vignettes/v02-scp.Rmd
v02-scp.Rmd
Last modified: 2021-08-12 13:40:23
Compiled: Thu Aug 12 15:59:38 2021
Mass spectrometry (MS)-based single-cell proteomics (SCP) is emerging thanks to several recent technological advances in sample preparation, liquid chromatography (LC) and MS (see Kelly (2020) for a comprehensive review). The improvements tackle the issues encountered when dealing with small sample amounts and focus on:
Two main strategies have currently been developed. On the one hand, label-free protocols acquire one single cell per MS run leading to accurate quantification but low sensitivity and throughput. On the other hand, label-based protocols multiplex several single-cell samples in one MS run leading to higher throughput (1000 cells per week). Another advantage of label-based protocols is the inclusion of a carrier sample, that is a sample containing between tens to hundreds of cells. Including a carrier increases sensitivity thanks to increased sample material, but at the cost of decreased quantification accuracy due to chemical noise (linked to sample labelling) and competition between the single-cell samples and the carrier sample during MS acquisition.
Although the scp
package can handle the data acquired from the two strategies, the exercise of this vignette will focus on the multiplexed strategy developed by Specht et al. (2021), called SCoPE2. An overview of the acquisition and data processing pipeline is depicted below.
scp
data frameworkSCP data is very similar to bulk proteomics data with the exception that the PSM data may be composed of tens to hundreds of separate acquisition runs. The QFeatures
class is able to store this acquisition structure by considering each MS run as a separate assay. Because the assays hold information about single cells, they are stored as SingleCellExperiment
objects (Lun and Risso (2020)) to create a direct interface to existing Bioconductor packages. Performing downstream analyses (such as dimension reduction, clustering, finding markers) very easy. The links between related features across different assays are also stored to facilitate manipulation and visualization of of PSM, peptide and protein data as we go along with the processing workflow.
scp
packageThe general workflow for processing SCP data is very similar to the workflow for bulk proteomics that we presented in the previous vignette. Therefore, the QFeatures
package already contains most of the tools required for the processing of SCP data. The scp
package implements the missing functions that are specifically designed for dealing with SCP data. Below, we provide the list of functions from scp
that extend the QFeatures
functions:
readSCP
: this is the main feature of the scp
package. It loads and formats standard data tables into QFeatures
objects ready for data processing.aggregateFeaturesOverAssays
: extends aggregateFeatures
to allow streamlined aggregation over multiple assayscomputeSCR
: compute the sample over carrier ratio (SCR), a useful metric for feature QCdivideByReference
: divide columns by a reference columnmedianCVperCell
: compute the median coefficient of variation (CV) per cell, a useful metric for single-cell QCnormalizeSCP
: extends QFeatures::normalize
to SingleCellExperiment
objectspep2qvalue
: compute q-values from posterior error probabilitiesrowDataToDF
: extract the rowData
of a QFeatures
object to a DataFrame
We will demonstrate those functions later in this vignette.
There are two input tables required for starting an analysis with scp
: the feature data table and the sample annotation table.
The feature data are generated after the identification and quantification of the MS spectra by a pre-processing software such as MaxQuant, ProteomeDiscoverer or MSFragger (the list of available software is actually much longer). We will here use as an example a data table that has been generated by MaxQuant. The table is available from the scp
package and is called mqScpData
(for MaxQuant-generated SCP data).
In this toy example, there are 1361 rows corresponding to features (quantified PSMs) and 149 columns corresponding to different data fields recorded by MaxQuant during the processing of the MS spectra.
Some of the columns contain quantitative data. In our example, those column names start with Reporter.intensity.
followed by a number.
(quantCols <- grep("Reporter.intensity.\\d", colnames(mqScpData), value = TRUE))
## [1] "Reporter.intensity.1" "Reporter.intensity.2" "Reporter.intensity.3"
## [4] "Reporter.intensity.4" "Reporter.intensity.5" "Reporter.intensity.6"
## [7] "Reporter.intensity.7" "Reporter.intensity.8" "Reporter.intensity.9"
## [10] "Reporter.intensity.10" "Reporter.intensity.11" "Reporter.intensity.12"
## [13] "Reporter.intensity.13" "Reporter.intensity.14" "Reporter.intensity.15"
## [16] "Reporter.intensity.16"
There are 16 columns with quantitative data because 16 labels were used during multiplexing. Let’s have a quick look into the quantitative data:
head(mqScpData[, quantCols])
## Reporter.intensity.1 Reporter.intensity.2 Reporter.intensity.3
## 1 61251 501.71 3731.3
## 2 58648 1099.80 2837.7
## 3 150350 3705.00 9361.0
## 4 27347 405.90 1525.2
## 5 84035 583.09 4092.3
## 6 44895 700.23 2283.0
## Reporter.intensity.4 Reporter.intensity.5 Reporter.intensity.6
## 1 1643.30 871.84 981.87
## 2 494.32 349.26 1030.50
## 3 0.00 1945.40 1188.60
## 4 0.00 0.00 318.74
## 5 530.13 718.13 2204.50
## 6 1109.60 0.00 675.79
## Reporter.intensity.7 Reporter.intensity.8 Reporter.intensity.9
## 1 1200.10 939.06 1457.50
## 2 0.00 1214.10 800.58
## 3 1574.00 2302.10 2176.10
## 4 0.00 519.81 0.00
## 5 960.51 453.77 1188.40
## 6 0.00 809.38 668.88
## Reporter.intensity.10 Reporter.intensity.11 Reporter.intensity.12
## 1 1329.80 981.83 NA
## 2 807.79 391.38 NA
## 3 1399.50 1307.50 2192.4
## 4 507.23 370.79 NA
## 5 740.99 0.00 NA
## 6 1467.50 901.38 NA
## Reporter.intensity.13 Reporter.intensity.14 Reporter.intensity.15
## 1 NA NA NA
## 2 NA NA NA
## 3 1791.4 1727.5 2157.3
## 4 NA NA NA
## 5 NA NA NA
## 6 NA NA NA
## Reporter.intensity.16
## 1 NA
## 2 NA
## 3 1398
## 4 NA
## 5 NA
## 6 NA
Most columns in the mqScpData
table contain feature metadata, that is data generated during the identification of the MS spectra. For instance, you may find the charge of the parent ion, the score and probability of a correct match between the MS spectrum and a peptide sequence, the sequence of the best matching peptide, its length, its modifications, the retention time of the peptide on the LC, the protein(s) the peptide originates from and much more.
head(mqScpData[, c("Charge", "Score", "PEP", "Sequence", "Length",
"Retention.time", "Proteins")])
## Charge Score PEP Sequence Length Retention.time
## 1 2 41.029 5.2636e-04 ATNFLAHEK 9 65.781
## 2 2 44.349 5.8789e-04 ATNFLAHEK 9 63.787
## 3 2 51.066 4.0315e-24 SHTILLVQPTK 11 71.884
## 4 2 63.816 4.7622e-06 SHTILLVQPTK 11 68.633
## 5 2 74.464 6.8709e-09 SHTILLVQPTK 11 71.946
## 6 2 41.502 5.3705e-02 SLVIPEK 7 76.204
## Proteins
## 1 sp|P29692|EF1D_HUMAN
## 2 sp|P29692|EF1D_HUMAN
## 3 sp|P84090|ERH_HUMAN
## 4 sp|P84090|ERH_HUMAN
## 5 sp|P84090|ERH_HUMAN
## 6 sp|P62269|RS18_HUMAN
Finally, some columns may contain metadata related to the MS instrument. In MaxQuant, only the file names generated by the MS instrument are stored.
unique(mqScpData$Raw.file)
## [1] "190321S_LCA10_X_FP97AG" "190222S_LCA9_X_FP94BM"
## [3] "190914S_LCB3_X_16plex_Set_21" "190321S_LCA10_X_FP97_blank_01"
The 1361 PSM were found across 4 MS runs. While these 4 runs are contained in a single table, we prefer to split them in separate assay when building the QFeatures
object, this will allow to have unique sample identifier that represent a combination of the MS run and the quantification column.
Next to the feature, sample annotation contains the experimental design generated by the researcher. The rows of the sample annotation table correspond to a sample in the experiment and the columns correspond to the different pieces of available information. We will here use the second example table provided in scp
:
data("sampleAnnotation")
head(sampleAnnotation)
## Raw.file Channel SampleType lcbatch sortday digest
## 1 190222S_LCA9_X_FP94BM Reporter.intensity.1 Carrier LCA9 s8 N
## 2 190222S_LCA9_X_FP94BM Reporter.intensity.2 Reference LCA9 s8 N
## 3 190222S_LCA9_X_FP94BM Reporter.intensity.3 Unused LCA9 s8 N
## 4 190222S_LCA9_X_FP94BM Reporter.intensity.4 Monocyte LCA9 s8 N
## 5 190222S_LCA9_X_FP94BM Reporter.intensity.5 Blank LCA9 s8 N
## 6 190222S_LCA9_X_FP94BM Reporter.intensity.6 Monocyte LCA9 s8 N
readSCP
readSCP
is the function that converts the sample and the feature data into a QFeatures
object following the data structure described above. The data belonging to each MS batch are stored in a separate SingleCellExperiment
object, where feature metadata are stored in the rowData
and the quantitative data are retrieved with assay()
. The sample annotations are stored as the colData
and are linked to the columns of each SingleCellExperiment
object.
In practice, we provide the two tables to readSCP
and tell the function which column holds the MS run information and which column in the sample annotation holds the quantitative column names in the feature data.
scp <- readSCP(featureData = mqScpData,
colData = sampleAnnotation,
batchCol = "Raw.file",
channelCol = "Channel")
## Loading data as a 'SingleCellExperiment' object
## Splitting data based on 'Raw.file'
## Formatting sample metadata (colData)
## Formatting data as a 'QFeatures' object
scp
## An instance of class QFeatures containing 4 assays:
## [1] 190222S_LCA9_X_FP94BM: SingleCellExperiment with 395 rows and 16 columns
## [2] 190321S_LCA10_X_FP97_blank_01: SingleCellExperiment with 109 rows and 16 columns
## [3] 190321S_LCA10_X_FP97AG: SingleCellExperiment with 487 rows and 16 columns
## [4] 190914S_LCB3_X_16plex_Set_21: SingleCellExperiment with 370 rows and 16 columns
The object returned by readSCP
is a QFeatures
object containing 4 SingleCellExperiment
assays that have been named after the 4 MS batches.
As mentioned in the previous vignette, sample data is retrieved from the colData
. Note that unique sample names were automatically generated by combining the batch name and the column names of the quantitative data:
colData(scp)
## DataFrame with 64 rows and 6 columns
## Raw.file Channel
## <character> <character>
## 190222S_LCA9_X_FP94BMReporter.intensity.1 190222S_LC... Reporter.i...
## 190222S_LCA9_X_FP94BMReporter.intensity.2 190222S_LC... Reporter.i...
## 190222S_LCA9_X_FP94BMReporter.intensity.3 190222S_LC... Reporter.i...
## 190222S_LCA9_X_FP94BMReporter.intensity.4 190222S_LC... Reporter.i...
## 190222S_LCA9_X_FP94BMReporter.intensity.5 190222S_LC... Reporter.i...
## ... ... ...
## 190914S_LCB3_X_16plex_Set_21Reporter.intensity.12 190914S_LC... Reporter.i...
## 190914S_LCB3_X_16plex_Set_21Reporter.intensity.13 190914S_LC... Reporter.i...
## 190914S_LCB3_X_16plex_Set_21Reporter.intensity.14 190914S_LC... Reporter.i...
## 190914S_LCB3_X_16plex_Set_21Reporter.intensity.15 190914S_LC... Reporter.i...
## 190914S_LCB3_X_16plex_Set_21Reporter.intensity.16 190914S_LC... Reporter.i...
## SampleType lcbatch
## <character> <character>
## 190222S_LCA9_X_FP94BMReporter.intensity.1 Carrier LCA9
## 190222S_LCA9_X_FP94BMReporter.intensity.2 Reference LCA9
## 190222S_LCA9_X_FP94BMReporter.intensity.3 Unused LCA9
## 190222S_LCA9_X_FP94BMReporter.intensity.4 Monocyte LCA9
## 190222S_LCA9_X_FP94BMReporter.intensity.5 Blank LCA9
## ... ... ...
## 190914S_LCB3_X_16plex_Set_21Reporter.intensity.12 Macrophage LCB3
## 190914S_LCB3_X_16plex_Set_21Reporter.intensity.13 Macrophage LCB3
## 190914S_LCB3_X_16plex_Set_21Reporter.intensity.14 Macrophage LCB3
## 190914S_LCB3_X_16plex_Set_21Reporter.intensity.15 Macrophage LCB3
## 190914S_LCB3_X_16plex_Set_21Reporter.intensity.16 Macrophage LCB3
## sortday digest
## <character> <character>
## 190222S_LCA9_X_FP94BMReporter.intensity.1 s8 N
## 190222S_LCA9_X_FP94BMReporter.intensity.2 s8 N
## 190222S_LCA9_X_FP94BMReporter.intensity.3 s8 N
## 190222S_LCA9_X_FP94BMReporter.intensity.4 s8 N
## 190222S_LCA9_X_FP94BMReporter.intensity.5 s8 N
## ... ... ...
## 190914S_LCB3_X_16plex_Set_21Reporter.intensity.12 s9 R
## 190914S_LCB3_X_16plex_Set_21Reporter.intensity.13 s9 R
## 190914S_LCB3_X_16plex_Set_21Reporter.intensity.14 s9 R
## 190914S_LCB3_X_16plex_Set_21Reporter.intensity.15 s9 R
## 190914S_LCB3_X_16plex_Set_21Reporter.intensity.16 s9 R
The feature metadata is retrieved from the rowData
. Since the feature metadata is specific to each assay, we need to tell from which assay we want to get the rowData
:
rowData(scp[["190222S_LCA9_X_FP94BM"]])[, 1:5]
## DataFrame with 395 rows and 5 columns
## uid Sequence Length Modifications Modified.sequence
## <character> <character> <integer> <character> <character>
## PSM2 _(Acetyl (... ATNFLAHEK 9 Acetyl (Pr... _(Acetyl (...
## PSM4 _(Acetyl (... SHTILLVQPT... 11 Acetyl (Pr... _(Acetyl (...
## PSM6 _(Acetyl (... SLVIPEK 7 Acetyl (Pr... _(Acetyl (...
## PSM9 _AAGLALK_ ... AAGLALK 7 Unmodified _AAGLALK_
## PSM12 _AALSAGK_ ... AALSAGK 7 Unmodified _AALSAGK_
## ... ... ... ... ... ...
## PSM1242 _YSQVLANGL... YSQVLANGLD... 12 Unmodified _YSQVLANGL...
## PSM1244 _YVLGPAVR_... YVLGPAVR 8 Unmodified _YVLGPAVR_
## PSM1247 _YYPTEDVPR... YYPTEDVPR 9 Unmodified _YYPTEDVPR...
## PSM1249 _YYSVNSR_ ... YYSVNSR 7 Unmodified _YYSVNSR_
## PSM1252 _YYTVFDR_ ... YYTVFDR 7 Unmodified _YYTVFDR_
Finally, we can also retrieve the quantification matrix for an assay of interest:
assay(scp, "190222S_LCA9_X_FP94BM")[1:5, 1:5]
## 190222S_LCA9_X_FP94BMReporter.intensity.1
## PSM2 58648
## PSM4 27347
## PSM6 44895
## PSM9 122070
## PSM12 58605
## 190222S_LCA9_X_FP94BMReporter.intensity.2
## PSM2 1099.80
## PSM4 405.90
## PSM6 700.23
## PSM9 1153.50
## PSM12 895.25
## 190222S_LCA9_X_FP94BMReporter.intensity.3
## PSM2 2837.7
## PSM4 1525.2
## PSM6 2283.0
## PSM9 7361.9
## PSM12 2763.8
## 190222S_LCA9_X_FP94BMReporter.intensity.4
## PSM2 494.32
## PSM4 0.00
## PSM6 1109.60
## PSM9 1732.30
## PSM12 867.82
## 190222S_LCA9_X_FP94BMReporter.intensity.5
## PSM2 349.26
## PSM4 0.00
## PSM6 0.00
## PSM9 1515.60
## PSM12 1050.30
scpdata
packageNext to scp
, we also developed scpdata
, a data package that disseminates published SCP data sets formatted using the scp
data structure. The package heavily relies on the ExperimentHub
infrastructure. This package is an ideal platform for data sharing and promotes for open and reproducible science in SCP, it facilitates the access for developers to SCP data to build and benchmark new methodologies and it facilitates the access for new users to data in the context of training and demonstration (like this workshop).
After loading the package, you can have a look at the available datasets by running:
library(scpdata)
scpdata()
## DataFrame with 12 rows and 15 columns
## title dataprovider species taxonomyid genome
## <character> <character> <character> <integer> <character>
## EH3899 specht2019... SlavovLab ... Homo sapie... 9606 NA
## EH3900 specht2019... SlavovLab ... Homo sapie... 9606 NA
## EH3901 dou2019_ly... MassIVE Homo sapie... 9606 NA
## EH3902 dou2019_mo... MassIVE Mus muscul... 10090 NA
## EH3903 dou2019_bo... MassIVE Mus muscul... 10090 NA
## ... ... ... ... ... ...
## EH3906 zhu2018NC_... PRIDE Homo sapie... 9606 NA
## EH3907 zhu2018NC_... PRIDE Homo sapie... 9606 NA
## EH3908 cong2020AC PRIDE Homo sapie... 9606 NA
## EH3909 zhu2019EL PRIDE Gallus gal... 9031 NA
## EH6011 liang2020_... PRIDE Homo sapie... 9606 NA
## description coordinate_1_based maintainer rdatadateadded
## <character> <integer> <character> <character>
## EH3899 SCP expres... 1 Christophe... 2020-11-05
## EH3900 SCP expres... 1 Christophe... 2020-11-05
## EH3901 SCP expres... 1 Christophe... 2020-11-05
## EH3902 SCP expres... 1 Christophe... 2020-11-05
## EH3903 SCP expres... 1 Christophe... 2020-11-05
## ... ... ... ... ...
## EH3906 Near SCP e... 1 Christophe... 2020-11-05
## EH3907 Near SCP e... 1 Christophe... 2020-11-05
## EH3908 SCP expres... 1 Christophe... 2020-11-05
## EH3909 SCP expres... 1 Christophe... 2020-11-05
## EH6011 Expression... 1 Christophe... 2021-04-27
## preparerclass tags rdataclass
## <character> <list> <character>
## EH3899 scpdata Experiment...,Expression...,Experiment...,... QFeatures
## EH3900 scpdata Experiment...,Expression...,Experiment...,... QFeatures
## EH3901 scpdata Experiment...,Expression...,Experiment...,... QFeatures
## EH3902 scpdata Experiment...,Expression...,Experiment...,... QFeatures
## EH3903 scpdata Experiment...,Expression...,Experiment...,... QFeatures
## ... ... ... ...
## EH3906 scpdata Experiment...,Expression...,Experiment...,... QFeatures
## EH3907 scpdata Experiment...,Expression...,Experiment...,... QFeatures
## EH3908 scpdata Experiment...,Expression...,Experiment...,... QFeatures
## EH3909 scpdata Experiment...,Expression...,Experiment...,... QFeatures
## EH6011 scpdata Expression...,MassSpectr...,Proteome,... QFeatures
## rdatapath sourceurl sourcetype
## <character> <character> <character>
## EH3899 scpdata/sp... https://sc... CSV
## EH3900 scpdata/sp... https://sc... CSV
## EH3901 scpdata/do... ftp://mass... XLS/XLSX
## EH3902 scpdata/do... ftp://mass... XLS/XLSX
## EH3903 scpdata/do... ftp://mass... XLS/XLSX
## ... ... ... ...
## EH3906 scpdata/zh... ftp://ftp.... TXT
## EH3907 scpdata/zh... ftp://ftp.... TXT
## EH3908 scpdata/co... ftp://ftp.... TXT
## EH3909 scpdata/zh... ftp://ftp.... TXT
## EH6011 scpdata/li... ftp://ftp.... TXT
For instance, loading the data sets published in Zhu et al. (2019) is as simple as calling the title of the data set:
zhu2019EL()
## An instance of class QFeatures containing 62 assays:
## [1] 1H1a: SingleCellExperiment with 152 rows and 1 columns
## [2] 1H1b: SingleCellExperiment with 267 rows and 1 columns
## [3] 1H1c: SingleCellExperiment with 128 rows and 1 columns
## ...
## [60] 2N0c: SingleCellExperiment with 61 rows and 1 columns
## [61] peptides: SingleCellExperiment with 3444 rows and 60 columns
## [62] proteins: SingleCellExperiment with 840 rows and 60 columns
As a use-case, we will reproduce the analysis of the SCoPE2 data published in Specht et al. (2021). SCoPE2 is the first published SCP protocol that has been used to profile thousands of proteins in thousands of single-cells. This is a technical milestone for the field and it opens the door for a fine-grain understanding of biological processes at the protein level.
Along their acquisition protocol, the authors have also provided an R script to reproduce their data processing. The code is hard to read and built from scratch. We have used scp
to standardize this workflow (Vanderaa and Gatto (2021)) and reproduce the results using code that can easily be adapted to other data sets. The outline of the workflow is shown below.
The replication analysis on the full data set is provided in another vignette.
The remainder of the vignette includes a few exercises for you to have your hands on the functions in QFeatures
and scp
. We have hidden the solution under Solution
ticks that you can click on to reveal a good answer (sometimes, several solution are possible but we provided only one). You can also make use of some hints and fill-the-blank if you feel you are getting stuck. Each exercise has an associated data that you can load upfront. This will avoid you to rerun the whole vignette for every exercise.
Also, to reduce the computational burden, we will focus the replication only on a part of the SCoPE2 data. You can find the full replication in another vignette (Vanderaa and Gatto (2021))
In a previous section we have shown how to format the feature and sample tables to a Qfeatures
object for a small subset of the SCoPE2 data set. We will here import the data directly from scpdata
to gain some time.
(scope2 <- specht2019v3())
## An instance of class QFeatures containing 179 assays:
## [1] 190222S_LCA9_X_FP94AA: SingleCellExperiment with 2777 rows and 11 columns
## [2] 190222S_LCA9_X_FP94AB: SingleCellExperiment with 4348 rows and 11 columns
## [3] 190222S_LCA9_X_FP94AC: SingleCellExperiment with 4917 rows and 11 columns
## ...
## [177] 191110S_LCB7_X_APNOV16plex2_Set_9: SingleCellExperiment with 4934 rows and 16 columns
## [178] peptides: SingleCellExperiment with 9354 rows and 1490 columns
## [179] proteins: SingleCellExperiment with 3042 rows and 1490 columns
The data set provides 177 assays containing the PSM data, but also two assays that contain the peptide and protein data generated by the authors in the original work. We will here work only on samples acquired as part of the LCB3
batch.
This section will guide you through the filtering of low-quality features based on different metrics.
First, only the assays that have sufficient PSMs are kept. The authors keep an assay if it has over 500 PSMs. Before filtering, let’s first look at the distribution of the number of PSMs per assay. You can extract the number of rows (here PSMs) and the number of columns (TMT channels) of each assay using the dims
function implemented in QFeatures
.
nPSMs <- dims(lcb3)[1, ]
Let’s have a look at the number of features that were identified in the different runs. The data visualization in this vignette is performed using the ggplot2
function.
library("ggplot2")
ggplot(data.frame(nPSMs)) +
aes(x = nPSMs) +
geom_histogram() +
geom_vline(xintercept = 500)
No MS run failed in the LB3 batch. If some runs had failed, we could have subset the data taking advantage of the subsetting method of a QFeatures
object.
lcb3 <- lcb3[, , nPSMs > 500]
All MS-based proteomic experiment is subjected to contamination from the environment. The most common contamination is keratin released from the researcher’s skin. A small database of known contaminants is commonly supplied to tag PSMs originating from potential contaminants. Next to that, a decoy database consisting of reversed peptide sequence is used to assess the reliability of the PSM identification and tag false hits.
You can remove the PSMs that were matched to contaminant proteins (the protein name starts with CON
) or to the decoy database (the protein name starts with REV
) using the filterFeatures
function (cf previous vignette). To match the protein identifiers, you can use the grepl
function and the filterFeatures
will automatically retrieve the requested variable from the rowData
.
lcb3 <- filterFeatures(lcb3,
~ !grepl("REV|CON", protein))
Next, the SCoPE2 workflow filters PSMs based on the false discovery rate (FDR) for identification. This will remove the PSMs that were matched by chance. The PSM data were already processed with DART-ID (Chen, Franks, and Slavov (2019)), a python software that updates the confidence in peptide identification using an Bayesian inference approach. DART-ID outputs for every PSM the updated posterior error probability (PEP). Filtering on the PEP is too conservative and it is rather advised to filter based on FDR (Käll et al. (2008)). To control for the FDR, we need to compute q-values, that correspond to the minimal FDR threshold that would still select the associated feature.
The scp
package provides some utility function to compute common metrics and statistics from the rowData
that are automatically added to the rowData
. Converting PEPs to q-values is one such example. You can use the pep2qvalue
function to easily convert PEPs to q-values. We here compute the q-values from the dart_PEP
and store the result in the rowData
of each assay under qvalue_psm
.
lcb3 <- pep2qvalue(lcb3,
i = names(lcb3),
PEP = "dart_PEP",
rowDataName = "qvalue_psm")
You can also compute de protein-level q-values when providing the protein name (taken from the rowData
) as a grouping variable.
lcb3 <- pep2qvalue(lcb3,
i = names(lcb3),
groupBy = "protein",
PEP = "dart_PEP",
rowDataName = "qvalue_protein")
Now that we have q-values available in the rowData
, we can use them to filter out low-confidence PSMs using filterFeatures
. The SCoPE2 authors removed features so to control a 1% PSM and protein FDR.
lcb3 <- filterFeatures(lcb3,
~ qvalue_psm < 0.01 & qvalue_protein < 0.01)
The SCoPE2 authors suggested in their paper a new PSM quality control for SCP data: the sample to carrier ratio (SCR). The SCR is computed for every PSM separately as the (mean) intensity of the single-cell samples divided by the intensity of the carrier (200 cells) acquired during the same run as the sample. It is expected that the carrier intensities are much higher than the single-cell intensities.
computeSCR
is another example of utility function from the scp
package. It can be used to automatically compute the SCR for each PSM in the assays of interest. All that is required is to provide the function a way to tell what sample is what. You can achieve this thanks to the sample annotations contained in the colData
. In this dataset, SampleType
gives the type of sample that is present in each TMT channel. The SCoPE2 protocol includes 5 types of samples:
table(lcb3$SampleType)
##
## Blank Carrier Macrophage Monocyte Reference Unused
## 22 24 192 74 24 48
Carrier
) contain 200-cell equivalents and are meant to boost the peptide identification rate.Reference
) contain 5 cell equivalents and are used to partially correct for between-run variation.Unused
) are channels that are left empty due to isotopic cross-contamination by the carrier channel.Blank
) contain samples that do not contain any cell but are processed as single-cell samples.Macrophage
or Monocyte
).The computeSCR
function expects the user to provide a pattern (following regular expression syntax) that identifies the carrier column(s) (carrierPattern
) a pattern that identifies the sample columns (samplePattern
). The function will store the computed SCRs of each PSM in the rowData
of the corresponding assay. When multiple matches are found for samples or carrier, you can also provide a summarizing function through sampleFUN
and carrierFUN
, respectively.
Exercise: It’s your turn now, compute the SCR by considering blanks, monocytes and macrophages as samples of interest. Note there are multiple samples per run for each PSM and you should therefore compute the mean of the samples. There is only a single carrier, so you don’t need to bother providing carrierFUN
.
data("lcb3_1")
lcb3 <- lcb3_1
colData
column SampleType
. The output above provides an overview of the available sample types in the data.
sampleFUN
.
lcb3 <- computeSCR(lcb3,
i = names(lcb3),
colvar = ...,
carrierPattern = ...,
samplePattern = ...,
sampleFUN = ...,
rowDataName = "MeanSCR")
lcb3 <- computeSCR(lcb3,
i = names(lcb3),
colvar = ...,
carrierPattern = "Carrier",
samplePattern = "Blank|Monocyte|Macrophage",
sampleFUN = ...,
rowDataName = "MeanSCR")
lcb3 <- computeSCR(lcb3,
i = names(lcb3),
colvar = "SampleType",
carrierPattern = "Carrier",
samplePattern = "Blank|Monocyte|Macrophage",
sampleFUN = mean,
rowDataName = "MeanSCR")
Before applying the filter, let’s plot the distribution of the mean SCR. You can extract the MeanSCR
from the rowData
of several assays using the rbindRowData
function. It takes the rowData
of interest and returns a single DataFrame
table with all rowData
variables common to the selected assays.
rbindRowData(lcb3, i = names(lcb3)) %>%
data.frame %>%
ggplot(aes(x = MeanSCR)) +
geom_histogram() +
geom_vline(xintercept = c(1/200, 0.1),
lty = 2:1) +
scale_x_log10()
## Warning: Transformation introduced infinite values in continuous x-axis
## Warning: Removed 2832 rows containing non-finite values (stat_bin).
A great majority of the PSMs have a mean SCR that is lower than 10%, as expected. The mode of the distribution is located close to 1%, as expected since every carrier sample contains 200 cells leading to an expected ratio of 0.5% (dashed line).
Exercise: keep only PSMs that have a mean SCR lower than 10%. Make sure also to remove mean SCR values that are NA
(due to missing quantification data or to dividing zero by zero) or infinite (when carrier is zero but not samples).
filterFeatures
function a few times in this vignette, take inspiration from the previous sections.
is.na(MeanSCR)
, and similarly is.infinite(MeanSCR)
for infinite values.
lcb3 <- filterFeatures(lcb3,
~ !is.na(MeanSCR) &
!is.infinite(MeanSCR) &
...)
lcb3 <- filterFeatures(lcb3,
~ !is.na(MeanSCR) &
!is.infinite(MeanSCR) &
MeanSCR < 0.1)
A PIF (parental ion fraction) smaller than 80 % indicates the associated spectra is contaminated by co-isolated peptides and therefore the quantification becomes unreliable. The PIF was computed by MaxQuant and is readily available for filtering. Using again filterFeatures
, we keep only PSMs that have low spectral contamination, that is, PSMs with an associated PIF greater than 0.8.
lcb3 <- filterFeatures(lcb3,
~ !is.na(PIF) & PIF > 0.8)
In order to partially correct for between-run variation, the SCoPE2 authors compute relative reporter ion intensities. This means that intensities measured for single-cells are divided by the reference channel (5-cell equivalents). Before normalization:
assay(lcb3[[1]])[1:5, 1:5]
## 190913S_LCB3_X_16plex_Set_12_RI1 190913S_LCB3_X_16plex_Set_12_RI2
## PSM149 112830 3638.1
## PSM1114 352770 10268.0
## PSM2144 179260 4943.6
## PSM2703 247810 6965.6
## PSM3259 240900 8922.6
## 190913S_LCB3_X_16plex_Set_12_RI3 190913S_LCB3_X_16plex_Set_12_RI4
## PSM149 5805.9 0.00
## PSM1114 19621.0 0.00
## PSM2144 8769.5 0.00
## PSM2703 10308.0 0.00
## PSM3259 16451.0 360.84
## 190913S_LCB3_X_16plex_Set_12_RI5
## PSM149 1012.9
## PSM1114 1974.0
## PSM2144 1193.9
## PSM2703 1094.8
## PSM3259 3423.6
You can use the divideByReference
function that divides channels of interest by the reference channel. Similarly to computeSCR
, you can point to the samples and the reference columns in each assay using the annotation contained in the colData
.
lcb3 <- divideByReference(lcb3,
i = names(lcb3),
colvar = "SampleType",
samplePattern = ".",
refPattern = "Reference")
After normalization:
assay(lcb3[[1]])[1:5, 1:5]
## 190913S_LCB3_X_16plex_Set_12_RI1 190913S_LCB3_X_16plex_Set_12_RI2
## PSM149 31.01344 1
## PSM1114 34.35625 1
## PSM2144 36.26102 1
## PSM2703 35.57626 1
## PSM3259 26.99886 1
## 190913S_LCB3_X_16plex_Set_12_RI3 190913S_LCB3_X_16plex_Set_12_RI4
## PSM149 1.595860 0.00000000
## PSM1114 1.910888 0.00000000
## PSM2144 1.773910 0.00000000
## PSM2703 1.479844 0.00000000
## PSM3259 1.843745 0.04044113
## 190913S_LCB3_X_16plex_Set_12_RI5
## PSM149 0.2784146
## PSM1114 0.1922478
## PSM2144 0.2415042
## PSM2703 0.1571724
## PSM3259 0.3836998
Now that the PSM assays are processed, you can aggregate them to peptides. This is performed using the aggregateFeaturesOverAssays
function. This is a wrapper function in scp
that sequentially calls the aggregateFeatures
from the QFeatures
package over the different assays. For each assay, the function aggregates several PSMs into a unique peptide given an aggregating variable in the rowData
(peptide sequence) and a user-supplied aggregating function (the median for instance). Regarding the aggregating function, the SCoPE2 analysis removes duplicated peptide sequences per run by taking the first non-missing value. While better alternatives are documented in QFeatures::aggregateFeatures
, we suggest to use this approach for the sake of replication, but also to illustrate that custom functions can be applied when aggregating.
For each assay, a new aggregated assay will be added to the dataset. The aggregated peptide assays must be given a name. We here suggest to use the original names prefixed with peptides_
.
You now have all the required information to aggregate the PSMs to peptides.
Exercise: use the aggregateFeaturesOverAssays
function to aggregate PSMs to peptides. Don’t forget to provide the remove.duplicates
as an aggregating function.
data("lcb3_2")
lcb3 <- lcb3_2
fcol = "peptide"
. The function will take the peptide
column in the rowData
that contains the peptide sequences and create a new aggregated feature for each unique sequence.
lcb3 <- aggregateFeaturesOverAssays(lcb3,
i = names(lcb3),
fcol = "peptide",
name = ...,
fun = ...)
lcb3 <- aggregateFeaturesOverAssays(lcb3,
i = names(lcb3),
fcol = "peptide",
name = peptideAssays,
fun = remove.duplicates)
lcb3
## An instance of class QFeatures containing 48 assays:
## [1] 190913S_LCB3_X_16plex_Set_12: SingleCellExperiment with 1621 rows and 16 columns
## [2] 190913S_LCB3_X_16plex_Set_17_reinject2: SingleCellExperiment with 1484 rows and 16 columns
## [3] 190914S_LCB3_X_16plex_Set_1: SingleCellExperiment with 1415 rows and 16 columns
## ...
## [46] peptides_190914S_LCB3_X_16plex_Set_7: SingleCellExperiment with 1771 rows and 16 columns
## [47] peptides_190914S_LCB3_X_16plex_Set_8: SingleCellExperiment with 1348 rows and 16 columns
## [48] peptides_190914S_LCB3_X_16plex_Set_9: SingleCellExperiment with 1944 rows and 16 columns
There are as many new assay that are added as there are assays to aggregate. Under the hood, the QFeatures
architecture preserves the relationship between the aggregated assays. See ?AssayLinks
for more information on relationships between assays.
It is now time to have an overview of our data set:
plot(lcb3)
Exercise: there are too many assays displayed on the graph above that is becoming too crowded. The plot
function also provides an interactive version of this graph that you can explore manually. Search the documentation and create an interactive plot for the lcb3
data.
plot(lcb3, interactive = TRUE)
The next step is to replace zero and infinite values by NA
s. The zeros can be biological zeros or technical zeros and differentiating between the two types is a difficult task, they are therefore better considered as missing data that should be modelled using dedicated methods. The infinite values appear during the normalization by the reference when the reference channel is zero. This artefact could easily be avoided if we had replaced the zeros by NA
s at the beginning of the workflow, what we strongly recommend for future analyses.
The infIsNA
and the zeroIsNA
functions automatically detect infinite and zero values, respectively, and replace them with NA
s. Those two functions are provided by the QFeatures
package. See here how you can replace infinite values by NA:
sum(is.infinite(assay(lcb3, peptideAssays[1])))
## [1] 31
lcb3 <- infIsNA(lcb3, i = peptideAssays)
sum(is.infinite(assay(lcb3, peptideAssays[1])))
## [1] 0
You can also replace all zero values by NA.
Up to now, the data belonging to each MS run are kept in separate assays. You can combine all batches into a single assay using the joinAssays
function from the QFeatures
package.
lcb3 <- joinAssays(lcb3,
i = peptideAssays,
name = "peptides")
Note joinAssays
has created a new assay called peptides
that combines the previously aggregated peptide assays.
lcb3
## An instance of class QFeatures containing 49 assays:
## [1] 190913S_LCB3_X_16plex_Set_12: SingleCellExperiment with 1621 rows and 16 columns
## [2] 190913S_LCB3_X_16plex_Set_17_reinject2: SingleCellExperiment with 1484 rows and 16 columns
## [3] 190914S_LCB3_X_16plex_Set_1: SingleCellExperiment with 1415 rows and 16 columns
## ...
## [47] peptides_190914S_LCB3_X_16plex_Set_8: SingleCellExperiment with 1348 rows and 16 columns
## [48] peptides_190914S_LCB3_X_16plex_Set_9: SingleCellExperiment with 1944 rows and 16 columns
## [49] peptides: SingleCellExperiment with 4365 rows and 384 columns
At this point, you can regularly have a look at plot(lcb3, interactive = TRUE)
to keep an overview of the assay hierarchy.
The SCoPE2 workflow proceeds with the filtering of low quality cells. The filtering is based on the median coefficient of variation (CV) per cell. The CV is measured for each protein in each sample as the standard deviation of the peptide quantifications belonging to the same protein divided by the average expression of those peptides. Taking the median CV per cell will give a measure of the robustness of the quantification within that cell. We want to remove cells that exhibit a high median CV because inconsistent measures imply artefacts during samples preparation.
This is performed using the medianCVperCell
function from the scp
package. The function takes the protein information from the rowData
of the assays. This information will tell how to group the features (peptides) when computing the CV. The SCoPE2 authors applied a custom normalization (norm = "SCoPE2"
) to the data prior to computing CVs and they computed CVs only if at least 6 peptides are available (nobs = 6
). See the methods section in Specht et al. (2021) for more information.
lcb3 <- medianCVperCell(lcb3,
i = peptideAssays,
groupBy = "protein",
nobs = 6,
colDataName = "MedianCV",
norm = "SCoPE2")
The computed CVs are stored in the colData
. We can now filter cells that have reliable quantifications. The blank samples are not expected to have reliable quantifications and hence can be used to estimate a null distribution of the CV. This distribution helps defining a threshold that filters out single-cells that contain noisy quantification.
colData(lcb3) %>%
data.frame %>%
filter(SampleType %in% c("Macrophage", "Monocyte", "Blank")) %>%
ggplot(aes(x = MedianCV,
fill = SampleType)) +
geom_histogram() +
geom_vline(xintercept = 0.365)
We can see that the protein quantification for single-cells are much more consistent within single-cell channels than within blank channels. A threshold of 0.365 best separates single-cells from empty channels.
Exercise: keep cells that have a median CV lower than 0.365. You should also keep macrophages and monocytes as those represent the samples of interest. You can easily achieve this by subsetting the samples based on the associated colData
using the subsetByColData
function from the MultiAssayExperiment
package.
data("lcb3_3")
lcb3 <- lcb3_3
<- subsetByColData(lcb3,
lcb3 $MedianCV ... &
lcb3$SampleType ...) lcb3
lcb3 <- subsetByColData(lcb3,
lcb3$MedianCV < 0.365 &
lcb3$SampleType %in% c("Macrophage", "Monocyte"))
Although you already normalized by the reference channels, the authors of SCoPE2 suggested to proceed with further data normalization. First, they normalize the columns (samples) of the peptide data by the median intensities. Then, the rows (peptides) are normalized by dividing the relative intensities by the mean relative intensities.
## Scale column with median
lcb3 <- normalizeSCP(lcb3,
i = "peptides",
method = "div.median",
name = "peptides_norm1")
The second normalization method is not available in normalize
, but you can still apply it using the sweep
method from the QFeatures
package that is inspired from the base::sweep
function. We here show how to use it, note that is it a bit more complicated since you need to manually provide the scaling factors.
sf <- rowMeans(assay(lcb3[["peptides_norm1"]]), na.rm = TRUE)
## Scale rows with mean
lcb3 <- sweep(lcb3,
i = "peptides_norm1", MARGIN = 1,
FUN = "/",
STATS = sf,
name = "peptides_norm2")
Each normalization step is stored in a separate assay. You can have a look at the QFeatures
plot as suggested previously.
Peptides that contain many missing values are not informative. Therefore, it is advised to remove highly missing peptides from downstream analysis. The SCoPE2 authors removed peptides with more than 99 % missing data. You can achieve this using the filterNA
function from QFeatures
.
lcb3 <- filterNA(lcb3,
i = "peptides_norm2",
pNA = 0.99)
The last processing step of the peptide data before aggregating to proteins is to log2-transform the data.
lcb3 <- logTransform(lcb3,
base = 2,
i = "peptides_norm2",
name = "peptides_log")
Similarly to aggregating PSM data to peptide data, you can aggregate peptide data to protein data. Note this time, you can use the aggregateFeatures
function instead of the aggregateFeaturesOverAssays
function since you only need to aggregate only one assay (peptides_log
).
lcb3 <- aggregateFeatures(lcb3,
i = "peptides_log",
name = "proteins",
fcol = "protein",
fun = matrixStats::colMedians, na.rm = TRUE)
Normalization is performed similarly to peptide normalization. You can use the same functions, but since the data were log-transformed at the peptide level, you should subtract/center by the statistic (median or mean) instead of dividing.
## Center columns with median
lcb3 <- normalizeSCP(lcb3,
i = "proteins",
method = "center.median",
name = "proteins_norm1")
## Center rows with mean
lcb3 <- sweep(lcb3,
i = "proteins_norm1",
MARGIN = 1,
FUN = "-",
STATS = rowMeans(assay(lcb3[["proteins_norm1"]]),
na.rm = TRUE),
name = "proteins_norm2")
The protein data contains a lot of missing values. Let’s have a look at the distribution of the percent missing data per sample. You can easily achieve this by using the nNA
function. Cells contain on average between 60 and 70% missing values!
nNAres <- nNA(lcb3, "proteins_norm2")$nNAcols
ggplot(data.frame(nNAres),
aes(x = pNA)) +
geom_histogram()
The missing data is imputed using K nearest neighbors. QFeatures
provides the impute
function that serves as an interface to different imputation algorithms among which the KNN algorithm from impute::impute.knn
.
lcb3 <- impute(lcb3,
i = "proteins_norm2",
method = "knn",
k = 3, rowmax = 1, colmax= 1,
maxp = Inf, rng.seed = 1234)
The final step is to model the remaining batch effects and correct for it. The data were acquired as a series of MS runs. Each MS run can be subjected to technical perturbations that lead to differences in the data. This must be accounted for to avoid attributing biological effects to technical effects. The ComBat
algorithm (Johnson, Li, and Rabinovic (2007)) is used in the SCoPE2 script to correct for those batch effects. ComBat
is part of the sva
package. It requires a batch variable, in this case the LC-MS/MS run, and adjusts for batch effects, while protecting variables of interest, the sample type in this case.
Important: we do not claim ComBat
is the best method to model batch effect, we simply follow the SCoPE2 workflow. Therefore, we do not provide a wrapper function for applying batch correction on QFeatures
object. However, this section is an excellent example on how to apply custom functions to the data.
Let’s first extract the assays with the associated colData
.
sce <- getWithColData(lcb3, "proteins_norm2")
## Warning: 'experiments' dropped; see 'metadata'
You can then apply the function of interest, here batch correction with ComBat
. See how we easily retrieve the sample annotation required to perform the batch correction. The output of ComBat
is used to overwrite the data in the assay.
batch <- sce$Set
model <- model.matrix(~ SampleType, data = colData(sce))
assay(sce) <- ComBat(dat = assay(sce),
batch = batch,
mod = model)
Finally, the modified assay needs to be added to the QFeatures
object. To properly achieve this, you will need two functions. First, addAssay
allows you to add the new assay to the data set. Second, the addAssayLinkOneToOne
will create one-to-one link between the proteins of the new assay and the proteins of a parent assay.
Exercise: add the batch corrected protein data as a new assay in the QFeatures
object (you can call it proteins_batchC
). Create a one-to-one relationship between the features of the last assay (proteins_norm2
) and the features of the newly added assay.
data("lcb3_4")
lcb3 <- lcb3_4
lcb3 <- addAssay(lcb3,
y = ...,
name = ...)
lcb3 <- addAssayLinkOneToOne(lcb3,
from = ...,
to = ...)
lcb3 <- addAssay(lcb3,
y = sce,
name = "proteins_batchC")
lcb3 <- addAssayLinkOneToOne(lcb3, from = "proteins_norm2",
to = "proteins_batchC")
Note that in the case the new assay has not a one-to-one relationship with the parent assay, you can also add custom relationships using the addAssayLink
function.
Exercise: create the interactive plot for an overview of the assays in lcb3
. Compare it to your first plot and see how you kept track of the intermediate processing steps.
plot(lcb3, interactive = TRUE)
A final normalization step is performed in the SCoPE2 workflow. This is exactly the same as three sections above.
## Center columns with median
lcb3 <- normalizeSCP(lcb3,
i = "proteins_batchC",
method = "center.median",
name = "proteins_batchC_norm1")
## Center rows with mean
lcb3 <- sweep(lcb3,
i = "proteins_batchC_norm1",
MARGIN = 1,
FUN = "-",
STATS = rowMeans(assay(lcb3[["proteins_batchC_norm1"]]),
na.rm = TRUE),
name = "proteins_scp")
By running this last step, you have replicated the data processing performed by the SCoPE2 workflow! The data is now ready for data exploration and downstream analyses.
The QFeatures
package provides the longFormat
function that formats the data set as a long table, ideal for the integration with ggplot2
. This can be used for instance to explore the different processing steps applied to the data.
Exercise: convert the lcb3
dataset to a long format table. First subset the data Filamin-A (P21333
) and all its associated features (PSMs and peptides). Remember the 3-index subsetting of a QFeatures
object. You can also include rowData
and colData
variable. For this exercise, include the colData
variables: Set
, Channel
and SampleType
.
data("lcb3_5")
lcb3 <- lcb3_5
## Subsetting
filamin <- lcb3[..., ..., ...]
## Conversion to longFormat
lf <- longFormat(filamin,
colvars = ...)
filamin <- lcb3["P21333", , ]
lf <- longFormat(filamin,
colvars = c("Set", "Channel", "SampleType"))
You can use the metadata to filter for a batch of interest, for example 190913S_LCB3_X_16plex_Set_12
. This filtered long data is then passed to ggplot2
for data visualization.
lf <- filter(data.frame(lf), Set == "190913S_LCB3_X_16plex_Set_12")
ggplot(lf) +
aes(x = Channel,
y = value,
col = SampleType) +
geom_point() +
facet_wrap(~ assay, scales = "free") +
theme(axis.text.x = element_text(angle = 90))
## Warning: Removed 1262 rows containing missing values (geom_point).
This graph can be used to track the processing of the quantitative data.
Since each assay is a SingleCellExperiment
object, the data can also easily be plugged into dimension reduction functions from the Bioconductor package scater
. We show here an example of dimension reduction results using t-SNE.
sce <- getWithColData(lcb3, "proteins_scp")
## Warning: 'experiments' dropped; see 'metadata'
## Warning: Ignoring redundant column names in 'colData(x)':
library(scater)
set.seed(1234)
sce <- runTSNE(sce,
ncomponents = 2,
ntop = Inf,
scale = TRUE,
exprs_values = 1,
name = "TSNE")
## Plotting is performed in a single line of code
plotTSNE(sce, colour_by = "SampleType")
This graph shows a low dimension representation of the final processed protein data. Each point represents a cell. The sample type is accessed from the colData
and coloured on the graph. This allows to evaluate whether the quantitative data contains information to separate monocytes from differentiated macrophages.
We hope we could convince you that QFeatures
and its extension scp
are an ideal environment to manipulate and process MS-SCP quantification data. This is only the beginning of the journey, as further development and benchmarking are required to offer improved processing workflows. Furthermore, downstream analyses, such as differential expression analyses, require the development of new statistical methods to model the complex data structure present in SCP data (Vanderaa and Gatto (2021)). Those methods could highly benefit from the QFeatures
and scp
infrastructure to access and manipulate the required information. Furthermore, the scpdata
package provides ready-to-process data that represent valuable use cases to build analytical method onto.
scp
in this vignette.QFeatures
object.Have a look at our paper that describes the reproduction of the complete SCoPE2 data set and highlights some important challenges that remain to be tackled in the field.
sessionInfo()
## R version 4.1.0 (2021-05-18)
## Platform: x86_64-pc-linux-gnu (64-bit)
## Running under: Ubuntu 20.04.2 LTS
##
## Matrix products: default
## BLAS/LAPACK: /usr/lib/x86_64-linux-gnu/openblas-pthread/libopenblasp-r0.3.8.so
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=C
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] scater_1.21.3 scuttle_1.3.1
## [3] SingleCellExperiment_1.15.1 forcats_0.5.1
## [5] stringr_1.4.0 dplyr_1.0.7
## [7] purrr_0.3.4 readr_2.0.1
## [9] tidyr_1.1.3 tibble_3.1.3
## [11] ggplot2_3.3.5 tidyverse_1.3.1
## [13] sva_3.41.0 BiocParallel_1.27.3
## [15] genefilter_1.75.0 mgcv_1.8-36
## [17] nlme_3.1-152 scpdata_1.1.0
## [19] ExperimentHub_2.1.4 AnnotationHub_3.1.4
## [21] BiocFileCache_2.1.1 dbplyr_2.1.1
## [23] scp_1.3.3 QFeatures_1.3.7
## [25] MultiAssayExperiment_1.19.5 SummarizedExperiment_1.23.1
## [27] Biobase_2.53.0 GenomicRanges_1.45.0
## [29] GenomeInfoDb_1.29.3 IRanges_2.27.0
## [31] S4Vectors_0.31.0 BiocGenerics_0.39.1
## [33] MatrixGenerics_1.5.3 matrixStats_0.60.0
## [35] BiocStyle_2.21.3
##
## loaded via a namespace (and not attached):
## [1] readxl_1.3.1 backports_1.2.1
## [3] systemfonts_1.0.2 igraph_1.2.6
## [5] lazyeval_0.2.2 splines_4.1.0
## [7] digest_0.6.27 htmltools_0.5.1.1
## [9] viridis_0.6.1 fansi_0.5.0
## [11] magrittr_2.0.1 memoise_2.0.0
## [13] ScaledMatrix_1.1.0 cluster_2.1.2
## [15] tzdb_0.1.2 limma_3.49.4
## [17] Biostrings_2.61.2 annotate_1.71.0
## [19] modelr_0.1.8 pkgdown_1.6.1
## [21] colorspace_2.0-2 ggrepel_0.9.1
## [23] blob_1.2.2 rvest_1.0.1
## [25] rappdirs_0.3.3 textshaping_0.3.5
## [27] haven_2.4.3 xfun_0.25
## [29] crayon_1.4.1 RCurl_1.98-1.3
## [31] jsonlite_1.7.2 impute_1.67.0
## [33] survival_3.2-11 glue_1.4.2
## [35] gtable_0.3.0 zlibbioc_1.39.0
## [37] XVector_0.33.0 DelayedArray_0.19.1
## [39] BiocSingular_1.9.1 scales_1.1.1
## [41] DBI_1.1.1 edgeR_3.35.0
## [43] Rcpp_1.0.7 viridisLite_0.4.0
## [45] xtable_1.8-4 clue_0.3-59
## [47] rsvd_1.0.5 bit_4.0.4
## [49] MsCoreUtils_1.5.0 httr_1.4.2
## [51] ellipsis_0.3.2 farver_2.1.0
## [53] pkgconfig_2.0.3 XML_3.99-0.6
## [55] sass_0.4.0 locfit_1.5-9.4
## [57] utf8_1.2.2 labeling_0.4.2
## [59] tidyselect_1.1.1 rlang_0.4.11
## [61] later_1.2.0 AnnotationDbi_1.55.1
## [63] munsell_0.5.0 BiocVersion_3.14.0
## [65] cellranger_1.1.0 tools_4.1.0
## [67] cachem_1.0.5 cli_3.0.1
## [69] generics_0.1.0 RSQLite_2.2.7
## [71] broom_0.7.9 evaluate_0.14
## [73] fastmap_1.1.0 yaml_2.2.1
## [75] ragg_1.1.3 knitr_1.33
## [77] bit64_4.0.5 fs_1.5.0
## [79] KEGGREST_1.33.0 AnnotationFilter_1.17.1
## [81] sparseMatrixStats_1.5.2 mime_0.11
## [83] xml2_1.3.2 compiler_4.1.0
## [85] rstudioapi_0.13 beeswarm_0.4.0
## [87] filelock_1.0.2 curl_4.3.2
## [89] png_0.1-7 interactiveDisplayBase_1.31.2
## [91] reprex_2.0.1 bslib_0.2.5.1
## [93] stringi_1.7.3 highr_0.9
## [95] desc_1.3.0 lattice_0.20-44
## [97] ProtGenerics_1.25.1 Matrix_1.3-4
## [99] vctrs_0.3.8 pillar_1.6.2
## [101] lifecycle_1.0.0 BiocManager_1.30.16
## [103] jquerylib_0.1.4 BiocNeighbors_1.11.0
## [105] cowplot_1.1.1 irlba_2.3.3
## [107] bitops_1.0-7 httpuv_1.6.1
## [109] R6_2.5.0 promises_1.2.0.1
## [111] gridExtra_2.3 vipor_0.4.5
## [113] MASS_7.3-54 assertthat_0.2.1
## [115] rprojroot_2.0.2 withr_2.4.2
## [117] GenomeInfoDbData_1.2.6 parallel_4.1.0
## [119] hms_1.1.0 beachmat_2.9.0
## [121] grid_4.1.0 DelayedMatrixStats_1.15.2
## [123] rmarkdown_2.10 Rtsne_0.15
## [125] shiny_1.6.0 lubridate_1.7.10
## [127] ggbeeswarm_0.6.0