Application of the scp
pipeline to the adult cardiomyocytes data from Ai et al. (2025). This vignette is distributed under a CC BY-SA license.
scp 1.17.0
In this vignette, we will reanalyse single-cell proteomics data from (Ai et al. 2025).
iPSC-derived cardiomyocytes (iCMs) have been proposed as a model to study cardiovascular diseases. However, this requires that iCMs display the same biological features as adult cardiomyocytes (aCMs). The authors of the manuscript used MS-based single-cell proteomics to analyze metabolic changes in iPSCs during differentiation process and to compare the resulting iCMs to freshly-isolated aCMs.
Here, we will focus on the analysis of the aCMs data.
Single cardiomyocytes were isolated from 3 human hearts using enzymatic digestion isolation technique.
Cells were isolated from 4 regions of the heart: - LVepi: left ventricle, epi-myocardium - LVendo: left ventricle, endo-myocardium - LVmid: left ventricle, mid-myocardium - RV: right ventricle
Single cell proteomes were analyzed in a label-free approach. MS data was acquired in data-independent mode (DIA-MS). Data were analyzed with DIA-NN 1.8.1 and DIA-NN 1.8.2. The DIA-NN reports were downloaded from MassIVE (MSV000094438).
The following packages were used to compile this vignette.
library(QFeatures)
library(scp)
library(tidyverse)
library(patchwork)
We can now read in the results file from DIA-NN. We use the file
report.tsv
file generate by DIA-NN, that can be downloaded via the
MsDataHub package1 Version 1.7.1 or later.
and read it as a tibble
with read_tsv()
.
(acmsTab <- read_tsv(MsDataHub::Ai2025_aCMs_report.tsv()))
## # A tibble: 1,031,054 × 56
## File.Name Run Protein.Group Protein.Ids Protein.Names Genes PG.Quantity
## <chr> <chr> <chr> <chr> <chr> <chr> <dbl>
## 1 "X:\\TimsTOF… CMs_… Q9NQP4 Q9NQP4 PFD4_HUMAN PFDN4 26834.
## 2 "X:\\TimsTOF… CMs_… Q9NQP4 Q9NQP4 PFD4_HUMAN PFDN4 23905.
## 3 "X:\\TimsTOF… CMs_… Q9NQP4 Q9NQP4 PFD4_HUMAN PFDN4 19715.
## 4 "X:\\TimsTOF… CMs_… Q9NQP4 Q9NQP4 PFD4_HUMAN PFDN4 3046.
## 5 "X:\\TimsTOF… CMs_… Q9NQP4 Q9NQP4 PFD4_HUMAN PFDN4 22406.
## 6 "X:\\TimsTOF… CMs_… Q9NQP4 Q9NQP4 PFD4_HUMAN PFDN4 17476.
## 7 "X:\\TimsTOF… CMs_… Q9NQP4 Q9NQP4 PFD4_HUMAN PFDN4 18305.
## 8 "X:\\TimsTOF… CMs_… Q9NQP4 Q9NQP4 PFD4_HUMAN PFDN4 11097.
## 9 "X:\\TimsTOF… CMs_… Q9NQP4 Q9NQP4 PFD4_HUMAN PFDN4 20368.
## 10 "X:\\TimsTOF… CMs_… Q9NQP4 Q9NQP4 PFD4_HUMAN PFDN4 17047.
## # ℹ 1,031,044 more rows
## # ℹ 49 more variables: PG.Normalised <dbl>, PG.MaxLFQ <dbl>,
## # Genes.Quantity <dbl>, Genes.Normalised <dbl>, Genes.MaxLFQ <dbl>,
## # Genes.MaxLFQ.Unique <dbl>, Modified.Sequence <chr>,
## # Stripped.Sequence <chr>, Precursor.Id <chr>, Precursor.Charge <dbl>,
## # Q.Value <dbl>, PEP <dbl>, Global.Q.Value <dbl>, Protein.Q.Value <dbl>,
## # PG.Q.Value <dbl>, Global.PG.Q.Value <dbl>, GG.Q.Value <dbl>, …
We need to extract metadata from the file names, in particular: - File name - Sample name - Acquisition date - Subject number - Location in the heart from which each cell was extracted (see above) - Position in the 384 well plate - Position (BA18, …)
Our goal is then to create a data frame with 299 rows, one for each cell, with the information defined above as the columns.
## Extracting file name, sample name, date, subject, well position and
## position
tab <- tibble(File.Name = unique(acmsTab[[1]])) |>
mutate(Sample = sub("^.+CM_PROJECT\\\\", "", File.Name)) |>
mutate(Sample = sub("\\\\", "_", Sample)) |>
mutate(Date = ymd(as.integer(substring(Sample, 1, 6)))) |>
mutate(Subject = sub("^.+_(Subject[0-9])_.+$", "\\1", Sample)) |>
mutate(PlateWell = sub("^.+_([A-Z][0-9]+)_.+$", "\\1", Sample)) |>
mutate(PlateRow = gsub("[0-9]*", "", PlateWell)) |>
mutate(PlateColumn = gsub("[A-Z]*", "", PlateWell)) |>
mutate(PlateColumn = factor(PlateColumn, levels = 1:24)) |>
mutate(Position = sub("^.+_([A-Z]+[0-9]+)_1_[0-9]+\\.d$", "\\1", Sample)) |>
mutate(FileIndex = sub("^.+_1_([0-9]+)\\.d$", "\\1", Sample)) |>
mutate(FileIndex = factor(FileIndex, levels = sort(unique(FileIndex))))
## Extracting the heart locations
tab$HeartLocation <- NA
ExpectedLocations <- c("Lvendo", "Lvepi", "Lvmid", "RV", "sytox")
for (i in 1:5) {
loc <- ExpectedLocations[i]
tab$HeartLocation[grep(loc, tab$File.Name, ignore.case = TRUE)] <- loc
}
tab
## # A tibble: 299 × 10
## File.Name Sample Date Subject PlateWell PlateRow PlateColumn Position
## <chr> <chr> <date> <chr> <chr> <chr> <fct> <chr>
## 1 "X:\\TimsT… 22101… 2022-10-19 Subjec… B15 B 15 BB17
## 2 "X:\\TimsT… 22101… 2022-10-19 Subjec… F13 F 13 BF14
## 3 "X:\\TimsT… 22101… 2022-10-19 Subjec… I14 I 14 BI17
## 4 "X:\\TimsT… 22101… 2022-10-19 Subjec… A19 A 19 BA20
## 5 "X:\\TimsT… 22101… 2022-10-19 Subjec… B24 B 24 BB3
## 6 "X:\\TimsT… 22101… 2022-10-19 Subjec… C22 C 22 BC23
## 7 "X:\\TimsT… 22101… 2022-10-19 Subjec… D21 D 21 BD22
## 8 "X:\\TimsT… 22101… 2022-10-19 Subjec… D24 D 24 BD3
## 9 "X:\\TimsT… 22101… 2022-10-19 Subjec… E24 E 24 BE3
## 10 "X:\\TimsT… 22101… 2022-10-19 Subjec… F19 F 19 BF2
## # ℹ 289 more rows
## # ℹ 2 more variables: FileIndex <fct>, HeartLocation <chr>
After creating the data frame we can check that the values in the different columns.
table(tab[, c("Date","Subject")])
## Subject
## Date Subject3 Subject4 Subject5
## 2022-10-19 156 0 0
## 2022-10-22 0 0 21
## 2022-10-24 0 122 0
table(tab[, c("HeartLocation","Subject")])
## Subject
## HeartLocation Subject3 Subject4 Subject5
## Lvendo 35 34 0
## Lvepi 62 33 0
## Lvmid 29 23 0
## RV 30 32 0
## sytox 0 0 21
We can see that there are 21 cells from subject 5, annotated as “sytox”, acquired on one specific date. Sytox is a stain used for assessing cell viability. We will therefore remove these cells before we start doing statistical analysis.
To create the QFeatures
object, we need to add a column runCol
to
properly associate the colData
to each sample.
tab$runCol <- tab$File.Name
Now we can create the QFeatures
object, and include the metadata
that we extracted. We also use the fnames
argument2 Note that the
fnames
requires QFeatures
version 1.17.2 or later. to use
percursor identifiers (i.e. peptides sequence and precursor charge) as
assay rownames (this is important to correctly join the assays below).
acms <- readSCPfromDIANN(acmsTab,
colData = DataFrame(tab),
fnames = "Precursor.Id")
## Checking arguments.
## Loading data as a 'SummarizedExperiment' object.
## Splitting data in runs.
## Formatting sample annotations (colData).
## Formatting data as a 'QFeatures' object.
## Setting assay rownames.
# Setting the names of the QFeatures objects
names(acms) <- acms$Sample
acms
## An instance of class QFeatures containing 299 set(s):
## [1] 221019_CM_SUBJECT-03_CMs_Subject3_Lvendo_A13_BA14_1_7445.d: SingleCellExperiment with 6634 rows and 1 columns
## [2] 221019_CM_SUBJECT-03_CMs_Subject3_Lvendo_A14_BA15_1_7446.d: SingleCellExperiment with 4326 rows and 1 columns
## [3] 221019_CM_SUBJECT-03_CMs_Subject3_Lvendo_A15_BA16_1_7447.d: SingleCellExperiment with 6263 rows and 1 columns
## ...
## [297] 221024_Subject-04_CMs_Subject4_D_RV_G6_BG7_1_7841.d: SingleCellExperiment with 2108 rows and 1 columns
## [298] 221024_Subject-04_CMs_Subject4_D_RV_H4_BH7_1_7858.d: SingleCellExperiment with 2491 rows and 1 columns
## [299] 221024_Subject-04_CMs_Subject4_D_RV_I1_BI11_1_7861.d: SingleCellExperiment with 858 rows and 1 columns
The QFeatures
object contains 299 assays, on per single
cell.
We immediately replace 0 by NA
for downstream analysis.
acms <- zeroIsNA(acms, names(acms))
The objective of quality control is to remove low-quality data. These data can be filtered based on PSM-level annotations and on cell-level annotations.
We first explore the features annotation. We can retrieve all the data
annotations along the quantified values in a single long table using
longFormat()
lf <- longFormat(acms,
colvars = c("HeartLocation", "Subject", "FileIndex"),
rowvars = c("PEP", "RT", "Predicted.RT", "IM", "Predicted.IM")) |>
data.frame()
We first explore the distributions of the posterior error
probabilities (PEP) for precursor identification for each cell
separately, stratified by subject, a potential variable for batch
effects. We sort the cells according to the sequence of MS acquisition
(available in the FileIndex
column).
ggplot(lf) +
aes(x = log10(PEP),
y = FileIndex,
colour = Subject) +
geom_boxplot() +
theme(axis.text.y = element_blank(),
axis.ticks.y = element_blank())
The plot indicates that the identification errors of every precursor are stable across the experiment. We will therefore not use PEP for PSM filtering.
We now generate a similar plot, but focusing on the difference between the observed retention time (RT) and the predicted RT, where large differences may indicate wrong identification or issues during liquid chromatography. We hence plot the absolute difference between observed RT.
ggplot(lf,
aes(x = abs(RT - Predicted.RT),
y = FileIndex,
colour = Subject)) +
geom_boxplot() +
geom_hline(yintercept = 0.18, linetype = "dashed") +
scale_x_log10() +
theme(axis.text.y = element_blank(),
axis.ticks.y = element_blank())
## Warning in scale_x_log10(): log-10 transformation introduced infinite values.
## Warning: Removed 1961 rows containing non-finite outside the scale range
## (`stat_boxplot()`).
Again, the plot shows stable metrics across cells, with a few outliers
above 0.18 spread across the experiment. We will remove those outliers
using filterFeatures()
that takes the QFeatures
object and an
arbitrary formula to filter the PSM. Note that you can directly filter
on variables from the rowData by including them in the formula.
acms <- filterFeatures(acms, ~ abs(RT - Predicted.RT) < 0.18)
## 'RT' found in 299 out of 299 assay(s)
## 'Predicted.RT' found in 299 out of 299 assay(s)
Next, we remove precursors that map on multiple proteins, as these are
of limited use when interpreting downstream, we will remove them.
DIA-NN generated the isProteotypic
, however the table below shows
that some precursors considered as proteotypic may still be part of a
protein group where the Protein.Ids
contain multiple proteins
separated by a ;
.
rd <- rbindRowData(acms, names(acms))
table(isProteotypic = rd$Proteotypic,
isProteinGroup = grepl(";", rd$Protein.Ids))
## isProteinGroup
## isProteotypic FALSE TRUE
## 0 0 100377
## 1 929019 690
We here keep all precursor that are both proteotypic and not part of a protein group.
acms <- filterFeatures(acms,
~ !grepl(";", Protein.Names) &
Proteotypic == 1)
## 'Protein.Names' found in 299 out of 299 assay(s)
## 'Proteotypic' found in 299 out of 299 assay(s)
Finally, we add the information whether a precursor belongs to a
contaminant protein or not. We retrieve this information using the
cRAP database, through the camprotR
package.
head(contaminants <- camprotR::get_ccp_crap())
## [1] "P00330" "P02768" "P00883" "P0DUB6" "P0DTE7" "P0DTE8"
We next retrieve all the rowData table in a single table using
rbindRowData()
, we add a new column isContaminant
and inject it
back into the QFeatures
object after splitting it by assay.
rd <- rbindRowData(acms, names(acms))
rd$isContaminant <- rd$Protein.Ids %in% contaminants
rowData(acms) <- split(rd, rd$assay)
We decide not to remove contaminants as they may be used later for model exploration and diagnostics.
We now remove low-quality cells. We will compute a few metrics, explore the experimental design and assess how it impacts the computed metrics and we will conclude with removing cells identified as low-quality, if any.
First, we compute the median intensity within each cell. This is meant as a proxy for the amount of material that has been injected in the MS. Note that in single-cell experiments, difference in the amount of material injected are subject to technical variability, but also to biological variability (cell size, proteome complexity, cell density, …). Hence, different median intensities may not be a problematic artifact if different across heart location.
acms$MedianIntensity <- sapply(names(acms), function(i) {
median(log2(assay(acms[[i]])), na.rm = TRUE)
})
We also compute the number of identified precursors within each cell.
This relates to the number of rows in each set of the QFeatures
object.
acms$TotalIds <- nrows(acms)
We now explore the experimental design. First, the sample annotation contain subjects and different dates. The table below shows that each subject has been processed in a different day.
table(acms$Date, acms$Subject)
##
## Subject3 Subject4 Subject5
## 2022-10-19 156 0 0
## 2022-10-22 0 0 21
## 2022-10-24 0 122 0
This is not a problematic design as soon we are not interested in distinguishing between subject effects and date effects, which is the case for this experiment.
Next we will explore the plate design. We first retrieve all the
available sample annotations from the colData
and perform a little
data cleaning for improved visualisation.
We will first explore how the data acquisition strategy, by plotting the order of the the MS runs on the plate layout.
data.frame(colData(acms)) |>
ggplot(aes(y = PlateRow,
x = PlateColumn,
fill = as.numeric(FileIndex))) +
geom_tile() +
facet_grid(~ Date) +
scale_fill_continuous(type = "viridis") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90, hjust = 1),
legend.position = "bottom")
We can see that each plate as been acquired sequentially (as already suggested by the date annotation), in order of the rows. Next, we now plot the plate layout with respect to the heart location of the cells, for each data/subject separately.
data.frame(colData(acms)) |>
ggplot(aes(y = PlateRow,
x = PlateColumn,
fill = HeartLocation)) +
geom_tile() +
facet_grid(~ Date) +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90, hjust = 1),
legend.position = "bottom")
Two observations can be drawn from the plot. First, all sytox
cells
are from one date and hence their effect cannot be estimated.
According to the original study, this sample preparation for this
subject failed. Hence we will remove the corresponding data later.
Also cells have been assigned in block on the plate, one block for
each heart location. While this is probably the consequence of
practical constraints, this means we can no longer distinguish between
well column effect (if any) and heart location effects. Hence, we will
ignore cell column effects.
We plot the same layout, this time colouring by the number of precursors identified.
data.frame(colData(acms)) |>
ggplot(aes(y = PlateRow,
x = PlateColumn,
fill = TotalIds)) +
geom_tile() +
facet_grid(~ Date) +
scale_fill_continuous(type = "viridis") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90, hjust = 1),
legend.position = "bottom")
Columns 10-24 in the first plate seem to have more identified precursors than the other columns, potentially indicating some batch effect during cell sorting. A cell type effect may also be an explanation, but we do not see the effect in the last plate.
data.frame(colData(acms)) |>
ggplot(aes(y = PlateRow,
x = PlateColumn,
fill = MedianIntensity)) +
geom_tile() +
facet_grid(~ Date) +
scale_fill_continuous(type = "viridis") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 90, hjust = 1),
legend.position = "bottom")
Interestingly, this plot shows that columns 1-9 in the first plate have lower median intensity. We now combine the information
data.frame(colData(acms)) |>
ggplot(aes(y = MedianIntensity,
x = TotalIds,
colour = as.numeric(FileIndex),
shape = Subject)) +
geom_point(size = 2) +
scale_colour_continuous(type = "viridis")
Upon data exploration, we identified that one of the subjects should be removed.
We now join the 299 single-cell assays into one containing all precursors across all cells.
(acms <- joinAssays(acms, i = names(acms), name = "precursors"))
## An instance of class QFeatures containing 300 set(s):
## [1] 221019_CM_SUBJECT-03_CMs_Subject3_Lvendo_A13_BA14_1_7445.d: SingleCellExperiment with 6082 rows and 1 columns
## [2] 221019_CM_SUBJECT-03_CMs_Subject3_Lvendo_A14_BA15_1_7446.d: SingleCellExperiment with 3958 rows and 1 columns
## [3] 221019_CM_SUBJECT-03_CMs_Subject3_Lvendo_A15_BA16_1_7447.d: SingleCellExperiment with 5729 rows and 1 columns
## ...
## [298] 221024_Subject-04_CMs_Subject4_D_RV_H4_BH7_1_7858.d: SingleCellExperiment with 2206 rows and 1 columns
## [299] 221024_Subject-04_CMs_Subject4_D_RV_I1_BI11_1_7861.d: SingleCellExperiment with 689 rows and 1 columns
## [300] precursors: SingleCellExperiment with 11731 rows and 299 columns
We then log-transforming our data for downstream statistical
analyses. The logTransfrorm()
function will create a new assay,
named precursors_log
.
(acms <- logTransform(acms, "precursors", "precursors_log"))
## An instance of class QFeatures containing 301 set(s):
## [1] 221019_CM_SUBJECT-03_CMs_Subject3_Lvendo_A13_BA14_1_7445.d: SingleCellExperiment with 6082 rows and 1 columns
## [2] 221019_CM_SUBJECT-03_CMs_Subject3_Lvendo_A14_BA15_1_7446.d: SingleCellExperiment with 3958 rows and 1 columns
## [3] 221019_CM_SUBJECT-03_CMs_Subject3_Lvendo_A15_BA16_1_7447.d: SingleCellExperiment with 5729 rows and 1 columns
## ...
## [299] 221024_Subject-04_CMs_Subject4_D_RV_I1_BI11_1_7861.d: SingleCellExperiment with 689 rows and 1 columns
## [300] precursors: SingleCellExperiment with 11731 rows and 299 columns
## [301] precursors_log: SingleCellExperiment with 11731 rows and 299 columns
While we will be performing our analyses with precursor data, below we show how to aggregate these data into peptide and protein data.
acms <- aggregateFeatures(acms,
i = "precursors_log",
name = "peptides",
fcol = "Modified.Sequence",
fun = colMedians,
na.rm = TRUE)
## Your quantitative data contain missing values. Please read the relevant
## section(s) in the aggregateFeatures manual page regarding the effects
## of missing values on data aggregation.
acms <- aggregateFeatures(acms,
i = "peptides",
name = "proteins",
fcol = "Protein.Ids",
fun = colMedians,
na.rm = TRUE)
## Your quantitative data contain missing values. Please read the relevant
## section(s) in the aggregateFeatures manual page regarding the effects
## of missing values on data aggregation.
We now have two additional assays names peptides
and proteins
:
acms
## An instance of class QFeatures containing 303 set(s):
## [1] 221019_CM_SUBJECT-03_CMs_Subject3_Lvendo_A13_BA14_1_7445.d: SingleCellExperiment with 6082 rows and 1 columns
## [2] 221019_CM_SUBJECT-03_CMs_Subject3_Lvendo_A14_BA15_1_7446.d: SingleCellExperiment with 3958 rows and 1 columns
## [3] 221019_CM_SUBJECT-03_CMs_Subject3_Lvendo_A15_BA16_1_7447.d: SingleCellExperiment with 5729 rows and 1 columns
## ...
## [301] precursors_log: SingleCellExperiment with 11731 rows and 299 columns
## [302] peptides: SingleCellExperiment with 10277 rows and 299 columns
## [303] proteins: SingleCellExperiment with 2044 rows and 299 columns
We will model the data following the scplainer pipeline (Vanderaa and Gatto 2024).
We extract the log-transformed precursor assez and remove sytox cells (all from subject 5):
sce <- getWithColData(acms, "precursors_log")
## Warning: 'experiments' dropped; see 'drops()'
sce <- sce[, sce$HeartLocation != "sytox"]
We can now model the precursor quantities considering median intensity in each cell, plate position (column), subject and heart location as factors that influence the measure abundances.
sce <- scpModelWorkflow(
sce,
formula = ~ 1 + ## intercept
## normalisation
MedianIntensity +
## batch effects
PlateRow +
Subject +
## biological variability
HeartLocation,
verbose = FALSE)
Note that above, we set the verbose
argument to FALSE
to switch
the progress bar out and avoid unnecessary output in this
vignette. For interactive use, we recommend to keep the default
(verbose = TRUE
) to monitor progress.
Here, we define the missing data filtering threshold: we want at least 3 measurement for each coefficient estimation.
scpModelFilterThreshold(sce) <- 3
scpModelFilterPlot(sce)
## To change the threshold, use:
## scpModelFilterThreshold(object, name) <- threshold
(vaRes <- scpVarianceAnalysis(sce))
## DataFrameList of length 5
## names(5): Residuals MedianIntensity PlateRow Subject HeartLocation
## Add annotations from the rowData()
vaRes <- scpAnnotateResults(
vaRes, rowData(sce), by = "feature", by2 = "Precursor.Id"
)
scpVariancePlot(vaRes)
scpVariancePlot(
vaRes, top = 20, by = "percentExplainedVar", effect = "Subject",
decreasing = TRUE, combined = FALSE, fcol = "Protein.Names"
) + scpVariancePlot(
vaRes, top = 10, by = "percentExplainedVar", effect = "HeartLocation",
decreasing = TRUE, combined = FALSE, fcol = "Protein.Names"
) + plot_layout(ncol = 1, guides = "collect")
locations <- unique(sce$HeartLocation)
combinations <- combn(locations, 2)
contrasts <- lapply(1:ncol(combinations),
function(i) c("HeartLocation",
combinations[, i]))
(daRes <- scpDifferentialAnalysis(
sce, contrast = contrasts
))
## DataFrameList of length 6
## names(6): HeartLocation_Lvendo_vs_Lvepi ... HeartLocation_Lvmid_vs_RV
daRes[[1]]
## DataFrame with 6051 rows and 7 columns
## feature Estimate SE Df tstatistic pvalue
## <character> <numeric> <numeric> <numeric> <numeric> <numeric>
## 1 ILNPAAIPEG... 0.133536 0.288787 256 0.462402 6.44186e-01
## 2 VFDEFKPLVE... 1.099511 0.288538 256 3.810624 1.73664e-04
## 3 VLSIGDGIAR... -1.083659 0.262455 255 -4.128928 4.94204e-05
## 4 ELTTEIDNNI... -0.417190 0.270743 254 -1.540908 1.24584e-01
## 5 KVPQVSTPTL... 0.174581 0.244313 254 0.714580 4.75525e-01
## ... ... ... ... ... ... ...
## 6047 ADAPEEEDHV... 0.8504744 0.581988 65 1.461327 0.14874334
## 6048 GGVSDHSAFL... 0.7455283 0.280467 63 2.658166 0.00994636
## 6049 VVNISSMLGR... -0.3850981 0.192371 30 -2.001847 0.05441535
## 6050 ELSGLGSALK... -0.1667748 0.194061 146 -0.859392 0.39153328
## 6051 ELSTYYQVEP... -0.0600048 0.239444 50 -0.250601 0.80314989
## padj
## <numeric>
## 1 0.80546425
## 2 0.00294238
## 3 0.00134100
## 4 0.28573059
## 5 0.67930494
## ... ...
## 6047 0.3419627
## 6048 0.0419948
## 6049 0.1800259
## 6050 0.7845657
## 6051 1.0000000
daRes <-
scpAnnotateResults(
daRes, rowData(sce),
by = "feature", by2 = "Precursor.Id")
scpVolcanoPlot(daRes,
textBy = "Protein.Names",
pointParams = list(aes(colour = Lib.Q.Value)))[[1]]
(caRes <- scpComponentAnalysis(
sce, ncomp = 15, method = "APCA"))
## List of length 2
## names(2): bySample byFeature
caResCells <- caRes$bySample
sce$cell <- colnames(sce)
caResCells <- scpAnnotateResults(caResCells,
colData(sce), by = "cell")
scpComponentPlot(
caResCells,
pointParams = list(aes(colour = HeartLocation,
shape = Subject),
size = 3)) |>
wrap_plots() +
plot_layout(guides = "collect")
library(scater)
## Loading required package: SingleCellExperiment
## Loading required package: scuttle
sce <-addReducedDims(sce, caRes$bySample)
sce <- runTSNE(sce, dimred = "APCA_HeartLocation")
plotTSNE(sce, colour_by = "HeartLocation") +
plotTSNE(sce, colour_by = "Subject") +
plotTSNE(sce, colour_by = "PlateRow") +
plotTSNE(sce, colour_by = "MedianIntensity")
We can not extract the hear location effect matrix, removing the subject, plate position and median intensity effects for downstream analysis.
scebr <- scpRemoveBatchEffect(
sce, effects = c("Subject", "PlateRow", "MedianIntensity"),
intercept = TRUE)
Coming soon.
This vignette was prepared as part of the EuBIC 2025 developer hackathon.
using the follwing software
## R Under development (unstable) (2025-01-25 r87633)
## Platform: x86_64-pc-linux-gnu
## Running under: Ubuntu 24.04.2 LTS
##
## Matrix products: default
## BLAS: /opt/Rdevel/lib/R/lib/libRblas.so
## LAPACK: /opt/Rdevel/lib/R/lib/libRlapack.so; LAPACK version 3.12.0
##
## locale:
## [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C
## [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8
## [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8
## [7] LC_PAPER=en_US.UTF-8 LC_NAME=C
## [9] LC_ADDRESS=C LC_TELEPHONE=C
## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C
##
## time zone: Europe/Brussels
## tzcode source: system (glibc)
##
## attached base packages:
## [1] stats4 stats graphics grDevices utils datasets methods
## [8] base
##
## other attached packages:
## [1] scater_1.35.1 scuttle_1.17.0
## [3] SingleCellExperiment_1.29.1 MsDataHub_1.7.1
## [5] patchwork_1.3.0 lubridate_1.9.4
## [7] forcats_1.0.0 stringr_1.5.1
## [9] dplyr_1.1.4 purrr_1.0.2
## [11] readr_2.1.5 tidyr_1.3.1
## [13] tibble_3.2.1 ggplot2_3.5.1
## [15] tidyverse_2.0.0 scp_1.17.0
## [17] QFeatures_1.17.2 MultiAssayExperiment_1.33.8
## [19] SummarizedExperiment_1.37.0 Biobase_2.67.0
## [21] GenomicRanges_1.59.1 GenomeInfoDb_1.43.4
## [23] IRanges_2.41.2 S4Vectors_0.45.2
## [25] BiocGenerics_0.53.6 generics_0.1.3
## [27] MatrixGenerics_1.19.1 matrixStats_1.5.0
## [29] BiocStyle_2.35.0
##
## loaded via a namespace (and not attached):
## [1] RColorBrewer_1.1-3 jsonlite_1.9.0 magrittr_2.0.3
## [4] ggbeeswarm_0.7.2 magick_2.8.5 farver_2.1.2
## [7] MALDIquant_1.22.3 rmarkdown_2.29 vctrs_0.6.5
## [10] memoise_2.0.1 tinytex_0.54 htmltools_0.5.8.1
## [13] S4Arrays_1.7.1 BiocBaseUtils_1.9.0 AnnotationHub_3.15.0
## [16] curl_6.2.1 BiocNeighbors_2.1.2 SparseArray_1.7.4
## [19] mzID_1.45.0 sass_0.4.9 bslib_0.8.0
## [22] plyr_1.8.9 impute_1.81.0 cachem_1.1.0
## [25] igraph_2.1.4 iterators_1.0.14 mime_0.12
## [28] lifecycle_1.0.4 pkgconfig_2.0.3 rsvd_1.0.5
## [31] Matrix_1.7-2 R6_2.6.1 fastmap_1.2.0
## [34] GenomeInfoDbData_1.2.13 clue_0.3-66 digest_0.6.37
## [37] fdrtool_1.2.18 pcaMethods_1.99.0 colorspace_2.1-1
## [40] AnnotationDbi_1.69.0 irlba_2.3.5.1 ExperimentHub_2.15.0
## [43] camprotR_0.0.0.9000 lpsymphony_1.35.0 RSQLite_2.3.9
## [46] beachmat_2.23.6 filelock_1.0.3 labeling_0.4.3
## [49] timechange_0.3.0 httr_1.4.7 abind_1.4-8
## [52] compiler_4.5.0 doParallel_1.0.17 bit64_4.6.0-1
## [55] withr_3.0.2 BiocParallel_1.41.2 viridis_0.6.5
## [58] DBI_1.2.3 MASS_7.3-64 rappdirs_0.3.3
## [61] DelayedArray_0.33.4 mzR_2.41.1 tools_4.5.0
## [64] vipor_0.4.7 PSMatch_1.11.0 beeswarm_0.4.0
## [67] glue_1.8.0 grid_4.5.0 Rtsne_0.17
## [70] cluster_2.1.8 reshape2_1.4.4 gtable_0.3.6
## [73] tzdb_0.4.0 preprocessCore_1.69.0 hms_1.1.3
## [76] ScaledMatrix_1.15.0 BiocSingular_1.23.0 metapod_1.15.0
## [79] utf8_1.2.4 XVector_0.47.2 foreach_1.5.2
## [82] ggrepel_0.9.6 BiocVersion_3.21.1 pillar_1.10.1
## [85] limma_3.63.3 vroom_1.6.5 robustbase_0.99-4-1
## [88] BiocFileCache_2.15.1 lattice_0.22-6 bit_4.5.0.1
## [91] tidyselect_1.2.1 Biostrings_2.75.3 knitr_1.49
## [94] gridExtra_2.3 bookdown_0.42 ProtGenerics_1.39.2
## [97] IHW_1.35.0 xfun_0.50 statmod_1.5.0
## [100] MSnbase_2.33.2
## [ reached 'max' / getOption("max.print") -- omitted 30 entries ]