The "MSnSet" Class for MS Proteomics Expression Data and Meta-Data
MSnSet-class.Rd
The MSnSet
holds quantified expression data for MS proteomics
data and the experimental meta-data.
The MSnSet
class is derived from the
"eSet"
class and mimics the
"ExpressionSet"
class classically used for
microarray data.
Objects from the Class
The constructor MSnSet(exprs, fData, pData)
can be used to
create MSnSet
instances. Argument exprs
is a
matrix
and fData
and pData
must be of class
data.frame
or "AnnotatedDataFrame"
and all
must meet the dimensions and name validity constrains.
Objects can also be created by calls of the form new("MSnSet",
exprs, ...)
. See also "ExpressionSet"
for
helpful information. Expression data produced from other softwares
can thus make use of this standardized data container to benefit
R
and Bioconductor
packages. Proteomics expression data
available as spreadsheets, as produced by third-party software such as
Proteome Discoverer, MaxQuant, ... can be imported using the
readMSnSet
and readMSnSet2
functions.
Coercion methods are also available to transform MSnSet
objects
to IBSpectra
, to data.frame
and to/from
ExpressionSet
and SummarizedExperiment
objects. In the
latter case, the metadata available in the protocolData
,
experimentData
are completely dropped, and only the logging
information of the processingData
slot is retained. All these
metadata can be subsequently be added using the
addMSnSetMetadata
(see examples below). When converting a
SummarizedExperiment
to an MSnSet
, the respective
metadata slots will be populated if available in the
SummarizedExperiment
metadata.
In the frame of the MSnbase
package, MSnSet
instances
can be generated from "MSnExp"
experiments using
the quantify
method).
Slots
qual
:Object of class
"data.frame"
that records peaks data for each of the reporter ions to be used as quality metrics.processingData
:Object of class
"MSnProcess"
that records all processing.assayData
:Object of class
"assayData"
containing a matrix with equal with column number equal tonrow(phenoData)
.assayData
must contain a matrixexprs
with rows represening features (e.g., reporters ions) and columns representing samples. See the"AssayData"
class,exprs
andassayData
accessor for more details. This slot in indirectly inherited from"eSet"
.phenoData
:Object of class
"AnnotatedDataFrame"
containing experimenter-supplied variables describing sample (i.e the individual tags for an labelled MS experiment) (indireclty inherited from"eSet"
). SeephenoData
and the"eSet"
class for more details. This slot can be accessed as adata.frame
withpData
and be replaced by a new valid (i.e. of compatible dimensions and row names)data.frame
withpData()<-
.featureData
:Object of class
"AnnotatedDataFrame"
containing variables describing features (spectra in our case), e.g. identificaiton data, peptide sequence, identification score,... (inherited indirectly from"eSet"
). SeefeatureData
and the"eSet"
class for more details. This slot can be accessed as adata.frame
withfData
and be replaced by a new valid (i.e. of compatible dimensions and row names)data.frame
withfData()<-
.experimentData
:Object of class
"MIAPE"
, containing details of experimental methods (inherited from"eSet"
). SeeexperimentData
and the"eSet"
class for more details.annotation
:not used here.
protocolData
:Object of class
"AnnotatedDataFrame"
containing equipment-generated variables (inherited indirectly from"eSet"
). SeeprotocolData
and the"eSet"
class for more details..__classVersion__
:Object of class
"Versions"
describing the versions of R, the Biobase package,"eSet"
,"pSet"
andMSnSet
of the current instance. Intended for developer use and debugging (inherited indirectly from"eSet"
).
Extends
Class "eSet"
, directly.
Class "VersionedBiobase"
, by class "eSet", distance 2.
Class "Versioned"
, by class "eSet", distance 3.
Methods
MSnSet specific methods or over-riding it's super-class are described
below. See also more "eSet"
for
inherited methods.
- acquisitionNum
acquisitionNum(signature(object = "MSnSet"))
: Returns the a numeric vector with acquisition number of each spectrum. The vector names are the corresponding spectrum names. The information is extracted from the object'sfeatureData
slot.- fromFile
fromFile(signature(object = "MSnSet"))
: get the index of the file (infileNames(object)
) from which the raw spectra from which the corresponding feature were originally read. The relevant information is extracted from the object'sfeatureData
slot.Returns a numeric vector with names corresponding to the spectrum names.
- dim
signature(x = "MSnSet")
: Returns the dimensions of object's assay data, i.e the number of samples and the number of features.- fileNames
signature(object = "MSnSet")
: Access file names in theprocessingData
slot.- msInfo
signature(object = "MSnSet")
: Prints the MIAPE-MS meta-data stored in theexperimentData
slot.- processingData
signature(object = "MSnSet")
: Access theprocessingData
slot.- show
signature(object = "MSnSet")
: Displays object content as text.- qual
signature(object = "MSnSet")
: Access the reporter ion peaks description.- purityCorrect
signature(object = "MSnSet", impurities = "matrix")
: performs reporter ions purity correction. SeepurityCorrect
documentation for more details.- normalise
signature(object = "MSnSet")
: PerformsMSnSet
normalisation. Seenormalise
for more details.- t
signature(x = "MSnSet")
: Returns a transposedMSnSet
object where features are now aligned along columns and samples along rows and thephenoData
andfeatureData
slots have been swapped. TheprotocolData
slot is always dropped.- as(,"ExpressionSet")
signature(x = "MSnSet")
: Coerce object fromMSnSet
toExpressionSet-class
. TheexperimentData
slot is converted to aMIAME
instance. It is also possible to coerce anExpressionSet
to andMSnSet
, in which case theexperimentData
slot is newly initialised.- as(,"IBSpectra")
signature(x = "MSnSet")
: Coerce object fromMSnSet
toIBSpectra
from theisobar
package.- as(,"data.frame")
signature(x = "MSnSet")
: Coerce object fromMSnSet
todata.frame
. TheMSnSet
is transposed and thePhenoData
slot is appended.- as(,"SummarizedExperiment")
signature(x = "MSnSet")
: Coerce object fromMSnSet
toSummarizedExperiment
. Only part of the metadata is retained. SeeaddMSnSetMetadata
and the example below for details.- write.exprs
signature(x = "MSnSet")
: Writes expression values to a tab-separated file (default istmp.txt
). ThefDataCols
parameter can be used to specify whichfeatureData
columns (as column names, column number orlogical
) to append on the right of the expression matrix. The following arguments are the same aswrite.table
.- combine
signature(x = "MSnSet", y = "MSnSet", ...)
: Combines 2 or moreMSnSet
instances according to their feature names. Note that thequal
slot and the processing information are silently dropped.- topN
signature(object = "MSnSet", groupBy, n = 3, fun, ..., verbose = isMSnbaseVerbose())
: Selects then
most intense features (typically peptides or spectra) out of all available for each set defined bygroupBy
(typically proteins) and creates a new instance of classMSnSet
. If less thann
features are available, all are selected. Thencol(object)
features are summerised usingfun
(default issum
) prior to be ordered in decreasing order. Additional parameters can be passed tofun
through...
, for instance to control the behaviour oftopN
in case ofNA
values. (Works also withmatrix
instances.)See also the
nQuants
function to retrieve the actual number of retained peptides out ofn
.A complete use case using
topN
andnQuants
is detailed in thesynapter
package vignette.- filterNA
signature(object = "MSnSet", pNA = "numeric", pattern = "character", droplevels = "logical")
: This method subsetsobject
by removing features that have (strictly) more thanpNA
percent of NA values. DefaultpNA
is 0, which removes any feature that exhibits missing data. The method can also be used with a character pattern composed of0
or1
characters only. A0
represent a column/sample that is allowed a missing values, while columns/samples with and1
must not haveNA
s.This method also accepts
matrix
instances.droplevels
defines whether unused levels in the feature meta-data ought to be lost. Default isTRUE
. See thedroplevels
method below.See also the
is.na.MSnSet
andplotNA
methods for missing data exploration.- filterZero
signature(object = "MSnSet", pNA = "numeric", pattern = "character", droplevels = "logical")
: AsfilterNA
, but for zeros.- filterMsLevel
signature(object = "MSnSet", msLevel. = "numeric", fcol = "character")
Keeps only spectra with levelmsLevel.
, as defined by thefcol
feature variable (default is"msLevel"
).- log
signature(object = "MSnSet", base = "numeric")
Log transformsexprs(object)
usingbase::log
.base
(defaults ise='exp(1)'
) must be a positive or complex number, the base with respect to which logarithms are computed.- droplevels
signature(x = "MSnSet", ...)
Drops the unused factor levels in thefeatureData
slot. Seedroplevels
for details.- impute
signature(object = "MSnSet", ...)
Performs data imputation on theMSnSet
object. Seeimpute
for more details.- trimws
signature(object = "MSnSet", ...)
Trim leading and/or trailing white spaces in the feature data slot. Also available fordata.frame
objects. See?base::trimws
for details.
Additional accessors for the experimental metadata
(experimentData
slot) are defined. See
"MIAPE"
for details.
Plotting
- meanSdPlot
signature(object = "MSnSet")
Plots row standard deviations versus row means. SeemeanSdPlot
(vsn
package) for more details.- image
signature(x = "MSnSet", facetBy = "character", sOrderBy = "character", legend = "character", low = "character", high = "character", fnames = "logical", nmax = "numeric")
Produces an heatmap of expression values in thex
object. Simple horizontal facetting is enabled by passing a single character asfacetBy
. Arbitrary facetting can be performed manually by saving the return value of the method (see example below). Re-ordering of the samples is possible by providing the name of a phenotypic variable tosOrderBy
. The title of the legend can be set withlegend
and the colours with thelow
andhigh
arguments. If any negative value is detected in the data, the values are considered as log fold-changes and a divergent colour scale is used. Otherwise, a gradient from low to high is used. To scale the quantitative data inx
prior to plotting, please see thescale
method.When there are more than
nmax
(default is 50) features/rows, these are not printed. This behaviour can be controlled by settingfnames
toTRUE
(always print) orFALSE
(never print). See examples below.The code is based on Vlad Petyuk's
vp.misc::image_msnset
. The previous version of this method is still available through theimage2
function.- plotNA
signature(object = "MSnSet", pNA = "numeric")
Plots missing data for anMSnSet
instance.pNA
is anumeric
of length 1 that specifies the percentage of accepted missing data values per features. This value will be highlighted with a point on the figure, illustrating the overall percentage of NA values in the full data set and the number of proteins retained. Default is 1/2. See alsoplotNA
.- MAplot
signature(object = "MSnSet", log.it = "logical", base = "numeric", ...)
Produces MA plots (Ratio as a function of average intensity) for the samples inobject
. Ifncol(object) == 2
, then one MA plot is produced using thema.plot
function from theaffy
package. Ifobject
has more than 2 columns, thenmva.pairs
.log.it
specifies is the data should be log-transformed (default isTRUE
) usingbase
. Further...
arguments will be passed to the respective functions.- addIdentificationData
signature(object = "MSnSet", ...)
: Adds identification data to aMSnSet
instance. SeeaddIdentificationData
documentation for more details and examples.- removeNoId
signature(object = "MSnSet", fcol = "pepseq", keep = NULL)
: Removes non-identified features. SeeremoveNoId
documentation for more details and examples.- removeMultipleAssignment
signature(object = "MSnSet", fcol = "nprot")
: Removes protein groups (or feature belong to protein groups) with more than one member. The latter is defined by extracting a feature variable (default is"nprot"
). Also removes non-identified features.- idSummary
signature(object = "MSnSet", ...)
: Prints a summary that lists the percentage of identified features per file (calledcoverage
).
Functions
- updateFvarLabels
signature(object, label, sep)
This function updatesobject
's featureData variable labels by appendinglabel
. By default,label
is the variable name and the separatorsep
is.
.- updateSampleNames
signature(object, label, sep)
This function updatesobject
's sample names by appendinglabel
. By default,label
is the variable name and the separatorsep
is.
.- updateFeatureNames
signature(object, label, sep)
This function updatesobject
's feature names by appendinglabel
. By default,label
is the variable name and the separatorsep
is.
.- ms2df
signature(x, fcols)
Coerces theMSnSet
instance to adata.frame
. The direction of the data is retained and the feature variable labels that matchfcol
are appended to the expression values. See alsoas(x, "data.frame")
above.- addMSnSetMetadata
signature(x, y)
When coercing anMSnSet
y
to aSummarizedExperiment
x
withx <- as(y, "SummarizedExperiment")
, most ofy
's metadata is lost. Only the file names, the processing log and the MSnbase version from theprocessingData
slots are passed along. TheaddMSnSetMetadata
function can be used to add the completeprocessingData
,experimentData
andprotocolData
slots. The downside of this is that MSnbase is now required to use theSummarizedExperiment
object.
See also
"eSet"
, "ExpressionSet"
and
quantify
. MSnSet
quantitation values and
annotation can be exported to a file with
write.exprs
. See readMSnSet
to
create and MSnSet
using data available in a spreadsheet or
data.frame
.
Examples
data(msnset)
msnset <- msnset[10:15]
exprs(msnset)[1, c(1, 4)] <- NA
exprs(msnset)[2, c(1, 2)] <- NA
is.na(msnset)
#> iTRAQ4.114 iTRAQ4.115 iTRAQ4.116 iTRAQ4.117
#> X18 TRUE FALSE FALSE TRUE
#> X19 TRUE TRUE FALSE FALSE
#> X2 FALSE FALSE FALSE FALSE
#> X20 FALSE FALSE FALSE FALSE
#> X21 FALSE FALSE FALSE FALSE
#> X22 FALSE FALSE FALSE FALSE
featureNames(filterNA(msnset, pNA = 1/4))
#> [1] "X2" "X20" "X21" "X22"
featureNames(filterNA(msnset, pattern = "0110"))
#> [1] "X18" "X2" "X20" "X21" "X22"
M <- matrix(rnorm(12), 4)
pd <- data.frame(otherpdata = letters[1:3])
fd <- data.frame(otherfdata = letters[1:4])
x0 <- MSnSet(M, fd, pd)
sampleNames(x0)
#> [1] "1" "2" "3"
M <- matrix(rnorm(12), 4)
colnames(M) <- LETTERS[1:3]
rownames(M) <- paste0("id", LETTERS[1:4])
pd <- data.frame(otherpdata = letters[1:3])
rownames(pd) <- colnames(M)
fd <- data.frame(otherfdata = letters[1:4])
rownames(fd) <- rownames(M)
x <- MSnSet(M, fd, pd)
sampleNames(x)
#> [1] "A" "B" "C"
## Visualisation
library("pRolocdata")
data(dunkley2006)
image(dunkley2006)
## Changing colours
image(dunkley2006, high = "darkgreen")
image(dunkley2006, high = "darkgreen", low = "yellow")
## Forcing feature names
image(dunkley2006, fnames = TRUE)
## Facetting
image(dunkley2006, facetBy = "replicate")
p <- image(dunkley2006)
library("ggplot2") ## for facet_grid
p + facet_grid(replicate ~ membrane.prep, scales = 'free', space = 'free')
p + facet_grid(markers ~ replicate)
## Fold-changes
dd <- dunkley2006
exprs(dd) <- exprs(dd) - 0.25
image(dd)
image(dd, low = "green", high = "red")
## Feature names are displayed by default for smaller data
dunkley2006 <- dunkley2006[1:25, ]
image(dunkley2006)
image(dunkley2006, legend = "hello")
## Coercion
if (require("SummarizedExperiment")) {
data(msnset)
se <- as(msnset, "SummarizedExperiment")
metadata(se) ## only logging
se <- addMSnSetMetadata(se, msnset)
metadata(se) ## all metadata
msnset2 <- as(se, "MSnSet")
processingData(msnset2)
}
#> Loading required package: SummarizedExperiment
#> Loading required package: MatrixGenerics
#> Loading required package: matrixStats
#>
#> Attaching package: ‘matrixStats’
#> The following objects are masked from ‘package:Biobase’:
#>
#> anyMissing, rowMedians
#>
#> Attaching package: ‘MatrixGenerics’
#> The following objects are masked from ‘package:matrixStats’:
#>
#> colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
#> colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
#> colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
#> colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
#> colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
#> colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
#> colWeightedMeans, colWeightedMedians, colWeightedSds,
#> colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
#> rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
#> rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
#> rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
#> rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
#> rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
#> rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
#> rowWeightedSds, rowWeightedVars
#> The following object is masked from ‘package:Biobase’:
#>
#> rowMedians
#> Loading required package: GenomicRanges
#> Loading required package: IRanges
#> Loading required package: GenomeInfoDb
#> - - - Processing information - - -
#> Data loaded: Wed May 11 18:54:39 2011
#> iTRAQ4 quantification by trapezoidation: Wed Apr 1 21:41:53 2015
#> Subset [55,4][6,4] Tue Oct 15 15:25:49 2024
#> MSnbase version: 1.1.22
as(msnset, "ExpressionSet")
#> ExpressionSet (storageMode: lockedEnvironment)
#> assayData: 6 features, 4 samples
#> element names: exprs
#> protocolData: none
#> phenoData
#> sampleNames: iTRAQ4.114 iTRAQ4.115 iTRAQ4.116 iTRAQ4.117
#> varLabels: mz reporters
#> varMetadata: labelDescription
#> featureData
#> featureNames: X18 X19 ... X22 (6 total)
#> fvarLabels: spectrum ProteinAccession ... collision.energy (15 total)
#> fvarMetadata: labelDescription
#> experimentData: use 'experimentData(object)'
#> Annotation: No annotation