The "MSnSet" Class for MS Proteomics Expression Data and Meta-Data
MSnSet-class.RdThe MSnSet holds quantified expression data for MS proteomics
  data and the experimental meta-data.
  The MSnSet class is derived from the
  "eSet" class and mimics the
  "ExpressionSet" class classically used for
  microarray data.
Objects from the Class
The constructor MSnSet(exprs, fData, pData) can be used to
  create MSnSet instances. Argument exprs is a
  matrix and fData and pData must be of class
  data.frame or "AnnotatedDataFrame" and all
  must meet the dimensions and name validity constrains.
Objects can also be created by calls of the form new("MSnSet",
  exprs, ...).  See also "ExpressionSet" for
  helpful information.  Expression data produced from other softwares
  can thus make use of this standardized data container to benefit
  R and Bioconductor packages. Proteomics expression data
  available as spreadsheets, as produced by third-party software such as
  Proteome Discoverer, MaxQuant, ... can be imported using the
  readMSnSet and readMSnSet2 functions.
Coercion methods are also available to transform MSnSet objects
  to IBSpectra, to data.frame and to/from
  ExpressionSet and SummarizedExperiment objects. In the
  latter case, the metadata available in the protocolData,
  experimentData are completely dropped, and only the logging
  information of the processingData slot is retained. All these
  metadata can be subsequently be added using the
  addMSnSetMetadata (see examples below). When converting a
  SummarizedExperiment to an MSnSet, the respective
  metadata slots will be populated if available in the
  SummarizedExperiment metadata.
In the frame of the MSnbase package, MSnSet instances
  can be generated from "MSnExp" experiments using
  the quantify method).
Slots
qual:Object of class
"data.frame"that records peaks data for each of the reporter ions to be used as quality metrics.processingData:Object of class
"MSnProcess"that records all processing.assayData:Object of class
"assayData"containing a matrix with equal with column number equal tonrow(phenoData).assayDatamust contain a matrixexprswith rows represening features (e.g., reporters ions) and columns representing samples. See the"AssayData"class,exprsandassayDataaccessor for more details. This slot in indirectly inherited from"eSet".phenoData:Object of class
"AnnotatedDataFrame"containing experimenter-supplied variables describing sample (i.e the individual tags for an labelled MS experiment) (indireclty inherited from"eSet"). SeephenoDataand the"eSet"class for more details. This slot can be accessed as adata.framewithpDataand be replaced by a new valid (i.e. of compatible dimensions and row names)data.framewithpData()<-.featureData:Object of class
"AnnotatedDataFrame"containing variables describing features (spectra in our case), e.g. identificaiton data, peptide sequence, identification score,... (inherited indirectly from"eSet"). SeefeatureDataand the"eSet"class for more details. This slot can be accessed as adata.framewithfDataand be replaced by a new valid (i.e. of compatible dimensions and row names)data.framewithfData()<-.experimentData:Object of class
"MIAPE", containing details of experimental methods (inherited from"eSet"). SeeexperimentDataand the"eSet"class for more details.annotation:not used here.
protocolData:Object of class
"AnnotatedDataFrame"containing equipment-generated variables (inherited indirectly from"eSet"). SeeprotocolDataand the"eSet"class for more details..__classVersion__:Object of class
"Versions"describing the versions of R, the Biobase package,"eSet","pSet"andMSnSetof the current instance. Intended for developer use and debugging (inherited indirectly from"eSet").
Extends
Class "eSet", directly.
  Class "VersionedBiobase", by class "eSet", distance 2.
  Class "Versioned", by class "eSet", distance 3.
Methods
MSnSet specific methods or over-riding it's super-class are described
  below. See also more "eSet" for
  inherited methods.
- acquisitionNum
 acquisitionNum(signature(object = "MSnSet")): Returns the a numeric vector with acquisition number of each spectrum. The vector names are the corresponding spectrum names. The information is extracted from the object'sfeatureDataslot.- fromFile
 fromFile(signature(object = "MSnSet")): get the index of the file (infileNames(object)) from which the raw spectra from which the corresponding feature were originally read. The relevant information is extracted from the object'sfeatureDataslot.Returns a numeric vector with names corresponding to the spectrum names.
- dim
 signature(x = "MSnSet"): Returns the dimensions of object's assay data, i.e the number of samples and the number of features.- fileNames
 signature(object = "MSnSet"): Access file names in theprocessingDataslot.- msInfo
 signature(object = "MSnSet"): Prints the MIAPE-MS meta-data stored in theexperimentDataslot.- processingData
 signature(object = "MSnSet"): Access theprocessingDataslot.- show
 signature(object = "MSnSet"): Displays object content as text.- qual
 signature(object = "MSnSet"): Access the reporter ion peaks description.- purityCorrect
 signature(object = "MSnSet", impurities = "matrix"): performs reporter ions purity correction. SeepurityCorrectdocumentation for more details.- normalise
 signature(object = "MSnSet"): PerformsMSnSetnormalisation. Seenormalisefor more details.- t
 signature(x = "MSnSet"): Returns a transposedMSnSetobject where features are now aligned along columns and samples along rows and thephenoDataandfeatureDataslots have been swapped. TheprotocolDataslot is always dropped.- as(,"ExpressionSet")
 signature(x = "MSnSet"): Coerce object fromMSnSettoExpressionSet-class. TheexperimentDataslot is converted to aMIAMEinstance. It is also possible to coerce anExpressionSetto andMSnSet, in which case theexperimentDataslot is newly initialised.- as(,"IBSpectra")
 signature(x = "MSnSet"): Coerce object fromMSnSettoIBSpectrafrom theisobarpackage.- as(,"data.frame")
 signature(x = "MSnSet"): Coerce object fromMSnSettodata.frame. TheMSnSetis transposed and thePhenoDataslot is appended.- as(,"SummarizedExperiment")
 signature(x = "MSnSet"): Coerce object fromMSnSettoSummarizedExperiment. Only part of the metadata is retained. SeeaddMSnSetMetadataand the example below for details.- write.exprs
 signature(x = "MSnSet"): Writes expression values to a tab-separated file (default istmp.txt). ThefDataColsparameter can be used to specify whichfeatureDatacolumns (as column names, column number orlogical) to append on the right of the expression matrix. The following arguments are the same aswrite.table.- combine
 signature(x = "MSnSet", y = "MSnSet", ...): Combines 2 or moreMSnSetinstances according to their feature names. Note that thequalslot and the processing information are silently dropped.- topN
 signature(object = "MSnSet", groupBy, n = 3, fun, ..., verbose = isMSnbaseVerbose()): Selects thenmost intense features (typically peptides or spectra) out of all available for each set defined bygroupBy(typically proteins) and creates a new instance of classMSnSet. If less thannfeatures are available, all are selected. Thencol(object)features are summerised usingfun(default issum) prior to be ordered in decreasing order. Additional parameters can be passed tofunthrough..., for instance to control the behaviour oftopNin case ofNAvalues. (Works also withmatrixinstances.)See also the
nQuantsfunction to retrieve the actual number of retained peptides out ofn.A complete use case using
topNandnQuantsis detailed in thesynapterpackage vignette.- filterNA
 signature(object = "MSnSet", pNA = "numeric", pattern = "character", droplevels = "logical"): This method subsetsobjectby removing features that have (strictly) more thanpNApercent of NA values. DefaultpNAis 0, which removes any feature that exhibits missing data. The method can also be used with a character pattern composed of0or1characters only. A0represent a column/sample that is allowed a missing values, while columns/samples with and1must not haveNAs.This method also accepts
matrixinstances.droplevelsdefines whether unused levels in the feature meta-data ought to be lost. Default isTRUE. See thedroplevelsmethod below.See also the
is.na.MSnSetandplotNAmethods for missing data exploration.- filterZero
 signature(object = "MSnSet", pNA = "numeric", pattern = "character", droplevels = "logical"): AsfilterNA, but for zeros.- filterMsLevel
 signature(object = "MSnSet", msLevel. = "numeric", fcol = "character")Keeps only spectra with levelmsLevel., as defined by thefcolfeature variable (default is"msLevel").- log
 signature(object = "MSnSet", base = "numeric")Log transformsexprs(object)usingbase::log.base(defaults ise='exp(1)') must be a positive or complex number, the base with respect to which logarithms are computed.- droplevels
 signature(x = "MSnSet", ...)Drops the unused factor levels in thefeatureDataslot. Seedroplevelsfor details.- impute
 signature(object = "MSnSet", ...)Performs data imputation on theMSnSetobject. Seeimputefor more details.- trimws
 signature(object = "MSnSet", ...)Trim leading and/or trailing white spaces in the feature data slot. Also available fordata.frameobjects. See?base::trimwsfor details.
Additional accessors for the experimental metadata
  (experimentData slot) are defined. See
  "MIAPE" for details.
Plotting
- meanSdPlot
 signature(object = "MSnSet")Plots row standard deviations versus row means. SeemeanSdPlot(vsnpackage) for more details.- image
 signature(x = "MSnSet", facetBy = "character", sOrderBy = "character", legend = "character", low = "character", high = "character", fnames = "logical", nmax = "numeric")Produces an heatmap of expression values in thexobject. Simple horizontal facetting is enabled by passing a single character asfacetBy. Arbitrary facetting can be performed manually by saving the return value of the method (see example below). Re-ordering of the samples is possible by providing the name of a phenotypic variable tosOrderBy. The title of the legend can be set withlegendand the colours with thelowandhigharguments. If any negative value is detected in the data, the values are considered as log fold-changes and a divergent colour scale is used. Otherwise, a gradient from low to high is used. To scale the quantitative data inxprior to plotting, please see thescalemethod.When there are more than
nmax(default is 50) features/rows, these are not printed. This behaviour can be controlled by settingfnamestoTRUE(always print) orFALSE(never print). See examples below.The code is based on Vlad Petyuk's
vp.misc::image_msnset. The previous version of this method is still available through theimage2function.- plotNA
 signature(object = "MSnSet", pNA = "numeric")Plots missing data for anMSnSetinstance.pNAis anumericof length 1 that specifies the percentage of accepted missing data values per features. This value will be highlighted with a point on the figure, illustrating the overall percentage of NA values in the full data set and the number of proteins retained. Default is 1/2. See alsoplotNA.- MAplot
 signature(object = "MSnSet", log.it = "logical", base = "numeric", ...)Produces MA plots (Ratio as a function of average intensity) for the samples inobject. Ifncol(object) == 2, then one MA plot is produced using thema.plotfunction from theaffypackage. Ifobjecthas more than 2 columns, thenmva.pairs.log.itspecifies is the data should be log-transformed (default isTRUE) usingbase. Further...arguments will be passed to the respective functions.- addIdentificationData
 signature(object = "MSnSet", ...): Adds identification data to aMSnSetinstance. SeeaddIdentificationDatadocumentation for more details and examples.- removeNoId
 signature(object = "MSnSet", fcol = "pepseq", keep = NULL): Removes non-identified features. SeeremoveNoIddocumentation for more details and examples.- removeMultipleAssignment
 signature(object = "MSnSet", fcol = "nprot"): Removes protein groups (or feature belong to protein groups) with more than one member. The latter is defined by extracting a feature variable (default is"nprot"). Also removes non-identified features.- idSummary
 signature(object = "MSnSet", ...): Prints a summary that lists the percentage of identified features per file (calledcoverage).
Functions
- updateFvarLabels
 signature(object, label, sep)This function updatesobject's featureData variable labels by appendinglabel. By default,labelis the variable name and the separatorsepis..- updateSampleNames
 signature(object, label, sep)This function updatesobject's sample names by appendinglabel. By default,labelis the variable name and the separatorsepis..- updateFeatureNames
 signature(object, label, sep)This function updatesobject's feature names by appendinglabel. By default,labelis the variable name and the separatorsepis..- ms2df
 signature(x, fcols)Coerces theMSnSetinstance to adata.frame. The direction of the data is retained and the feature variable labels that matchfcolare appended to the expression values. See alsoas(x, "data.frame")above.- addMSnSetMetadata
 signature(x, y)When coercing anMSnSetyto aSummarizedExperimentxwithx <- as(y, "SummarizedExperiment"), most ofy's metadata is lost. Only the file names, the processing log and the MSnbase version from theprocessingDataslots are passed along. TheaddMSnSetMetadatafunction can be used to add the completeprocessingData,experimentDataandprotocolDataslots. The downside of this is that MSnbase is now required to use theSummarizedExperimentobject.
See also
"eSet", "ExpressionSet" and
  quantify. MSnSet quantitation values and
  annotation can be exported to a file with
  write.exprs. See readMSnSet to
  create and MSnSet using data available in a spreadsheet or
  data.frame.
Examples
data(msnset)
msnset <- msnset[10:15]
exprs(msnset)[1, c(1, 4)] <- NA
exprs(msnset)[2, c(1, 2)] <- NA
is.na(msnset)
#>     iTRAQ4.114 iTRAQ4.115 iTRAQ4.116 iTRAQ4.117
#> X18       TRUE      FALSE      FALSE       TRUE
#> X19       TRUE       TRUE      FALSE      FALSE
#> X2       FALSE      FALSE      FALSE      FALSE
#> X20      FALSE      FALSE      FALSE      FALSE
#> X21      FALSE      FALSE      FALSE      FALSE
#> X22      FALSE      FALSE      FALSE      FALSE
featureNames(filterNA(msnset, pNA = 1/4))
#> [1] "X2"  "X20" "X21" "X22"
featureNames(filterNA(msnset, pattern = "0110"))
#> [1] "X18" "X2"  "X20" "X21" "X22"
M <- matrix(rnorm(12), 4)
pd <- data.frame(otherpdata = letters[1:3])
fd <- data.frame(otherfdata = letters[1:4])
x0 <- MSnSet(M, fd, pd)
sampleNames(x0)
#> [1] "1" "2" "3"
M <- matrix(rnorm(12), 4)
colnames(M) <- LETTERS[1:3]
rownames(M) <- paste0("id", LETTERS[1:4])
pd <- data.frame(otherpdata = letters[1:3])
rownames(pd) <- colnames(M)
fd <- data.frame(otherfdata = letters[1:4])
rownames(fd) <- rownames(M)
x <- MSnSet(M, fd, pd)
sampleNames(x)
#> [1] "A" "B" "C"
## Visualisation
library("pRolocdata")
data(dunkley2006)
image(dunkley2006)
## Changing colours
image(dunkley2006, high = "darkgreen")
image(dunkley2006, high = "darkgreen", low = "yellow")
## Forcing feature names
image(dunkley2006, fnames = TRUE)
## Facetting
image(dunkley2006, facetBy = "replicate")
p <- image(dunkley2006)
library("ggplot2") ## for facet_grid
p + facet_grid(replicate ~ membrane.prep, scales = 'free', space = 'free')
p + facet_grid(markers ~ replicate)
## Fold-changes
dd <- dunkley2006
exprs(dd) <- exprs(dd) - 0.25
image(dd)
image(dd, low = "green", high = "red")
## Feature names are displayed by default for smaller data
dunkley2006 <- dunkley2006[1:25, ]
image(dunkley2006)
image(dunkley2006, legend = "hello")
## Coercion
if (require("SummarizedExperiment")) {
    data(msnset)
    se <- as(msnset, "SummarizedExperiment")
    metadata(se) ## only logging
    se <- addMSnSetMetadata(se, msnset)
    metadata(se) ## all metadata
    msnset2 <- as(se, "MSnSet")
    processingData(msnset2)
}
#> Loading required package: SummarizedExperiment
#> Loading required package: MatrixGenerics
#> Loading required package: matrixStats
#> 
#> Attaching package: ‘matrixStats’
#> The following objects are masked from ‘package:Biobase’:
#> 
#>     anyMissing, rowMedians
#> 
#> Attaching package: ‘MatrixGenerics’
#> The following objects are masked from ‘package:matrixStats’:
#> 
#>     colAlls, colAnyNAs, colAnys, colAvgsPerRowSet, colCollapse,
#>     colCounts, colCummaxs, colCummins, colCumprods, colCumsums,
#>     colDiffs, colIQRDiffs, colIQRs, colLogSumExps, colMadDiffs,
#>     colMads, colMaxs, colMeans2, colMedians, colMins, colOrderStats,
#>     colProds, colQuantiles, colRanges, colRanks, colSdDiffs, colSds,
#>     colSums2, colTabulates, colVarDiffs, colVars, colWeightedMads,
#>     colWeightedMeans, colWeightedMedians, colWeightedSds,
#>     colWeightedVars, rowAlls, rowAnyNAs, rowAnys, rowAvgsPerColSet,
#>     rowCollapse, rowCounts, rowCummaxs, rowCummins, rowCumprods,
#>     rowCumsums, rowDiffs, rowIQRDiffs, rowIQRs, rowLogSumExps,
#>     rowMadDiffs, rowMads, rowMaxs, rowMeans2, rowMedians, rowMins,
#>     rowOrderStats, rowProds, rowQuantiles, rowRanges, rowRanks,
#>     rowSdDiffs, rowSds, rowSums2, rowTabulates, rowVarDiffs, rowVars,
#>     rowWeightedMads, rowWeightedMeans, rowWeightedMedians,
#>     rowWeightedSds, rowWeightedVars
#> The following object is masked from ‘package:Biobase’:
#> 
#>     rowMedians
#> Loading required package: GenomicRanges
#> Loading required package: IRanges
#> Loading required package: GenomeInfoDb
#> - - - Processing information - - -
#> Data loaded: Wed May 11 18:54:39 2011 
#> iTRAQ4 quantification by trapezoidation: Wed Apr  1 21:41:53 2015 
#> Subset [55,4][6,4] Tue Aug 26 11:37:36 2025 
#>  MSnbase version: 1.1.22 
as(msnset, "ExpressionSet")
#> ExpressionSet (storageMode: lockedEnvironment)
#> assayData: 6 features, 4 samples 
#>   element names: exprs 
#> protocolData: none
#> phenoData
#>   sampleNames: iTRAQ4.114 iTRAQ4.115 iTRAQ4.116 iTRAQ4.117
#>   varLabels: mz reporters
#>   varMetadata: labelDescription
#> featureData
#>   featureNames: X18 X19 ... X22 (6 total)
#>   fvarLabels: spectrum ProteinAccession ... collision.energy (15 total)
#>   fvarMetadata: labelDescription
#> experimentData: use 'experimentData(object)'
#> Annotation: No annotation