Mass spectrometry (MS) is a technology that separates charged molecules (ions) based on their mass to charge ratio (M/Z). It is often coupled to chromatography (liquid LC, but can also be gas-based GC). The time an analytes takes to elute from the chromatography column is the retention time.
An mass spectrometer is composed of three components:
When using mass spectrometry for proteomics, the proteins are first digested with a protease such as trypsin. In mass shotgun proteomics, the analytes assayed in the mass spectrometer are peptides.
Often, ions are subjected to more than a single MS round. After a first round of separation, the peaks in the spectra, called MS1 spectra, represent peptides. At this stage, the only information we possess about these peptides are their retention time and their mass-to-charge (we can also infer their charge be inspecting their isotopic envelope, i.e the peaks of the individual isotopes, see below), which is not enough to infer their identify (i.e. their sequence).
In MSMS (or MS2), the settings of the mass spectrometer are set automatically to select a certain number of MS1 peaks (for example 20)1 Here, we will focus in data dependent acquisition (DDA), where MS1 peaks a selected. In data independent acquisition (DIA), all peaks in the MS1 spectrum are fragmented.. Once a narrow M/Z range has been selected (corresponding to one high-intensity peak, a peptide, and some background noise), it is fragmented (using for example collision-induced dissociation (CID), higher energy collisional dissociation (HCD) or electron-transfer dissociation (ETD)). The fragment ions are then themselves separated in the analyser to produce a MS2 spectrum. The unique fragment ion pattern can then be used to infer the peptide sequence using de novo sequencing (when the spectrum is of high enough quality) of using a search engine such as, for example Mascot, MSGF+, …, that will match the observed, experimental spectrum to theoratical spectra (see details below).
The animation below show how 25 ions different ions (i.e. having different M/Z values) are separated throughout the MS analysis and are eventually detected (i.e. quantified). The final frame shows the hypothetical spectrum.
The figures below illustrate the two rounds of MS. The spectrum on the left is an MS1 spectrum acquired after 21 minutes and 3 seconds of elution. 10 peaks, highlited by dotted vertical lines, were selected for MS2 analysis. The peak at M/Z 460.79 (488.8) is highlighted by a red (orange) vertical line on the MS1 spectrum and the fragment spectra are shown on the MS2 spectrum on the top (bottom) right figure.
The figures below represent the 3 dimensions of MS data: a set of spectra (M/Z and intensity) of retention time, as well as the interleaved nature of MS1 and MS2 (and there could be more levels) data.
MS-based proteomics data is disseminated through the ProteomeXchange infrastructure, which centrally coordinates submission, storage and dissemination through multiple data repositories, such as the PRoteomics IDEntifications (PRIDE) database at the EBI for mass spectrometry-based experiments (including quantitative data, as opposed as the name suggests), PASSEL at the ISB for Selected Reaction Monitoring (SRM, i.e. targeted) data and the MassIVE resource. These data can be downloaded within R using the rpx package.
## 15 new ProteomeXchange annoucements
## Data.Set Publication.Data Message
## 1 PXD024245 <NA> New
## 2 PXD022662 <NA> New
## 3 PXD018575 <NA> New
## 4 PXD018912 <NA> New
## 5 PXD022915 <NA> New
## 6 PXD019603 <NA> New
## 7 PXD023536 <NA> New
## 8 PXD023537 <NA> New
## 9 PXD023983 <NA> New
## 10 PXD022144 <NA> New
## 11 PXD016427 <NA> New
## 12 PXD024022 <NA> New
## 13 PXD021868 <NA> New
## 14 PXD022997 <NA> New
## 15 PXD017256 <NA> New
Using the unique PXD000001
identifier, we can retrieve the relevant
metadata that will be stored in a PXDataset
object. The names of the
files available in this data can be retrieved with the pxfiles
accessor function.
## Object of class "PXDataset"
## Id: PXD000001 with 11 files
## [1] 'F063721.dat' ... [11] 'erwinia_carotovora.fasta'
## Use 'pxfiles(.)' to see all files.
## [1] "F063721.dat"
## [2] "F063721.dat-mztab.txt"
## [3] "PRIDE_Exp_Complete_Ac_22134.xml.gz"
## [4] "PRIDE_Exp_mzData_Ac_22134.xml.gz"
## [5] "PXD000001_mztab.txt"
## [6] "README.txt"
## [7] "TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzML"
## [8] "TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzXML"
## [9] "TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01.mzXML"
## [10] "TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01.raw"
## [11] "erwinia_carotovora.fasta"
Other metadata for the px
data set:
## [1] "Erwinia carotovora"
## [1] "ftp://ftp.pride.ebi.ac.uk/pride/data/archive/2012/03/PXD000001"
## [1] "Gatto L, Christoforou A. Using R and Bioconductor for proteomics data analysis. Biochim Biophys Acta. 2013 May 18. doi:pii: S1570-9639(13)00186-6. 10.1016/j.bbapap.2013.04.032"
Data files can then be downloaded with the pxget
function. Below, we
retrieve the raw data file. The file is
downloaded2 If the file is already available, it is not downloaded a second time.
in the working directory and the name of the file is return by the
function and stored in the mzf
variable for later use.3 This and other files are also availabel in the msdata
package, described below
## Loading TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzML from cache.
## [1] "~/.cache/rpx/10c1f4e4c6980_TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzML"
Some data are also distributed through dedicated packages. The msdata, for example, provides some general raw data files relevant for both proteomics and metabolomics.
## [1] "MRM-standmix-5.mzML.gz"
## [2] "MS3TMT10_01022016_32917-33481.mzML.gz"
## [3] "MS3TMT11.mzML"
## [4] "TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzML.gz"
## [5] "TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01.mzML.gz"
## [1] "TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzid"
## [1] "cptac_a_b_peptides.txt"
More often, such experiment packages distribute processed data; an example of such is the pRolocdata package, that offers quantitative proteomics data.
Page built: 2021-03-17