Contents

Last update: Wed May 11 20:00:24 2016


This vignette available under a creative common CC-BY license. You are free to share (copy and redistribute the material in any medium or format) and adapt (remix, transform, and build upon the material) for any purpose, even commercially.


1 Introduction

This document provides annotated and reproducible quantitative proteomics data analysis examples for the Quantitative Proteomics And Data Analysis course (intro slides).

To be able to execute the code below, you will need to have a working R installation. I also recommend using the RStudio editor. To install the proteomics add-on packages required for this tutorial, you will need to run the following code:

## try http:// if https:// URLs are not supported
source("https://bioconductor.org/biocLite.R")
biocLite("RforProteomics", dependencies = TRUE)
biocLite("AnnotationHub")
biocLite("genefilter")
biocLite("gplots")
biocLite("qvalue")

For a more thorough introduction to R for proteomics, please read the RforProteomics vignette (online or off-line with vignette("RforProteomics") after installing as described above), the visualisation vignette and the corresonding papers [1, 2]

We first need to load the proteomics packages:

library("MSnbase")
library("rpx")
library("mzR")
library("RforProteomics")
library("pRoloc")
library("pRolocdata")
library("msmsTests")
library("AnnotationHub")
library("lattice")
library("gridExtra")
library("gplots")
library("genefilter")
library("qvalue")

2 Getting example data

AnnotationHub is a cloud resource set up and managed by the Bioconductor project that programmatically disseminates omics data. I am currently working on contributing proteomics data.

Below, we download a raw mass spectrometry dataset with identifier AH49008 and store it in a variable names ms.

ah <- AnnotationHub()
ms <- ah[["AH49008"]]
ms
## Mass Spectrometry file handle.
## Filename:  55314 
## Number of scans:  7534

The data contains 7534 spectra - 1431 MS1 spectra and 6103 MS2 spectra. The filename, 55314, is not very descriptive because the data originates from the AnnotationHub cloud repository. If the data was read from a local file, is would be named as the mzML (or mzXML) file.

Later, we will use data that is distributed direclty with package and access them using the data function. One can also use the rpx package to access and download data from the ProteomeXchange repository.

px1 <- PXDataset("PXD000001")
px1
## Object of class "PXDataset"
##  Id: PXD000001 with 10 files
##  [1] 'F063721.dat' ... [10] 'erwinia_carotovora.fasta'
##  Use 'pxfiles(.)' to see all files.
mzf <- pxget(px1, 6)
mzf
## [1] "TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzML"

Manual download:

f1 <- downloadData("http://proteome.sysbiol.cam.ac.uk/lgatto/files/Thermo-HELA-PRT/Thermo_Hela_PRTC_1.mzML")
f2 <- downloadData("http://proteome.sysbiol.cam.ac.uk/lgatto/files/Thermo-HELA-PRT/Thermo_Hela_PRTC_2.mzML")
f3 <- downloadData("http://proteome.sysbiol.cam.ac.uk/lgatto/files/Thermo-HELA-PRT/Thermo_Hela_PRTC_3.mzML")
f3
## [1] "./Thermo_Hela_PRTC_3.mzML"

3 Visualising raw data

3.1 A full chromatogam

chromatogram(ms)

plot of chunk chromato

3.2 Multiple chromatograms

c1 <- chromatogram(f1)
c2 <- chromatogram(f2, plot = FALSE)
lines(c2, col = "steelblue", lty = "dashed")
c3 <- chromatogram(f3, plot = FALSE)
lines(c3, col = "orange", lty = "dotted")

plot of chunk chromato3

3.3 An extracted ion chromatogram

par(mfrow = c(1, 2))
xic(ms, mz = 636.925, width = 0.01)
x <- xic(ms, mz = 636.925, width = 0.01, rtlim = c(2120, 2200))

plot of chunk xic

3.4 Spectra

We first load a test iTRAQ data called itraqdata and distributed as part of the MSnbase package using the data function. This is a pre-packaged data that comes as a dedicated data structure called MSnExp. We then plot the 10th spectum using specific code that recognizes what to do with an element of an MSnExp.

data(itraqdata)
itraqdata
## Object of class "MSnExp"
##  Object size in memory: 1.88 Mb
## - - - Spectra data - - -
##  MS level(s): 2 
##  Number of MS1 acquisitions: 1 
##  Number of MSn scans: 55 
##  Number of precursor ions: 55 
##  55 unique MZs
##  Precursor MZ's: 401.74 - 1236.1 
##  MSn M/Z range: 100 2069.27 
##  MSn retention times: 19:9 - 50:18 minutes
## - - - Processing information - - -
## Data loaded: Wed May 11 18:54:39 2011 
##  MSnbase version: 1.1.22 
## - - - Meta data  - - -
## phenoData
##   rowNames: 1
##   varLabels: sampleNames sampleNumbers
##   varMetadata: labelDescription
## Loaded from:
##   dummyiTRAQ.mzXML 
## protocolData: none
## featureData
##   featureNames: X1 X10 ... X9 (55 total)
##   fvarLabels: spectrum ProteinAccession ProteinDescription
##     PeptideSequence
##   fvarMetadata: labelDescription
## experimentData: use 'experimentData(object)'
plot(itraqdata[[10]], reporters = iTRAQ4, full=TRUE) 

plot of chunk itraqdata

The ms data is not pre-packaged as an MSnExp data. It is a more bare-bone mzRramp object, a pointer to a raw data file (here 55314): we need first to extract a spectrum of interest (here the 3071st spectrum, an MS1 spectrum), and use the generic plot function to visualise the spectrum.

plot(peaks(ms, 3071), type = "h",
     xlab = "M/Z", ylab = "Intensity",     
     sub = formatRt(hd[3071, "retentionTime"]))

plot of chunk ms1

Below, we use data downloaded from ProteomeXchange (see above) to generate additional raw data visualisations. These examples are taken from the RforProteomics visualisation vignette. The code, which is not displayed here, can also be seen in the source document.

The importance of flexible access to specialised data becomes visible in the figure below (taken from the RforProteomics visualisation vignette). Not only can we access specific data and understand/visualise them, but we can transverse all the data and extracted/visualise/understand structured slices of data.

The upper panel represents the chomatogram of the TMT_Erwinia_1uLSike_Top10HCD_isol2_45stepped_60min_01-20141210.mzML raw data file. We concentrate at a specific retention time, 30:1 minutes (1800.6836 seconds) corresponding to the 2807th MS1 spectrum, shown on the second row of figures. On the right, we zoom on the isotopic envelope of one peptide in particular. All vertical lines (red and grey) represent all the ions that were selected for a second round of MS analysis; these are represented in the bottom part of the figure.

plot of chunk mslayout

Below, we illustrate some additional visualisation and animations of raw MS data, also taken from the RforProteomics visualisation vignette. On the left, we have a heatmap like visualisation of a MS map and a 3 dimensional representation of the same data. On the right, 2 MS1 spectra in blue and the set of interleaves 10 MS2 spectra.

## 1
## 1

plot of chunk msmap1

Below, we have animations build from extracting successive slices as above.