Abstract In this course, we will use R/Bioconductor packages to explore, process, visualise and understand mass spectrometry-based proteomics data, starting with raw data, and proceeding with identification and quantitation data, discussing some of their peculiarities compared to sequencing data along the way. The workflow is aimed at a beginner to intermediate level, such as, for example, seasoned R users who want to get started with mass spectrometry and proteomics, or proteomics practitioners who want to familiarise themselves with R and Bioconductor infrastructure.
This material available under a creative common CC-BY license. You are free to share (copy and redistribute the material in any medium or format) and adapt (remix, transform, and build upon the material) for any purpose, even commercially.
If you (re-)use this material, please cite the following reference
Gatto, Laurent. (2019, January). Bioconductor tools for mass spectrometry and proteomics. Zenodo. http://doi.org/10.5281/zenodo.2547971
Before we start:
If you identify typos, if there are parts that you would like to see expended or clarified, please let me know by telling me directly (during workshops), opening a github issue or by emailing me. Please do also briefly specify your background/familiarity with mass spectrometry and/or proteomics (beginner, intermediate or expert) so that I can update accordingly.
In recent years, there we have seen an increase in the number of packages to analyse mass spectrometry and proteomics data for R and Bioconductor, as well as an increase in total number of downloads. See vignette Proteomics packages in Bioconductor for more details and code underlying these figures.
It is also good to highlight that several of these package have become a group efforts, supported by several developers in the community. This post illustrates the various contributions to MSnbase. mzR has benefited by a similar wide range of successful contributions. Both packages, and in particular mzR, are used by many others, and will be described in some detail in this workflow.
This workflow illustrates R / Bioconductor infrastructure for proteomics. Topics covered focus on support for open community-driven formats for raw data and identification results, packages for peptide-spectrum matching, data processing and analysis:
Links to other packages and references are also documented. In particular, the vignettes included in the RforProteomics package also contains relevant material.
This workflow provides a general introduction to Bioconductor software for mass spectrometry and proteomics. If you are interested in
vignette("pRoloc-tutorial", package = "pRoloc")
or
online.vignette("msnid_vignette", package = "MSnID")
or
online. In
addition, the vignettes of the msmsTest package
describe how to analyse spectral counting data using packages
dedicated for the analysis of high throughput sequencing data.vignette("MALDIquant-intro", package = "MALDIquant")
and available
online.vignette("Cardinal-walkthrough", package = "Cardinal")
and
online.The follow packages will be used throughout this documents. R version
3.5
or higher is required to install all the packages using
BiocManager::install
.
library("mzR")
library("mzID")
library("MSnID")
library("MSnbase")
library("rpx")
library("MLInterfaces")
library("pRoloc")
library("pRolocdata")
library("MSGFplus")
library("rols")
library("hpar")
library("ensembldb")
The most convenient way to install most of the tutorials requirement (and more related content), is to install RforProteomics with all its dependencies.
if (!require("BiocManager"))
install.package("BiocManager")
BiocManager::install("RforProteomics", dependencies = TRUE)
Other packages of interest, such as rTANDEM or MSGFgui will be described later in the document but are not required to execute the code in this workflow.
In Bioconductor version 3.6, there are respectively 92
proteomics,
62
mass spectrometry software packages
and 17
mass spectrometry experiment packages. These
respective packages can be extracted with the proteomicsPackages()
,
massSpectrometryPackages()
and massSpectrometryDataPackages()
and
explored interactively, or looked at by exploring the respective
biocViews
on the
Bioconductor web page.
library("RforProteomics")
pp <- proteomicsPackages()
DT::datatable(pp)