Chapter 1 Preamble

1.1 About this course

This course will introduce participants to the analysis and exploration of mass spectrometry (MS) based proteomics data using R and Bioconductor. The course will cover all levels of MS data, from raw data to identification and quantitation data, up to the statistical interpretation of a typical shotgun MS experiment and will focus on hands-on tutorials. At the end of this course, the participants will be able to manipulate MS data in R and use existing packages for their exploratory and statistical proteomics data analysis.

Targeted audience and assumed background

The course is targeted to either proteomics practitioners or data analysts/bioinformaticians that would like to learn how to use R and Bioconductor to analyse proteomics data. Familiarity with MS or proteomics in general is desirable, but not essential as we will walk through and describe a typical MS data as part of learning about the tools. Participants need to have a working knowledge of R (R syntax, commonly used functions, basic data structures such as data frames, vectors, matrices, … and their manipulation). Familiarity with other Bioconductor omics data classes and the tidyverse syntax is useful, but not required.

Program

In the first part of this course, we will focus on raw MS data, including how mass spectrometry works, how raw MS data looks like, MS data formats, and how to extract, manipulate and visualise raw data.
The second part will focus in identification data, how to combine them with raw data, quantitation of MS data, and introduce data structure of quantitative proteomics data.
The last part will focus on quantitative proteomics, including data structures, data processing, visualisation statistical analysis to identify differentially expression proteins between two groups.

The matriel from this course is compiled from various documents, from the bioc-ms-prot and CSAMA labs. See also a previous interation of this course.

1.2 The R for Mass Spectrometry initiative

The R for Mass Spectrometry initiative is a relatively recent project. Its aim is to provide efficient, thoroughly documented, tested and flexible R software for the analysis and interpretation of high throughput mass spectrometry assays, including proteomics and metabolomics experiments. The project formalises the longtime collaborative development efforts of its core members under the R for Mass Spectrometry organisation to facilitate dissemination and accessibility of their work.

We will be making use of several of these packages in this course.

1.3 Setup

The participants should set up R and RStudio and be familiar with R basics

Familiarity with Bioconductor is useful, but not necessary. We will be learning about different types of objects related to mass spectrometry and proteomics throughout the course. No experience in object-oriented programming is necessary.

Package installation instructions:

if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")

BiocManager::install(version = "3.12")

###Install the required libraries
BiocManager::install("msdata")
BiocManager::install("mzR")
BiocManager::install("lgatto/ProtGenerics")
BiocManager::install("RforMassSpectrometry/MsCoreUtils")
BiocManager::install("RforMassSpectrometry/QFeatures")
BiocManager::install("RforMassSpectrometry/PSM")
BiocManager::install("RforMassSpectrometry/Spectra")

1.4 License

This material is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License. You are free to share (copy and redistribute the material in any medium or format) and adapt (remix, transform, and build upon the material) for any purpose, even commercially, as long as you give appropriate credit and distribute your contributions under the same license as the original.

Laurent Gatto, R/Bioconductor for Mass Spectrometry and Proteomics, DOI:10.5281/zenodo.4604531 2021.

Page built: 2021-03-17