Short bio

I am a Senior Research Associate in the Department of Biochemistry at the University of Cambridge. I am an avid open research advocate and make every possible effort to make my research reproducible and openly available. I am a Software Sustainability Institute fellow and a Data and Software Carpentry instructor, affiliated member of the Bioconductor project and a founding member of OpenConCam, our local OpenCon group. My current open researcher activities focus on the Wellcome Trust Open Research Project, where we explore the barriers to open research, and the Bullied Into Bad Science campaign, an initiative by and for early career researchers who aim for a fairer, more open and ethical research and publication environment. Since 2017, I am also part of the eLife Early-career advisory group.

I moved to Cambridge, UK, in January 2010 to work in the Cambridge Centre for Proteomics on various aspects of quantitative and spatial proteomics, developing new methods and implementing computational tools with a strong emphasis on rigorous and reproducible data analysis. I am also a visiting scientist in the PRIDE team at the European Bioinformatics Institute, and an affiliate teaching staff at the Cambridge Computational Biology Institute. I am currently a PI in the Cambridge Systems Biology Centre where I lead the Computational Proteomics Unit.

Research

As pointed out by D. Donoho, An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures. This directly applies to high throughput biology data analysis and I strongly believe that being able to reproduce the complete set of results, replicate an analysis with new data and track the evolution of the work that lead to the scientific novelty are essential aspects of the process of doing research. Hence, I regard the development of scientific software as well as agile and robust analysis methodologies that facilitate reproducible research, an important aspect of my scientific activity.

Clarity and traceability of the data and the analysis methodology enable us to better understand what we do, how and why we do it and consequently exploit the data and comprehend the biology. While not sufficient, these are nevertheless necessary requirements for effective data-driven science.

The collaborative and interdisciplinarity nature of much of the research in biology calls for a open approaches, influenced by the open source and free/libre software movements, from communication between stake holders, open research and development to open dissemination of all research outputs.

Proteomics

My work on the design and implementation of reproducible mass spectrometry-based proteomics data analysis pipeline has materialised in the development of the MSnbase (Gatto et al., 2012) package to manipulate, process and analyse quantitative proteomics data. The MSnbase infrastructure also supports the work on the statistical learning applied to spatial proteomics (see below). The synapter package and the associated publications (Bond et al., 2013 and Shliaha et al., 2013) addresses MSE label-free quantitation, optionally including ion mobility separation.

Spatial proteomics

In biology, localisation is function: knowledge of the localisation of proteins is of paramount importance to assess and study their function, and spatial proteomics is the systematic study of the sub-cellular localisation of proteins and changes thereof (Gatto et al., 2010). Since 2010, I have developed novel software and machine learning approaches enabling more reliable and systematic inference of protein localisations using quantitative proteomics. This work has materialised in the pRoloc package (Gatto et al., 2014) that implements various established classification algorithms, effective visualisation techniques (Gatto el al., 2015) as well as novelty detection (Breckels et al., 2013) and transfer learning, harvesting GO annotations of microscopy-based methods to improve the spatial resolution of experimental spatial proteomics data (Breckels et al., 2016).

Past research

My MSc and PhD work, I studied micro-evolutionary genetic patterns of the Broom leaf beetle Gonioctena variabilis in Southern Europe (Gatto el al., 2008), the application of short interspersed mobile elements (SINEs) to study the evolution of cetaceans applicability of the General Time Reversible nucleotide substitution model in the light of differential lineage sorting (Gatto el al., 2006). I also spend 3 years in industry working on genomic and transcriptomics data, in particular the microarray quality control (Shi et al., 2010).

Teaching

Over the years I have been involved in many teaching activities, ranging from beginners and advanced R courses, genome biology, proteomics bioinformatics, integrative omics, scientific computing as part of the MPhil in Computational Biology in Cambridge, as well as several Software and Data Carpentry bootcamps. All my teaching material is available in the TeachingMaterial repository.

Please do get in touch if you are interested in running workshops.

Publications

See also my Google scholar profile.

Journal articles

Thul PJ, et al. A subcellular map of the human proteome. Science. 2017 May 11. pii: eaal3321. doi:10.1126/science.aal3321. [Epub ahead of print] PubMed PMID:28495876.

Mulvey CM, Breckels LM, Geladaki A, Kocevar Britovsek N, Nightingale DJH, Christoforou A , Elzek M, Deery MJ, Gatto L, Lilley KS. Using HyperLOPIT to perform high-resolution mapping of the spatial proteome. Nature Protocols, 12, 1110–1135 (2017) doi:10.1038/nprot.2017.026 (See the F1000Research workflow for details on the computational side of the protocol.)

Leprevost FD, et al. BioContainers: An open-source and community-driven framework for software standardization. Bioinformatics. 2017 Mar 30. doi:10.1093/bioinformatics/btx192. [Epub ahead of print] PubMed PMID:28379341.

Breckels LM, Mulvey CM, Lilley KS and Gatto L. A Bioconductor workflow for processing and analysing spatial proteomics data F1000Research 2016, 5:2926 (doi:10.12688/f1000research.10411.1). [Software: MSnbase, pRoloc, pRolocGUI]

Wieczorek S, Combes F, Lazar C, Giai Gianetto Q, Gatto L, Dorffer A, Hesse A, Coute Y, Ferro M, Bruley C, and Burger T. DAPAR & ProStaR: software to perform statistical analyses in quantitative discovery proteomics Bioinformatics 2016, doi:10.1093/bioinformatics/btw580.

Perez-Riverol Y, Gatto L, Wang R, Sachsenberg T, Uszkoreit J, Leprevost Fda V, Fufezan C, Ternent T, Eglen SJ, Katz DS, Pollard TJ, Konovalov A, Flight RM, Blin K, Vizcaino JA. Ten Simple Rules for Taking Advantage of Git and GitHub. PLoS Comput Biol. 2016 Jul 14;12(7):e1004947. doi:10.1371/journal.pcbi.1004947 PMID:27415786.

Breckels LM, Holden S, Wonjar D, Mulvey CM, Christoforou A, Groen AJ, Kohlbacher O, Lilley KS, Gatto L. Learning from heterogeneous data sources: an application in spatial proteomics. PLoS Comput Biol. 2016 May 13;12(5):e1004920 doi:10.1371/journal.pcbi.1004920, Software)

Fabre B, Korona D, Groen A, Vowinckel J, Gatto L, Deery MJ, Ralser M, Russell S, Lilley KS. Analysis of the Drosophila melanogaster proteome dynamics during the embryo early development by a combination of label-free proteomics approaches, Proteomics, 2016 (PMID:27029218, Publisher)

Lazar C, Gatto L, Ferro M, Bruley C, Burger T. Accounting for the Multiple Natures of Missing Values in Label-Free Quantitative Proteomics Data Sets to Compare Imputation Strategies. J Proteome Res. 2016 Apr 1;15(4):1116-25. (Publisher, PMID:26906401, Software: CRAN and Bioconductor)

Christoforou A, Mulvey CM, Breckels LM, Geladaki A, Hurrell T, Hayward PC, Naake T, Gatto L, Viner R, Arias AM, Lilley KS. A draft map of the mouse pluripotent stem cell spatial proteome. Nat Commun. 2016 Jan 12;7:9992 doi:10.1038/ncomms9992 (PMID:26754106, data, PRIDE, resource)

Gatto L, Hansen KD, Hoopmann MR, Hermjakob H, Kohlbacher O and Beyer, A Testing and validation of computational methods for mass spectrometry. J Proteome Res. 2015. doi: 10.1002/stem.2067 (PubMed).

Mulvey CM, Schröter C, Gatto L, Dikicioglu D, Baris Fidaner I, Christoforou A, Deery MJ, Cho LT, Niakan KK, Martinez-Arias A, Lilley KS. Dynamic proteomic profiling of extra-embryonic endoderm differentiation in mouse embryonic stem cells. Stem Cells. 2015 Jun 8. doi: 10.1002/stem.2067 (PubMed).

Gatto L, Breckels LM, Naake T and Gibb S Visualisation of proteomics data using R and Bioconductor. Proteomics. 2015 Feb 18. doi:10.1002/pmic.201400392 (PubMed, Publisher and software: Bioconductor, github).

Huber W et al. Orchestrating high-throughput genomic analysis with Bioconductor. Nat Methods. 2015 Jan 29;12(2):115-21 (PubMed, Publisher).

Hiemstra TF et al. Human urinary exosomes as innate immune effectors, J Am Soc Nephrol. 2014 Sep;25(9):2017-27. (PubMed,Publisher).

Nikolovski N, Shliaha PV, Gatto L, Dupree P and Lilley KS Label free protein quantification for plant Golgi protein localisation and abundance, Plant Physiol. pp.114.245589; First Published on August 13, 2014; doi:10.1104/pp.114.245589 (Publisher, PubMed)

Griss J, et al. The mzTab Data Exchange Format: communicating MS-based proteomics and metabolomics experimental results to a wider audience, Mol Cell Proteomics. 2014 June 30. (Publisher)

Tomizioli M, et al. Deciphering thylakoid sub-compartments using a mass spectrometry-based approach, Mol Cell Proteomics. 2014 May 28. (Publisher, PubMed)

Gatto L, et al. A foundation for reliable spatial proteomics data analysis, Mol Cell Proteomics. 2014 Aug;13(8):1937-52. (Publisher, PubMed, software, press coverage)

Walzer M, et al. qcML: an exchange format for quality control metrics from mass spectrometry experiments, Mol Cell Proteomics. 2014 Apr 23. (PubMed).

Vizcaíno J.A. et al. ProteomeXchange: globally co-ordinated proteomics data submission and dissemination, Nature Biotechnology 2014, 32, 223–226. (PubMed)

Gatto L., Breckels L.M, Burger T, Wieczorek S. and Lilley K.S. Mass-spectrometry based spatial proteomics data analysis using pRoloc and pRolocdata, Bioinformatics, 2014 (software, PubMed, publisher, software and data).

Groen A., Sancho-Andrés G., Breckels LM., Gatto L., Aniento F. and Lilley K.S. Identification of Trans Golgi Network proteins in Arabidopsis thaliana root tissue Journal of Proteome Research, 2013 (PubMed, publisher).

Wilf N.M. et al. RNA-seq reveals the RNA binding proteins, Hfq and RsmA, play various roles in virulence, antibiotic production and genomic flux in Serratia sp. ATCC 39006 BMC Genomics 2013, 14:822.

Gatto L. and Christoforou A. Using R and Bioconductor for proteomics data analysis, Biochim Biophys Acta - Proteins and Proteomics, 2013. (PubMed, pre-print and software: Bioconductor, github).

Bond N.J., Shliaha P.V, Lilley K.S., and Gatto L. Improving qualitative and quantitative performance for MSE-based label free proteomics, J. Proteome Res., 2013 (PubMed, publisher, software).

Shliaha P.V, Bond N.J., Gatto L. and Lilley K.S. The Effects of Travelling Wave Ion Mobility Separation on Data Independent Acquisition in Proteomics Studies, J. Proteome Res., 2013 (PubMed, publisher, software).

Breckels L.M., Gatto L., Christoforou A., Groen A.J., Lilley K.S. and Trotter M.W.B. The Effect of Organelle Discovery upon Sub-Cellular Protein Localisation, Journal of Proteomics, 2013 (PubMed, software).

Chambers M. et al. A Cross-platform Toolkit for Mass Spectrometry and Proteomics, Nature Biotechnology 30, 918–920, 2012 (PubMed, pdf, software [1|2]).

Gatto L. and Lilley K.S. MSnbase - an R/Bioconductor package for isobaric tagged mass spectrometry data visualisation, processing and quantitation, Bioinformatics, 28(2), 288-289, 2012 (PubMed, pdf, software).

Capuano F., Bond N.J., Gatto L., Beaudoin F., Napier J., Benvenuto E., Lilley K.S, and Baschieri S. LC-MS/MS methods for absolute quantification and identification of proteins associated to chimeric plant oil bodies, Analytical Chemistry, Dec 15;83(24):9267-72, 2011 (PubMed - data).

Foster J.M., Degroeve S., Gatto L., Visser, M., Wang, R., Griss J., Apweiler R. and Martens L. A posteriori quality control for the curation and reuse of public proteomics data, Proteomics 11(11):2182-94, 2011 (PubMed, pdf).

Lilley K.S., Deery M.J. and Gatto L. Challenges for Proteomics Core Facilities, Proteomics 11: 1017–1025, 2011 (PubMed, pdf).

Gatto L., Vizcaíno J.A., Hermjakob H., Huber W. and Lilley K.S. Organelle proteomics experimental designs and analysis Proteomics, 10:22, 3957-3969, 2010 (PubMed, pdf).

MAQC Consortium The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models Nature Biotechnology 28, 827–838 2010 (PubMed, pdf).

Gatto L., Mardulyn P. and Pasteels J.M. Morphological and mitochondrial DNA analysis suggest the presence of a hybrid zone between two species of the leaf beetle Gonioctena variabilis species complex in southern Spain, Biological Journal of the Linnean Society, 2008, 94(1), 105-114 (abstract, pdf).

Danis B., George T.C., Goriely S., Dutta B., Renneson J., Gatto L., Fitzgerald-Bocarsly P., Marchant A., Goldman M., Willems F. and De Wit D. Interferon regulatory factor 7-mediated responses are defective in cord blood plasmacytoid dendritic cells. Eur J Immunol. 2008 Feb;38(2):507-17. (PubMed, pdf).

Gatto L., Catanzaro D. and Milinkovitch M.C. Assessing the Applicability of the GTR Nucleotide Substitution Model Through Simulations Evolutionary Bioinformatics 2006:2 (PubMed, pdf).

Book chapters

Christoforou A., Mulvey C., Breckels LM., Gatto L. and Lilley KS. Spatial Proteomics: Practical Considerations for Data Acquisition and Analysis in Protein Subcellular Localisation Studies in Quantitative Proteomics, 185-210, The Royal Society of Chemistry, 2014.

Breckels LM, Gibb S, Petyuk V and Gatto L R for Proteomics in Proteome Informatics, The Royal Society of Chemistry, November 2016.

Technical Notes

Gatto L. Data Management Plan for a Biotechnology and Biological Sciences Research Council (BBSRC) Tools and ResourcesDevelopment Fund (TRDF) Grant, Research Ideas and Outcomes (2017), doi:10.3897/rio.3.e11624.

Gatto, L. and Schretter, C. Designing Primer Pairs and Oligos with OligoFaktorySE. EMBnet.news North America, 15, oct. 2009 (pdf,software).

Schretter, C. and Gatto, L. A Tiny Queuing System for Blast Servers December, 2005 (short and slighly longer versions).

Software

I have developed and have contributed to many open source R/Bioconductor packages, in particular proteomics software and data packages, all of which are available on my own and my group’s GitHub pages. See Gatto and Christoforou, 2014, Gatto et al., 2014 and the RforProteomics vignettes for an overview of the R/Bioconductor infrastructure for mass spectrometry and proteomics.

Talks

Forthcoming talks:

  • Talk at the Leibniz Institut for Aging, on the 8 November 2017, invited by Alessandro Ori. I will be presenting my work in spatial proteomics, its applications and finding, and the open and collaborative R/Bioconductor software ecosystem.

  • EuBIC 2018 developer’s meeting, 9 - 12 January 2018, Ghent, Belgium. I’ll be giving a keynote talk and run a hackathon project. More details soon.

Open Science in Practice, 25 September 2017, Lausanne, Switzerland. An early career researcher’s view on modern and open scholarship.

Proteomics Method Forum, Oxford, UK, 22-23 June 2017. The Bioconductor project - analysis and comprehension of high-throughput proteomics data.

Research Data Management Forum, London, UK, 9th June 2017. An early career researcher’s view on modern and open scholarship … and careers.

Office of Scholarly Communication Training - How to Get the Most Out of Modern Peer Review, Cambridge, UK, 30 Mar 2017. The role of peer-reviewers in promoting open science.

European Bioconductor Developer Meeting, Zurich, Switzerland, 6 - 7 December 2016. MSnbase2 - disk access is the limit.

Cambridge Computational Biology Institute, UK, 16 November 2016. Mapping the sub-cellular proteome: Computational analyses of high-throughput mass spectrometry-based spatial proteomics data.

Dialogue on methods for ecology, Cambridge, UK, 15 November 2016, Learning from heterogeneous data in spatial proteomics.

Quantitative Proteomics and Data Analysis, Chester, UK, 4 - 5 April 2016. Inspection, visualisation and analysis of quantitative proteomics data (slides, vignette).

Introduction to Integrative Omics: proteomics, European Bioinformatics Institute, Hinxton, UK, 8 March 2016.

Updated: