I am a Senior Research Associate in the Department of Biochemistry at the University of Cambridge. I am an avid open research advocate and make every possible effort to make my research reproducible and openly available. I am a Software Sustainability Institute fellow and a Data and Software Carpentry instructor, affiliated member of the Bioconductor project and a founding member of OpenConCam, our local OpenCon group. My current open researcher activities focus on the Wellcome Trust Open Research Project, where we explore the barriers to open research, and the Bullied Into Bad Science campaign, an initiative by and for early career researchers who aim for a fairer, more open and ethical research and publication environment. Since 2017, I am also part of the eLife Early-career advisory group and a #ASAPbio ambassador.
I moved to Cambridge, UK, in January 2010 to work in the Cambridge Centre for Proteomics on various aspects of quantitative and spatial proteomics, developing new methods and implementing computational tools with a strong emphasis on rigorous and reproducible data analysis. I am also a visiting scientist in the PRIDE team at the European Bioinformatics Institute, and an affiliate teaching staff at the Cambridge Computational Biology Institute. I am currently a PI in the Cambridge Systems Biology Centre where I lead the Computational Proteomics Unit.
As pointed out by D. Donoho, An article about computational science in a scientific publication is not the scholarship itself, it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures. This directly applies to high throughput biology data analysis and I strongly believe that being able to reproduce the complete set of results, replicate an analysis with new data and track the evolution of the work that lead to the scientific novelty are essential aspects of the process of doing research. Hence, I regard the development of scientific software as well as agile and robust analysis methodologies that facilitate reproducible research, an important aspect of my scientific activity.
Clarity and traceability of the data and the analysis methodology enable us to better understand what we do, how and why we do it and consequently exploit the data and comprehend the biology. While not sufficient, these are nevertheless necessary requirements for effective data-driven science.
The collaborative and interdisciplinarity nature of much of the research in biology calls for a open approaches, influenced by the open source and free/libre software movements, from communication between stake holders, open research and development to open dissemination of all research outputs.
My work on the design and implementation of reproducible mass
spectrometry-based proteomics data analysis pipeline has materialised
in the development of the
(Gatto et al., 2012)
package to manipulate, process and analyse quantitative proteomics
MSnbase infrastructure also supports the work on the
statistical learning applied to spatial proteomics (see below). The
synapter package and
the associated publications
(Bond et al., 2013 and
Shliaha et al., 2013)
addresses MSE label-free quantitation, optionally including
ion mobility separation.
In biology, localisation is function: knowledge of the localisation of
proteins is of paramount importance to assess and study their
function, and spatial proteomics is the systematic study of the
sub-cellular localisation of proteins and changes thereof
(Gatto et al., 2010). Since
2010, I have developed novel software and machine learning approaches
enabling more reliable and systematic inference of protein
localisations using quantitative proteomics. This work has
(Gatto et al., 2014)
that implements various established classification algorithms,
effective visualisation techniques
(Gatto el al., 2015) as
well as novelty detection
(Breckels et al., 2013)
and transfer learning, harvesting GO annotations of microscopy-based
methods to improve the spatial resolution of experimental spatial
(Breckels et al., 2016).
My MSc and PhD work, I studied micro-evolutionary genetic patterns of the Broom leaf beetle Gonioctena variabilis in Southern Europe (Gatto el al., 2008), the application of short interspersed mobile elements (SINEs) to study the evolution of cetaceans applicability of the General Time Reversible nucleotide substitution model in the light of differential lineage sorting (Gatto el al., 2006). I also spend 3 years in industry working on genomic and transcriptomics data, in particular the microarray quality control (Shi et al., 2010).
Over the years I have been involved in many teaching activities, ranging from beginners and advanced R courses, genome biology, proteomics bioinformatics, integrative omics, scientific computing as part of the MPhil in Computational Biology in Cambridge, as well as several Software and Data Carpentry bootcamps. All my teaching material is available in the TeachingMaterial repository.
Please do get in touch if you are interested in running workshops.
See also my Google scholar profile.
Mulvey CM, Breckels LM, Geladaki A, Kocevar Britovsek N, Nightingale DJH, Christoforou A , Elzek M, Deery MJ, Gatto L, Lilley KS. Using HyperLOPIT to perform high-resolution mapping of the spatial proteome. Nature Protocols, 12, 1110–1135 (2017) doi:10.1038/nprot.2017.026 (See the F1000Research workflow for details on the computational side of the protocol.)
Leprevost FD, et al. BioContainers: An open-source and community-driven framework for software standardization. Bioinformatics. 2017 Mar 30. doi:10.1093/bioinformatics/btx192. [Epub ahead of print] PubMed PMID:28379341.
Breckels LM, Mulvey CM, Lilley KS and Gatto L. A Bioconductor workflow for processing and analysing spatial proteomics data F1000Research 2016, 5:2926 (doi:10.12688/f1000research.10411.1). [Software: MSnbase, pRoloc, pRolocGUI]
Wieczorek S, Combes F, Lazar C, Giai Gianetto Q, Gatto L, Dorffer A, Hesse A, Coute Y, Ferro M, Bruley C, and Burger T. DAPAR & ProStaR: software to perform statistical analyses in quantitative discovery proteomics Bioinformatics 2016, doi:10.1093/bioinformatics/btw580.
Perez-Riverol Y, Gatto L, Wang R, Sachsenberg T, Uszkoreit J, Leprevost Fda V, Fufezan C, Ternent T, Eglen SJ, Katz DS, Pollard TJ, Konovalov A, Flight RM, Blin K, Vizcaino JA. Ten Simple Rules for Taking Advantage of Git and GitHub. PLoS Comput Biol. 2016 Jul 14;12(7):e1004947. doi:10.1371/journal.pcbi.1004947 PMID:27415786.
Breckels LM, Holden S, Wonjar D, Mulvey CM, Christoforou A, Groen AJ, Kohlbacher O, Lilley KS, Gatto L. Learning from heterogeneous data sources: an application in spatial proteomics. PLoS Comput Biol. 2016 May 13;12(5):e1004920 doi:10.1371/journal.pcbi.1004920, Software)
Fabre B, Korona D, Groen A, Vowinckel J, Gatto L, Deery MJ, Ralser M, Russell S, Lilley KS. Analysis of the Drosophila melanogaster proteome dynamics during the embryo early development by a combination of label-free proteomics approaches, Proteomics, 2016 (PMID:27029218, Publisher)
Lazar C, Gatto L, Ferro M, Bruley C, Burger T. Accounting for the Multiple Natures of Missing Values in Label-Free Quantitative Proteomics Data Sets to Compare Imputation Strategies. J Proteome Res. 2016 Apr 1;15(4):1116-25. (Publisher, PMID:26906401, Software: CRAN and Bioconductor)
Christoforou A, Mulvey CM, Breckels LM, Geladaki A, Hurrell T, Hayward PC, Naake T, Gatto L, Viner R, Arias AM, Lilley KS. A draft map of the mouse pluripotent stem cell spatial proteome. Nat Commun. 2016 Jan 12;7:9992 doi:10.1038/ncomms9992 (PMID:26754106, data, PRIDE, resource)
Mulvey CM, Schröter C, Gatto L, Dikicioglu D, Baris Fidaner I, Christoforou A, Deery MJ, Cho LT, Niakan KK, Martinez-Arias A, Lilley KS. Dynamic proteomic profiling of extra-embryonic endoderm differentiation in mouse embryonic stem cells. Stem Cells. 2015 Jun 8. doi: 10.1002/stem.2067 (PubMed).
Gatto L, Breckels LM, Naake T and Gibb S Visualisation of proteomics data using R and Bioconductor. Proteomics. 2015 Feb 18. doi:10.1002/pmic.201400392 (PubMed, Publisher and software: Bioconductor, github).
Nikolovski N, Shliaha PV, Gatto L, Dupree P and Lilley KS Label free protein quantification for plant Golgi protein localisation and abundance, Plant Physiol. pp.114.245589; First Published on August 13, 2014; doi:10.1104/pp.114.245589 (Publisher, PubMed)
Griss J, et al. The mzTab Data Exchange Format: communicating MS-based proteomics and metabolomics experimental results to a wider audience, Mol Cell Proteomics. 2014 June 30. (Publisher)
Walzer M, et al. qcML: an exchange format for quality control metrics from mass spectrometry experiments, Mol Cell Proteomics. 2014 Apr 23. (PubMed).
Vizcaíno J.A. et al. ProteomeXchange: globally co-ordinated proteomics data submission and dissemination, Nature Biotechnology 2014, 32, 223–226. (PubMed)
Gatto L., Breckels L.M, Burger T, Wieczorek S. and Lilley K.S. Mass-spectrometry based spatial proteomics data analysis using pRoloc and pRolocdata, Bioinformatics, 2014 (software, PubMed, publisher, software and data).
Groen A., Sancho-Andrés G., Breckels LM., Gatto L., Aniento F. and Lilley K.S. Identification of Trans Golgi Network proteins in Arabidopsis thaliana root tissue Journal of Proteome Research, 2013 (PubMed, publisher).
Wilf N.M. et al. RNA-seq reveals the RNA binding proteins, Hfq and RsmA, play various roles in virulence, antibiotic production and genomic flux in Serratia sp. ATCC 39006 BMC Genomics 2013, 14:822.
Shliaha P.V, Bond N.J., Gatto L. and Lilley K.S. The Effects of Travelling Wave Ion Mobility Separation on Data Independent Acquisition in Proteomics Studies, J. Proteome Res., 2013 (PubMed, publisher, software).
Breckels L.M., Gatto L., Christoforou A., Groen A.J., Lilley K.S. and Trotter M.W.B. The Effect of Organelle Discovery upon Sub-Cellular Protein Localisation, Journal of Proteomics, 2013 (PubMed, software).
Gatto L. and Lilley K.S. MSnbase - an R/Bioconductor package for isobaric tagged mass spectrometry data visualisation, processing and quantitation, Bioinformatics, 28(2), 288-289, 2012 (PubMed, pdf, software).
Capuano F., Bond N.J., Gatto L., Beaudoin F., Napier J., Benvenuto E., Lilley K.S, and Baschieri S. LC-MS/MS methods for absolute quantification and identification of proteins associated to chimeric plant oil bodies, Analytical Chemistry, Dec 15;83(24):9267-72, 2011 (PubMed - data).
Foster J.M., Degroeve S., Gatto L., Visser, M., Wang, R., Griss J., Apweiler R. and Martens L. A posteriori quality control for the curation and reuse of public proteomics data, Proteomics 11(11):2182-94, 2011 (PubMed, pdf).
MAQC Consortium The MicroArray Quality Control (MAQC)-II study of common practices for the development and validation of microarray-based predictive models Nature Biotechnology 28, 827–838 2010 (PubMed, pdf).
Gatto L., Mardulyn P. and Pasteels J.M. Morphological and mitochondrial DNA analysis suggest the presence of a hybrid zone between two species of the leaf beetle Gonioctena variabilis species complex in southern Spain, Biological Journal of the Linnean Society, 2008, 94(1), 105-114 (abstract, pdf).
Danis B., George T.C., Goriely S., Dutta B., Renneson J., Gatto L., Fitzgerald-Bocarsly P., Marchant A., Goldman M., Willems F. and De Wit D. Interferon regulatory factor 7-mediated responses are defective in cord blood plasmacytoid dendritic cells. Eur J Immunol. 2008 Feb;38(2):507-17. (PubMed, pdf).
Christoforou A., Mulvey C., Breckels LM., Gatto L. and Lilley KS. Spatial Proteomics: Practical Considerations for Data Acquisition and Analysis in Protein Subcellular Localisation Studies in Quantitative Proteomics, 185-210, The Royal Society of Chemistry, 2014.
Breckels LM, Gibb S, Petyuk V and Gatto L R for Proteomics in Proteome Informatics, The Royal Society of Chemistry, November 2016.
Gatto L. Data Management Plan for a Biotechnology and Biological Sciences Research Council (BBSRC) Tools and ResourcesDevelopment Fund (TRDF) Grant, Research Ideas and Outcomes (2017), doi:10.3897/rio.3.e11624.
I have developed and have contributed to many open source
R/Bioconductor packages, in particular
software and data packages, all of which are available on my
own and my
Gatto and Christoforou, 2014,
Gatto et al., 2014 and
for an overview of the R/Bioconductor infrastructure for mass
spectrometry and proteomics.
Current plans are to be at the Howard Hughes Medical Institute in Chevy Chase, MD in February 2018 and at the Northeastern University in Boston, MA in May 2018.
A talk presenting R/Bioconductor for proteomics and applications at the Sainsbury Laboratory in Norwich on the 15 January 2018.
Open source and open development proteomics software at the EuBIC 2018 developer’s meeting, 9 - 12 January 2018, Ghent, Belgium.
Mapping the sub-cellular proteome, 8 November 2017, Leibniz Institut for Aging, Jena, Germany.
Open Science in Practice, 25 September 2017, Lausanne, Switzerland. An early career researcher’s view on modern and open scholarship.
Proteomics Method Forum, Oxford, UK, 22-23 June 2017. The Bioconductor project - analysis and comprehension of high-throughput proteomics data.
Research Data Management Forum, London, UK, 9th June 2017. An early career researcher’s view on modern and open scholarship … and careers.
Office of Scholarly Communication Training - How to Get the Most Out of Modern Peer Review, Cambridge, UK, 30 Mar 2017. The role of peer-reviewers in promoting open science.
Cambridge Computational Biology Institute, UK, 16 November 2016. Mapping the sub-cellular proteome: Computational analyses of high-throughput mass spectrometry-based spatial proteomics data.
Dialogue on methods for ecology, Cambridge, UK, 15 November 2016, Learning from heterogeneous data in spatial proteomics.
Introduction to Integrative Omics: proteomics, European Bioinformatics Institute, Hinxton, UK, 8 March 2016.