Becoming a better scientist with open and reproducible research

May 16, 2019

Who

I am Laurent Gatto, de Duve Institute, UCLouvain, Belgium.

Open science, reproducible research, data champion, computational
biology, proteomics, more omics, emacs, a lot of R, quite a bit of
running, and parenting.

More about me: https://lgatto.github.io/

Slides: https://lgatto.github.io/2019_05_16_rr_Delft/

Blog post: https://lgatto.github.io/rr-publ/

The material is available under CC-BY.

Disclaimer: I do not speak from authority. I speak of personal experience. My experience is in computational biology, bioinformatics and high throughput biology data. My experience doesn't directly translate to other fields or domains (for example when it comes to data privacy) or even to other personalities in the same field.

Notes

A piece of open research doesn't automatically make it good, where good is defined as of high academic quality.

A piece of reproducible research doesn't automatically make it good, where good is defined as of high academic quality.

Open and reproducible research

Open != reproducible

Open research and reproducible research aren't the same thing, and one doesn't imply the other. They are historically also very different.

Many people seem to think that #OpenScience & the reproducibility crisis in psychology are somehow causally related. They are not. Open science is decades old & did not focus on reproducibility as a single issue — more here: https://t.co/KpJHIEqPj3 & here: https://t.co/KdMeK6PCUT pic.twitter.com/qF5yPTqNqu
— Olivia Guest | Ολίβια Γκεστ (@o_guest) December 1, 2018

The Mertonian norms (1942)

Communism: all scientists should have common ownership of scientific goods (intellectual property), to promote collective collaboration; secrecy is the opposite of this norm.
Universalism: scientific validity is independent of the sociopolitical status/personal attributes of its participants.
Disinterestedness: scientific institutions act for the benefit of a common scientific enterprise, rather than for the personal gain of individuals within them
Organised scepticism: scientific claims should be exposed to critical scrutiny before being accepted: both in methodology and institutional codes of conduct.

There isn't only one type of open science

Open science has seen a continuous evolution since the 17th century, with the advent of dissemination of research in scientific journals and the societal demand to access scientific knowledge at large. Technology and communication has further accelerated this evolution, and put it in the spot light among researchers and academics (for examples funder mandates) and more widely in the press with the cost of publications (see for example this Guardian long read article Is the staggeringly profitable business of scientific publishing bad for science? or the Paywall movie).

Open science/research is the process of transparent dissemination and access to knowledge, that can be applied to various scientific practices (The six principles of open science image from Wikipedia):

As a result

Open science/research can mean different things to different people, in particular when declined it along its many technical and philosophical attributes.

Take home message:

Open isn't binary, it's a gradient, it's multidisciplinary, it's multidimensional.

How to be an open scientist:

Let's be open and understanding of different situations and constraints.

Why becoming an open research practitioners (1)

It's the right thing to do. See the The Mertonian norms… Or is it?
Benefits for your academic career: some examples from the Open as a career boost paragraph
Networking opportunities
Get more funding: Meet funder requirements, and qualify for special funds such as the Wellcome Trust Open Research Fund

Why becoming an open research practitioners (2)

Get that promotion: Open research is increasingly recognised in promotion and tenure. See also Reproducibility and open science are starting to matter in tenure and promotion July 14th, 2017, Brian Nosek) and the EU's Evaluation of Research Careers fully acknowledging Open Science Practice defines an Open Science Career Assessment Matrix (OS-CAM):

But are there any risks?

Does it take more time to work openly?

Isn't it worth investing time is managing data in a way that others (including future self) can find and understand it? That's, IMHO, particularly important from a group leader's perspective, where I want to build a corpus of data/software/research that other lab members can find, mine and re-use.

Are senior academics always supportive?

No.

Is there a risk of being scooped?

There certainly is a benefit if releasing one's research early!

But, importantly, working with open and reproducible research in mind doesn't mean releasing everything prematurely, it means

managing research in a way one can find data and results at every stage
one can reproduce/repeat results, re-run/compare them with new data or different methods/parameters, and
one can release data (or parts thereof) when/if appropriate.

So, are there any risks?

The Bullied Into Bad Science campaign is an initiative by early career researchers (ECRs) for early career researchers who aim for a fairer, more open and ethical research and publication environment.

Bullied Into Bad Science

Why reproducibility is important

For scientific reasons: think reproducibility crisis.
For political reasons: public trust in science, in data, in experts; without (public) trust in science and research, there won't be any funding anymore.

But what do we mean by reproducibility?

Repeat my experiment i.e. obtain the same tables/graphs/results using the same setup (data, software, …) in the same lab or on the same computer.
Reproduce an experiment (not mine) i.e. obtain the same tables/graphs/results in a different lab or on a different computer, using the same setup.
Replicate an experiment, i.e. obtain the same (similar enough) tables/graphs/results in a different set up.
Finally, re-use the information/knowledge from one experiment to run a different experiment with the aim to confirm results from scratch.

(From a But what to we mean by reproducibility? blog post.)

Another view (from a talk by Kirstie Whitaker):

	Same Data	Different Data
Same Code	reproduce	replicate
Different Code	robust	generalisable

See also this opinion piece by Jeffrey T. Leek and Roger D. Peng, Reproducible research can still be wrong: Adopting a prevention approach.

From

Gabriel Becker An Imperfect Guide to Imperfect Reproducibility May Institute for Computational Proteomics, Boston, 2019.

(Computational) Reproducibility Is Not The Point

But rather

The goal is trust, verification and guarantees

The goal is trust, verification and guarantees

Trust in Reporting - result is accurately reported
Trust in Implementation - analysis code successfully implements chosen methods
Statistical Trust - data and methods are (still) appropriate
Scientific Trust - result convincingly supports claim(s) about underlying systems or truths

Take home messages

Reproducibility isn't binary, it's a gradient, it's multidisciplinary, it's multidimensional.

Reproducibility isn't easy.

(But then why becoming a reproducible research practitioners?)

Why becoming a reproducible research practitioners

Florian Markowetz, Five selfish reasons to work reproducibly, Genome Biology 2015, 16:274.

And so, my fellow scientists: ask not what you can do for reproducibility; ask what reproducibility can do for you! Here, I present five reasons why working reproducibly pays off in the long run and is in the self-interest of every ambitious, career-oriented scientist.

Five selfish reasons to work reproducibly

reproducibility helps to avoid disaster
reproducibility makes it easier to write papers
reproducibility helps reviewers see it your way
reproducibility enables continuity of your work
reproducibility helps to build your reputation

And career perspectives: Faculty promotion must assess reproducibility.

What can you do to improve trust in (your) research?

Be an open research practitioners
Be an reproducible research practitioners

Includes (but not limited to)

Preprints are the best!

Read, post, review and cite preprints (see ASAPbio for lots of resources about preprints).

Promoting open research through peer review

Accept sound/valid research and provide constructive comments

and hence

Focus firstly on the validity of the research by inspecting the data, software and method. If the methods and/or data fail, the rest is meaningless.

(See The role of peer-reviewers in promoting open science for details.)

Availability: Are the data/software/methods accessible and understandable in a way that would allow an informed researcher to reproduce and/or verify the results underlying the claims?
Meta-data: It's of course not enough to provide a wild dump of the data/software/…, but these need to be appropriately documented.
Do numbers match?: The first thing when reproducing someone's analysis is to match the data files to the experimental design.
What data, what format: Is the data readable with standard and open/free software? Are the raw and processed available, and have the authors described how to get from one to the other?
License: Is the data shared in a way that allows users to re-use it. Under what conditions?

Make sure that the data adhere to the FAIR principles:

Findable and Accessible and Interoperable and Reusable

(and clearly, supplementary information in research manuscripts don't comply!)

As a quick note, my ideal review system would be one where

Submit your data to a repository, where it gets checked (by specialists, data scientists, data curators) for quality, annotation, meta-data.
Submit your research with a link to the peer reviewed data. First review the intro and methods, then only the results (to avoid positive results bias).

When talking about open research and peer review, one logical extension is open peer review. But …

Open peer review

When talking about open research and peer review, one logical extension is open peer review.

While I personally value open peer review and practice it when possible, it can be a difficult issue for ECRs, exposing them unnecessarily when reviewing work from prominent peers. It also can reinforce an already unwelcoming environment for underrepresented minorities. See more about this in the Inclusivity: open science and open science section below.

Registered reports

Define you data collection and analysis protocol in advance. Get it reviewed and, if accepted, get right to publish once data have been collected and analysed, irrespective of the (positive or negative) result.

See Open science challenges, benefits and tips in early career and beyond (2019).

Three challenges: Restrictions on flexibility, The time cost, Incentive structure isn’t in place yet.

Three benefits: Greater faith in research, New helpful systems (see technical solutions below), Investment in your future.

Make allies

This is very important!

Other ECR
Librarians
Data stewards/champions
Research Software engineers
On/off-line networking

Collaborative work and cooperation is certainly one important concept that gravitates around open science/research (see the Mertonian norm of communism), but that isn't necessary nor sufficient for open science.

Open research can lead to collaborative research. The development of MSnbase is an example I am very proud of.

The credit goes to the outstanding contributions and contributors!
— Laurent Gⓐtt⓪ (@lgatt0) May 6, 2019

Just do it!

Build openness at the core your research

(according to you possibilities)

Open and reproducible research doesn't work if it's an afterthought.

Technical solutions

Scripting, scripting, scripting (applies to code, data, analyses, manuscripts, …).
Avoid manual steps.
Document everything, especially manual steps (which you should avoid anyway).
Version control, such as git/github, bitbucket, …
Literate analyses: reproducible documents with R markdown, Sweave (R with LaTeX), Juyter notebooks, …
Shareable compute environments (docker containers).
Document and share all artefacts related to your research (when possible): data, code, protocols, …

Some examples of my own research

Spatial proteomics software: systematic and high throughput analysis of sub-cellular protein localisation (… which is all reproducible, by the way.)

Software: pRoloc.
QSep: quantify resolution of a spatial proteomics experiment (a function within pRoloc).
SpatialMap: visualisation and data sharing platform for spatial proteomics.

Inclusivity

Open research and open research

There is

Open Science as in widely disseminated and openly accessible

and

Open Science as in inclusive and welcoming

The primary value proposition of #openscience is that diverse contributions allow better critique, refinement, and application 3/n
— CⓐmeronNeylon (@CameronNeylon) August 10, 2017

It was a damned hard community to break into. Any step I took to be more open, I felt attacked for not doing enough/doing it right.
— Christie Bahlai (@cbahlai) June 4, 2017

Even if there are efforts to promote diversity, under-represented minorities (URM) don't necessarily feel included. When it comes to open science/research URMs can be further discriminated against by greater exposure or, can't always afford to be vocal.

Not everybody has the privilege to be open.
There are different levels in how open one wants, or how open one could afford to be.
Every voice and support is welcome.

Conclusions (1)

Standing on the shoulders of giants only really makes sense in the context of open and reproducible research.

If you are here (or have read this), chances are you are on the path towards open and reproducible research.
You are the architect of the kind of research and researcher you want to become. I hope these include openness and reproducibility.

Conclusions (2)

Standing on the shoulders of giants only really makes sense in the context of open and reproducible research.

It's a long path, that constantly evolves, depending on constraints, aspirations, environment, …
The sky is the limit, be creative: work out the (open and reproducible) research that works for you now …
… and that you want to work (for you and others) in the future.

Acknowledgements

One of my advice was to make allies. I have been lucky to meet wonderful allies and inspiring friends along the path towards open and reproducible research that works for me. Among these, I would like to highlight Corina Logan, Stephen Eglen, Marta Teperek, Kirstie Whitaker, Chris Hartgenink, Naomie Penfold, Yvonne Nobis.