Laurent Gatto
1 July 2019
The Reproducible research chapter in the Intoduction to Bioinformatics course at http://bit.ly/WSBIM1207
Direct link: https://uclouvain-cbio.github.io/WSBIM1207/sec-rr.html
The table below summerises different levels of reproducible research focusing on data and code in computational projects.
Same data | Different data | |
---|---|---|
Same code | ||
Different code |
Reproducible research refers to research that can be reproduced under various conditions and by different people.
The table below summerises different levels of reproducible research focusing on data and code in computational projects.
Same data | Different data | |
---|---|---|
Same code | Repeat | |
Different code |
Repeat my experiment, i.e. obtain the same tables/graphs/results using the same setup (data, software, …) in the same lab or on the same computer.
The table below summerises different levels of reproducible research focusing on data and code in computational projects.
Same data | Different data | |
---|---|---|
Same code | Repeat | Reproduce |
Different code | Reproduce |
Reproduce an experiment (not ones own), i.e. obtain the same tables/graphs/results in a different lab or on a different computer, using the same or similar setup.
The table below summerises different levels of reproducible research focusing on data and code in computational projects.
Same data | Different data | |
---|---|---|
Same code | Repeat | Reproduce |
Different code | Reproduce | Replicate |
Replicate an experiment, i.e. obtain the same (similar enough) tables/graphs/results in a different set up.
The table below summerises different levels of reproducible research focusing on data and code in computational projects.
Same data | Different data | |
---|---|---|
Same code | Repeat | Reproduce |
Different code | Reproduce | Replicate |
Finally, re-use the information/knowledge from one experiment to run a different experiment with the aim to confirm results from scratch.
There are many reasons to work reproducibly, and Markowetz (2015) nicely summarises 5 good reasons. Importantly, he stressed out that the first beneficiary of reproducible work are the student/research that apply these principles:
knitr
and rmarkdown
knitr::knit
converts the Rmd
into md
by executing the code chunks and replacing the code by its output (text, tables, figures, …).
The md
file is then compiled into the desired output format (typically html or pdf) using pandoc
.
In practice, in R, these two steps are automatically handled in one go by rmarkdown::render()
.
An Rmarkdown (Rmd) document is composed of
An optional YAML header, delimited by ---
.
Text in simple markdown format.
One or more R code chunks delimited by three backticks. Each code chunk can be uniquely named and parametrised with a set of code chunk options.
RStudio also supports Notebook documents[^jupyter] that execute individual code chunks independently and display directly in the source document.
Caching with cache = TRUE
.
Interactive tables with DT::datatable()
.
Packages and version with sessionInfo()
.
RStudio also supports Notebook documents that execute individual code chunks independently and display directly in the source document.
Using Rmarkdown, it is also possible to produce slides, websites, and complete books, interactive documents and R package vignettes.