Jekyll2024-03-15T10:06:37+01:00https://lgatto.github.io/feed.xml~/Laurent Gatto's online homeLaurent GattoHUPO ECR Online Panel Discussion - Getting recognised for your work2024-02-28T00:00:00+01:002024-02-28T00:00:00+01:00https://lgatto.github.io/ecr-recognition<p>The HUPO Early Career Researcher (ECR) committee has organised a
discussion panel on <em>Getting recognised for your work</em> and have asked
me to participate - thank you! I am always keen on such events,
organised by and for ECRs.</p>
<h2 id="introductions">Introductions</h2>
<p>The first part of the panel is a short 5-minute introduction of the
panellists, including <a href="https://www.malakerlab.com/">Prof Stacy
Malaker</a> from Yale University and <a href="https://www.ebi.ac.uk/people/person/juan-vizcaino/">Dr
Juan Antonio
Vizcaino</a> from the
EMBL-EBI and myself.</p>
<p>I prepared this career flow chart as a visual aid:</p>
<p><img src="/images/2024-ecr-recognition-careerpath.png" alt="Career path" /></p>
<ul>
<li>I earned my PhD in 2006, from the <a href="https://www.ulb.be/">Free University of
Brussels</a> (ULB). My PhD work focused on the
evaluation of different types of evolutionary genetic markers to
study cetaceans phylogeny.</li>
<li>During my PhD (probably around 2004 or so), when my work and
interests shifted toward bioinformatics, I started a part-time
degree in computer science at the <a href="https://unamur.be/">University of
Namur</a>. That lasted until I left Belgium for the
UK in 2010, after completing all my exams, but before finishing my
masters project - I never graduated.</li>
<li>After my PhD, I worked for 3 years in industry, in a small company
(we probably were about 15 employees). I didn’t see much point in
continuing in academia at that point, considering my experience so
far and my personal situation. The environment and general
atmosphere was very much like an academic lab, with many more
collaborations within the team, and clear and common
objectives. This goal-oriented work environment was a very
refreshing experience that has been influential for the next steps
of my career. At some point, I felt I was starting to run in circles
and got a chance to move back to academia, at the <a href="https://www.cam.ac.uk/">University of
Cambridge</a> nonetheless.</li>
<li>In 2010, I started a post-doctoral research associate (PDRA)
position in the <a href="https://proteomics.bio.cam.ac.uk/">Cambridge Centre for
Proteomics</a>, working on mass
spectrometry-based proteomics.</li>
<li>In 2013, I got promoted to senior research associate (SRA), which
allowed me to earn some grants as main PI and develop a small
research team.</li>
<li>In 2018, I joined the
<a href="https://uclouvain.be/fr/index.html">UCLouvain</a> as a professor of
bioinformatics. I teach in the <a href="https://uclouvain.be/fr/facultes/fasb">faculty of pharmacy and biomedical
sciences</a> (FASB) and run the
CBIO <a href="https://lgatto.github.io/cbio-lab/">computational research
group</a> in the <a href="https://www.deduveinstitute.be/">de Duve
Institute</a>.</li>
</ul>
<p>An interesting fact is that I started working on DNA during my PhD,
moved on with RNA in the private company, and since moving back to
academia, I have been focusing on proteins: my career followed the
main path of the <a href="https://en.wikipedia.org/wiki/Central_dogma_of_molecular_biology">central dogma of molecular
biology</a>.</p>
<p>To give more context to my career path, I also highlight some other
activities and interests, that have guided and supported my academic
activities.</p>
<ul>
<li>
<p>I started to realise the importance of open and reproducible
research around 2010, both with respect to the rigour of doing
research, but also in the light of the (at times) <a href="http://bulliedintobadscience.org/">oppressive and
restrictive global research
environment</a> ECR have to
endure. The desire for others to benefit from my research by making
it as open, collaborative and reproducible as possible, and being
<a href="https://lgatto.github.io/open-and-rr-2/">vocal about it</a>, has
followed me since then.</p>
</li>
<li>
<p>The <a href="https://bioconductor.org/">Bioconductor project</a> has been
instrumental for me. It has allowed me over the years to meet and be
influenced by outstanding scientists, and has offered an
international environment in which I was able to grow and
flourish. I published my first (now retired) <a href="https://bioconductor.org/packages/3.12/bioc/html/yaqcaffy.html">Bioconductor
package</a>
around 2007 (Bioconductor 2.2 and R 2.7), and many more followed. I
am a Bioconductor package reviewer, a member of the European
Bioconductor (EuroBioc) conference organisation committee (I was a
local organiser for a handful EuroBioc conferences in Cambridge, in
2019 in Brussels, and in <a href="https://eurobioc2023.bioconductor.org/">2023 in
Ghent</a>), have been part
until recently of the <a href="https://bioconductor.org/about/code-of-conduct/">Code of
Conduct</a> committee,
am part of the social media working group, I co-lead the <a href="https://bioconductor.org/help/education-training/">Teaching
committee</a>, am
since 2018 member of the <a href="https://bioconductor.org/about/technical-advisory-board/">Technical Advisory
Board</a>,
and co-created, in 2021, the <a href="https://bioconductor.org/about/european-bioconductor-society/">European Bioconductor
Society</a>.</p>
</li>
</ul>
<h2 id="recognition">Recognition</h2>
<p>The panellists were also asked to comment on how they have been
recognised for their work, with a particular emphasis their time as
ECRs. This is of course very subjective, and I’m not sure if my answer
will reflect how I have been recognised (assuming I have), or how I
hope I have been. I will also try to look beyond papers, the obvious
academic outputs - without those, there is little chance to get
academic recognition. This is of course a major problem, as there’s
much more to research than papers:</p>
<blockquote>
<p>An article about computational science in a scientific publication
is not the scholarship itself, it is merely advertising of the
scholarship. The actual scholarship is the complete software
development environment and that complete set of instructions that
generated the figures.</p>
</blockquote>
<p>[Buckheit and Donoho 1995, after Claerbout]</p>
<p>I think I’m known for the computational development and applications
in spatial (2010…) and single-cell proteomics (2018…), my efforts
to produce open and reproducible research, open and collaborative
software development, my R/Bioconductor contributions (some packages
have been around for since 2010) as well as my involvement in
teaching, such as international workshops (for example the mythical -
for me at least - Bioconductor <a href="https://csama2024.bioconductor.eu/">CSAMA
course</a> workshop).</p>
<p>One noteworthy aspect of my publication strategy, that highlights my
efforts for openness and reproducibility, is the workflow that
typically starts with the release of the software (more often than not
after review, on Bioconductor), then the publication of a pre-print
with code to reproduce the analyses and, eventually, a peer-reviewed
paper.</p>
<p><img src="/images/2024-ecr-recognition-pubworkflow.png" alt="open software, pre-print and paper publication workflow" /></p>
<p>I think I have gained some reputation as someome having expertise in
computational quantitative proteomics, including demonstrable
<em>technical skills</em>, in addition to more standard <em>scientific/academic
output</em>.</p>
<p>In terms of recognition, I suppose that invitations to give talks (for
scientific outputs), teach at workshops (pedagogical and technical
skills) and to submit papers are obvious goals. Being recognised for
my open and collaborative contributions with a <a href="https://bioconductor.org/about/awards/">Bioconductor community
award</a> is one of my proudest
moments. High on the list are also the
<a href="https://lgatto.github.io/msnbase-contribs/">many</a>
<a href="https://lgatto.github.io/msnbase-contribs-2/">contributions</a> that the
<a href="https://bioconductor.org/packages/release/bioc/html/MSnbase.html">MSnbase</a>
package benefited from - some of these indirectly initiated the
collaborations that lead to the creation of the <a href="https://www.rformassspectrometry.org/">R for Mass
Spectrometry</a> initiative.</p>
<p>But what matters the most, in my eyes, and what in the end is the most
meaningul recognition, are the (shared) <strong>values</strong> that we promote
with the research we do, and the <strong>intrinsic motivation</strong> that drive
us.</p>
<h2 id="questions">Questions</h2>
<p>We were also asked to prepare answers to three short questions. These
have been pre-determined by the HUPO ECR to get things going and give
us, the panellists, a chance to think about the comments we would like
to make.</p>
<blockquote>
<p>When hiring a new postdoctoral researcher for your group, what are
the most important attributes for them to have on their CV? What do
you look for other than publication history?</p>
</blockquote>
<p>Here are a couple of things I look for, and that I consider absolutely
essential, much more important than papers. Papers are only one of the
attributes that will help me assess the following:</p>
<ul>
<li>Does the candidate’s skills match the project’s needs?</li>
<li>What are <strong>concrete signs of mastery</strong>? I perform regular (and
constructive) appraisals with the researchers in my group, and one
question in that appraisal is “What do you want to become an expert
in?”. In a CV, I want to find what the candidate is an expert in,
whey they can teach me/bring to the lab.</li>
<li>I also need to see public/open <code class="language-plaintext highlighter-rouge">code</code>, such as for example
<strong>active</strong> Github/Gitlab profiles and repositories and
<strong>contributions</strong> (to their or other’s code base).</li>
</ul>
<p>And of course, last but not least, will the person be a good lab/team
member? It is of course very difficult (and arguably subjective) to
assess, but we (the lab) will be attentive to red flags pointing to
the contrary. In case of doubt, I will invite the candidate on site if
the interview (always with the whole group) was remote. I wouldn’t
want to take any risks that could harm the cohesion and well-being of
the group.</p>
<blockquote>
<p>How would you recommend that ECRs promote their work other than
research e.g., teaching, outreach, committee work? Is there anything
they can do other than add a line on their CV?</p>
</blockquote>
<ul>
<li>Yes, ‘add lines’ to your CV, but not at all costs. Be pragmatic! Not
need to run or teach workshops several time a year to demonstrate
that you have done some teaching. Don’t forget that your post-doc
years should be the most productive research-wise of your career!</li>
<li>Promote <strong>your research</strong>, don’t be a vehicle for some else’s
research (typically your advisor), don’t limit yourself to merely
doing it. Show how you go the extra mile - for example by delivering
reproducible research.</li>
<li><strong>Do things you like!</strong> Nothing beats motivation when it comes to
convincing others that you are good at what you do.</li>
</ul>
<blockquote>
<p>How much does networking (either via social media or in-person
meetings) play a role in promoting your work?</p>
</blockquote>
<p>It is <strong>very important!</strong> Networking are opportunities to learn,
share, discuss, and make yourself known, … Networking can be hard
though, so don’t be too hard on yourselves. It takes time.</p>
<p>Here’s a simple example illustrating the importance of building a
network: I happily spare myself organising and running interviews when
I can find a candidate in my direct or indirect network.</p>
<p>To build that network, there’s of course the in-person or remote
conferences and workshops, there may be social media (might not be for
everybody), but also Github issues and code and documentation
contributions and typo fixes. The large and small contributions are
very concrete examples that address the first question above.</p>Laurent GattoThe HUPO Early Career Researcher (ECR) committee has organised a discussion panel on Getting recognised for your work and have asked me to participate - thank you! I am always keen on such events, organised by and for ECRs.The Grid poster, in R2024-02-18T00:00:00+01:002024-02-18T00:00:00+01:00https://lgatto.github.io/the-grid-in-r<p>The <a href="https://museel.be/">MuseeL</a> is the
<a href="https://uclouvain.be/">UCLouvain</a> University museum in
Louvain-la-Neuve. Highly recommended. It’s located to the lively
<em>place des sciences</em>, in a nice brutalist style building, formerly the
science university library. If you ever spend some time in
Louvain-la-Neuve, do spare a couple of hours to visit it.</p>
<p>As an academic and an ‘amis du musée’, I can get in for free, and
sometimes enjoy the quiet and rather unique atmosphere to get some
work. The previous exhibition, named <a href="https://museel.be/fr/evenement/visite-guidee/decouvrir-lexposition-grid-0">The
Grid</a>,
was dedicated to the use of a grid in science. The poster and book of
the exhibition, shown below, shows a grid, formed of smaller, slightly
irregular squares. I thought this was a funny example to reproduce in
R.</p>
<p><img src="/images/grid-poster.jpg" alt="The Grid poster" /></p>
<p>The first thing I need is the be able to draw squares. The
<code class="language-plaintext highlighter-rouge">plotSquare()</code> function below plots on of width <code class="language-plaintext highlighter-rouge">width</code> at positions
<code class="language-plaintext highlighter-rouge">x</code> and <code class="language-plaintext highlighter-rouge">y</code>.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">plotSquare</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="k">function</span><span class="p">(</span><span class="n">x</span><span class="p">,</span><span class="w"> </span><span class="n">y</span><span class="p">,</span><span class="w"> </span><span class="n">width</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="n">x1</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">x</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="p">(</span><span class="n">width</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="m">2</span><span class="p">)</span><span class="w">
</span><span class="n">y1</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">y</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="p">(</span><span class="n">width</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="m">2</span><span class="p">)</span><span class="w">
</span><span class="n">x2</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">x1</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">width</span><span class="w">
</span><span class="n">y2</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">y1</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="n">width</span><span class="w">
</span><span class="n">rect</span><span class="p">(</span><span class="n">x1</span><span class="p">,</span><span class="w"> </span><span class="n">y1</span><span class="p">,</span><span class="w"> </span><span class="n">x2</span><span class="p">,</span><span class="w"> </span><span class="n">y2</span><span class="p">)</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>
<p>Assuming I want an <code class="language-plaintext highlighter-rouge">nsq</code> by <code class="language-plaintext highlighter-rouge">nsq</code> grid of squares, below, I define
that value to be 10, to draw a total of 100 squares.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">## Number of squares</span><span class="w">
</span><span class="n">nsq</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="m">10</span><span class="w">
</span></code></pre></div></div>
<p>I also want some jitter, i.e. some random displacements from a perfect
10 by 10 alignment, set by the <code class="language-plaintext highlighter-rouge">amount</code> variables.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">## amount of square jittering</span><span class="w">
</span><span class="n">amount</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="m">1.2</span><span class="w">
</span></code></pre></div></div>
<p>Finally, I need to define how much space is dedicated to the border
between the squares.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">## border ratio</span><span class="w">
</span><span class="n">ratio</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="m">0.2</span><span class="w">
</span></code></pre></div></div>
<p>Assuming that the grid will have a width and a height of 100
(arbitrary) unites, below I define the width <code class="language-plaintext highlighter-rouge">sq_w</code> of a square,
considering the number of squares and the space that is dedicated to
the border between squares. One I have the with of a square, I can
compute the width <code class="language-plaintext highlighter-rouge">border_w</code> of the border between two squares.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">sq_w</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="p">(</span><span class="m">100</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="n">nsq</span><span class="p">)</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="p">(</span><span class="m">1</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">ratio</span><span class="p">)</span><span class="w">
</span><span class="n">border_w</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="p">(</span><span class="m">100</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="p">(</span><span class="n">nsq</span><span class="w"> </span><span class="o">*</span><span class="w"> </span><span class="n">sq_w</span><span class="p">))</span><span class="w"> </span><span class="o">/</span><span class="w"> </span><span class="p">(</span><span class="n">nsq</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="m">1</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p>I can now compute the x and y position of my squares. Given that my
final grid is a square itself, these x and y positions apply to rows
and columns of squares.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">pos</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">seq</span><span class="p">(</span><span class="n">border_w</span><span class="p">,</span><span class="w"> </span><span class="m">100</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">border_w</span><span class="p">,</span><span class="w">
</span><span class="n">length.out</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">nsq</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p>We can now produce the figure. I first define the margins of my plot
with the <code class="language-plaintext highlighter-rouge">par</code> function: the margins have width 1 and outer
margins 0. The <code class="language-plaintext highlighter-rouge">plot()</code> function doesn’t plot anything (`type = “n”),
no axes, no frame, no labels. It however sets a grid itself, ranging
from -2 to 100, to accommodate my squares and borders.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">par</span><span class="p">(</span><span class="n">mar</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">rep</span><span class="p">(</span><span class="m">1</span><span class="p">,</span><span class="w"> </span><span class="m">4</span><span class="p">),</span><span class="w"> </span><span class="n">oma</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="nf">rep</span><span class="p">(</span><span class="m">0</span><span class="p">,</span><span class="w"> </span><span class="m">4</span><span class="p">))</span><span class="w">
</span><span class="n">plot</span><span class="p">(</span><span class="m">-2</span><span class="o">:</span><span class="p">(</span><span class="m">100</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">border_w</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="m">2</span><span class="p">),</span><span class="w"> </span><span class="m">-2</span><span class="o">:</span><span class="p">(</span><span class="m">100</span><span class="w"> </span><span class="o">-</span><span class="w"> </span><span class="n">border_w</span><span class="w"> </span><span class="o">+</span><span class="w"> </span><span class="m">2</span><span class="p">),</span><span class="w">
</span><span class="n">type</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"n"</span><span class="p">,</span><span class="w"> </span><span class="n">xaxt</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"n"</span><span class="p">,</span><span class="w"> </span><span class="n">yaxt</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">"n"</span><span class="p">,</span><span class="w">
</span><span class="n">xlab</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span><span class="w"> </span><span class="n">ylab</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="s2">""</span><span class="p">,</span><span class="w">
</span><span class="n">frame.plot</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="kc">FALSE</span><span class="p">)</span><span class="w">
</span></code></pre></div></div>
<p>The last step is to place the squares. The x and y positions are
symmetrical, i.e defined by the <code class="language-plaintext highlighter-rouge">pos</code> variable above: the lines and
columns are <code class="language-plaintext highlighter-rouge">pos[1]</code>, <code class="language-plaintext highlighter-rouge">pos[2]</code>, …, respectively, and the squares are
added line by line, starting at line at <code class="language-plaintext highlighter-rouge">pos[1]</code>. A little amount of
noise (defined by <code class="language-plaintext highlighter-rouge">amount</code> above) is added to the actual x and y
position by the <code class="language-plaintext highlighter-rouge">jitter()</code> function.</p>
<div class="language-r highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span><span class="w"> </span><span class="p">(</span><span class="n">y</span><span class="w"> </span><span class="k">in</span><span class="w"> </span><span class="n">pos</span><span class="p">)</span><span class="w"> </span><span class="p">{</span><span class="w">
</span><span class="n">pos_x</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">jitter</span><span class="p">(</span><span class="nf">rep</span><span class="p">(</span><span class="n">y</span><span class="p">,</span><span class="w"> </span><span class="n">nsq</span><span class="p">),</span><span class="w"> </span><span class="n">amount</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">amount</span><span class="p">)</span><span class="w">
</span><span class="n">pos_y</span><span class="w"> </span><span class="o"><-</span><span class="w"> </span><span class="n">jitter</span><span class="p">(</span><span class="n">pos</span><span class="p">,</span><span class="w"> </span><span class="n">amount</span><span class="w"> </span><span class="o">=</span><span class="w"> </span><span class="n">amount</span><span class="p">)</span><span class="w">
</span><span class="n">plotSquare</span><span class="p">(</span><span class="n">pos_x</span><span class="p">,</span><span class="w"> </span><span class="n">pos_y</span><span class="p">,</span><span class="w"> </span><span class="n">sq_w</span><span class="p">)</span><span class="w">
</span><span class="p">}</span><span class="w">
</span></code></pre></div></div>
<p>The final output (with <code class="language-plaintext highlighter-rouge">set.seed(123)</code>), with the parameter above is
show here.</p>
<p><img src="/images/grid123.png" alt="The R Grid" /></p>
<p>The full script is available
<a href="https://gist.github.com/lgatto/4fa4b3a8a6668a6b755b47da40d8ca81">here</a>. The
fun part is of course to play with the parameters, which is left as an
exercise for the reader :-).</p>Laurent GattoThe MuseeL is the UCLouvain University museum in Louvain-la-Neuve. Highly recommended. It’s located to the lively place des sciences, in a nice brutalist style building, formerly the science university library. If you ever spend some time in Louvain-la-Neuve, do spare a couple of hours to visit it.Podcasts du LLL: repensons l’évaluation2024-02-17T00:00:00+01:002024-02-17T00:00:00+01:00https://lgatto.github.io/podcast-eval<p>Une fois n’est pas coutume, un billet en français, pour attirer votre
attention sur les podcasts du LLL. Le LLL, ou <a href="https://uclouvain.be/fr/etudier/lll/a-propos.html">Louvain Learning
Lab</a> accompagne
tous les acteurs et actrices de la formation de l’UCLouvain dans leurs
activités d’enseignement. En plus, c’est une équipe super chouette!</p>
<p>Parmi leurs
<a href="https://uclouvain.be/fr/etudier/lll/les-podcasts-du-lll.html">podcasts</a>,
il y a celui qui se penche sur <a href="https://www.podcastics.com/podcast/repensons-les-evaluations/">l’évaluation des
acquis</a>
des étudiant(e)s, réalisé et écrit par Emilie Malcourant, que je vous
conseille.</p>
<p>Parmi les points abordés, il y avait la question</p>
<blockquote>
<p>Quelle serait l’évaluation idéale?</p>
</blockquote>
<p>qui a suscité la réflexion suivante.</p>
<p>L’évaluation se doit avant tout d’être au service de la
formation. L’évaluation certificative est pour moi une voie sans
issue, que j’ai beaucoup de mal à percevoir comment elle fait partie
de ma mission d’enseignement.</p>
<p><img src="/images/eval.png" alt="Évaluations certificative :-( et formative :-)" /></p>
<p>Pour moi, une évaluation idéale, c’est une formalité, c’est une
évaluation qui n’a pas raison d’être, car les intervenant(e)s du cours
savent que les étudiant(e)s maîtrisent la matière, et que l’évaluation
finale n’est qu’une formalité, et qu’elle n’est donc plus nécessaire.</p>
<p>Le but d’un enseignement serait donc de rendre l’évaluation
certificative irrelevante, de faire en sorte qu’elle ne soit plus
pertinente, de la rendre impertinente.</p>
<p>Edit (2024-02-19): The emoticons in my simple chart above use the
number <code class="language-plaintext highlighter-rouge">8</code> for the eyes, rather than the standard <code class="language-plaintext highlighter-rouge">:</code> because the
column has a specific meaning in
<a href="https://github.com/stathissideris/ditaa">ditaa</a>, the cool
mini-language used to generate the figure, and these special characters
<a href="https://github.com/stathissideris/ditaa/issues/9">can’t easily be
escaped</a>. But I just
learnt that I was 1 character away of calling students a
<a href="https://pc.net/emoticons/categories/characters"><code class="language-plaintext highlighter-rouge">8-E</code></a>.</p>Laurent GattoUne fois n’est pas coutume, un billet en français, pour attirer votre attention sur les podcasts du LLL. Le LLL, ou Louvain Learning Lab accompagne tous les acteurs et actrices de la formation de l’UCLouvain dans leurs activités d’enseignement. En plus, c’est une équipe super chouette!CBIO’s internal communication goes open source2024-02-04T00:00:00+01:002024-02-04T00:00:00+01:00https://lgatto.github.io/zulip<p>I have been using slack for a rather long time as a discussion
platform for the lab, even before moving to Belgium. Based on the
access log, I set up the lab slack workspace in June 2016. It has
proven very useful, even with the more recent limitations that come
with the free slack plan. I have been thinking for some time that I
should move to an open source offering. Based on a little bit of
reading
(<a href="https://blog.ossph.org/best-open-source-alternatives-to-slack/">here</a>
and <a href="https://opensource.com/alternatives/slack">here</a>) and asking on
<a href="https://fosstodon.org/@lgatto/111840609295851868">social media</a>,
there were a couple of contenders:
<a href="https://mattermost.com/">Mattermost</a>,
<a href="https://element.io/">element.io</a>,
<a href="https://www.rocket.chat/">rocket.chat</a> and
<a href="https://zulip.com/">Zulip</a>.</p>
<p>My needs are pretty simple: I would like to have access to all
messages, and apps for various desktop and mobile OSes. Cost is also
an issue - with say 15 users at around 7$ per month, it would cost the
lab over 1200$ per year. While not a big amount per se for an academic
lab, given the restrictions of what can be paid for from academic
funding (salary only or consumable, for example), that money is money
that couldn’t be used to support students with their travelling to
conferences. I thus considered self-hosting, but it wasn’t clear I
could get the support from IT, a public facing IP, with the necessary
port accessible to the world.</p>
<p>Then I discovered that Zulip offers their Cloud Standard plan for
<a href="https://zulip.com/for/research/">free for academic research</a>, which
is what I finally went for with <code class="language-plaintext highlighter-rouge">cbio.zulipchat.com</code>. They ask to
acknowledge their support on the <a href="https://lgatto.github.io/cbio-lab/">lab
webpage</a> (see at the bottom),
which is a fair request. The sponsorship was accepted on the day, and
a couple of days later, their support
<a href="https://zulip.com/help/import-from-slack">imported</a><sup id="fnref:fn" role="doc-noteref"><a href="#fn:fn" class="footnote" rel="footnote">1</a></sup> all the
slack messages from our non-private channels.</p>
<p>So far, all works very smoothly, and I’m quite happy. The web
interface and the GNU/Linux and Android apps work very well. I think
I’m also liking the possibility to provide a title to posts/threads -
future will tell if all members make use of this. Another useful
difference with slack (at least the free plan) is there are <a href="https://zulip.com/help/roles-and-permissions">guest
accounts</a> (that are
given access to specific streams/channels when joining), in addition
to regular members (who can join all public streams).</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:fn" role="doc-endnote">
<p>Even if one can only see the last messages with a free slack
plan, all messages from public channels get exported. This, and a
bot user OAuth token is all they need. <a href="#fnref:fn" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Laurent GattoI have been using slack for a rather long time as a discussion platform for the lab, even before moving to Belgium. Based on the access log, I set up the lab slack workspace in June 2016. It has proven very useful, even with the more recent limitations that come with the free slack plan. I have been thinking for some time that I should move to an open source offering. Based on a little bit of reading (here and here) and asking on social media, there were a couple of contenders: Mattermost, element.io, rocket.chat and Zulip.CBIO’s EuroBioc2023 posters and talks2023-10-20T00:00:00+02:002023-10-20T00:00:00+02:00https://lgatto.github.io/EuroBioc-Ghent<p>Long overdue now, but here are the <a href="https://lgatto.github.io/cbio-lab/">CBIO
lab</a>’s contributions to the
<a href="https://eurobioc2023.bioconductor.org/">EuroBioc2023</a> conference that
was organised on Ghent, on the 20 - 22 September 2023.</p>
<h2 id="poster-a-reliable-and-reproducible-resource-for-ct-genes">Poster: A reliable and reproducible resource for CT genes</h2>
<p>Julie Devis, Axelle Loriot, Charles De Smet and Laurent Gatto</p>
<p>Cancer-Testis (CT) genes are tissue-specific genes whose expression is
limited to the germline. They are normally repressed in somatic
tissues, but can be aberrantly activated in tumors. For many CT genes,
tumoral activation is enabled by loss of promoter DNA methylation. CT
genes are of great interest. First, they have clinical potential as
cancer-specific antigens, and can thus be used as target for cancer
immunotherapy and as cancer biomarkers. Second, they are also a good
model to study DNA demethylation in cancer, which is still poorly
understood.</p>
<p>The definition of CT genes differs vastly according to the literature
source. Some databases already exist [1,2,3] but they are neither
up-to-date nor well annotated, and thus difficult to use. There is
therefore a need for a reliable and reproducible resource when
studying CT genes. We therefor created CTexploreR, a package that
rigorously defines and explores CT genes. Our main objective was to
propose a reliable and well-defined list of CT genes based on publicly
available RNAseq databases. We also determined their precise
transcription start site in order to be able to realise an accurate
promoter methylation analysis. Our list contains 307 CT genes that
were carefully classified as regulated by DNA methylation or not. We
also developed functions to visualise CT genes expression and promoter
DNA methylation in normal and tumoral tissues.</p>
<p>We also performed a thorough comparison of CTexploreR with the
available resources mentioned above. It allowed us to clearly
establish and characterise the difference between CT lists and clarify
the origins of the inconsistencies. These analyses demonstrate that
CTexploreR is a clear, curated and rigorously established up-to-date
reference. Our package can thus be used as the starting point for
further investigations.</p>
<p>[1] Almeida, L. G., Sakabe, N. J., deOliveira, A. R., Silva, M. C. C.,
Mundstein, A. S., Cohen, T., Chen, Y.-T., Chua, R., Gurung, S.,
Gnjatic, S., Jungbluth, A. A., Caballero, O. L., Bairoch, A.,
Kiesler, E., White, S. L., Simpson, A. J. G., Old, L. J., Camargo,
A. A., & Vasconcelos, A. T. R. (2009). CTdatabase: a
knowledge-base of high-throughput and curated data on
cancer-testis antigens. Nucleic Acids Research, 37(Database
issue), D816-9.</p>
<p>[2] Jamin, S. P., Hikmet, F., Mathieu, R., Jégou, B., Lindskog, C.,
Chalmel, F., & Primig, M. (2021). Combined RNA/tissue profiling
identifies novel Cancer/testis genes. Molecular Oncology, 15(11),
3003–3023.</p>
<p>[3] Wang, C., Gu, Y., Zhang, K., Xie, K., Zhu, M., Dai, N., Jiang, Y.,
Guo, X., Liu, M., Dai, J., Wu, L., Jin, G., Ma, H., Jiang, T.,
Yin, R., Xia, Y., Liu, L., Wang, S., Shen, B., … Hu,
Z. (2016). Systematic identification of genes with a cancer-testis
expression pattern in 19 cancer types. Nature Communications, 7,
10499.</p>
<h2 id="workshop-cytopipeline-building-and-visualizing-automated-pre-processing-and-quality-control-pipelines-for-flow-cytometry-data">Workshop: CytoPipeline: Building and visualizing automated pre-processing and quality control pipelines for flow cytometry data</h2>
<p><a href="https://github.com/phauchamps/CytoPipeline_BiocWS">Workshop repo</a> and
<a href="https://www.biorxiv.org/content/10.1101/2023.10.10.561699v1">pre-print</a></p>
<p>Philippe Hauchamps, Dan Lin and Laurent Gatto</p>
<p>With the increase of the dimensionality in conventional flow cytometry
data over the past years, there is a growing need to replace or
complement traditional manual analysis (i.e. iterative 2D gating) with
automated data analysis pipelines. Examples of such pipelines have
been documented in the recent literature (e.g. [1],[2],[3]). A crucial
part of these pipelines consists of pre-processing and applying
quality control filtering to the raw data, in order to use high
quality events in the downstream statistical analysis. This part can
in turn be split into a number of elementary steps : margin events
removal, signal compensation, scale transformations, debris and dead
cells removal, batch effect correction,… etc.</p>
<p>However, when designing automated flow cytometry data analysis
pipelines, assembling and assessing the pre-processing part can be
challenging for a number of reasons. First, each of the involved
elementary steps can be implemented using various methods and R
packages. Second, the order of the steps can have an impact on the
downstream analysis results. Finally, each method typically comes with
its specific, unstandardized diagnostic and visualizations, making
objective comparison difficult for the end user.</p>
<p>Here, we present CytoPipeline, an R package suite for building,
assessing and comparing pre-processing pipelines for flow cytometry
data. To exemplify our tool, we present the steps involved in
designing a pre-processing pipeline on a real life dataset and
demonstrate the visualization utilities. We also show how CytoPipeline
can nicely complement benchmarking tools, like e.g. PipeComp [4], by
providing user intuitive insight into benchmarking results.</p>
<p>[1] Quintelier, Katrien, Artuur Couckuyt, Annelies Emmaneel, Joachim
Aerts, Yvan Saeys, and Sofie Van Gassen. 2021. “Analyzing
High-Dimensional Cytometry Data Using FlowSOM.” Nature Protocols
16 (8): 3775–3801.</p>
<p>[2] Ashhurst, Thomas Myles, Felix Marsh-Wakefield, Givanna Haryono
Putri, Alanna Gabrielle Spiteri, Diana Shinko, Mark Norman Read,
Adrian Lloyd Smith, and Nicholas Jonathan Cole
King. 2021. “Integration, Exploration, and Analysis of
High-Dimensional Single-Cell Cytometry Data Using Spectre.”
Cytometry. Part A: The Journal of the International Society for
Analytical Cytology, no. cyto.a.24350
(April). https://doi.org/10.1002/cyto.a.24350.</p>
<p>[3] Nowicka, Malgorzata, Carsten Krieg, Helena L. Crowell, Lukas
M. Weber, Felix J. Hartmann, Silvia Guglietta, Burkhard Becher,
Mitchell P. Levesque, and Mark D. Robinson. 2017. “CyTOF Workflow:
Differential Discovery in High-Throughput High-Dimensional
Cytometry Datasets.” F1000Research 6 (May): 748.</p>
<p>[4] Germain, Pierre-Luc, Anthony Sonrel, and Mark
D. Robinson. 2020. “pipeComp, a General Framework for the
Evaluation of Computational Pipelines, Reveals Performant Single
Cell RNA-Seq Preprocessing Tools.” Genome Biology 21 (1): 227.</p>
<h2 id="poster-fmsne---fast-multi-scale-neighbour-embedding-in-r">Poster: <code class="language-plaintext highlighter-rouge">fmsne</code> - fast multi-scale neighbour embedding in R</h2>
<p>Laurent Gatto and Cyril de Bodt</p>
<p>Dimensionality reduction (DR) has been a workhorse of large scale,
multivariate omics data analysis from the early days. Since the advent
of single-cell RNA sequencing, non-linear approaches have taken the
front stage, with t-distributed stochastic neighbour embedding (t-SNE)
[1,2] being one of, if not the main player. Packages such as <code class="language-plaintext highlighter-rouge">Rtsne</code>
[3] and <code class="language-plaintext highlighter-rouge">scater</code> [4] have made it easy to apply t-SNE in
R/Bioconductor workflows.</p>
<p>One sticking point with t-SNE is the single perplexity parameter, that
controls the number of nearest high-dimensional (HD) neighbours that
are taken into account when constructing the low-dimensional (LD)
embedding: small (resp. large) values only enable preserving small
(resp. large) neighbourhoods from HD to LD during DR, impairing the
reproduction of large (resp. small) neighbourhoods. It is thus a key
parameter, especially if the LD embedding is used for interpretation,
which is often the case in omics-based applications.</p>
<p>Multi-scale neighbour embedding [5] is an extension to single-scale
approaches such as t-SNE, that exempt users from having to set a
single perplexity (scale) arbitrarily. Multi-scale approaches maximise
the LD embedding quality at all scales, preserving both local and
global HD neighbourhoods [6]. They have been shown to better capture
the structure of data and to significantly improve DR quality [7].</p>
<p>Here, we present <code class="language-plaintext highlighter-rouge">fmsne</code> (https://github.com/lgatto/fmsne), an R
package that relies on the <code class="language-plaintext highlighter-rouge">basiliks</code> package [8] to provide
Bioconductor-friendly interface to fast multi-scale methods
implemented in python. <code class="language-plaintext highlighter-rouge">fmsne</code> implements fast multi-scale functions
such as <code class="language-plaintext highlighter-rouge">runFMSTSNE()</code> and <code class="language-plaintext highlighter-rouge">plotFMSTSNE()</code>, based on scater’s
<code class="language-plaintext highlighter-rouge">scater::run*()</code> and <code class="language-plaintext highlighter-rouge">scater::plot*()</code> interface [4]. It also exposes
the <code class="language-plaintext highlighter-rouge">drQuality()</code> function to assess DR quality using rank-based
criteria [7]. Finally, we illustrate fast multi-scale methods on
various single-cell datasets.</p>
<p>[1] van der Maaten, L., & Hinton, G. (2008). Visualizing data using
t-SNE. <em>Journal of Machine Learning Research</em>, 9(Nov), 2579-2605.</p>
<p>[2] van der Maaten, L. (2014). Accelerating t-SNE using tree-based
algorithms. <em>Journal of Machine Learning Research</em>, 15(1),
3221-3245.</p>
<p>[3] Jesse H. Krijthe (2015). Rtsne: T-Distributed Stochastic Neighbor
Embedding using a Barnes-Hut Implementation, URL:
https://github.com/jkrijthe/Rtsne</p>
<p>[4] McCarthy DJ, Campbell KR, Lun ATL, Willis QF (2017). Scater:
pre-processing, quality control, normalisation and visualisation
of single-cell RNA-seq data in R. <em>Bioinformatics</em>, 33,
1179-1186. doi:10.1093/bioinformatics/btw777</p>
<p>[5] C. de Bodt, D. Mulders, M. Verleysen and J. A. Lee, “Fast
Multiscale Neighbor Embedding,” in <em>IEEE Transactions on Neural
Networks and Learning Systems</em>, 2020, doi:
10.1109/TNNLS.2020.3042807.</p>
<p>[6] Lee, J. A., Peluffo-Ordóñez, D. H., & Verleysen,
M. (2015). Multi-scale similarities in stochastic neighbour
embedding: Reducing dimensionality while preserving both local and
global structure. <em>Neurocomputing</em>, 169, 246-261.</p>
<p>[7] Lee, J. A., & Verleysen, M. (2009). Quality assessment of
dimensionality reduction: Rank-based criteria. <em>Neurocomputing</em>,
72(7-9), 1431-1443.</p>
<p>[8] Lun ATL (2022). basilisk: a Bioconductor package for managing
Python environments. <em>Journal of Open Source Software</em>,
7, 4742. doi:10.21105/joss.04742.</p>
<h2 id="talk-linear-models-for-single-cell-proteomics">Talk: Linear models for single-cell proteomics</h2>
<p>Christophe Vanderaa and Laurent Gatto</p>
<p>Mass spectrometry (MS)-based single-cell proteomics (SCP) has become a
credible player in the single-cell biology arena [1,2]. Continuous
technical improvements have pushed the boundaries of sensitivity and
throughput. However, the computational efforts to support the analysis
of these complex data have been missing. Strong batch effects coupled
to high proportions of missing values complicate the analysis, causing
strong entanglement between biological and technical variability
[3,4].</p>
<p>We propose a simple, yet powerful approach to address this need:
linear models. We use linear regression to model and remove undesired
technical factors while retaining the biological variability, even in
the presence of high proportions of missing values. The key advantage
of linear models lies in the interpretability of the results they
generate. Inspired by previous research [5], we streamlined modelling
and exploration of the patterns induced by known technical and
biological factors. The exploration enables a thorough assessment of
the model coefficients, and highlights key factors influencing
SCP experiments. Further exploration of the unmodelled variance
recovers unknown but biologically relevant patterns in the data,
leveraging the power of single-cell proteomics technologies. We
successfully applied our approach to a diverse collection of SCP
datasets [6], and could demonstrate that it is also amenable for
integrating datasets acquired using different technologies.</p>
<p>We implemented and documented this approach in our Bioconductor
package scp [7]. In summary, our approach represents a turning point
for principled SCP data analysis, moving the tension point from how to
perform the analysis to result generation and interpretation.</p>
<p>[1] “Single-Cell Proteomics: Challenges and Prospects.” 2023. Nature
Methods 20 (3): 317–18.</p>
<p>[2] Bennett HM, Stephenson W, Rose CM, and Darmanis S. 2023.
“Single-Cell Proteomics Enabled by next-Generation Sequencing or
Mass Spectrometry.” Nature Methods, March.</p>
<p>[3] Vanderaa C, and Gatto L. 2021. “Replication of Single-Cell
Proteomics Data Reveals Important Computational Challenges.”
Expert Review of Proteomics, October, 1–9.</p>
<p>[4] Vanderaa C, and Gatto L. 2023. “The Current State of Single-Cell
Proteomics Data Analysis.” Current Protocols 3 (1): e658.</p>
<p>[5] Thiel M, Féraud B, Govaerts B. 2017. “ASCA+ and APCA+: Extensions
of ASCA and APCA in the Analysis of Unbalanced Multifactorial
Designs.” Journal of Chemometrics 31 (6): e2895.</p>
<p>[6] Vanderaa C, and Gatto L.. scpdata: Single-Cell Proteomics Data
Package. R package verison 1.6.0,
<a href="https://bioconductor.org/packages/release/data/experiment/html/scpdata.html">https://bioconductor.org/packages/release/data/experiment/html/scpdata.html</a>.</p>
<p>[7] Vanderaa C, and Gatto L.. scp: Mass Spectrometry-Based Single-Cell
Proteomics Data Analysis. R package version 1.8.0,
<a href="https://bioconductor.org/packages/release/bioc/html/scp.html">https://bioconductor.org/packages/release/bioc/html/scp.html</a>.</p>
<h2 id="workshop-spectra----an-expandable-infrastructure-to-handle-mass-spectrometry-data">Workshop: <code class="language-plaintext highlighter-rouge">Spectra</code> - an expandable infrastructure to handle mass spectrometry data</h2>
<p>Johannes Rainer, Sebastian Gibb, Laurent Gatto</p>
<p>Mass spectrometry (MS) data is a key technology in modern metabolomics and
proteomics experiments. Continuous improvements in MS instrumentation, larger
experiments and new technological developments lead to ever growing data sizes
and increased number of available variables making <em>standard</em> in-memory data
handling and processing difficult.</p>
<p>The <code class="language-plaintext highlighter-rouge">Spectra</code> package provides a modern infrastructure for MS data handling
specifically designed to enable extension to additional data resources or
alternative data representations. These can be realized by extending the virtual
<code class="language-plaintext highlighter-rouge">MsBackend</code> class and its related methods. Implementations of such <code class="language-plaintext highlighter-rouge">MsBackend</code>
classes can be tailored for specific needs, such as low memory footprint, fast
processing, remote data access, or also support for specific additional data
types or variables. Importantly, data processing of <code class="language-plaintext highlighter-rouge">Spectra</code> objects is
independent of the backend in use due to a <em>lazy evaluation</em> mechanism that
caches data manipulations internally.</p>
<p>This workshop discusses different available data representations for MS data
along with their properties, advantages and performances. In addition,
<code class="language-plaintext highlighter-rouge">Spectra</code>’s concept of lazy evaluation for data manipulations is presented, as
well as a simple caching mechanism for data modifications. Finally, it explains
how new <code class="language-plaintext highlighter-rouge">MsBackend</code> instances can be implemented and tested to ensure
compliance.</p>
<h2 id="workshop-the-r-for-mass-spectrometry-initiative---from-raw-data-to-identifications-and-quantitative-proteomics-data-analysis">Workshop: The R for Mass Spectrometry initiative - from raw data to identifications and quantitative proteomics data analysis</h2>
<p>Laurent Gatto, Sebastien Gibb and Johannes Rainer</p>
<p>The aim of the RforMassSpectrometry initiative
(https://www.rformassspectrometry.org/) is to provide efficient,
thoroughly documented, tested and flexible R software for the analysis
and interpretation of high throughput mass spectrometry assays. In
this software demo, we will demonstrate three software packages that
are central for proteomics data analysis.</p>
<ul>
<li>
<p>The Spectra package [1], that defines an efficient infrastructure
for storing and handling mass spectrometry spectra and functionality
to subset, process, visualise and compare spectra data.</p>
</li>
<li>
<p>The PSMatch package [2] allows to load, process and analyse
peptide-spectrum matches, and can, among others, explore and
deconvolute the peptide-protein (group) relations using adjacency
matrices and connected components.</p>
</li>
<li>
<p>The QFeatures package [3] provides infrastructure to management and
process quantitative features for high-throughput mass spectrometry
assay, in particular so across assay levels (such as precursors,
peptide spectrum matches, peptides and proteins or protein groups)
in a coherent and tractable format.</p>
</li>
</ul>
<p>We will conclude by illustrating how the MsExperiment package [5] can
be used to bundle these three types of data together.</p>
<p>[1] Gatto L et al. (2023). <em>Spectra Infrastructure for Mass
Spectrometry Data</em>, R package version
1.9.15. <a href="https://rformassspectrometry.github.io/Spectra/">https://rformassspectrometry.github.io/Spectra/</a>.</p>
<p>[2] Gatto L, Rainer J, Gibb S (2023). <em>PSMatch: Handling and Managing
Peptide Spectrum Matches</em>. R package version 1.3.3,
<a href="https://github.com/RforMassSpectrometry/PSM">https://github.com/RforMassSpectrometry/PSM</a>.</p>
<p>[3] Gatto L, Vanderaa C (2023). <em>QFeatures: Quantitative features for
mass spectrometry data</em>. R package version 1.9.3,
<a href="https://github.com/RforMassSpectrometry/QFeatures">https://github.com/RforMassSpectrometry/QFeatures</a>.</p>
<p>[4] Gatto L, Rainer J, Gibb S (2022). <em>MsExperiment: Infrastructure
for Mass Spectrometry Experiments</em>. R package version 1.0.0,
<a href="https://github.com/RforMassSpectrometry/MsExperiment">https://github.com/RforMassSpectrometry/MsExperiment</a>.</p>
<h2 id="poster-a-mixed-cell-control-design-to-assess-data-processing-in-single-cell-proteomics">Poster: A mixed-cell control design to assess data processing in single-cell proteomics</h2>
<p>Samuel Grégoire, Sébastien Pyr dit Ruys, Christophe Vanderaa, Didier Vertommen and Laurent Gatto</p>
<p>Single-cell proteomics (SCP) aims at studying cellular heterogeneity
by focusing on the functional effectors of the cells, proteins. While
this is essential to identify cells undergoing subtle processes and
point out underlying relevant protein and proteoform abundance
patterns, assessing protein content inside a single cell is
challenging.</p>
<p>Thanks to recent breakthroughs in mass spectrometry and sample
processing, it has become possible to increase the depth of proteome
covered, reduce the time needed to analyse a cell and make this
technology more accessible [1].</p>
<p>However, extracting meaningful biological information from this type
of data requires robust and suitable data analysis methods. Progress
in this field is tempered by the lack of standardised
workflows. Currently, data analysis workflows are custom made and
substantially different from one research team to another
[2]. Moreover, it is difficult to evaluate specific steps or entire
pipelines as ground truths are missing. In an effort to bridge the gap
towards the standardisation of SCP data analysis, our team has
developed the scp package [3] relying on the QFeatures and
SingleCellExperiment infrastructures to provide a standardised
framework for SCP data analysis. In addition, we produced our own SCP
datasets to constitute a basis for data analysis benchmarking. To this
end, we used a design containing cell lines mixed in known proportions
to generate controlled variability [4].</p>
<p>In this work, we used the scp package to test different combinations
of data processing steps and evaluated them using our ground truth
data. We illustrate how we benefited from this modular, standardised
framework and highlight some crucial steps.</p>
<p>[1] Slavov, Nikolai. Scaling Up Single-Cell Proteomics. Molecular &
Cellular Proteomics 21, no 1 (2022): 100179.
https://doi.org/10.1016/j.mcpro.2021.100179.</p>
<p>[2] Vanderaa, Christophe, and Laurent Gatto. 2023. The Current State
of Single-Cell Proteomics Data Analysis. Current Protocols 3 (1):
e658. https://doi.org/10.1002/cpz1.658</p>
<p>[3] Vanderaa Christophe and Laurent Gatto. Replication of Single-Cell
Proteomics Data Reveals Important Computational Challenges.
Expert Review of Proteomics, 1–9
(2021). https://doi.org/10.1080/14789450.2021.1988571</p>
<p>[4] Tian, L., Dong, X., Freytag, S. et al. Benchmarking single cell
RNA-sequencing analysis pipelines using mixture control
experiments. Nat Methods 16, 479–487
(2019). https://doi.org/10.1038/s41592-019-0425</p>Laurent GattoLong overdue now, but here are the CBIO lab’s contributions to the EuroBioc2023 conference that was organised on Ghent, on the 20 - 22 September 2023.One-minute introduction2023-07-22T00:00:00+02:002023-07-22T00:00:00+02:00https://lgatto.github.io/one-minute-intro<p>Every now and then, there’s a situation where I need to briefly
introduce myself, academically. This can be with or without any visual
support (i.e a slide). Instead of doing so semi-randomly, I though I
would prepare such a one-minute introduction, once an for all. So here
I go, with a nice illustrative <a href="https://docs.google.com/presentation/d/1R3W63-TdOJcjSs8p_QnTQ7iC7IsJnuwOe1EBb7thfTE/edit?usp=sharing">heck-sticker
slide</a>
and plenty of links.</p>
<p><img src="/images/one-minute-slide.png" alt="One-minute intro slide" /></p>
<ul>
<li>
<p>I am <a href="http://lgatto.github.io/about">Laurent Gatto</a>, professor of
bioinformatics at the <a href="https://uclouvain.be/">UCLouvain</a>. I teach
(more of my courses) at the <a href="https://uclouvain.be/fr/facultes/fasb">Faculty of pharmacy and biomedical
sciences</a> and run a
computation biology lab at the <a href="https://www.deduveinstitute.be/">de Duve
institute</a>.</p>
</li>
<li>
<p>The <a href="https://lgatto.github.io/cbio-lab/">lab</a>’s research focuses on
developing statistical and machine learning methods to process,
explore and comprehend high-dimensional biological data, such as
typically produces by omics technologies. Working on the university
biomedical campus, we deploy our work on clinically and biomedically
relevant research projects, in collaboration with other laboratories
on the campus.</p>
</li>
<li>
<p>I am committed to the open, transparent and rigorous practice of
scientific enquiry. In particular, we make every possible effort to
make our <a href="http://lgatto.github.io/rr-what-should-be-our-goals/">research repeatable, reproducible and
replicable</a>.</p>
</li>
<li>
<p>The development and publication of scientific software (see
<a href="https://github.com/UCLouvain-CBIO/">here</a>,
<a href="https://github.com/lgatto">here</a> and
<a href="https://github.com/RforMassSpectrometry/">here</a>) is an integral
part of my work and is reflected by my contributions to the
<a href="http://www.bioconductor.org/">Bioconductor</a> project. Some specific
examples include spatial proteomics data analysis with
<a href="https://lgatto.github.io/pRoloc/">pRoloc</a>, single-cell proteomics
with <a href="https://uclouvain-cbio.github.io/scp/">scp</a> and mass
spectrometry data processing with the <a href="https://www.rformassspectrometry.org/">R for Mass
Spectrometry</a> packages.</p>
</li>
<li>
<p>I also serve on the Bioconductor <a href="https://bioconductor.org/about/technical-advisory-board/">technical advisory
board</a>,
<a href="https://bioconductor.org/about/european-bioconductor-society/">European Bioconductor
Society</a>,
<a href="https://bioconductor.org/help/education-training/">education and teaching
committee</a> and
the <a href="https://bioconductor.org/about/code-of-conduct/">Code of conduct
committee</a>, and
co-organise the yearly European <a href="https://eurobioc2023.bioconductor.org/">EuroBioc
conference</a>.</p>
</li>
</ul>Laurent GattoEvery now and then, there’s a situation where I need to briefly introduce myself, academically. This can be with or without any visual support (i.e a slide). Instead of doing so semi-randomly, I though I would prepare such a one-minute introduction, once an for all. So here I go, with a nice illustrative heck-sticker slide and plenty of links.Open position in the CBIO lab: single-cell proteomics2023-02-27T00:00:00+01:002023-02-27T00:00:00+01:00https://lgatto.github.io/scp-job-2023<p>A fully-funded position for a PhD student (4 years) or a post-doctoral
researcher (2 - 3 years, depending on experience) is open in the
Computational Biology and Bioinformatics lab (CBIO) at the de Duve
Institute, UCLouvain in Brussels, Belgium.</p>
<p>The <a href="https://lgatto.github.io/cbio-lab">CBIO lab</a> has developed a
leading expertise in mass spectrometry-based <a href="https://paperpile.com/shared/paTK2y">single-cell proteomics
data processing, analysis and
interpretation</a> and is looking
for a researcher to contribute to this research theme. The position
will focus on computational and statistical approaches, including
research software development, integration with other omics
modalities, and could, depending on the candidate interests and
experience, also include a small wet lab component.</p>
<p>The successful candidate will have a degree in bioinformatics,
statistics, computer sciences, biomedical sciences, bio-engineering,
or equivalent and be able to demonstrate experience and/or keen
interest in one or several of the following:</p>
<ul>
<li>experience in one or multiple omics experimental technologies, data
processing, analysis and/or interpretation - experience in mass
spectrometry and proteomics is an advantage;</li>
<li>a keen interest in understanding and tackling biomedically relevant
questions;</li>
<li>interest in robust method development;</li>
<li>background/expertise in statistics, machine learning and/or
artificial intelligence;</li>
<li>open and reproducible research (e.g. Rmarkdown, Jupyter notebooks,
Github, version control, …);</li>
<li>research software development (e.g. unit testing, version control,
continuous integration, …);</li>
<li>experience in R/Bioconductor data structures, in particular those
used for omics data analysis (e.g. SummarizedExperiment,
SingleCellExeriment, QFeatures, …);</li>
<li>contribution to an open and inclusive research environment;</li>
<li>good written and oral communication skills.</li>
</ul>
<p>The project is funded by the Fonds National de la Recherche
Scientifique (FNRS). The position is open immediately until filled.</p>
<h2 id="about-the-cbio-lab">About the CBIO lab</h2>
<p>The Computational Biology and Bioinformatics lab is headed by Prof
<a href="https://lgatto.github.io/about">Laurent Gatto</a> and is composed of
students and researchers with expertise in biomedical sciences,
statistics, omics data analysis and bioinformatics. Details about our
work and the members can be found at
<a href="https://lgatto.github.io/cbio-lab/">https://lgatto.github.io/cbio-lab/</a>.</p>
<p>The lab supports a friendly and supportive work environment through
flexible working hours and the possibility to work remotely. The lab
meetings are typically scheduled in-person (or mixed remote/in-person)
to favour interactions among lab members. While joining the lab and
contribution to our research, you will also have the opportunity to get
involved in the international Bioconductor community. The lab members
have opportunities to join courses and conferences in Belgium and
abroad to present their work.</p>
<h2 id="academic-environment">Academic environment</h2>
<p>The <a href="https://www.deduveinstitute.be/">de Duve Institute</a> is a
multidisciplinary biomedical research institute (250-300 scientists)
hosting several laboratories the UCLouvain, as well as the Brussels
branch of the Ludwig Institute for Cancer Research. The focus is on
basic research in the fields of tumour immunology and signal
transduction in cancer, genetics and development, including human
genetics, stem cells and organ development, infection and inflammation
and metabolism and hormones. The de Duve Institute also features
several core facilities, including imaging, transgenesis, mass
spectrometry, flow cytometry and cell sorting, as well as
genomics. The de Duve Institute provides access to a high-performance
computing cluster and storage, managed by a dedicated IT team.</p>
<h2 id="application">Application</h2>
<p>To apply for the position, please send the following documents to
Laurent Gatto:</p>
<ul>
<li>A cover letter describing why you would like to join the lab and how
you match the requirement;</li>
<li>A detailed CV including, depending on seniority, some or all of the
following: a list of publications, pre-prints, posters and/or
software contributions; a link to publicly available code/data
analysis you have contributed to; education and professional
experience; current and past positions; awards; obtained funding;
teaching experience; any additional information you deem relevant.</li>
<li>If no code is publicly available, please include code chunks that
illustrate best your programming experience and interests.</li>
<li>Among your publications/software/projects, select up to 3 and
provide a short narrative describing why these are important in your
career, your specific contributions and/or the unique skills you
have gained through these.</li>
<li>A list of at least 2 (for a PhD application) or 3 (for a
post-doctoral researcher application) academic references.</li>
</ul>
<p>Expectations are different for PhD or PDRA application and will be
assessed accordingly.</p>
<h2 id="salary">Salary</h2>
<p>A PhD-student salary is around 2000 euros in Belgium. Other positions,
such as research assistants (assumes a MS degree) or post-doctoral
associates (assumes a PhD degree), the gross salary will depend on
seniority (year after acquiring the degree) and the net salary will
depend on family circumstances.</p>
<p>For more details about this position, feel free to contact Laurent
Gatto at <code class="language-plaintext highlighter-rouge">laurent<DOT>gatto<AT>uclouvain.be</code>.</p>
<p>You might also be interested in <a href="https://lgatto.github.io/spatprot-job-2023/">this
position</a>.</p>
<p>Please do share this position with your students and colleagues
(<a href="/images/2023-job-scp.pdf">pdf</a>).</p>Laurent GattoA fully-funded position for a PhD student (4 years) or a post-doctoral researcher (2 - 3 years, depending on experience) is open in the Computational Biology and Bioinformatics lab (CBIO) at the de Duve Institute, UCLouvain in Brussels, Belgium.Open position in the CBIO lab: spatial proteomics and PTMs2023-02-27T00:00:00+01:002023-02-27T00:00:00+01:00https://lgatto.github.io/spatprot-job-2023<p>A position for a PhD student (4 years) or a research assistant (1 year
contracts, renewable) is open in the CBIO lab focusing on protein
sub-cellular localisation (spatial) proteomics, post-translational
modification and machine learning.</p>
<p>The goal of the project is to apply and extend existing spatial
proteomics methods [<a href="https://pubmed.ncbi.nlm.nih.gov/24413670/">1</a>,
<a href="https://journals.plos.org/ploscompbiol/article?id=10.1371/journal.pcbi.1006516">2</a>,
<a href="https://www.nature.com/articles/s41467-022-33570-9">3</a>] and software
infrastructure (such as the
<a href="https://bioconductor.org/packages/pRoloc">pRoloc</a> package) to focus
more specifically on the detection and the effect of
post-translational modifications in protein and proteoform
sub-cellular localisation.</p>
<p>This position is part of the <a href="https://www.fwo.be/en/">FWO</a>-funded
Protein Contours project, in collaboration with <a href="https://we.vub.ac.be/en/wim-vranken">Win
Vranken</a> from the VUB and
<a href="https://www.compomics.com/people/lennart-martens/">Lennart Martens</a>
from the VIB and Ghent University.</p>
<p>The successful candidate will have a degree in bioinformatics,
statistics, computer sciences, biomedical sciences, or equivalent and
be able to demonstrate experience and/or keen interest in at least
several of the following:</p>
<ul>
<li>experience in one or multiple omics experimental technologies, data
processing, analysis and/or interpretation - experience in mass
spectrometry and proteomics is an advantage;</li>
<li>a keen interest in understanding and tackling biomedically relevant
questions;</li>
<li>interest in robust method development;</li>
<li>background/expertise in statistics, machine learning and/or
artificial intelligence;</li>
<li>open and reproducible research (e.g. Rmarkdown, Jupyter notebooks,
Github, version control, …);</li>
<li>experience in R/Bioconductor data structures, in particular those
used for omics data analysis (e.g. SummarizedExperiment,
SingleCellExeriment, QFeatures, …);</li>
<li>contribution to an open and inclusive research environment;</li>
<li>good written and oral communication skills.</li>
</ul>
<h2 id="about-the-cbio-lab">About the CBIO lab</h2>
<p>The Computational Biology and Bioinformatics lab is headed by Prof
<a href="https://lgatto.github.io/about">Laurent Gatto</a> and is composed of
students and researchers with expertise in biomedical sciences,
statistics, omics data analysis and bioinformatics. Details about our
work and the members can be found at
<a href="https://lgatto.github.io/cbio-lab/">https://lgatto.github.io/cbio-lab/</a>.</p>
<p>The lab supports a friendly and supportive work environment through
flexible working hours and the possibility to work remotely. The lab
meetings are typically scheduled in-person (or mixed remote/in-person)
to favour interactions among lab members. While joining the lab and
contribution to our research, you will also have the opportunity to get
involved in the international Bioconductor community. The lab members
have opportunities to join courses and conferences in Belgium and
abroad to present their work.</p>
<h2 id="academic-environment">Academic environment</h2>
<p>The <a href="https://www.deduveinstitute.be/">de Duve Institute</a> is a
multidisciplinary biomedical research institute (250-300 scientists)
hosting several laboratories the UCLouvain, as well as the Brussels
branch of the Ludwig Institute for Cancer Research. The focus is on
basic research in the fields of tumour immunology and signal
transduction in cancer, genetics and development, including human
genetics, stem cells and organ development, infection and inflammation
and metabolism and hormones. The de Duve Institute also features
several core facilities, including imaging, transgenesis, mass
spectrometry, flow cytometry and cell sorting, as well as
genomics. The de Duve Institute provides access to a high-performance
computing cluster and storage, managed by a dedicated IT team.</p>
<h2 id="application">Application</h2>
<p>To apply for the position, please send the following documents to
Laurent Gatto:</p>
<ul>
<li>A cover letter describing why you would like to join the lab and how
you match the requirement;</li>
<li>A detailed CV including some or all of the following: a list of
publications, pre-prints, posters and/or software contributions; a
link to publicly available code/data analysis you have contributed
to; education and professional experience; awards; teaching
experience; any additional information you deem relevant.</li>
<li>If no code is publicly available, please include code chunks that
illustrate best your programming experience and interests.</li>
<li>Among your projects, select 1 and provide a short narrative
describing why these are important in your career, your specific
contributions and/or the unique skills you have gained through
these.</li>
<li>A list of at least 2 academic references.</li>
</ul>
<h2 id="salary">Salary</h2>
<p>A PhD-student salary is around 2000 euros in Belgium. Other positions,
such as research assistants (assumes a MS degree) or post-doctoral
associates (assumes a PhD degree), the gross salary will depend on
seniority (year after acquiring the degree) and the net salary will
depend on family circumstances.</p>
<p>For more details about this position, feel free to contact Laurent
Gatto at <code class="language-plaintext highlighter-rouge">laurent<DOT>gatto<AT>uclouvain.be</code>.</p>
<p>You might also be interested in <a href="https://lgatto.github.io/scp-job-2023/">this
position</a>.</p>
<p>Please do share this position with your students and colleagues
(<a href="/images/2023-job-spatprot.pdf">pdf</a>).</p>Laurent GattoA position for a PhD student (4 years) or a research assistant (1 year contracts, renewable) is open in the CBIO lab focusing on protein sub-cellular localisation (spatial) proteomics, post-translational modification and machine learning.Installing and running MaxQuant on GNU/Linux2022-12-16T00:00:00+01:002022-12-16T00:00:00+01:00https://lgatto.github.io/maxquant-linux<p>After a first analysis of 20 plasma samples, we wanted to compare the
results of a single acquisition against a fractionation of the same
samples over 7 fractions, tallying a total of 140 raw files. This
required a more powerful computer… running GNU/Linux.</p>
<h2 id="installation">Installation</h2>
<p>My first reference was the following
<a href="https://www.youtube.com/watch?v=KHdvO1M85VM">video</a> by Pavel
Sinitcyn, a post-doc in the MaxQuant team. There’s no actual
installation of MaxQuant - the binaries simply need to be
<a href="http://www.coxdocs.org/doku.php?id=maxquant:common:download_and_installation#download_and_installation_guide">downloaded</a>
and unzipped. The challenge is however to install the infrastructure
to run MaxQuant’s C# code on Linux.</p>
<p>The instructions in the video suggested to install <code class="language-plaintext highlighter-rouge">dotnet</code>, as
documented
<a href="https://learn.microsoft.com/en-us/dotnet/core/install/linux-ubuntu#">here</a>. This
didn’t work under Ubuntu 22.04 LTS.</p>
<p>I am not sure of the reason… I had to install <code class="language-plaintext highlighter-rouge">dotnet</code> 3.1 (even
though 2.1 was mentioned in the video above), an older version than
the one available by default on Ubuntu 22.04 LTS. <code class="language-plaintext highlighter-rouge">dotnet</code> version 3.1
itself depended of an older version of <code class="language-plaintext highlighter-rouge">libssl</code>, namely <code class="language-plaintext highlighter-rouge">libss1</code>, that
wasn’t available on ubuntu (again, <code class="language-plaintext highlighter-rouge">libssl3</code> was the default
version). I had to download and compile <code class="language-plaintext highlighter-rouge">libss1</code> by hand, which seemed
to be successful, but ended up with a crashing MaxQuant.</p>
<p>Next, I followed <a href="https://bioinformatics.stackexchange.com/a/13901">these
instructions</a>, that
suggested to use <code class="language-plaintext highlighter-rouge">mono</code>, and installing it via <code class="language-plaintext highlighter-rouge">conda</code>. This proved to
be successful. I will be using that setup in the following sections.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code># create the environment
conda create -n maxquant -c conda-forge mono
# activate the environment
conda activate maxquant
# run any maxquant version
mono /path/to/maxquant/bin/MaxQuantCmd.exe mqpar.xml
</code></pre></div></div>
<p>It is worth noting that there’s also a <a href="https://github.com/nickdelgrosso/DockerizeMaxQuant">MaxQuant docker
container</a>, but
for an older version of MQ, 1.6.5.0, while the current one (at the
time of writing) is 2.2.0.0.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>docker run -it nickdg/maxquant
</code></pre></div></div>
<p>I couldn’t find a up-to-date MaxQuant container elsewhere.</p>
<h2 id="running-and-debugging">Running and debugging</h2>
<p>Once installed, running MaxQuant is only a matter of passing it a
parameter file.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>mono /path/to/maxquant/bin/MaxQuantCmd.exe mqpar.xml
</code></pre></div></div>
<p>That parameter can be generated using the GUI on Windows, and adapted
for linux by changing the paths to the data and fasta file folders
using the <code class="language-plaintext highlighter-rouge">--changeFolder</code> argument . This is described in detail in
the instruction video above.</p>
<h2 id="debugging">Debugging</h2>
<p>The run crashed after just under a week’s continuous run. I started
the job remotely and lost the output, so didn’t have any error
messages to try to identify the cause. I knew that the first MS/MS
search was underway and thus resumed the run at that step/job, using
the <code class="language-plaintext highlighter-rouge">-p</code> argument. To find that job index, one can perform a dry run
with the <code class="language-plaintext highlighter-rouge">--dryrun</code> argument. This will list all the steps as defined
in the parameter file and their respective ids.</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>mono /path/to/maxquant/bin/MaxQuantCmd.exe --dryrun mqpar.xml
</code></pre></div></div>
<p>Take note the index <code class="language-plaintext highlighter-rouge">N</code> of the step you want to resume your run with
and run it with:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>mono /path/to/maxquant/bin/MaxQuantCmd.exe -p N mqpar.xml
</code></pre></div></div>
<p>And, as expected, the error happened again, but this time I managed to
capture the output:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Reading search engine results
Preparing reverse hits
Finish search engine results
Filter identifications (MS/MS)
Calculating PEP
Unhandled Exception:
System.Exception: Exception during execution of external process: 1302419 Error: Garbage collector could not allocate 16384u bytes of memory for major heap section.
at QueueingSystem.WorkDispatcher.ProcessSingleRunExternalProcess (System.Int32 taskIndex, System.Int32 threadIndex) [0x0009d] in <48f64397b7dc4fdd807bfd54c44c2941>:0
at QueueingSystem.WorkDispatcher.DoWork (System.Int32 taskIndex, System.Int32 threadIndex) [0x0001e] in <48f64397b7dc4fdd807bfd54c44c2941>:0
at QueueingSystem.WorkDispatcher.Work (System.Object threadIndex) [0x00054] in <48f64397b7dc4fdd807bfd54c44c2941>:0
at System.Threading.ThreadHelper.ThreadStart_Context (System.Object state) [0x00025] in <aa5dff9b31c64fce86559bbbf6cd364f>:0
at System.Threading.ExecutionContext.RunInternal (System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, System.Object state, System.Boolean preserveSyncCtx) [0x00071] in <aa5dff9b31c64fce86559bbbf6cd364f>:0
at System.Threading.ExecutionContext.Run (System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, System.Object state, System.Boolean preserveSyncCtx) [0x00000] in <aa5dff9b31c64fce86559bbbf6cd364f>:0
at System.Threading.ExecutionContext.Run (System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, System.Object state) [0x0002b] in <aa5dff9b31c64fce86559bbbf6cd364f>:0
at System.Threading.ThreadHelper.ThreadStart (System.Object obj) [0x0000f] in <aa5dff9b31c64fce86559bbbf6cd364f>:0
[ERROR] FATAL UNHANDLED EXCEPTION: System.Exception: Exception during execution of external process: 1302419 Error: Garbage collector could not allocate 16384u bytes of memory for major heap section.
at QueueingSystem.WorkDispatcher.ProcessSingleRunExternalProcess (System.Int32 taskIndex, System.Int32 threadIndex) [0x0009d] in <48f64397b7dc4fdd807bfd54c44c2941>:0
at QueueingSystem.WorkDispatcher.DoWork (System.Int32 taskIndex, System.Int32 threadIndex) [0x0001e] in <48f64397b7dc4fdd807bfd54c44c2941>:0
at QueueingSystem.WorkDispatcher.Work (System.Object threadIndex) [0x00054] in <48f64397b7dc4fdd807bfd54c44c2941>:0
at System.Threading.ThreadHelper.ThreadStart_Context (System.Object state) [0x00025] in <aa5dff9b31c64fce86559bbbf6cd364f>:0
at System.Threading.ExecutionContext.RunInternal (System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, System.Object state, System.Boolean preserveSyncCtx) [0x00071] in <aa5dff9b31c64fce86559bbbf6cd364f>:0
at System.Threading.ExecutionContext.Run (System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, System.Object state, System.Boolean preserveSyncCtx) [0x00000] in <aa5dff9b31c64fce86559bbbf6cd364f>:0
at System.Threading.ExecutionContext.Run (System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, System.Object state) [0x0002b] in <aa5dff9b31c64fce86559bbbf6cd364f>:0
at System.Threading.ThreadHelper.ThreadStart (System.Object obj) [0x0000f] in <aa5dff9b31c64fce86559bbbf6cd364f>:0
</code></pre></div></div>
<p>Thanks to Kristina’s help,
<a href="https://gist.github.com/elrubio/4e7797d7d0d9add96ce82f0472f17908?permalink_comment_id=2961278">addressing</a>
the error was simply a matter of increasing the size of the heap with</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>sudo sysctl -w vm.max_map_count=655350
</code></pre></div></div>
<p>I then re-resumed the search with at step 25, corresponding to the
calculation of posterior error probabilites:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>mono /path/to/maxquant/bin/MaxQuantCmd.exe -p 25 mqpar_linux.xml
</code></pre></div></div>
<p>and… after just over a week of total run time, success!</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Calculating PEP
Copying identifications
Applying FDR
Assembling second peptide MS/MS
Combining second peptide files
Second peptide search
Reading search engine results (SP)
Finish search engine results (SP)
Filtering identifications (SP)
Applying FDR (SP)
Re-quantification
Reporter quantification
Retention time alignment
Matching between runs 1
Matching between runs 2
Matching between runs 3
Matching between runs 4
Prepare protein assembly
Assembling proteins
Assembling unidentified peptides
Finish protein assembly
Updating identifications
iBAQ
Label-free preparation
Label-free normalization
Label-free quantification
Label-free collect
Estimating complexity
Prepare writing tables
Writing tables
Finish writing tables
</code></pre></div></div>
<h2 id="acknowledgements">Acknowledgements</h2>
<p>Thank you very much to <a href="https://github.com/KristinaGomoryova">Kristina
Gomoryova</a> for her help in
finding the relevant information, advising during installation and
debugging, and generating the parameter file.</p>Laurent GattoAfter a first analysis of 20 plasma samples, we wanted to compare the results of a single acquisition against a fractionation of the same samples over 7 fractions, tallying a total of 140 raw files. This required a more powerful computer… running GNU/Linux.First ungrading assessment2022-12-15T00:00:00+01:002022-12-15T00:00:00+01:00https://lgatto.github.io/ungrading-assessment<p>Here’s the initial assessment of the new <em>ungrading and feedback</em>
pedagogy that the TAs and myself have implemented this academic year
(2022-2023). In a nutshell, we decided to not mark any of the weekly
test and maximise opportunities for feedback to favour student’s
self-assessment and reflection on their own work
(meta-cognition). Read the <a href="https://lgatto.github.io/ungrading/">full
post</a> to read about the reasons
and opportunities for dropping grading the weekly tests and maximise
feedback with students.</p>
<!--more-->
<aside class="sidebar__right">
<nav class="toc">
<header><h4 class="nav__title"><i class="fa fa-file-text"></i> On This Page</h4></header>
<ul class="toc__menu" id="markdown-toc">
<li><a href="#part-1" id="markdown-toc-part-1">Part 1</a> <ul>
<li><a href="#third-bachelor-course" id="markdown-toc-third-bachelor-course">Third bachelor course</a></li>
<li><a href="#first-masters-course" id="markdown-toc-first-masters-course">First masters course</a></li>
<li><a href="#next-assessments" id="markdown-toc-next-assessments">Next assessments</a></li>
</ul>
</li>
<li><a href="#part-2" id="markdown-toc-part-2">Part 2</a> <ul>
<li><a href="#on-line-questionnaire" id="markdown-toc-on-line-questionnaire">On-line questionnaire</a></li>
</ul>
</li>
</ul>
</nav>
</aside>
<p>This initial assessment is split into several parts, describing
different sources of data used for the assessment. The different parts
have been added over different times. Part 1 was originally added on
15 December 2022 and is based on direct, free-form informal and
semi-formal feedback. Part 2 is based on on-line evaluation forms that
students were invited to fill out and was published on 27 December
2022.</p>
<h2 id="part-1">Part 1</h2>
<h3 id="third-bachelor-course">Third bachelor course</h3>
<p>Let’s start with the third bachelor student’s feedback, collected as
part of the ‘Year committee’, where student representatives meet with
the professors and share the feedback they collected among all
students. This is a cohort that experienced the previous approach,
where each weekly test was graded and students that earned a decent
weighted average would get a dispense for the final exam<sup id="fnref:1" role="doc-noteref"><a href="#fn:1" class="footnote" rel="footnote">1</a></sup>.</p>
<ul>
<li>
<p>While some students did prefer being kept on their toes with
marking, the majority appreciate dropping marking to reduce
stress/pressure during weekly tests.</p>
</li>
<li>
<p>The general pedagogy and the feedback are well received. In
particular, students like the post-it notes for in-class
interactions and feedback, and the possibility to book one-to-one
sessions to get individual feedback.</p>
</li>
<li>
<p>This year, we also have the support of a third TA, which helps a lot
to provide prompt technical help and answer questions without delay
during class.</p>
</li>
</ul>
<p>The feedback we gathered on the post-it notes after the last lecture
and our subjective appreciation (based on the perceived motivation of
students, and the quantity and quality of their questions) is along
the same lines. Some students already mentioned on the post-it notes
that they were looking forward to next year’s (optional) Master’s
course, which is a strong argument in favour of our strategy.</p>
<h3 id="first-masters-course">First masters course</h3>
<p>This year, for the first time, students were asked to self-mark during
the final oral exam. They all provided a fair assessment of their
work, at times even slightly lower that what they really deserved (in
which case we obviously bumped the mark accordingly). I also
systematically asked what they thought they could have improved and if
more time would have helped. Interestingly, the topic of the last
project/presentations, which was a more open-ended and creative task,
was one point that came up repeatedly, with analysis approaches that
they didn’t think of but thought they should have.</p>
<p>We also considered some further updates for next year, to promote
feedback and allow students to act specifically on that feedback. For
the first report, that focuses on a <a href="https://uclouvain-cbio.github.io/WSBIM2122/sec-rnaseq.html">hands-on analysis of RNA-Seq
data</a>,
here are the steps that we plan to implement:</p>
<ol>
<li>After receiving their data, students will give a first short
presentation focusing on introducing their dataset, the
experimental design, the biological question(s) they want to focus
on, and the associated statistical model(s). This is a first
opportunity for feedback and to make sure we catch any
misdirections early on.</li>
<li>A full 4-hour session dedicated to questions and answers on their
data, the corresponding chapters, and how to prepare the report.</li>
<li>A report, written in R markdown and compiled in pdf, detailing the
analyses introduced in the presentation above (point 1).</li>
<li>We will read and annotate the reports, and provide an individual
feedback sheet including a short section with positive points, a
short section with possible improvements and a list of
questions. The questions we will ask during the end-of-term oral
exam (point 6 below) will be among those in this list, so that
students can prepare beforehand and thus address any short-comings
in their respective reports.</li>
<li>Each students will receive another report to read and provide
constructive comments. This will allow them to explore how others
have addressed their project and experience how to critically read
and assess another person’s work (and thus reflect on their own
contribution).</li>
<li>An oral exam to offer the students an opportunity to answer (some
of) the questions we handed them (point 4).</li>
</ol>
<p>As every year so far, I have also asked for students to comment on one
aspect of the course they particularly appreciated and one that they
did less, as well as an assessment of the amount of work they had to
invest. This is interesting as it provides them with an opportunity to
reflect on the whole course and discuss their impressions with us,
which we use to update and improve the course.</p>
<h3 id="next-assessments">Next assessments</h3>
<p>We need to consider whether the positive impact that we seem to
observe in the third bachelor’s course also translates into a better
success rate in the exam. I do have to admit that this makes me
slightly uncomfortable. What if we were to see a negative impact
(which, honestly, I doubt)? It wouldn’t necessarily mean that we did
worse, given that we have applied our new strategy on a single
cohort. This obviously also applied if we get better results - a
better or worse success rate might be the result other confounding
factors. However, this cohort will be the only that has experienced
the change in teaching strategy, and hence probably an ideal situation
for a direct assessment. If I’m being honest, I wouldn’t want to teach
to the test even if exam results were provable worse, for the many
reasons underlying the new <a href="https://lgatto.github.io/ungrading/">ungrading and
feedback</a> strategy.</p>
<p>Next term, we start with a brand new second bachelor cohort. This will
also be an interesting experience, as they will be immediately exposed
to this <a href="https://lgatto.github.io/ungrading/">ungrading and feedback</a>
strategy (with us having a term’s worth experience), and will
experience and adapt<sup id="fnref:2" role="doc-noteref"><a href="#fn:2" class="footnote" rel="footnote">2</a></sup> to it over two years.</p>
<h2 id="part-2">Part 2</h2>
<h3 id="on-line-questionnaire">On-line questionnaire</h3>
<p>This second part of the assessment looks at on-line evaluation forms
that 3rd bachelor students are asked to fill out. There are two types
of forms: one for the teaching unit (i.e the theoretical part of the
course) and a second one for the practical sessions. These evaluation
forms can be requested by the instructor or the faculty (and are
indeed asked for every three years by the latter, if I remember
well). I have three forms available, for two different cohorts.</p>
<ol>
<li>
<p>Evaluations for the teaching unit from 2021. That’s one that I
requested myself. Given that my courses blend theory and practice,
I only requested the teaching unit questions, asking students to
fill out the form considering both aspects. This form was completed
by 27 students (individual questions were answered by 25 to 27
students), corresponding to half of the class.</p>
</li>
<li>
<p>Evaluations for the teaching unit and practicals from 2022. These
were requested by the faculty, hence for both parts. The requests
and links to the forms where received after completion of the
course, and I was only able to inform and remind students through
forum announcements, hence a low participation rate: only 7
students (with one question getting 6 answers), corresponding to
just under 12% of the full cohort. This probably induces some bias,
where only the most committed students, and hence those more likely
to provide positive feedback, participated.</p>
</li>
</ol>
<p>Here are the score for the 2022 teaching unit and practical
sessions. The student answer each question<sup id="fnref:3" role="doc-noteref"><a href="#fn:3" class="footnote" rel="footnote">3</a></sup> by providing a score
from 1 (<em>Don’t agree at all</em>) to 4 (<em>Totally agree</em>).</p>
<p><img src="/images/eval_scores_ue_22.png" alt="2022 evaluations for the teaching unit" />
<img src="/images/eval_scores_tp_22.png" alt="2022 evaluations for the practical sessions" /></p>
<p>These evaluation are quite good, especially when compared
to 2021. Below are the questions for the teaching unit of that year.</p>
<p><img src="/images/eval_scores_ue_21.png" alt="2021 evaluations for the teaching unit" /></p>
<p>I also computed the mean scores and standard deviations per question
to directly compare the 2021 and 2022 results. The figure below show
the 2021 results in the top left panel and the 2022 results in the top
right and bottom left panels. The top right panel is helpful to
compare individual questions and the bottom panel to get a more
general assessment of the 2021 vs 2022 scores. The dotted red line
represents the yearly mean score.</p>
<p><img src="/images/eval_mean_ue_21_22.png" alt="Comparison of mean 2021 and 2022 scores for the teaching unit evaluations" /></p>
<p>The improvement is striking, to say the least. There are however two
points to keep in mind:</p>
<ul>
<li>In 2022, given that the evaluation request came in late, it is
possible that only the most committed students participated, hence
boosting these scores up.</li>
<li>The 2021 cohort had a hard time during the COVID lock-down. They had
to follow the second bachelor course (a prerequisite to the one
evaluated above) remotely, which was hard on them and is likely to
reduce the overall score.</li>
</ul>
<p>Even considering the above possible confounding factors, I am tempted
to take the figures from the student on-line evaluations as a strong
endorsement for the new teaching strategy. Hopefully, this will also
materialise in a likely part 3 to this post, describing the final exam
results.</p>
<div class="footnotes" role="doc-endnotes">
<ol>
<li id="fn:1" role="doc-endnote">
<p>The latter does read like a nice opportunity for students but do read the <a href="https://lgatto.github.io/ungrading/">motivation</a> for dropping this, and learn about the perverse incentives of such an approach. <a href="#fnref:1" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:2" role="doc-endnote">
<p>The word <em>adapting</em> is quite important here. Students adapt at whatever the system throws at them, all too often for the worse… here, I hope for the better. What we propose is quite different from what they are used to, so it does take some adaptation. <a href="#fnref:2" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
<li id="fn:3" role="doc-endnote">
<p>Original question are in French and were translated in English with Google translate, with minimal editing. <a href="#fnref:3" class="reversefootnote" role="doc-backlink">↩</a></p>
</li>
</ol>
</div>Laurent GattoHere’s the initial assessment of the new ungrading and feedback pedagogy that the TAs and myself have implemented this academic year (2022-2023). In a nutshell, we decided to not mark any of the weekly test and maximise opportunities for feedback to favour student’s self-assessment and reflection on their own work (meta-cognition). Read the full post to read about the reasons and opportunities for dropping grading the weekly tests and maximise feedback with students.