The tragic death of open source research software

1. Introduction

Reserach software has become an central player in scientific research, to the point that it is hard to imagine scientific research without software.

But because of its nature and how it is funded/values, it can also be a single point of failure.

2. Setting the stage

Imagine that 6 months ago, you, a yound and moticated PhD student in biomedical sciences, defined the ideal experimental design to answer an important biological question in your domain. After several months of hard work and thousands of euros of consumables, you have acquired the precious data.

You have even identified a research paper that tackles a similar question using exactly the same technology and type of data. That paper describes a data analysis method and published a piece of software that are ideally suited to answer your question.

Experimental design + data + software = results

You have your data and found the right software. Your results are at arm's length, aren't they? What could go wrong?

3. Possible causes of death

Unfortunately, lots of unfortunate events stand in the way of your results:

  • The software isn't available (anymore).
  • The software is available, but it can't be installed.
  • The software can be installed, but it doesn't work.
  • The software "works", but you can't get it to run on your data.
  • The software "runs" with your data, but the results don't make any sense.

4. Software collapse

4.1. The software doesn't work

  • Software collapse (or software rot) is the fact that software stops working eventually if is not actively maintained. Collapse can be the results of bugs, accidental changes, breaking changes in the software itself, 'natural' changes in software (and service) depedendencies, of disappearance of the software (or more generally, the page where it was available, or available on request)…
  • Or simply because the "software" was never meant to last beyond that one use case/paper. It should have clearly been labelled as a protoype, not a tool/software can other can reuse.

4.2. Or the software works but

  • There is no example data, and it's not clear what the input should look like.
  • There is no documentation - the software works (with the example/test data or with yours), but the commands and/or output don't make any sense.
  • Even though the software (correctly?) runs, the lack of documentation or its inadequacy make it too difficult to use.

5. Making software survive longer

There exist many steps that one can take to minimise the risks described above. These steps are technical to write better, and thus easier to maintain software, or non-technical, to grow a community around the software.

5.1. Administration

  • Stay withing the law (legal constrains, intellectual property, author- and copyrights, funding obligations, licencing, academia vs industry, policies and regulations, …)
  • Make it widerly available under an open source licenses increase usage, contributions, and visibility. Often even required for publications.

5.2. Open source development

  • Choose an open source license, publish your software (as a piece of software and as a research paper) and archive (Zenodo, Software Heritage, …) it.
  • Foster a collaborative environment and a user and co-developer community around your software: code of conduct, onboarding, contribution guide, support forum, … This is particularly relevant if your software is itself part of a larger ecosystem.

5.3. Development

  • Implement modularity (to deal for instance with software collapse).
  • Implement best practice (automation, integration, testing, version control, versioning, …)
  • Don't reinvent the wheel, re-use existing and robust infrastructure. But beware of fragile dependencies.
  • Document: manuals, tutorials, example data, installation, user and developer guides, slides, videos, webpage, …
  • Traceability and reproducibility when analysing data and developing software to do so.

5.4. Software life cycle

  • Think of your software's life cycle: maintenance, new features (if possible), new developers, …
  • Plan for sunsetting you software. Consider ending, pausing, or handing off.
  • Distaster planning: make a thread model considering social, financial and technical vulnerabilities.

5.5. Community

User/developer communities:

  • Maintain your software, answer questions, accept contribution, credit contributors, … make you software findable, citable (DOI), and re-usable.
  • Announce your software, promote it (on social media, mailing lists, forums, …) and through more formal academic publications (think of the different audiences), conference presentations, posters and workshops.

5.6. Use(rs)

  • Eat your own dog food - use the sofware you develop.
  • Produce software that users can install and use (no root/admin priviledges)
  • Make it (easy to) run on other's computer (no hard coded paths, …)
  • Be explicit on the code: one-off analysis scripts, scripts supporting results, or tool/software for wider consumption?

5.7. Training

Appropriate training in data analysis, data management, and software developement/usage is absolutely essentiel. (Some of) these should be delivered early, and indeally as part of the university curriculum to students that train in a software-heavy field (all STEM).

Here are some well-known examples:

5.8. Incentives and funding

6. Conclusions

  • It is hard to imagine scientific research without software.
  • Your results are only as good as the method and the software you use.
  • We need to produce more sustainable software. We need to support people that do so.

7. References

Author: Laurent Gatto

Created: 2025-10-20 Mon 16:16

Validate