September data champions meeting

5 minute read

The Data Champions Initiative was set up as a way to promote best practice in terms of data management within the University departments and institutes, and to give those that promote them, explicit visibility for their efforts. This is a reflection on my role and contributions as data champion and a summary of related activities that I will present at the meeting.

Introduction

So I’ll start with a apology

I’m sorry for being a bad data champion.

Why I am a bad data champion?

My personal and limited experience indicates that my department isn’t a place worth spending my time and energy in terms of good (and open) data management. Why? Because I don’t see it as a place that values that kind of skill set, open data, data management, reproducible science… Time and effort invested in my department will mostly remain unnoticed, and won’t benefit me.

Now that I have that out of the way, I can however share other efforts I’m involved in, that are related to the data champions effort and ethos. Here are some projects that I have been asked to share with you. They might not necessarily strike as being in line with the data champions efforts as outlined at the beginning of the initiative. But there is a lot of work to be done in terms of good data management in the light of more open, transparent and trustworthy science. I have spend a lot of time and efforts over the last years thinking about open research. I don’t have perfect solutions, only opinions (that I think would make, IMHO, a positive change). And I have settled to argue about issues and venues that I see as the most appropriate.

The Open Research Pilot Project (ORPP)

The ORPP is a joint project by the Office of Scholarly Communication at the University of Cambridge and the Wellcome Trust Open research team. From the official page, the pilot project looks at:

  • the support research groups need in order to make all aspects of their research open,
  • why they want to do this,
  • how it benefits them,
  • how it improves the research process
  • what barriers there might be that prevent the sharing of their research.

Four research groups have been selected to participate. So far, the project proceeds through meetings with all participants (every 6 months), discussions between the research groups and their recognised OSC collaborator, blog posts and occasional emails on a dedicate (private) mailing list.

I have written more about the ORPP here, where I share my first post-kick off meeting thoughts about the project, and here, where I wrote an public grant application describing the SpatialMap project.

My opinion at the moment is that, ironically, the project itself isn’t open enough to really bring anything new to the debate (but maybe, that is by design?). So far, all problems and needs that have been highlighted (bad incentives, lack of rewards for being an open researcher, need for better funding for open science, funding for maintenance, sustainability, …) have been known and discussed online extensively; the project would possibly benefit from the communities wisdom and input. Also, so far, the project replicates the main issue that we have with involving time to promote open research: it is an activity that is mainly driven by individual researchers’ desire to improve a research environment that doesn’t promote the best possible research, without clear benefits for the researchers themselves.

My hope is that the participants will be in a position to provide direct input to the Wellcome Trust and positively influence their impact on the promotion of open science and open scientists.

The role of peer review in promoting open science

This is based on a talk I gave at a peer review meeting organised by the OSC in March 2017. The slides are available here. There is more about peer review here, featuring a round discussion at Cambridge University Press as part of #PeerRevWk17.

Main main argument, that is most relevant to the data champions initiative is encapsulated in this question:

Reviewing data/software/methods: is this asking too much from reviewers?

The above implies that the review can be very quick

If no data/software are available, there can’t be any review.

Tips

(See slides for details)

  • availability: data/software availability and a reasonable description thereof
  • meta-data
  • do numbers match?: Experimental design: 2 groups, 3 replicates, but data files or columns: not a multiple of 6! Also look at the file names; is there are consistent naming convention?
  • what data, what format: Is it readable with standard and open/free software? Raw, processed, how to get from one to the other? Summary table?
  • license: If you share anything, make sure users are allowed to re-use it, and are aware of the terms under which they can re-use it. Examples: CC-BY, CC0, GPL, BSD, …

Quick survey

  • Is this asking too much?
  • Who would like to see this done systematically?
  • Who does this in their peer review activities?

The real question however is

who applies these principles when they prepare data?

My “ideal” review system

  1. Submit your data to a repository, where it get’s checked (by specialists, data scientists, data curators) for quality, annotation, meta-data.
  2. Submit your research with a link to the peer reviewed data. First review the intro and methods, then only the results (to avoid positive results bibs).

As data champions, I think it is important to promote the importance of data and data management on a day-to-day basis, but also to express this ethic in our academic capacity, such as peer review.

An early careers view on modern and open scholarship

This is a slide from a talk I gave at the Research Data Management forum (RDMF) in July 2017, revering to the ORPP above.

In my opinion, barriers to Open Research (and the issues Data Champions are addressing) are not technological, but rather at political, institutional, community level.

A critical point that is missing is the academic promotion of open research and open researcher (*), as a way to promote a more rigorous and sound research process and tackle the reproducibility crisis. What should the incentives be? How to make sure that the next generation of academics genuinely value openness and transparency as a foundation of rigorous research? (link)

(*) and here, I extend this to roles such as data champions, data maintainers/curators, research software engineers.

Conclusions

As data champions, our duty is to promote good data, good data management and good data analysis, and, directly or indirectly, more open and trustworthy research. ‘Open’ here refers to open to your future self, open the your group/institute, or open to the research community and to the public.

But I think we also need to reflect on the incentives in place to do what we do, and the costs that these efforts have. Not because I believe we shouldn’t make these efforts, but because we deserve to be recognised and valued for our efforts.