Introduction

The idea behind the LabInform framework: feasible ways for reproducible and reliable science on a larger scale, from planning experiments up to publishing results and storing the data and results for reasonable time (decades rather than years).

Reproducible research

Reproducibility of every aspecd of research is at the heart of proper science. To achieve this goal, documenting every step is one necessary prerequisite. In large companies working in the field of chemistry, pharmacy and medicine, a strict quality management has led to developing large-scale, commercial, and highly integrated systems taking a maximum effort to ensure a gap-less record and documentation of each and every step. In fundamental research in academia, though, those systems are rare at best, if not simply inexisting. In contrast, documenting the research directly correlates with the motivation of the individual scientist to cope with aspects of reproducibility as well as with her or his discipline to follow appropriate rules and workflows. In reality, the research performed is usually insufficiently documented, let alone the following processing of the acquired data. Furthermore, most scientists do not or can not care about storing data and results in ways that ensure or at least enable access after many years, ideally decades.

Personal freedom vs. reproducible science

At the same time, individuality, independence and personal responsibility of scientists are emphasised in the academic context for good reason. Generally, scientists should not be limited by too many and unnecessary rules. Even more, developing an individual concept for documenting experimental parameters necessary for a sensible and adequate analysis (and reproducibility) of the acquired data is often seen as an integral part of scientific education in universities.

Scientists are not software engineers

Another important aspect: Usually, programs for data processing and analysis get written by the individual scientist who is at the same time their main user. However, scientists normally are not familiar with nor know about key aspects of software engineering developed over the last decades. These aspects – ranging from tips for code formatting and naming to patterns of application architecture – are essential to create robust and reliable software of the necessary complexity that can be used by others as well and is sufficiently future-proof.

There are good reasons why developing and taking care of (more) complex software is usually in the realm of professional software engineers that can focus solely on this task. However, for most research groups in the academic context, it is not possible to hire professionals for this purpose. Besides that, the individual scientists should not use “black boxes” whose inner workings they neither know about nor understand. One strategy to overcome this problems may be to make scientists familiar with both, proven rules (“best practices”) for programming as well as general concepts for data handling that have been developed by scientists and have been proven useful in day-to-day work. This would free scientists to focus on their actual science and would be beneficial in the long run.

Long-term storage and availability

Usually, most data nowadays are acquired and processed digitally using computers. This raises questions about long-term storage and availability. Whereas an increasing number of funding agencies in science require applicants to provide concepts for these aspects, developing appropriate solutions and actually implementing them in a working manner is far from easy. Given that often not only raw data are of interest, but detailed analyses as well, this becomes even more difficult.

The goal: concepts for reproducibility

All the aspects mentioned above have led to the development of the LabInform framework (and its predecessors). The goal of the LabInform framework is not to solve once and for all the problems mentioned. Rather, the idea is to provide concepts and show ways to tackle many of the individual aspects. The ultimate goal remains to have a system ensuring complete reproducibility (and replicability wherever possible) starting from planning experiments and ending with final representations of results and long-term storing data and analyses.