Reproducibility and data repositories

Reproducibility and data repositories

data repositoryThe scientific method is founded on confirming results through repeat experiments. By replicating the results of a study, researchers can confidently claim the results are a true effect, rather than an anomaly. Even better than this, is the replication of results by independent researchers. Better again, is the replication of results by multiple independent researchers using modified methods, different statistical methods etc., and so on.

This is known as reproducibility. And we are in the middle of a reproducibility crisis [1]. “More than 70% of researchers have tried and failed to reproduce another scientist’s experiments, and more than half have failed to reproduce their own experiments.” [1].

What can be done about reproducibility?

Many labs have taken independent measures to tackle this problem, including getting a third party to repeat the studies, standardizing experimental methods, pre-registration (where researchers submit their hypotheses, design and plans for data analyses to the target journal prior to conducting the experiments, which prevents cherry-picking) [1].

On a wider scale, the consensus among the scientific community for improving reproducibility is to make data sharing mandatory. Data sharing is where researchers make their data publicly available. One of the ways they can do this is by depositing their datasets in online data repositories. This practice allows independent parties to verify findings [2] and promotes a culture of openness and transparency. It also “lowers the barriers to meta-studies and enables web-scale analysis” [2].

To promote this practice, many journals (PLOS One, Nature, The Royal Society, et al.) and funding bodies (NIH, STFC, NERC, Wellcome Trust, et al.) have mandatory data sharing policies.

Data repositories

There are >1500 discipline-specific, institutional and generalist data repositories currently available.

Data types that can be uploaded are wide-ranging, including plain text, simple Excel files, source code, SPSS files, GIS shapefiles, Genome data-specific formats, videos, images etc.

Some repositories also provide digital object identifiers (DOIs) or universal numerical fingerprint (UNFs) for datasets, so researchers can cite the dataset online and put links to their data in later published articles or conference papers [3].

Repositories can be open access, allowing instant searching and downloading of datasets, have restricted access, or closed access.

Most scientists will agree that the development of data repositories has been a significant step forward in tackling the irreproducibility problem. However, with thousands of repositories, searching through and selecting the correct one can be time-consuming for authors.

Fortunately, a registry is now available online. re3data.org currently provides an overview of 1637 data repositories, making it the largest of its kind.

You can search for appropriate repositories using a simple search box or using filters listed in a navigation panel. The list of filters is very comprehensive, covering everything from subject to content type (e.g. images, source code, plain text etc.) to software used (DSpace, DataVerse etc.) to whether or not the uploaded data is openly accessible or restricted.

re3data.org provides you with an overview of each repository, including a short description of the repository, the institutions responsible for funding, the guidelines and policies of the repository, and the technical (e.g. versioning of datasets) and quality (e.g. certificates, audit processes) standards of the repository.

In the fight against irreproducibility, you will be increasingly required to upload your data to repositories. re3data.org can help you select the most appropriate one and save you valuable time.

  1. Baker, M. (2016) 1,500 scientists lift the lid on reproducibility. Nature 533(7604), 452–454.
  2. Taylor, M. (2013) Should research data be publicly available? Elsevier Connect. Weblog. Available at: https://www.elsevier.com/connect/should-research-data-be-publicly-available [Accessed 04 Nov 2016].
  3. Uzwyshyn, R. (2016) Research Data Repositories: The What, When, Why, and How. Information Today, Inc. Weblog. Available at: http://www.infotoday.com/cilmag/apr16/Uzwyshyn–Research-Data-Repositories.shtml [Accessed 04 Nov 2016].