Many journals (PLOS One, Nature, The Royal Society, et al.) now have mandatory data sharing policies. This means that researchers must make their datasets publicly available, whereby readers can “reach the conclusions drawn in the manuscript” and “replicate the reported study findings in their entirety.” .
Datasets can be made publicly available in three ways:
- In the article itself: for small datasets that can be presented in full in a table.
- In the supporting information: for medium-sized datasets that can be presented in large tables or compressed files, which can be downloaded online from the journal website.
- In a data repository: for large datasets (e.g., DNA sequences) that need large database infrastructures to store them.
Although option 3 (deposit data in data repository) is most suitable for large datasets, it is strongly recommended that datasets of all sizes be uploaded to some form of repository .
Journals with mandatory data sharing policies require authors to provide a data availability statement on the first page of the article, which states the location of the dataset.
Examples of data availability statements:
Data Availability: All relevant data are within the paper.
Data Availability: All relevant data are within the paper and its Supporting Information files.
Data Availability: The TaqMan Human MicroRNA Array experiments are MIAME compliant and have been deposited at the NCBI Gene Expression Omnibus (GEO) database (http://www.ncbi.nlm.nih.gov/geo) under accession GSE6459 .
Data Availability: All .bam sequencing files are available at the European Nucleotide Archive (http://www.ebi.ac.uk/ena) (accession numbers ERS700862, ERS700863, ERS700864, ERS700858, ERS700859, ERS700860, ERS700861) .
In some cases, you may want to upload data before you are ready to release it publicly or publish it. In this case, you can upload data to a repository with tiered access – i.e., the data will only be made available when it has been published in a journal .
Are there exceptions to these mandatory policies?
In certain cases, datasets are too large or the data are human patient data, which cannot be made publicly available for ethical reasons. In such cases, it is recommended you contact the target journal to discuss solutions to these issues .
Are there costs?
The cost of depositing data in a repository varies. Dryad charges $120 per dataset (<20 GB); however, they have a waiver for countries with low-income economies , and both Nature and some Royal Society journals (Biology Letters, Proceedings B and Royal Society Open Science) cover the cost of depositing the data (<20 GB) in both Dryad and Figshare  (two large generalist repositories).
When it comes to data sharing, it is better to provide as much information as possible. The open transparent sharing of data not only benefits the scientific community but wins the favour of public taxpayers.
- Plus One. (2016) Data Availability. Plos One. Retrieved from http://journals.plos.org/plosone/s/data-availability on 15 November 2016.
- Wozniak, M.B., Scelo, G., Muller, D.C., Mukeria, A., Zaridze, D. and Brennan, P. (2015) Circulating microRNAs as non-invasive biomarkers for early detection of non-small-cell lung cancer. Plos One 10(5), p.e0125026.
- Butler, T.M., Johnson-Camacho, K., Peto, M., Wang, N.J., Macey, T.A., Korkola, J.E., Koppie, T.M., Corless, C.L., Gray, J.W. and Spellman, P.T. (2015) Exome sequencing of cell-free DNA from metastatic cancer patients identifies clinically actionable mutations distinct from primary disease. Plos One 10(8), p.e0136407.
- Dryad. (2016) Data publishing charges. Dryad. Retrieved from http://datadryad.org/pages/payment on 15 November 2016.
- The Royal Society. (2016). Data sharing and mining. The Royal Society. Retrieved from https://royalsociety.org/journals/ethics-policies/data-sharing-mining/ on 15 November 2016.