This text is taken from a translation of the guide published on the french site www.ouvrirlascience.fr

On the sharing of data associated to scientific publications (www.ouvrir_la_science.fr)

In science, publications are traditional disseminating vectors of knowledge. The presented results are increasingly based on underlying data and analytics. Data sharing together with publications are therefore playing an important role in the quality of the research. The purpose of this guide is to familiarize yourself with the steps needed to share Data linked with publications.

How to share data with publications?

When writing a scientific article, the authors adopt naturally a pedagogical approach consisting in clearly defining all notations and conventions used in the article, in describing the assumptions, the framework as well as the state of art, in order to facilitate its reading and allow its comprehension. data are part of this approach. If we want the shared data to be useful to the scientific community, the same attention must be paid to their publication.

Data preparation and documentation

Describe the data in order to make it intelligible to anyone not having participated in their production constitutes a preliminary step to their dissemination. Information on the origin of the data, the assumptions or constraints related to their production and the experimental protocols associated with it must be part of the descriptive information given with the data: the metadata. There are generic metadata standards which are domain specific. To support this process of continuous data management, we can also rely on a Data Management Plan, which is a document defining the procedures for monitoring and describing the data.

When time comes to share data, several elements must be taken into account. Some data are affected by legal constraints that prevent their sharing or make it necessary to anonymization or authorization requests. Each research establishment has its own policy of data openness, constrained by the legislation, which forms an important prerequisite for choosing the means for sharing data.

It is recommended not to entrust the publishers for sharing the data, who offer to publish them under form of “supplementary data” or “supplementary materials”. Such a publication is often done in a format and an environment that does not allow to document the data correctly, which makes it difficult to reuse by others. It may also be accompanied by a request for the exclusive transfer of rights which is in contradiction with state laws, and the spirit of open science. Finally, in some cases, it makes scientists captive of the environments controlled by major scientific publishing companies.

It is therefore rather recommended to share data in institutional repositories, either general or discipline specific, which avoids such pitfalls and offer documentation oriented environment, allowing consultation and reuse of open research data. Correctly linking the published datasets and the article then becomes a necessity and an approach to be anticipated.

Repository choice

In the case of structured disciplines for data sharing (astronomy, genomics, etc.), data producers have to layout of warehouses specific to their discipline. They will then naturally use all the standards and good practices already in place to document and format their data. The practice of his community is the best guide, but directories of these repositories exist4.
Alternatively, data producers can turn to the institutional repository with which they are affiliated, if any, or use the multidisciplinary Research Data Gouv warehouse. In these both cases, minimum requirements will be imposed by the warehouses and responsibility for ensuring the quality of data documentation will be borne more by the depositor.

The National Gouvernment Data Research Warehouse

The national platform Research Data Gouv offers a multidisciplinary data warehouse which will be operational from 2022: it ensures French sovereignty on the data, complies with French and Community law, guarantees the durability and indexing of the stored data, according to the FAIR principles. It is the warehouse of choice when no warehouse disciplinary does not exist.

Regardless of the warehouse chosen to share data, it must in particular offer the following features:

The assignment of a permanent identifier (Persistent Identifier: PID) of the DOI type which makes it possible to cite the data (for example http://dx.doi.org/10.15497/RDA00027) and constitutes the basic brick to link to other research products such as publications.
The description of the data at a sufficient level to facilitate discovery, understanding and reuse (metadata standardized descriptions, controlled disciplinary vocabularies).
The use of licenses and the definition of access rules allowing reuse to be included in a well-defined legal framework and compatible with French and European law.
A minimum shelf life of several years, consistent with the institution’s data retention policy.

Link data to publications

Several options are available to establish the link between a article and the data associated with it before the publication of the item under consideration. It is then easy to create the link between the article and the associated data, according to the methods described in the diagram on the following pages. Likewise, referencing data-related publications (including data papers) is generally possible in all data warehouses, even after the initial deposit. Conversely, indicate the explicit link to data after the publication of an article is most often impossible at present. A workaround is to refer to the data in the version of the article deposited in an open archive (HAL for example) which allows everyone to learn time of persistent identifiers linked to publications in fields specific “Associated Data” of the record. This scheme therefore allows the reciprocal link between publications and data, but only for the version deposited in the open archive.

Data papers

A data paper is a publication whose purpose is the description of a set of scientific data. Unlike a classic research article, the data paper consists of a detailed description of the scientific data, their metadata, as well as the circumstances and methods of their collection, but without analysis or interpretation of these data. The data described must be accessible (as far as possible), deposited in an appropriate warehouse, and provided with a permanent DOI-type identifier. A data paper is published in the form of a peer-reviewed article, guarantee of its quality, and can be quoted in the same way as an article “ classic”. Therefore, the author of a data paper must be convincing as to the quality and scientific scope of the data (including their potential for reuse). It can be published in specific journals (data journal) or in scientific journals traditional that allow this format.

Cite a dataset

How to cite a dataset linked to a scientific publication depends on the circumstances of production of this data:

If the data was produced and shared during the drafting of the article, it is recommended to introduce a section specific “Data availability” before the references bibliographic. For example: Availability of data Games of data related to this article can be found at https://doi.org/10.23708/PQTQDA, an online code-based data repository open source hosted by DataSuds IRD (Granjon and Fossati, 2020)
If the data has already been produced and shared in another framework than that of the publication, the quotation is made in the references in a form equivalent to that of the references bibliographical, for example:

Van Halder, Inge; Sacristan, Alberto ; Martín-García, Jorge; Pajares, Juan Alberto; Jactel, Herve, 2022, “Monochamus galloprovoncialis catches and pine tree composition in different landscape buffers in Spain”, https://doi.org/10.15454/JXFGPI, INRAE Data Portal, V1

Proper citation of data allows for better indexing and therefore a better discovery when searching and gives credit permanently to the data producer.

Glossary

Research data: factual documents (numerical notes, textual documents, images and sounds, etc.) used as sources primary for scientific research, and which are commonly accepted in the scientific community as being necessary for validate the search results. For further : https://legalinstruments.oecd.org/en/instruments/OECDLEGAL-034
Data warehouses: platforms on which are deposited, described and stored datasets of the research. Warehouses can be generalist or disciplinary.
FAIR: set of principles aimed at supporting research in facilitating the reuse of data. Easy to find (Findable), Accessible (Accessible), Interoperable (Interoperable), Reusable. For further : https://www.ouverturelascience.fr/fair-principles/
Metadata: set of structured information that describes, explicit, locates an information resource, with the aim of facilitate research, use, and management. For further : https://www.niso.org/publications/understanding-metadata-2017
PID: permanent unique identifier - License: mention defining the data reuse conditions

Cited references

Colavizza G, Hrynaszkiewicz I, Staden I, Whitaker K, McGillivray B (2020). The citation advantage of linking publications to research data. PLOS ONE 15(4): e0230416. https://doi.org/10.1371/journal.pone.0230416
https://doranum.fr/metadonnees-standards-formats/fichesynthetique/
https://doranum.fr/plan-gestion-donnees-dmp/minute/
https://repositoryfinder.datacite.org/
https://doranum.fr/aspects-juridiques-ethiques/leslicences-de-reutilisation-dans-le-cadre-de-lopen-data-2/

To go further

Guide de bonnes pratiques sur la gestion des données de la recherche : https://mi-gt-donnees.pages.math.unistra.fr/guide/00-introduction.html
Dedieu, L. ; Barale, M. 2020. Déposer des données dans un entrepôt, en 6 points. Montpellier (FRA) : CIRAD, 4 p. https://doi.org/10.18167/coopist/0070
Dedieu, L. 2014. Rédiger et publier un data paper dans une revue scientifique, en 5 points. Montpellier (FRA) : CIRAD, 7 p. https://doi.org/10.18167/coopist/0057
Deboin, M.C. 2021. Citer un jeu de données scientifiques, en 4 points. Montpellier (FRA) : CIRAD, 4 p. https://doi.org/10.18167/coopist/0058
How to cite datasets and link to publications https://www.dcc.ac.uk/sites/default/files/documents/publications/reports/guides/How_to_Cite_Link.pdf