Cite as: Data Citation Synthesis Group: Joint Declaration of Data Citation Principles. Martone M. (ed.) San Diego CA: FORCE11; 2014 https://doi.org/10.25490/a97f-egyk
Translations
Japanese – https://doi.org/10.11502/rduf_rdc_jddcp_ja (added 31.01.2020).
>>> Endorsement List
Preamble
Sound, reproducible scholarship rests upon a foundation of robust, accessible data. For this to be so in practice as well as theory, data must be accorded due importance in the practice of scholarship and in the enduring scholarly record. In other words, data should be considered legitimate, citable products of research. Data citation, like the citation of other evidence and sources, is good research practice and is part of the scholarly ecosystem supporting data reuse.
In support of this assertion, and to encourage good practice, we offer a set of guiding principles for data within scholarly literature, another dataset, or any other research object.
These principles are the synthesis of work by a number of groups. As we move into the next phase, we welcome your participation and endorsement of these principles.
Principles
The Data Citation Principles cover purpose, function and attributes of citations. These principles recognize the dual necessity of creating citation practices that are both human understandable and machine-actionable.
These citation principles are not comprehensive recommendations for data stewardship. And, as practices vary across communities and technologies will evolve over time, we do not include recommendations for specific implementations, but encourage communities to develop practices and tools that embody these principles.
The principles are grouped so as to facilitate understanding, rather than according to any perceived criteria of importance.
1. Importance
Data should be considered legitimate, citable products of research. Data citations should be accorded the same importance in the scholarly record as citations of other research objects, such as publications[1].
2. Credit and Attribution
Data citations should facilitate giving scholarly credit and normative and legal attribution to all contributors to the data, recognizing that a single style or mechanism of attribution may not be applicable to all data[2].
3. Evidence
In scholarly literature, whenever and wherever a claim relies upon data, the corresponding data should be cited[3].
4. Unique Identification
A data citation should include a persistent method for identification that is machine actionable, globally unique, and widely used by a community[4].
5. Access
Data citations should facilitate access to the data themselves and to such associated metadata, documentation, code, and other materials, as are necessary for both humans and machines to make informed use of the referenced data[5].
6. Persistence
Unique identifiers, and metadata describing the data, and its disposition, should persist — even beyond the lifespan of the data they describe[6].
7. Specificity and Verifiability
Data citations should facilitate identification of, access to, and verification of the specific data that support a claim. Citations or citation metadata should include information about provenance and fixity sufficient to facilitate verifying that the specific timeslice, version and/or granular portion of data retrieved subsequently is the same as was originally cited[7].
8. Interoperability and Flexibility
Data citation methods should be sufficiently flexible to accommodate the variant practices among communities, but should not differ so much that they compromise interoperability of data citation practices across communities[8].
When citing this document please use:
Data Citation Synthesis Group: Joint Declaration of Data Citation Principles. Martone M. (ed.) San Diego CA: FORCE11; 2014 https://doi.org/10.25490/a97f-egyk
For further information, please refer to these examples.
Glossary
ATTRIBUTION
(First used in principle 2)
Specification of terms of use of data, usually in the form of a license.
Legal attribution is founded on intellectual property rights and licenses as well as on strong normative values in the research community, and the data citations concern individual rights and norms of credit and publicity. Legal attribution is therefore distinguished in these principles from normative (scholarly) attribution, which is concerned with the incentives and systems of scholarly credit and evaluation (adapted from CoData 2013).
CITATION
(First used in preamble)
A formal structured reference to another scholarly published or unpublished work (adapted from https://www.jstage.jst.go.jp/article/dsj/12/0/12_OSOM13-043/_pdf).
In traditional print publishing, a “bibliographic citation” refers to a formal structured reference to another scholarly published or unpublished work. (This is in contrast to formal bibliometric terminology in which references are made, and citations received.) Typically, intra-document citation pointers to these structured references are marked and abbreviated. These are accompanied by the full bibliographic references to the work appearing in the bibliography or reference list, often following the end of the main text, and is called a “reference” or “bibliographic reference.” Traditional print citations include “pinpointing” information, typically in the form of a page range that identifies which part of the cited work is being referenced.
The terminology commonly used for digital citation has come to differ from this older print usage. We adopt the more current usage in which “citation” is used to refer to the full bibliographic reference information for the object. The current usage leaves open the issue of the terminology used to describe the more granular references to data, including subsets of observations, variables, or other components and subsets of a larger data set. These granular references are often necessary in-text to describe the precise evidential support for a data table, figure, or analysis and are analogous to the “pin citation” used in the legal profession or the “page reference” used in citing journal articles. The term “deep citation” has been applied to granular citation to subsets of data.
DATA
(First used in preamble)
Any record which can be used to support a scholarly research argument, even if it may not be considered valid evidence in all disciplines. In the social sciences, data may include survey responses, interviews and historical documents. Source: modified from http://vso1.nascom.nasa.gov/vso/misc/vocab_2p3.pdf.
The term “data” as used in this document is meant to be broadly inclusive. In addition to digital manifestations of literature (including text, sound, still images, moving images, models, games, and simulations), digital data refers as well to forms of data and databases that are not self-describing — that generally require the assistance of metadata, computational machinery and/or software in order to be useful, such as various types of laboratory data including spectrographic, genomic sequencing, and electron microscopy data; observational data, such as remote sensing, geospatial, and socio-economic data; and other forms of data either generated or compiled by humans or machines (adapted from CoData Report, 2013).
DATASET
(First used in preamble)
Recorded information, regardless of the form or medium on which it may be recorded including writings, films, sound recordings, pictorial reproductions, drawings, designs, or other graphic representations, procedural manuals, forms, diagrams, work flow, charts, equipment descriptions, data files, data processing or computer programs (software), statistical records, and other research data.” (from the U.S. National Institutes of Health (NIH) Grants Policy Statement via DataCite’s Best Practice Guide for Data Citation). – From DataCite Business Models Principles http://www.datacite.org/sites/default/files/Business_Models_Principles_v1.0.pdf
IDENTIFIER AND PERSISTENT IDENTIFIER
(First use in principle 6)
An identifier is an association between a character string and an object. Objects can be files, parts of files, names of persons or organizations, abstractions, etc. Objects can be online or offline. Character strings include URLs, serial numbers, names addresses, etc. A “persistent identifier” is an identifier that is available and managed over time; it will not change if the item is moved or renamed. This means that an item can be reliably referenced for future access by humans and software (from http://n2t.net/ezid/home/understanding).
INTEROPERABILITY
(First used in principle 8)
The ability of making systems and organizations work together (adapted from Wikipedia). Access to research data, as facilitated by data citations, requires technological infrastructure that is appropriately designed and based on interoperability best practices that include data quality control, security, and authorizations. Currently, interoperability at both the semantic and the infrastructure levels is important to ensure that data citations facilitate access to research data. However, organizations working to develop improved infrastructures that foster interoperability should widely communicate the standards, guidelines, and best practices that are being implemented; adopt standards for data documentation (such as metadata) and dissemination (data citations, including bidirectional links from data to publications and vice versa); and maintain an up-to-date knowledge of the evolution of not only the technologies implemented but also the best practices efforts being executed by the community of practice (adapted from CoData Report, 2013, Ch 5).
MACHINE-ACTIONABLE
(First used in introduction to principles)
Content that can be used and manipulated by computers (http://www.libraries.psu.edu/tas/jca/ccda/docs/tf-MRData3.pdf).
METADATA
(First used in preamble)
Information about the data being tracked within a data system. Metadata typically conforms to a metadata information model. Metadata may include, for example, the name of the sensor used to collect the data or person who collected the data, where the data was collected, information about the units and dimensionality of the data, and other notes recorded by the investigator about how the data has been processed. Source: modified from http://vso1.nascom.nasa.gov/vso/misc/vocab_2p3.pdf.
Metadata is information (data) about the object and its disposition, such as the name of the object’s creator, the date of creation, the target URL, the version of the object, its title, and so on. (from: http://n2t.net/ezid/home/understanding).
RESEARCH OBJECT
(First used in preamble)
Sharable, reusable digital objects that enable research to be recorded and reused (adapted from Wikipedia).
SCHOLARSHIP
(First used in preamble)
Serious formal study or research of a subject (adapted from Merriam-Webster Dictionary).
VERIFICATION, PROVENANCE AND FIXITY
(First used in principle 7)
Verification means to reliably establish the relationship between the cited object of a original citation and a current object — verification enables one to confirm that the data retrieved is the data cited. This is separate from persistence, which remains the responsibility of the archive, not the citation..
Types of verification information include fixity — which can be used directly to assess the integrity of specific content, and provenance, which provides information about parts of the chain of custody and/or processing to which the content was subject. Specific forms of citation verification include, but are not limited to: embedding fixity information in the citation itself; associating the citation with a surrogate (such as a landing page) where additional metadata, such as the data form, fixity, and final stage of provenance, are given explicitly; or associating such metadata with the DOI, handle, or other persistent identifier persistent identifier itself directly, through the persistent identifier’s resolution or index service (adapted from CoData, 2013).
VERSION
(First used in principle 7)
A modified dataset based on a single designated dataset — roughly equivalent to an “edition” in FRBR terms (see http://archive.ifla.org/VII/s13/frbr/frbr2.htm)
This is often denoted with a number that is increased when the data changes, and can also be described by a “timeslice” or access date where a formal version is unavailable, for example (see https://doi.org/10.1045/january2011-starr).
References
- CODATA/ITSCI Task Force on Data Citation, 2013. “Out of cite, out of mind: The Current State of Practice, Policy and Technology for Data Citation”. Data Science Journal 12: 1-75., <http://dx.doi.org/10.2481/dsj.OSOM13-043> sec 3.2.1; Uhlir (ed.) 2012,Developing Data Attribution and Citation Practices and Standards. National Academies.
<http://www.nap.edu/download.php?record_id=13564>, ch. 14.; Altman, Micah, and Gary King. 2007 . “A proposed standard for the scholarly citation of quantitative data.” D-lib Magazine 13.3/4. <http://www.dlib.org/dlib/march07/altman/03altman.html> - CODATA 2013, Sec 3.2; 7.2.3; . Uhlir (ed.), 2012. ,ch. 14
- CODATA 2013, Sec 3.1; 7.2.3; Uhlir (ed.) 2012, ch. 14
- Altman-King 2007; CODATA 2013, Sec 3.2.3, Ch. 5; Ball, A., Duke, M. (2012). ‘Data Citation and Linking’. DCC Briefing Papers. Edinburgh: Digital Curation Centre. <http://www.dcc.ac.uk/resources/briefing-papers/introduction-curation/data-citation-and-linking>
- CODATA 2013, Sec 3.2.4, 3.2.5, 3.2.8
- Altman-King 2007; Ball & Duke 2012; CODATA 2013, Sec 3.2.2
- Altman-King 2007; CODATA 2013, Sec 3.2.7, 3.2.8
- CODATA 2013, Sec 3.2.10