Much has been written about, and support stated for, sharing and publishing scientific data, in recognition of the benefits for the economy [1
], scientific discovery [2
] and public health [3
]. Maximizing the potential of scientific data sharing for the discovery of new knowledge involves reducing barriers to data dissemination, reuse, reproducibility and integration. Licensing, ownership, copyright and intellectual property present legal obstacles to data integration and reuse, which has led to the development of, and calls for, licensing standards for open data; where data are explicitly placed in the public domain with legal rights of the owners waived [4
BioMed Central has previously stated that the concept of open data, analogous to its policy on open access to journals, goes beyond making data freely accessible. Data should also be free to distribute, copy, re-format, and integrate into new research, without legal impediments [5
]. This position is consistent with the Panton Principles, which hold that for society to reap the full benefits of scientific research the published body of knowledge must be open – readily available such that it can be evaluated, reused, criticized and integrated with other knowledge without restrictions [6
]. For the remainder of this article the term ‘open data’ is reserved exclusively for data available according to these principles.
Unfortunately much data – and other content – freely available on the web are available under restrictive or ambiguous terms, which risks impeding or potentially criminalizing secondary users of scientific data. According to evidence submitted to the UK Government’s intellectual property review by the Wellcome Trust, 87 per cent of the material housed in the full-text scholarly archive UK PubMed Central is unavailable for legal text and data mining [7
]. A key finding of a more recent report, commissioned by JISC, was a need to overcome legal restrictions and uncertainties surrounding text mining of scientific literature [8
Indeed, as recognition of the value of shared life science data has increased, so has recognition of intellectual property and copyright as barriers to progress. Writing in Nature
in 2009, Schofield et al
., urged that “any restrictions on use should be strongly resisted and we endorse explicit encouragement of open sharing” [9
]; and Conway and VanLare in JAMA, in 2010, called for US health care data to be available without intellectual property constraints [10
]. Waiver of all intellectual property rights in research data is central to the achievement of an “information commons”, advocated by organisations such as Sage Bionetworks, to enhance the (slowing) pace of drug discovery.
The genomics community has shown leadership in establishing a framework for an “information commons”, engrained in the Bermuda Principles, and have established built-in temporal latencies to data for knowledge (when data are released), and rights (when rights restricting use are removed) [11
]. Researchers in this community typically must release their genetic sequence data immediately, and within 6–12
months release their exclusive rights in that data. During this relatively short embargo researchers have their opportunity to exploit the data for their discoveries, after which the community at large can benefit, if they wish, from the new data. A similar model for data release has since been proposed for clinical trials, although is probably far from implementation [12
]. A number of factors seem to have led to a successful culture of sharing in the genomics community: a need to collaborate and share to achieve a major goal (the sequencing of the human genome); effective mechanisms and infrastructure for sharing large amounts of data (well-funded genetic sequence databases); scientific community and funding agency mandates to share data; and importantly, in the context of this article, successful collaborations with the publishing community. Journals, their editors and publishers, supported implementation of the Bermuda Principles by, for example, requiring accession number for data deposits as a condition of manuscript submission or publication.
BioMed Central in its August 2010 open data statement [5
] and subsequent cross-publisher Publishing Open Data Working Group meeting identified that open data in journal publications could be implemented by specifying that, from a specific date, any author submitting to a journal or publisher agrees to dedicate the data elements of their article and supplementary material (in particular, additional data files; also known as “supplementary” data files) to the public domain [13
]. Much of the contents of academic journals could be considered as data but licensing terms cannot be applied retroactively by publishers without authors’ consent, and any changes to authors’ agreements should ideally be made in consultation between authors and publishers.
This article aims to describe practically what is needed from publishers to explicitly dedicate data within open access journals to the public domain, and discusses the implications of this development for authors, editors, publishers and funders of research. Illustrative examples and use cases are provided throughout the article. In this article “open access” is defined according to the Budapest Open Access Initiative definition [14