Providing for long-term and consistent public access to scientific data is a growing concern in biomedical research. One aspect of this problem can be demonstrated by evaluating the persistence of supplementary data associated with published biomedical papers.
We manually evaluated 655 supplementary data links extracted from PubMed abstracts published 1998–2005 (Method 1) as well as a further focused subset of 162 full-text manuscripts published within three representative high-impact biomedical journals between September and December 2004 (Method 2).
For Method 1 we found that since 2001, only 71 – 92% of supplementary data were still accessible via the links provided, with 93% of these inaccessible links occurring where supplementary data was not stored with the publishing journal. Of the manuscripts evaluated in Method 2, we found that only 83% of these links were available approximately a year after publication, with 55% of these inaccessible links were at locations outside the journal of publication.
We conclude that if supplemental data is required to support the publication, journals policies must take-on the responsibility to accept and store such data or require that it be maintained with a credible independent institution or under the terms of a strategic data storage plan specified by the authors. We further recommend that publishers provide automated systems to ensure that supplementary links remain persistent, and that granting bodies such as the NIH develop policies and funding mechanisms to maintain long-term persistent access to these data.