As of 9 March 2011, the Repository was tracking 111 NIDDK-funded studies. From these studies, the Repository offers resources for clinical, biochemical, statistical and genetic research especially in the areas of diabetes, kidney disease, liver disease and inflammatory bowel disease. At present, the Repository offers clinical data from 29 completed NIDDK-funded studies—15 of which currently offer biospecimens and 7 of which have available genotype data. provides descriptions of these studies, the specimens available from each study, and the number of subjects enrolled. Since there is substantial variability in the types of clinical data available from each study, it is not feasible to summarize it in this article. Suffice it to say that the collection of clinical data is large, diverse and carefully curated. As an example of the studies included in this collection, we would note the DCCT-EDIC study which is continuing to follow a cohort of Type 1 diabetic patients recruited in 1983. The clinical data include the results of physical examinations with extensive measurements at regular intervals of retinopathy, nephropathy, neuropathy and cardiovascular status along with metabolic and lipid profiles. (Biospecimens available from DCCT-EDIC include DNA, plasma, RNA, serum, urine and peripheral blood mononuclear cells [PBMC].). (Samples may include multiple aliquots of the same unique specimens.)
Studies currently offering clinical data from the NIDDK Central Repository
The Repository houses biospecimens both from studies for which we have clinical data sets and studies that have not yet deposited clinical data sets. As a result, the number and range of Repository biospecimens is substantially greater than those shown in . In , we present a tabulation of the different types of biospecimens available from the Repository and the studies that contributed each type of specimen. It will be seen from that the Repository offers more than 20 different types of biospecimens with over 5 million samples in storage. The most common biospecimens are serum, plasma, urine, DNA and buffy coat, plus the more than 470
000 stool samples collected by The Environmental Determinants of Diabetes in the Young (TEDDY) study.
Biospecimens currently banked at the NIDDK Repositorya
Use of Repository
Since 2004, the Repository website (http://www.niddkrepository.org
) has provided the public with access to details of all studies included in the NIDDK Central Repository, including study summaries, protocols, manuals of operation, data collection forms and lists of publications, available data sets and biospecimens. In addition, the Website allows investigators to apply electronically for access to data and biospecimens. Although the Repository Website provides an efficient and easily accessible portal for obtaining information on archived studies, Repository staff and statisticians frequently provide scientists with additional information prior to formal requests for data or biospecimens. So, for example, a researcher might send the Repository an e-mail saying: ‘I understand that a subset of patients in the Modification of Diet in Renal Disease (MDRD) study had polycystic kidney disease (PKD). How can I obtain information regarding the number of PKD patients in the MDRD database?’ The Repository has responded to numerous such requests for detailed information. Between 2008 and 2010, an average of 28 such requests were received annually via the ‘Ask the Repository’ link on the Repository’s Website. Additional requests were received via the NIDDK telephone help line, the ‘Contact Us’ page of the Repository Website, and by e-mails sent directly to Repository staff.
As of 9 March 2011, a total of 188 external requests for archived data sets and 31 external requests for biospecimens either have been approved or are pending. The number of requests has increased over time as the Repository has become better known in relevant scientific communities. In the Repository’s first 2 years of operation (2003–04), there were no approved data set or biospecimen requests; by 2010 requests had increased to an annual rate of 31.
As shows, there has been substantial variation in the popularity of data sets and biospecimens from different studies. The most frequently requested data sets involved studies of type 1 and type 2 diabetes. DCCT/EDIC ranks first in popularity, with 42 approved or pending requests for data and biospecimens from this landmark study of type 1 diabetes. The Diabetes Prevention Program for type 2 diabetes ranks second, with 21 approved or pending requests for data sets. Data sets and biospecimens from the Type 1 Diabetes Genetics Consortium (T1DGC; 20 requests) and Genetics of Kidneys in Diabetes (GoKinD; 13 requests) rank third and seventh, respectively. In addition, the Diabetes Prevention Trial of Type 1 Diabetes (DPT-1) ranks ninth (10 requests). These diabetes studies accounted for almost one-half (106 of 219) of the approved requests for Repository data sets and biospecimens.
Frequency in rank order of approved and pending requests for data sets and biospecimens in NIDDK Data Repository (as of 9 March 2011)
Studies of renal disease were the second most requested category of data sets and biospecimens. These included the MDRD study (19 requests); the African American Study of Kidney Disease and Hypertension (AASK; nine requests); the Hemodialysis Study (HEMO; 14 requests); the Consortium for Radiologic Imaging Studies of PKD (CRISP; 13 requests); the Acute Renal Failure Trial Network (ATN; five requests); and the National Analgesic Nephropathy Study (NANS; two requests). Studies of liver disease and transplantation were the next most requested data sets and biospecimens ().
In addition to the aforementioned requests from external researchers, the Repository also supports ancillary research by investigators participating in the original study group or collaborating with them who wish to use archived biospecimens to address research questions beyond the funded scope of the original study. As of 9 March 2011, 113 requests have been approved or are pending to provide biospecimens for such ancillary studies.
Sharing non-renewable resources
While digital data sets can be copied ad infinitum, some biospecimens stored in the Repository are not renewable. This creates unique challenges. In January 2010, NIDDK issued a program announcement (PAR-10-090) that was ‘intended to facilitate equitable and appropriate distribution of biosamples from the NIDDK Central Repositories.’ Investigators requesting nonrenewable biospecimens are required to consult with the Repository to determine whether a sufficient quantity of the samples is available and whether the proposed use of the biospecimens is consistent with the informed consent used in the research study. Investigators seeking nonrenewable biospecimens from the Repository are then required to submit an application describing ‘the background and rationale for request; a list of specific objectives; detailed information about the proposed studies; detailed information about the amount and type of samples needed and documentation from the Repository confirming that samples are available; plans for sample management; a description of follow-up plans.’ Requestors are also required to ‘explain how the proposed research will take advantage of the large amount of associated phenotypic data.’
Maintaining repositories of data and biospecimens is not cheap, but their costs pale in comparison to the costs of original data collection. From 2003 to 2013, NIDDK will spend a total of approximately $73 million for the NIDDK Repositories (1
). Costs are most expensive for archiving biospecimens ($28 million) and genetic samples ($33 million), while data archiving is less expensive ($12 million). The costs for acquisition of biosamples has ranged from ~$0.70 to $7 per tube while production of DNA or a cell line and DNA have ranged from ~$70 to $800. Maintaining these samples in the Repository has cost ~$0.01 per tube per year for biosamples and $10 to $16 per cell line per year.
The cost of the original data collection is, however, much more expensive. The DCCT-EDIC, for example, has cost more than $200 million since its inception, while the archiving and distribution costs for genetic samples and immortalized cell lines, biospecimens and multiple data sets have been less than $3 million.
Expectations for future use
The NIDDK Central Repository was established to improve the scientific yield of NIDDK-funded research by making valuable data and specimens available to the wider scientific community. At present, the Repository is being used by a widening community of researchers, and it is also providing valuable archival services for the original research teams. We expect that the use of the NIDDK Central Repository should increase not only with growing awareness of its resources by the scientific community but also with the issuance of RFAs for research that can effectively use this resource. So, for example, NIDDK solicited grant applications in 2009 to form a multicenter consortium to ‘discover or validate biomarkers for well-defined human chronic kidney diseases (CKD) (RFA-DK-08-015).’ Discovery and testing of candidate biomarkers requires biological samples (tissues, cells, or body fluids) from subjects whose disease status has been well characterized. As the RFA notes, the NIDDK Central Repository can provide the resources needed for such research.