Biospecimens are recognized as critical components of biomedical research, from basic studies, to clinical trials and epidemiologic investigations. Biorepositories have existed in various forms for over 150 years, from early small collections in pathology laboratories to modern automated facilities managing millions of samples. As collaborative science has developed it has been recognized that biospecimens must be of consistent quality. Recent years have seen a proliferation of best practices and the recognition of the field of “biospecimen science.” The future of this field will depend on the development of more evidence-based practices in both the research and clinical settings. As the field matures, educating a new generation of biospecimen/biobanking scientists will be an important need.
Biospecimen science; Biospecimen research; Biospecimen; Best practices; Biobanking; Biorepository
Detection, quantification, and prognosis of environmental exposures in humans has been vastly enhanced by the ability of epidemiologists to collect biospecimens for toxicologic or other laboratory evaluation. Ease of collection and level of invasiveness are commonly cited reasons why study participants fail to provide biospecimens for research purposes. The use of methodologies for the collection of biospecimens in the home offers promise for improving the validity of health effects linked to environmental exposures while maximizing the number and type of specimens capable of being collected in a timely and cost-effective manner. In this review we examine biospecimens (urine and blood) that have been successfully collected from the home environment. Related issues such as storage and transportation will also be examined as well as promising new approaches for collecting less frequently studied biospecimens (including hair follicles, breast milk, semen, and others). Such biospecimens are useful in the monitoring of reproductive development and function.
The increasing trend for incorporation of biological sample collection within clinical trials requires sample collection procedures which are convenient and acceptable for both patients and clinicians. This study investigated the feasibility of using saliva-extracted DNA in comparison to blood-derived DNA, across two genotyping platforms: Applied Biosystems TaqmanTM and Illumina BeadchipTM genome-wide arrays.
Patients were recruited from the Pharmacogenetics of Breast Cancer Chemotherapy (PGSNPS) study. Paired blood and saliva samples were collected from 79 study participants. The Oragene DNA Self-Collection kit (DNAgenotek®) was used to collect and extract DNA from saliva. DNA from EDTA blood samples (median volume 8 ml) was extracted by Gen-Probe, Livingstone, UK. DNA yields, standard measures of DNA quality, genotype call rates and genotype concordance between paired, duplicated samples were assessed.
Total DNA yields were lower from saliva (mean 24 μg, range 0.2–52 μg) than from blood (mean 210 μg, range 58–577 μg) and a 2-fold difference remained after adjusting for the volume of biological material collected. Protein contamination and DNA fragmentation measures were greater in saliva DNA. 78/79 saliva samples yielded sufficient DNA for use on Illumina Beadchip arrays and using Taqman assays. Four samples were randomly selected for genotyping in duplicate on the Illumina Beadchip arrays. All samples were genotyped using Taqman assays. DNA quality, as assessed by genotype call rates and genotype concordance between matched pairs of DNA was high (>97%) for each measure in both blood and saliva-derived DNA.
We conclude that DNA from saliva and blood samples is comparable when genotyping using either Taqman assays or genome-wide chip arrays. Saliva sampling has the potential to increase participant recruitment within clinical trials, as well as reducing the resources and organisation required for multicentre sample collection.
Human biological specimens (biospecimens) are increasingly important for research that aims to advance human health. Yet, despite significant proliferation in specimen-based research and discoveries during the past decade, researchremains challenged by the inequitable access to high quality biospecimens that are collected under rigorous ethical standards. This is primarily caused by the complex level of control and ownership exerted by the myriad of stakeholders involved in the biospecimen research process. This article discusses the ethical model of custodianship as a framework for biospecimen-based research to promote fair research access and resolve issues of control and potential conflicts between biobanks**, investigators, human research participants (human subjects), and sponsors. Custodianship is the caretaking obligation for biospecimens from initial collection to final dissemination of research findings. It endorses key practices and operating principles for responsible oversight of biospecimens collected for research. Embracing the custodial model would ensure transparency in research, fairness to human research participants, and shared accountability among all stakeholders involved in biospecimen-based research.
Various options exist for collecting biospecimens and biomarkers from cohort study participants, and these have important logistic, resource and scientific implications. Evidence on how different collection methods affect participation and data quality is lacking. This parallel-design randomised trial, the Link-Up Study, involved blood sample donation and other data collection among participants in an existing cohort study, The 45 and Up Study. It aimed to investigate the relation of fasting status, reminder letters and data collection site to response rates, data quality and biospecimen yield.
Individuals aged 45 and over participating in The 45 and Up Study and living ≤20 km from central Wagga Wagga, NSW (regional area) or ≤10 km from central Parramatta, NSW (urban area) (n = 2340) were randomised, stratified by area of residence, to be invited to give a blood sample and additional data by attending either a clinic established specifically for the trial, with an appointment time (“dedicated clinic”, n = 1336) or an existing local commercial pathology centre (n = 1004). Within dedicated clinic groups, participants were randomised into fasting (n = 668) or non-fasting (n = 668) and, at the Parramatta pathology centre site, reminder letter after two weeks (n = 336) or no reminder (n = 334).
Overall, 33% (762/2340) of invitees took part in the Link-Up Study; 41% (410/1002) among regional and 26% (352/1338) among urban-area residents (p < 0.0001). At the dedicated clinics, response rates were 38% (257/668) not fasting and 38% fasting (257/668) (participation rate ratio (RR) = 1.00, 95%CI 0.91-1.08, p = 0.98). The response rate was 22% among individuals randomised to attend the Parramatta pathology centre without a reminder and 23% among those sent a reminder letter (RR = 1.01, 0.93-1.09, p = 0.74). In total, the response rate was 38% (514/1336) at the dedicated clinics and 25% (248/1004) at the pathology centres (RR = 0.67, 0.56-0.78, p < 0.01); measures of height, weight and systolic and diastolic blood pressure did not vary materially between these groups, nor did the median number of aliquots of plasma, buffy coat and red cells collected.
Among cohort study participants, response rates for an additional study involving biospecimen collection, but not data quality or average biospecimen yield, were considerably higher at dedicated clinics than at existing commercial pathology sites.
Biobank; Response rate; Fasting status; Reminder; Biospecimens
Human biospecimens are subject to a number of different collection, processing, and storage factors that can significantly alter their molecular composition and consistency. These biospecimen preanalytical factors, in turn, influence experimental outcomes and the ability to reproduce scientific results. Currently, the extent and type of information specific to the biospecimen preanalytical conditions reported in scientific publications and regulatory submissions varies widely. To improve the quality of research utilizing human tissues, it is critical that information regarding the handling of biospecimens be reported in a thorough, accurate, and standardized manner. The Biospecimen Reporting for Improved Study Quality recommendations outlined herein are intended to apply to any study in which human biospecimens are used. The purpose of reporting these details is to supply others, from researchers to regulators, with more consistent and standardized information to better evaluate, interpret, compare, and reproduce the experimental results. The Biospecimen Reporting for Improved Study Quality guidelines are proposed as an important and timely resource tool to strengthen communication and publications around biospecimen-related research and help reassure patient contributors and the advocacy community that the contributions are valued and respected.
The domestic dog presents an attractive model system for the study of the genetic basis of disease. The development of resources such as the canine genome sequence and SNP genotyping platforms has allowed for the implementation of canine genetic studies. Successful implementation of such studies depends not only on the quality of individual DNA samples, but also on the number of samples obtained. The latter can be maximized using a non-invasive DNA collection method that can increase study participation. We compared the DNA yield and quality obtained from blood and buccal swabs to those obtained using a non-invasive saliva collection kit (Oragene ®•ANIMAL kit). We also assessed the success rate of PCR amplification and genotyping accuracy of DNA isolated using these collection methods.
Comparison of DNA yields from matched saliva, blood and buccal swab samples showed that yields from saliva were significantly higher than those from blood (p = 0.0198) or buccal swabs (p = 0.0008). Electrophoretic analysis revealed that blood and saliva produced higher quality DNA than buccal swabs. In addition, a 1.1-kb PCR fragment was successfully amplified using the paired DNA samples and genotyping by PCR-RFLP yielded identical results.
We demonstrate that DNA yields from canine saliva are higher than those from blood or buccal swabs. The quality of DNA extracted from saliva is sufficient for successful amplification of a 1.1-kb fragment and for accurate SNP genotyping by PCR-RFLP. We conclude that saliva presents a non-invasive alternative source of high quantities of canine genomic DNA suitable for genotyping studies.
Human biospecimens are subject to a number of different collection, processing, and storage factors that can significantly alter their molecular composition and consistency. These biospecimen preanalytical factors, in turn, influence experimental outcomes and the ability to reproduce scientific results. Currently, the extent and type of information specific to the biospecimen preanalytical conditions reported in scientific publications and regulatory submissions varies widely. To improve the quality of research utilizing human tissues it is critical that information regarding the handling of biospecimens be reported in a thorough, accurate, and standardized manner. The Biospecimen Reporting for Improved Study Quality (BRISQ) recommendations outlined herein are intended to apply to any study in which human biospecimens are used. The purpose of reporting these details is to supply others, from researchers to regulators, with more consistent and standardized information to better evaluate, interpret, compare, and reproduce the experimental results. The BRISQ guidelines are proposed as an important and timely resource tool to strengthen communication and publications around biospecimen-related research and help reassure patient contributors and the advocacy community that the contributions are valued and respected.
Several challenges face the development and operation of a biospecimen bank linked to clinical information, a critical component of any effective translational research program. Melanoma adds particular complexity and difficulty to such an endeavor considering the unique characteristics of this malignancy. We describe here a review of biospecimen bank and our experience in establishing a multi-disciplinary, prospective, integrated clinicopathological-biospecimen database in melanoma. The Interdisciplinary Melanoma Cooperative Group (IMCG), a prospective clinicopathological and biospecimen database, was established at the New York University (NYU) Langone Medical Center. With patients' informed consent, biospecimens from within and outside NYU, clinicopathological data, and follow-up information are collected using developed protocols. Information pertaining to biospecimens is recorded in 35 fields, and clinicopathological information is recorded in 371 fields within 5 modules in a virtual network system. Investigators conducting research utilizing the IMCG biospecimen resource are blind to clinicopathological information, and molecular data generated using biospecimens are linked independently with clinicopathological data by biostatistics investigators. This translational research enterprise acts as a valuable resource to efficiently translate laboratory discoveries to the clinic.
Melanoma; clinical database; specimen bank; translational medicine; model
Medical research to improve health care faces a major problem in the relatively limited availability of adequately annotated and collected biospecimens. This limitation is creating a growing gap between the pace of scientific advances and successful exploitation of this knowledge. Biobanks are an important conduit for transfer of biospecimens (tissues, blood, body fluids) and related health data to research. They have evolved outside of the historical source of tissue biospecimens, clinical pathology archives. Research biobanks have developed advanced standards, protocols, databases, and mechanisms to interface with researchers seeking biospecimens. However, biobanks are often limited in their capacity and ability to ensure quality in the face of increasing demand. Our strategy to enhance both capacity and quality in research biobanking is to create a new framework that repatriates the activity of biospecimen accrual for biobanks to clinical pathology.
The British Columbia (BC) BioLibrary is a framework to maximize the accrual of high-quality, annotated biospecimens into biobanks. The BC BioLibrary design primarily encompasses: 1) specialized biospecimen collection units embedded within clinical pathology and linked to a biospecimen distribution system that serves biobanks; 2) a systematic process to connect potential donors with biobanks, and to connect biobanks with consented biospecimens; and 3) interdisciplinary governance and oversight informed by public opinion.
The BC BioLibrary has been embraced by biobanking leaders and translational researchers throughout BC, across multiple health authorities, institutions, and disciplines. An initial pilot network of three Biospecimen Collection Units has been successfully established. In addition, two public deliberation events have been held to obtain input from the public on the BioLibrary and on issues including consent, collection of biospecimens and governance.
The BC BioLibrary framework addresses common issues for clinical pathology, biobanking, and translational research across multiple institutions and clinical and research domains. We anticipate that our framework will lead to enhanced biospecimen accrual capacity and quality, reduced competition between biobanks, and a transparent process for donors that enhances public trust in biobanking.
Biospecimen quality is affected by a number of preanalytical factors that may or may not be obvious to the investigator. These factors are introduced through multiple biospecimen collection, processing and storage procedures which can differ dramatically within and between medical institutions and biorepositories. Biospecimen Science is the emerging field of study that is attempting to quantify and control such variability. A variety of efforts are under way around the world to establish research programs, evidence-based biospecimen protocols, and standards to improve the overall quality of biospecimens for research.
Biospecimen science; Biospecimen research; Biospecimen; Best practices
Large epidemiological studies in DNA biobanks have increasingly used less invasive methods for obtaining DNA samples, such as saliva collection. Although lower amounts of DNA are obtained as compared with blood collection, this method has been widely used because of its more simple logistics and increased response rate. The present study aimed to verify whether a storage time of 8 months decreases the quality of DNA from collected samples.
Saliva samples were collected with an OrageneTM DNA Self-Collection Kit from 4,110 subjects aged 14–15 years. The samples were processed in two aliquots with an 8-month interval between them. Quantitative and qualitative evaluations were carried out in 20% of the samples by spectrophotometry and genotyping. Descriptive analyses and paired t-tests were performed.
The mean volume of saliva collected was 2.2 mL per subject, yielding on average 184.8 μg DNA per kit. Most samples showed a Ratio of OD differences (RAT) between 1.6 and 1.8 in the qualitative evaluation. The evaluation of DNA quality by TaqMan®, High Resolution Melting (HRM), and restriction fragment length polymorphism-PCR (RFLP-PCR) showed a rate of success of up to 98% of the samples. The sample store time did not reduce either the quantity or quality of DNA extracted with the Oragene kit.
The study results showed that a storage period of 8 months at room temperature did not reduce the quality of the DNA obtained. In addition, the use of the Oragene kit during fieldwork in large population-based studies allows for DNA of high quantity and high quality.
Identifying discriminatory human salivary RNA biomarkers reflective of disease in a low-cost non-invasive screening assay is crucial to salivary diagnostics. Recent studies have reported both mRNA and microRNA (miRNA) in saliva, but little information has been documented on the quality and yield of RNA collected. Therefore, the aim of the present study was to develop an improved RNA isolation method from saliva and to identify major miRNA species in human whole saliva.
RNA samples were isolated from normal human saliva using a combined protocol based on the Oragene®•RNA collection kit and the mirVana™ miRNA isolation kit in tandem. RNA samples were analyzed for quality and subjected to miRNA array analysis.
RNA samples isolated from twenty healthy donors ranged from 2.59–29.4 μg/ml saliva and with 1.92–2.16 OD260/280nm ratios. RNA yield and concentration of saliva samples were observed to be stable over 48 hours at room temperature. Analysis of total salivary RNA isolated from these twenty donors showed no statistical significance between sexes; however, the presence of high-, medium-, and low-yield salivary RNA producers were detected. MiRNA array analysis of salivary RNA detected five abundantly expressed miRNAs, miR-223, miR-191, miR-16, miR-203, and miR-24, that were similarly described in other published reports. Additionally, many previously undetected miRNAs were also identified.
High quality miRNAs can be isolated from saliva using available commercial kits, and in future studies, the availability of this isolation protocol may allow specific changes in their levels to be measured accurately in various relevant diseases.
biomarkers; gene expression; salivary RNA
The National Institute of Diabetes and Digestive and Kidney Diseases have established central repositories for the collection of DNA, biological samples, and clinical data to be catalogued at a single site. Here we present an overview of the site which stores the clinical data and links to biospecimens.
The NIDDK Data repository is a web-enabled resource cataloguing clinical trial data and supporting information from NIDDK supported studies. The Data Repository allows for the co-location of multiple electronic datasets that were created as part of clinical investigations. The Data Repository does not serve the role of a Data Coordinating Center, but rather as a warehouse for the clinical findings once the trials have been completed. Because both biological and genetic samples are collected from many of the studies, a data management system for the cataloguing and retrieval of samples was developed.
The Data Repository provides a unique resource for researchers in the clinical areas supported by NIDDK. In addition to providing a warehouse of data, Data Repository staff work with the users to educate them on the datasets as well as assist them in the acquisition of multiple data sets for cross-study analysis. Unlike the majority of biological databases, the Data Repository acts both as a catalogue for data, biosamples, and genetic materials and as a central processing point for the requests for all biospecimens. Due to regulations on the use of clinical data, the ultimate release of that data is governed under NIDDK data release policies. The Data Repository serves as the conduit for such requests.
Technical advances following the Human Genome Project revealed that high-quality and -quantity DNA may be obtained from whole saliva samples. However, usability of previously collected samples and the effects of environmental conditions on the samples during collection have not been assessed in detail. In five studies we document the effects of sample volume, handling and storage conditions, type of collection device, and oral sampling location, on quantity, quality, and genetic assessment of DNA extracted from cells present in saliva.
Saliva samples were collected from ten adults in each study. Saliva volumes from .10-1.0 ml, different saliva collection devices, sampling locations in the mouth, room temperature storage, and multiple freeze-thaw cycles were tested. One representative single nucleotide polymorphism (SNP) in the catechol-0-methyltransferase gene (COMT rs4680) and one representative variable number of tandem repeats (VNTR) in the serotonin transporter gene (5-HTTLPR: serotonin transporter linked polymorphic region) were selected for genetic analyses.
The smallest tested whole saliva volume of .10 ml yielded, on average, 1.43 ± .77 μg DNA and gave accurate genotype calls in both genetic analyses. The usage of collection devices reduced the amount of DNA extracted from the saliva filtrates compared to the whole saliva sample, as 54-92% of the DNA was retained on the device. An "adhered cell" extraction enabled recovery of this DNA and provided good quality and quantity DNA. The DNA from both the saliva filtrates and the adhered cell recovery provided accurate genotype calls. The effects of storage at room temperature (up to 5 days), repeated freeze-thaw cycles (up to 6 cycles), and oral sampling location on DNA extraction and on genetic analysis from saliva were negligible.
Whole saliva samples with volumes of at least .10 ml were sufficient to extract good quality and quantity DNA. Using 10 ng of DNA per genotyping reaction, the obtained samples can be used for more than one hundred candidate gene assays. When saliva is collected with an absorbent device, most of the nucleic acid content remains in the device, therefore it is advisable to collect the device separately for later genetic analyses.
The success of basic molecular research using biospecimens strongly depends on the quality of the specimen. In this study, we evaluated the effects of delayed freezing time on the stability of DNA and RNA in fresh frozen tissue from patients with colorectal cancer.
Tissues were frozen at 10, 30, 60, and 90 minutes after extirpation of colorectal cancer in 20 cases. Absorbance ratio of 260 to 280 nm (A260/A280) and agarose gel electrophoresis were evaluated. In addition, the RNA integrity number (RIN) was assayed for the analysis of the RNA integrity.
Regardless of delayed freezing time, all DNA and RNA samples revealed A260/A280 ratios of more than 1.9, and all DNA samples showed a discrete, high-molecular-weight band on agarose gel electrophoresis. The RINs were 7.53 ± 2.04, 6.70 ± 1.88, 6.47 ± 2.58, and 4.22 ± 2.34 at 10, 30, 60, and 90 minutes, respectively. Though the concentration of RNA was not affected by delayed freezing, the RNA integrity was decreased with increasing delayed freezing time.
According to the RIN results, we recommend that the collection of colorectal cancer tissue should be done within 10 minutes for studies requiring RNA of high quality and within 30 minutes for usual RNA studies.
Colorectal neoplasms; Tissue banks; DNA; RNA; Quality control
Epidemiological studies may require noninvasive methods for off-site DNA collection. We compared the DNA yield and quality obtained using a whole-saliva collection device (Oragene™ DNA collection kit) to those from three established noninvasive methods (cytobrush, foam swab, and oral rinse). Each method was tested on 17 adult volunteers from our center, using a random crossover collection design and analyzed using repeated-measures statistics. DNA yield and quality were assessed via gel electrophoresis, spectophotometry, and polymerase chain reaction (PCR) amplification rate. The whole-saliva method provided a significantly greater DNA yield (mean ± SD = 154.9 ± 103.05 μg, median = 181.88) than the other methods (oral rinse = 54.74 ± 41.72 μg, 36.56; swab = 11.44 ± 7.39 μg, 10.72; cytobrush = 12.66 ± 6.19, 13.22 μg) (all pairwise P < 0.05). Oral-rinse and whole-saliva samples provided the best DNA quality, whereas cytobrush and swab samples provided poorer quality DNA, as shown by lower OD260/OD280 and OD260/OD230 ratios. We conclude that both a 10-ml oral-rinse sample and 2-ml whole-saliva sample provide sufficient DNA quantity and better quality DNA for genetic epidemiological studies than do the commonly used buccal swab and brush techniques.
Large multi-center clinical studies often involve the collection and analysis of biological samples. It is necessary to ensure timely, complete and accurate recording of analytical results and associated phenotypic and clinical information. The TRIBE-AKI Consortium http://www.yale.edu/tribeaki supports a network of multiple related studies and sample biorepository, thus allowing researchers to take advantage of a larger specimen collection than they might have at an individual institution.
We describe a biospecimen data management system (BDMS) that supports TRIBE-AKI and is intended for multi-center collaborative clinical studies that involve shipment of biospecimens between sites. This system works in conjunction with a clinical research information system (CRIS) that stores the clinical data associated with the biospecimens, along with other patient-related parameters. Inter-operation between the two systems is mediated by an interactively invoked suite of Web Services, as well as by batch code. We discuss various challenges involved in integration.
Our experience indicates that an approach that emphasizes inter-operability is reasonably optimal in allowing each system to be utilized for the tasks for which it is best suited.
Biospecimen data management; clinical research information systems; multi-center clinical studies; biorepositories
Genome-wide association scans for genetic loci underlying both Mendelian and complex traits are increasingly common in canine genetics research. However, the demand for high-quality DNA for use on such platforms creates challenges for traditional blood sample ascertainment. Though the use of saliva as a means of collecting DNA is common in human studies, alternate means of DNA collection for canine research have instead been limited to buccal swabs, from which dog DNA is of insufficient quality and yield for use on most high-throughput array-based systems. We thus investigated an animal-based saliva collection method for ease of use and quality of DNA obtained and tested the performance of saliva-extracted canine DNA on genome-wide genotyping arrays.
Overall, we found that saliva sample collection using this method was efficient. Extractions yielded high concentrations (∼125 ng/ul) of high-quality DNA that performed equally well as blood-extracted DNA on the Illumina Infinium canine genotyping platform, with average call rates >99%. Concordance rates between genotype calls of saliva- versus blood-extracted DNA samples from the same individual were also >99%. Additionally, in silico calling of copy number variants was successfully performed and verified by PCR.
Our findings validate the use of saliva-obtained samples for genome-wide association studies in canines, highlighting an alternative means of collecting samples in a convenient and non-invasive manner.
The National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) Central Repository makes data and biospecimens from NIDDK-funded research available to the broader scientific community. It thereby facilitates: the testing of new hypotheses without new data or biospecimen collection; pooling data across several studies to increase statistical power; and informative genetic analyses using the Repository’s well-curated phenotypic data. This article describes the initial database plan for the Repository and its revision using a simpler model. Among the lessons learned were the trade-offs between the complexity of a database design and the costs in time and money of implementation; the importance of integrating consent documents into the basic design; the crucial need for linkage files that associate biospecimen IDs with the masked subject IDs used in deposited data sets; and the importance of standardized procedures to test the integrity data sets prior to distribution. The Repository is currently tracking 111 ongoing NIDDK-funded studies many of which include genotype data, and it houses over 5 million biospecimens of more than 25 types including serum, plasma, stool, urine, DNA, red blood cells, buffy coat and tissue. Repository resources have supported a range of biochemical, clinical, statistical and genetic research (188 external requests for clinical data and 31 for biospecimens have been approved or are pending). Genetic research has included GWAS, validation studies, development of methods to improve statistical power of GWAS and testing of new statistical methods for genetic research. We anticipate that the future impact of the Repository’s resources on biomedical research will be enhanced by (i) cross-listing of Repository biospecimens in additional searchable databases and biobank catalogs; (ii) ongoing deployment of new applications for querying the contents of the Repository; and (iii) increased harmonization of procedures, data collection strategies, questionnaires etc. across both research studies and within the vocabularies used by different repositories.
Database URL: http://www.niddkrepository.org
DNA extraction from blood and genotyping for candidate single nucleotide polymorphisms (SNP) is now an important part of almost all molecular epidemiologic studies. However, in many studies, the amount of blood sample is limited or only serum is available. We conducted several pilot studies to identify methods for DNA extraction and high-throughput SNP genotyping of both white blood cell (WBC) and serum DNA that can be done centrally and reliably for large numbers of samples.
We used biospecimens from The Prostate Cancer Prevention Trial (PCPT), a phase III, double-blind, placebo-controlled trial that tested the efficacy of finasteride for the primary prevention of prostate cancer. DNA was extracted from WBCs, from serum, and also from serum after organic solvent extraction for analysis of hormones. We also conducted blinded high-throughput genotyping in 3 laboratories to assess feasibility and reliability of results with differing methodologies using DNA from WBCs and from serum.
Genotyping of DNA extracted from WBCs resulted in highly reliable, reproducible results across laboratories using different genotyping platforms. However, genotyping with DNA extracted from serum did not provide reliable data using high-throughput multiplex approaches such as Sequenom (hME and iPLEX) and Applied Biosystems SNPlex, but was successful using Taqman.
Based upon the results of these pilot studies, we conclude that DNA obtained from serum must be used judiciously, and that genotyping using multiplex methods is not suitable for serum DNA.
Prostate cancer prevention; molecular epidemiology; genotyping; serum DNA extraction; multiplex methods
Variability of plasma sample collection and of proteomics technology platforms has been detrimental to generation of large proteomic profile datasets from human biospecimens.
We carried out a clinical trial-like protocol to standardize collection of plasma from 204 healthy and 216 breast cancer patient volunteers. The breast cancer patients provided follow up samples at 3 month intervals. We generated proteomics profiles from these samples with a stable and reproducible platform for differential proteomics that employs a highly consistent nanofabricated ChipCube™ chromatography system for peptide detection and quantification with fast, single dimension mass spectrometry (LC-MS). Protein identification is achieved with subsequent LC-MS/MS analysis employing the same ChipCube™ chromatography system.
With this consistent platform, over 800 LC-MS plasma proteomic profiles from prospectively collected samples of 420 individuals were obtained. Using a web-based data analysis pipeline for LC-MS profiling data, analyses of all peptide peaks from these plasma LC-MS profiles reveals an average coefficient of variability of less than 15%. Protein identification of peptide peaks of interest has been achieved with subsequent LC-MS/MS analyses and by referring to a spectral library created from about 150 discrete LC-MS/MS runs. Verification of peptide quantity and identity is demonstrated with several Multiple Reaction Monitoring analyses. These plasma proteomic profiles are publicly available through ProteomeCommons.
From a large prospective cohort of healthy and breast cancer patient volunteers and using a nano-fabricated chromatography system, a consistent LC-MS proteomics dataset has been generated that includes more than 800 discrete human plasma profiles. This large proteomics dataset provides an important resource in support of breast cancer biomarker discovery and validation efforts.
Evaluating biomarkers in epidemiological studies can be expensive and time consuming. Many investigators use techniques such as random sampling or pooling biospecimens in order to cut costs and save time on experiments. Commonly, analyses based on pooled data are strongly restricted by distributional assumptions that are challenging to validate because of the pooled biospecimens. Random sampling provides data that can be easily analyzed. However, random sampling methods are not optimal cost-efficient designs for estimating means. We propose and examine a cost-efficient hybrid design that involves taking a sample of both pooled and unpooled data in an optimal proportion in order to efficiently estimate the unknown parameters of the biomarker distribution. In addition, we find that this design can be utilized to estimate and account for different types of measurement and pooling error, without the need to collect validation data or repeated measurements. We show an example where application of the hybrid design leads to minimization of a given loss function based on variances of the estimators of the unknown parameters. Monte Carlo simulation and biomarker data from a study on coronary heart disease are used to demonstrate the proposed methodology.
Maximum likelihood; Measurement error; Pooling; Random sampling; Receiver operating characteristics; Sampling design
We generated extensive transcriptional and proteomic profiles from a Her2-driven mouse model of breast cancer that closely recapitulates human breast cancer. This report makes these data publicly available in raw and processed forms, as a resource to the community. Importantly, we previously made biospecimens from this same mouse model freely available through a sample repository, so researchers can obtain samples to test biological hypotheses without the need of breeding animals and collecting biospecimens.
Twelve datasets are available, encompassing 841 LC-MS/MS experiments (plasma and tissues) and 255 microarray analyses of multiple tissues (thymus, spleen, liver, blood cells, and breast). Cases and controls were rigorously paired to avoid bias.
In total, 18,880 unique peptides were identified (PeptideProphet peptide error rate ≤1%), with 3884 and 1659 non-redundant protein groups identified in plasma and tissue datasets, respectively. Sixty-one of these protein groups overlapped between cancer plasma and cancer tissue.
Conclusions and clinical relevance
These data are of use for advancing our understanding of cancer biology, for software and quality control tool development, investigations of analytical variation in MS/MS data, and selection of proteotypic peptides for MRM-MS. The availability of these datasets will contribute positively to clinical proteomics.
Breast cancer; Her2; mouse; proteome; transcriptome
The Kathleen Cuningham Foundation Consortium for Research into Familial Breast Cancer (kConFab) is a multidisciplinary, collaborative framework for the investigation of familial breast cancer. Based in Australia, the primary aim of kConFab is to facilitate high-quality research by amassing a large and comprehensive resource of epidemiological and clinical data with biospecimens from individuals at high risk of breast and/or ovarian cancer, and from their close relatives.
Epidemiological, family history and lifestyle data, as well as biospecimens, are collected from multiple-case breast cancer families ascertained through family cancer clinics in Australia and New Zealand. We used the Tyrer-Cuzick algorithms to assess the prospective risk of breast cancer in women in the kConFab cohort who were unaffected with breast cancer at the time of enrolment in the study.
Of kConFab's first 822 families, 518 families had multiple cases of female breast cancer alone, 239 had cases of female breast and ovarian cancer, 37 had cases of female and male breast cancer, and 14 had both ovarian cancer as well as male and female breast cancer. Data are currently held for 11,422 people and germline DNAs for 7,389. Among the 812 families with at least one germline sample collected, the mean number of germline DNA samples collected per family is nine. Of the 747 families that have undergone some form of mutation screening, 229 (31%) carry a pathogenic or splice-site mutation in BRCA1 or BRCA2. Germline DNAs and data are stored from 773 proven carriers of BRCA1 or BRCA1 mutations. kConFab's fresh tissue bank includes 253 specimens of breast or ovarian tissue – both normal and malignant – including 126 from carriers of BRCA1 or BRCA2 mutations.
These kConFab resources are available to researchers anywhere in the world, who may apply to kConFab for biospecimens and data for use in ethically approved, peer-reviewed projects. A high calculated risk from the Tyrer-Cuzick algorithms correlated closely with the subsequent occurrence of breast cancer in BRCA1 and BRCA2 mutation positive families, but this was less evident in families in which no pathogenic BRCA1 or BRCA2 mutation has been detected.