Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
Am J Clin Pathol. Author manuscript; available in PMC 2013 July 1.
Published in final edited form as:
PMCID: PMC3509484

Comprehensive Genomic Studies: Emerging Regulatory, Strategic, and Quality Assurance Challenges for Biorepositories


As part of the molecular revolution sweeping medicine, comprehensive genomic studies are adding powerful dimensions to medical research. However, their power exposes new regulatory, strategic, and quality assurance challenges for biorepositories. A key issue is that unlike other research techniques commonly applied to banked specimens, nucleic acid sequencing, if sufficiently extensive, yields data that could identify a patient. This evolving paradigm renders the concepts of anonymized and anonymous specimens increasingly outdated. The challenges for biorepositories in this new era include refined consent processes and wording, selection and use of legacy specimens, quality assurance procedures, institutional documentation, data sharing, and interaction with institutional review boards. Given current trends, biorepositories should consider these issues now, even if they are not currently experiencing sample requests for genomic analysis. We summarize our current experiences and best practices at Washington University Medical School, St Louis, MO, our perceptions of emerging trends, and recommendations.

Keywords: Genomic studies, Biorepositories, Biobanks, Quality assurance, Regulatory standards

Comprehensive Genomic Analyses as Part of the Ongoing Genetic Revolution

Given the molecular basis for many diseases, investigators have long used nucleic acid sequencing approaches to find the aberrations that occur in genes and, by inference, their downstream transcription and protein products. A profound revolution in medicine is underway, as advanced sequencing technology (also known as next-generation or massively parallel sequencing1) has yielded ever larger amounts of data at progressively lower costs. Thus, genetic changes underlying malignancy and other diseases can be found across the majority of the genome rather than within one or a few individual genes. Within the last few years, it has become feasible (from cost and resource standpoints) to sequence all 3 billion base pairs in a human genome (whole genome sequencing [WGS]), or the protein-coding portion (whole exome sequencing [WES]), consisting of most of the predicted 180,000 exons, or about 1% of the overall genome.2,3 Other approaches include genome-wide association studies (GWAS), which have identified multiple common alleles that are associated with breast cancer risk in the general population.46 Also there is transcriptome sequencing,7 in which complementary DNA is analyzed to study the RNA transcriptome (ie, complete RNA transcribed material) associated with a genome.

Compared with earlier approaches that could analyze only one or a few genes at a time—requiring a focused hypothesis as to which genes or regions should be targeted—comprehensive genomic studies such as the aforementioned can expose a more complete spectrum of common and rare mutations underlying human malignancies on personalized and epidemiologic levels. This yields previously unknown causal or association data and helps facilitate new therapies. For example, in an attempt to find causative mutations in acute myeloid leukemia, the Washington University Medical School (WUMS) Genome Institute (St Louis, MO) used massively parallel sequencing initially to sequence the complete tumor and normal genomes from a patient with acute myeloid leukemia, with the identification of tumor-unique alterations through comparative analyses of the 2 genomes.8 Subsequently, the Institute has performed this type of data collection and analysis on hundreds of tumor and matched normal samples. Such technology has growing relevance, given not only the continuing lack of effective therapies for many malignancies but also the increasing recognition of how genetically complex human neoplasia is. Mutations in genes associated with carcinogenesis may clearly be associated with clinical and demographic factors and impact the optimal choice of therapy.9

Rarer or unsuspected mutations found by genome-wide approaches are important because they potentially reveal new clues to tumorigenesis and many nonneoplastic diseases (eg, inflammatory bowel disease, diabetes, cardiovascular disease, and schizophrenia) and may be a basis for novel therapies, especially for select groups of patients. Significantly, rarer mutations (which can occur as somatic mutations in disease lesions or constitutional, disease-causing mutations) could be missed without the patient and specimen numbers and consequent statistical power enabled by biorepositories and their partnerships with clinical studies. Also, rarer but significant mutations could be missed by earlier technologies not using a comprehensive genome-wide approach with the exquisite precision of DNA sequencing.

The enormous growth in genetic information created by novel approaches poses marked challenges for informatics resources and storage, but also opens up exciting new possible avenues of growth for medicine. We can envision a day in which the physical storage of tissues and biofluids will assume secondary importance to the banking of data itself—disease and matched nondiseased specimens will undergo comprehensive genomic analysis, and the data will be stored and evaluated quickly after procurement and routinely used as a basis for personalized medical therapies. Newly generated genomic data (such as a recurrent tumor) will be rapidly compared with archived data (such as a previously resected primary tumor and reference germline) and all common and rare mutational differences identified and used in real-time decisions for therapy—the genomic analogy to the traditional pathologist role of evaluating new material under the microscope and comparing it with previous slides.

The power of genome-wide approaches carries novel ethical challenges. These challenges affect personnel who direct and manage biorepositories, along with the physicians obtaining and using the specimens, the health care facilities where the biorepository operates, and institutional review boards (IRBs) charged with reviewing the research. A unique central issue is that the research activity itself—genomic sequencing of a biosample, beyond a certain threshold and potentially including any of the aforementioned types of genetic studies—may yield enough data to identify a patient.10

This situation stems from the enormous breadth of information that can be ascertained genome-wide, rather than from just a select number of genes using older approaches using directed polymerase chain reaction and capillary sequencing, for example. This situation is also quite distinct from pathology techniques commonly applied to banked specimens (eg, Western blotting and immunohistochemical analysis) in which the data are not sufficiently comprehensive to identify a patient. The profound implication: The traditional hierarchy of identified, coded, deidentified, and anonymized/anonymous specimens11 on which biorepositories have based so much of their past operating practices now is largely outdated and needs to be rethought. In particular, once a banked sample is used in a genomics-based study, the notion of absolute anonymity is altered, even though, technically speaking, samples destined for this purpose can still be “deidentified” by being physically stripped of personal identifiers (and assigned a specific, highly secure coding system as a privacy safeguard) before analysis. It should be noted that although a person’s genotype may be decoded by advanced molecular techniques, the person technically still cannot be identified without a second event that yields similar information and the 2 data sets then compared with each other.

Along with the patient identity concern, genomic research carries with it other risks, some previously established but now enhanced by the increasing “reach” of technology: Comprehensive genomic analysis will often reveal information about the subject that was not the intended aim of the research protocol, so-called incidental findings.12 Also, information may be revealed that impacts third parties, ie, relatives of the participant, who might not have been part of the informed consent process.12 Such information has a direct bearing on the future health and welfare of the subject and/or relatives. The research may also reveal unexpected information with potentially harmful consequences, such as data regarding relatedness to other family members, criminal liability, and aspects of future insurability not specifically protected by the Genetic Information Nondiscrimination Act.13 There are mechanisms, of course, for protecting genomic data and certifying persons who have access to it, some of which are addressed further on. However, inherent in any statement or acknowledgment of risks is that no method of protection or informatics firewall can be considered absolutely secure.

Given the broad data sharing often required as part of these studies, when coupled with the identifiable nature of the data, the potential impact of these risks is further amplified. Consistent and appropriate IRB review of the proposed studies is an essential element in the success of genomic research. We solicited extensive input from genomic researchers, clinical investigators, and experts in bioethics and developed guidelines that provide investigators and IRBs with a sound framework for planning, implementing, and reviewing these important studies in a manner that facilitates the research and appropriately protects participants. Summaries of the ethical issues involved with genomic research are also available in the literature.12,14,15

The implications of genomic research for biobanks are many because they must store, disburse, evaluate, and (depending on the nature of the protocol) sometimes obtain consent for samples subject to requests for genomic studies. What is it that biobanks need to know in this new era? For example, what should best practices be regarding IRB and consent issues and data sharing and submission to large restricted-access Internet-accessible databases such as the National Institutes of Health (NIH) dbGaP (Database of Genotypes and Phenotypes)? Also, what best practices and quality standards should apply to specimen processing, histopathology reviews, and nucleic acid preparations? How do these relate to the needs and expectations of customers in genomic research laboratories, and how might they evolve over time?

WUMS has been at the forefront of genomic technology,1,5,1618 especially with the presence of the Genome Institute on campus, 1 of only 3 NIH-funded large-scale sequencing centers in the United States, and Genomics and Pathology Services at Washington University St Louis (GPS@WUSTL), one of the few College of American Pathologists (CAP)-accredited/Clinical Laboratory Improvement Amendments of 1988 (CLIA)-licensed laboratories in the United States focused on the clinical application of genomic testing. The biorepository at WUMS, which works in close association with the Genome Institute on a request-driven basis, provides a full array of laboratory services, including DNA and RNA isolation and characterization, frozen and paraffin tissue sectioning and staining, bio-fluid processing, pathology interpretation, and laser-capture microscopy (LCM). As such, we have gained experience with quality control, consent, and other issues relating to emerging genomics technologies and studies.

The WUMS biorepository approach is consistent with published best practice documents from the National Cancer Institute Office of Biorepositories and Biospecimen Research (OBBR)19 and the International Society of Biological and Environmental Repositories (ISBER).20 In these documents, sections B2.1 through 2.4 (OBBR, Biospecimen Collection, Processing, Storage, Retrieval, and Distribution), E2.0 through 3.0 (ISBER, Quality Assurance Program, Quality Standards), and J8.0 (ISBER, Specimen Collection, Processing and Retrieval) recommend biospecimen collection, storage, disbursal, and quality assessment procedures fit for the intended specimen uses. Sections C2 through C5 (OBBR, Ethical, Legal and Policy Best Practices) and K2.1 through 2.5 (ISBER, Common Principles) advise the implementation of sound informed consent activities, data access, and resource sharing policies that protect patient privacy and confidentiality.

Considerations for IRB Review and Informed Consent

The ethical principles of respect for persons, beneficence, and justice, as put forth in the Belmont Report,21 must underpin any human subjects research effort. IRBs are charged with ensuring that all human subjects research meets the criteria for approval specified in the federal regulations at 45 CFR 46.111. Applying these criteria to the use of tissue specimens for comprehensive genetic analyses is, therefore, a central task of the IRB and one that poses challenges for informed consent, protection of participant privacy, ensuring data confidentiality, and developing plans for the disclosure (or nondisclosure) of research-related and incidental findings.

The informed consent process is the primary vehicle with which we operationalize the ethical principle of respect for persons and is required for all human subjects research unless waived by the IRB. Through this process, we ensure that potential research participants have sufficient information to decide whether or not they want to participate in the proposed research. In Table 1, we have summarized the general IRB guidelines developed at Washington University (WU) regarding genomics and how they are handled by clinicians, investigators, and other stakeholders. Table 2 covers specific considerations for informed consent design that are used at WU. Our guidelines ensure that all criteria for approval (45 CFR 46.111) are met and are consistent with current Office of Human Research Protections guidance.22 Specific regulatory guidance on many genomic research issues has not yet been published but may be forthcoming.

Table 1
Considerations for WU IRB and WUMS Biorepository Regarding Banked Specimens and Genetic Studies
Table 2
Informed Consent Design Specifics

Specimens for genomic research may be anticipated or preexisting at the time the research is proposed. While the concerns of the IRB are similar for each, different approaches may be required; therefore, our guidelines differentiate early and clearly on this point (Table 1). In addition, we evaluate whether the proposed plans for return of results and data and/or sample sharing are consistent with the conditions under which the sample was obtained and appropriate for the proposed research. Proposals to use archival samples are common, and the various scenarios and ways of addressing them are also summarized in Table 1. As noted there, an occasionally encountered subset of these cases involves samples obtained for clinical purposes without consent for research or under a waiver of consent granted by the IRB. These are complex areas in which an IRB must carefully consider each study on an individual basis. Another scenario covered in Table 1 is the use of specimens from deceased persons.

Although informed consent is most often obtained by clinical investigators, an instance in which the WUMS bio-repository is directly involved in the collection and consenting process is its general tumor bank protocol that samples and subsequently obtains patient consent for tissue discards from surgical pathology.23 To clarify the issues surrounding genomic research and broaden the samples potentially available for such, the WUMS biobank, under IRB supervision, is currently modifying its consent brochure and form to include enhanced detail on the definitions and risks of genetic information Figure 1. Also, we have added a specific supplementary question Figure 2 covering genomic research in the signature part. Patients are free to decline this question while still agreeing to (and signing) other parts of the consent covering general nongenetic research, blood collection, and future contact. The wording in Figures 1 and and22 is similar to that already used in WUMS clinical trial protocols in which large-scale genomic analysis of procured samples is anticipated.

Figure 1
Updated language in the Washington University Medical School biorepository (St Louis, MO) consent brochure for general tumor banking, which covers the possibility of future use for genomic studies. Patients are free to accept or decline this option by ...
Figure 2
Supplementary genetic question in the Washington University Medical School biorepository (St Louis, MO) consent form for general tumor banking from surgical pathology.

Return of Genetic Research Results

The personnel at biobanks must understand, in their partnerships with clinicians and protocols, whether and how research results might be returned to participants. There are unique considerations with respect to returning results from banked specimens. A full listing and discussion of these considerations is presented in Table 3.

Table 3
Considerations for Returning Research Results to Participants

Along with the IRB and consent issues surrounding return of genetic research results, there are regulatory issues governing clinical laboratory testing that need attention. Although the CAP has yet to develop a formal checklist for CLIA licensing of laboratories performing clinical genomic sequencing, and although the US Food and Drug Administration does not (yet) regulate clinical genomic sequence analysis, current regulations prohibit the use of research results (that is, test results that originate in a laboratory that is not CLIA-licensed) to direct a patient’s care beyond the original study consent, and, in any event, the cost of care based on research test results is virtually never covered by health insurance. A research result that has clinical relevance must therefore be verified by repeat analysis in a CLIA-licensed laboratory, which can become burdensome. (Tests with positive results that demonstrate a disease-associated mutation in one gene must be repeated, as must tests with negative results that demonstrate the lack of a mutation at other genes that could potentially impact patient treatment decisions.)

Standards for Data Sharing

Among the genetic research community, data sharing, and its promotion and facilitation across multiple groups and institutions, is of great interest because the volume of data generated in a typical study is often greater than any individual or small group can feasibly explore and because there is much scientific potential from analyzing aggregated genetic data. A key element of the NIH GWAS policy is the expectation that data from NIH-supported GWAS be deposited into its data repository, the dbGaP, at the National Center for Biotechnology Information, a component of the National Library of Medicine. The standards for submission of data to dbGaP are located online24: These standards include that the sample must have been obtained with appropriate consent specifically addressing genomic research, the deidentification of the data before its submission, and the assignment of a random, unique code to further protect privacy and confidentiality. A key requirement is that the responsible institutional official(s) of the submitting institution certify that they approve submission to the NIH GWAS data repository.

The WU IRB considers the NIH dbGaP standards to represent the current accepted norm within the research community, and all plans for genetic data sharing for WUMS studies must conform to these standards; sharing with databases is not possible for samples not obtained in accordance with these guidelines. Plans for data sharing with other investigators within and outside WU will be assessed for appropriate protections of the privacy and confidentiality of research subjects. Privately funded research may have different requirements or standards for data sharing compared with that funded by the NIH. It is important to note that whole genome and exome data are deposited into the restricted access portion of the dbGaP. Only certified investigators are allowed access, for specific objectives, and there are serious consequences for the offending party if breaches of privacy occur.

Standards for Specimen Evaluation and Quality Assurance

General Issues

From a biorepository’s viewpoint, it is of great interest to know the quantity (amount) and quality thresholds disbursed tissue and nucleic acid products must both meet to be useful for genomic approaches. But published standards for these considerations relating to genomic studies are currently rare to nonexistent, and evidence-based data regarding this topic remain a fertile area for investigation and reporting. Several obstacles hinder the development of such standards, including the existence of multiple analysis platforms, different ways of preparing libraries across laboratories and institutions, the diversity of tissues and tumors used as starting points, and the new, frequently evolving nature of the field. It seems a reasonable expectation that specific guidance may eventually be forthcoming from such organizations as the Food and Drug Administration and the CAP. More robust language for genomics in the standard best practice documents for biorepositories19,20 also would be desirable; this will perhaps also evolve from continuing knowledge and experience across institutions.

In the hands of the Genome Institute at WUMS, or GPS@ WUSTL, a library for WGS or WES can generally be made starting with 100 ng or more of total DNA; smaller amounts can sometimes be used to make libraries in the hands of trained specialists, but such low amounts are suboptimal. The disadvantage conferred by smaller samples is that tumors are inherently heterogeneous and multiclonal, thus raising the risk of sampling insufficiently. For example, a starting point of 1 ng typically translates to only about 5,000 to 6,000 cells. Starting with 100 ng tumor DNA would sample about 500,000 cells, a quantity that probably represents many malignancies fairly well. (We evaluate/confirm heterogeneity of the genomes of the tumor cells from the input amount regardless of whether the tumor is solid tissue or liquid/hematologic type.)

For whole genome amplification, the WUMS Genome Institute generally uses a starting point of 200 ng DNA, thereby yielding several micrograms as a result, which then is used for downstream purposes such as validating detected somatic variants in cancer genomes. Whole genome amplification itself can be done with much lower input amounts, but yields from subsequent hybrid capture procedures (for example) may be insufficient if the starting point is too scant. Minimum nucleic acid input thresholds, in general, are subject to future adjustments as further refinements in genomic technology evolve, but probably the aforementioned cellular/clonal sampling issue will be rate-limiting in this regard.

Considerations for Solid and Liquid Tumors

For solid tumors, such as lung and colon carcinomas, samples for genomic techniques can be most easily obtained by sectioning directly from the tissue block (frozen or fixed), if the percentage of tumor cellularity (the percentage of overall histologically viable nuclei that belong to tumor cells) is sufficiently high. Although standards are not widespread and might evolve with further advances in the field, minimum thresholds of at least 60% tumor cellularity are common,25 with next-generation sequencing techniques probably sufficing with a somewhat lower percentage of tumor cellularity threshold. Such histologic criteria usefully promote quality assurance consistency across sites and the value of histopathologic review. Also, they address economic and analysis limitations: Below a certain percentage of tumor cellularity threshold, WGS or WES sequencing procedures become costly (relatively speaking) because oversequencing well beyond the typical minimal coverage is needed to get sufficient reading/coverage of the tumor genome. Also, one would have to adjust bioinformatics analysis programs to unrealistic or inconvenient parameters to cope with data from specimens with a very low percentage of tumor cellularity figures.

General best practices for specimen evaluation and quality assurance from the viewpoint of the biorepository have been previously described,26 and tissue specimens for comprehensive genetic studies are subject to rigorous quality assurance procedures at the WUMS biobank, similar to specimens for other purposes. Our most common tissue assessment is the one already described, overall percentage of tumor cellularity. This requires the involvement of a trained pathologist in the biorepository workflow, similar to how pathologists address other quality assurance functions.23,26 This pathologist review, currently, is of necessity a visual estimate, but image analysis software technology coupled with digital pathology could eventually provide a more precise numeric figure, especially for tumor morphologic features in which a distinct “rule set” separating tumor cells from others is feasible and can be readily implemented.

Judgments about the percentage of tumor cellularity are critical for assessing the amount of genetic “signal” coming from the tumor component and, thus, to the interpretation and reliability of the data derived from that tissue. Additional helpful evaluations include the area proportions respectively occupied by necrosis, desmoplasia, and/or admixed inflammatory cells (such as lymphocytes). As before, this helps assess the extent of genetic signal attributable to these various components. In the case of necrosis, the assessment helps determine the sample proportion that will likely not contribute to a meaningful genetic output. Figure 3 shows in schematic form how the pathologic assessment is done in the WUMS biorepository. It is important to note that macrodissection and microdissection methods, including LCM, can be used to enhance the percentage of cellularity for the tumor or other desired component,27 as long as the user accepts the required extra time and cost. LCM techniques should consider issues of tumor geography; for example, sampling from only a small area or edge of the tumor might inadequately represent multiclonality, even if an appreciable number of cells (>5,000–50,000) is procured.

Figure 3
Schematic diagram showing how tumor cellularity and other tissue components are assessed histologically at the Washington University Medical School (St Louis, MO) for samples intended for genetic studies. The percentage area occupied by necrosis is assessed ...

From a traditional viewpoint, an ideal specimen would consist entirely of tumor cells (or cells from the disease of interest) and no necrosis. However, other histologic components are almost invariably present (such as normal tissue, dysplasia, and/or desmoplasia), and including them in the analysis is important because, among other things, it will help show if such areas truly have a nonneoplastic genomic phenotype (as one might hypothesize) or share some of the aberrations characteristic of neoplasia. Communication of the concept in Figure 3 to users is helpful, especially at the time of specimen disbursal, because it provides a visual reference for quality assurance data and the potential relevance of histologic regions and LCM procedures.

Genome-wide approaches can use fresh frozen or paraffin-embedded, formalin-fixed tissues as their starting point, whether the desired molecular substrate is DNA or RNA. Traditionally, the fresh frozen tissues have been preferred because of the better preservation and molecular integrity attributed to such tissues28; however, fixed tissues have also seen successful applications in the field.29,30 The choice of the disposition of the starting tissue is on a case-by-case basis, involving specimen availability, the specific procedure and platform to be used, and the preferences and experience level of the investigator.

As with other investigative categories, genomic studies may use various comparative studies that access tissue bank inventories: primary vs metastatic tumors, better vs more poorly differentiated areas, and tumor vs surrounding (adjacent) normal samples. Such studies can help find the genetic changes underlying disease progression. But studies on retrospectively obtained material may be limited by the inventories typically available in tissue banks, for example, the common paucity of well-annotated primary tumors matched with metastases and other entities limited by the infrequency of surgical therapy or by competing needs from surgical pathology.23 Prospective tissue procurements for rare or poorly represented samples, energized and emboldened by the power of genomics technology and the inevitable reach of its clinical applications, will help address this availability problem. Biorepositories will need to promote such efforts among their clinician colleagues, emphasizing their strategic importance.

The availability of histologic review and selective dissection methods for cells and tissues, combined with advancing genomic technology, opens up exciting new avenues for possible future research.1,31 For example, the genomes of single cells could be analyzed and compared, yielding new insights into geographic or regional differences within solid tumors, as well as multiclonality generally. Also, gene expression profiles and patterns could be examined across a wide swath of developmental and adult tissues and histologic compartments.

Liquid neoplasias such as leukemias and myelodysplastic syndromes pose many of the same challenges for genomic studies as do other disease types. The WUMS biorepository maintains active collection and storage protocols for bone marrow and peripheral blood samples that enable studies of these disease entities. One emerging issue relevant to morphologic parameters is that myeloblast counts are often not a good indicator of clonality or mutational prevalence, particularly for myelodysplastic syndrome; in particular, the majority of sample cells in this situation can be largely or exclusively clonal regardless of specific myeloblast counts.17 An active research area currently in which the WUMS Genome Institute has been involved is the study of DNA methylation patterns and mutations in the genes responsible for such; these seem to be important in the etiology of acute myeloid leukemia and myelodysplastic syndromes.17

Nucleic Acid Isolation and Disbursal

On request, the WUMS biorepository may isolate, analyze, and disburse the DNA needed for genetic studies, instead of releasing the banked tissues and blood intended for such. For such requests, we use the same rigorous quality assurance procedures, backed by our strategic operating procedure documents, for the released products as we do for other intended purposes. This includes quantification of the nucleic acids and assessments of overall quality by A260/280 and A260/230 spectrophotometric absorbance ratios. The quantification is important because the biorepository should ensure that it is disbursing enough product to be fit-for-purpose.

It is important to note that in the WUMS biorepository, we have observed that quantification of DNA by A260 (Nano-Drop 2000 Spectrophotometer, Thermo Scientific, Waltham, MA) can be up to 3-fold greater than the reading by fluorometric assay (Invitrogen Qubit Fluorometer, Life Technologies, Grand Island, NY), especially when the A260/A230 ratio is less than 1.7. Because many genomic investigators use fluorometric measurements as their standard, we quantify by both methods for any DNA sample intended for genomic analysis, especially if the A260/A230 ratio is less than 1.7. Also, we recommend that DNA distribution for such purposes be made at twice the minimum quantity requested if feasible, especially if quantification by fluorometric assay has not been done.

The biorepository should communicate closely with customers regarding the minimum nucleic acid quantities needed and to guard against assumptions, given varying user skills and techniques and the ever-changing nature of the field. If tissue or blood is requested as the primary deliverable with the intent that nucleic acid will be isolated by the receiving laboratory, the biobank can do helpful calculations regarding the DNA or RNA that is likely to be yielded per milligram of starting sample,3234 accounting for the differences in cellularity that exist across sites, for example, liver (highly cellular) compared with connective tissue (less so).

Genetic analyses of neoplastic disease frequently use matching blood specimens obtained from patients before or during therapy as a germline source of DNA for comparative purposes. This can complement or replace the use of matching nonneoplastic (adjacent) tissue for such purposes because the latter might be unavailable or histologically unsuitable.26 Therefore, buffy coat specimens, derived from blood specimens and frozen as long-term biobank specimens, are frequently disbursed by the WUMS biobank for genomic studies. Evidence from the literature indicates that such buffy coat specimens are a stable source for DNA with yields suitable for many categories of genomic tests, especially when stored for up to 9 years at −80°C.35

Recent technical advances will have important consequences for biorepositories. The development of inexpensive instruments (eg, the MiSeq, Illumina, San Diego, CA; and the Ion Torrent, Life Technologies, San Francisco, CA) that can perform high-throughput sequence analysis faster and cheaper than current platforms will make comprehensive genomic analysis accessible to a much larger group of laboratories and will stimulate application of genomic sequence analysis to a much broader range of diseases. Similarly, recognition that sequence analysis of panels of genes, instead of whole exomes or whole genomes, is sufficient to address many research questions36,37 will greatly simplify data analysis and thereby increase the number of laboratories that perform genomic studies and, thus, increase demand for appropriately characterized molecular specimens from biorepositories.


Comprehensive genetic or genome-wide studies represent a profound revolution that, combined with ever more robust informatics capabilities, is radically changing how we understand and treat disease at the molecular and personalized levels. Overall, this new paradigm will require biorepositories to update certain facets of their business, thereby addressing expanding technology and evolving regulatory standards. In this article, we have summarized our experiences and best practices at WUMS in the expectation they will be useful to others.


We acknowledge the Siteman Cancer Center (grant P30 CA91842) and the Institute for Clinical and Translational Science (grant UL1RR024992) at Washington University Medical Center, which have partially funded the biorepository at Washington University School of Medicine. We also acknowledge Timothy J. Ley, MD, Division of Oncology, Department of Internal Medicine, Washington University School of Medicine, for useful advice regarding the manuscript.


1. Mardis ER. A decade’s perspective on DNA sequencing technology. Nature. 2011;470:198–203. [PubMed]
2. Ng SB, Turner EH, Robertson PD, et al. Targeted capture and massively parallel sequencing of 12 human exomes. Nature. 2009;461:272–276. [PMC free article] [PubMed]
3. Teer JK, Mullikin JC. Exome sequencing: the sweet spot before whole genomes. Hum Mol Genet. 2010;19:R145–R151. [PMC free article] [PubMed]
4. Easton DF, Pooley KA, Dunning AM, et al. Genome-wide association study identifies novel breast cancer susceptibility loci. Nature. 2007;447:1087–1093. [PMC free article] [PubMed]
5. Thomas G, Jacobs KB, Kraft P, et al. A multistage genome-wide association study in breast cancer identifies two new risk alleles at 1p11.2 and 14q24.1 (RAD51L1) Nat Genet. 2009;41:579–584. [PMC free article] [PubMed]
6. Zheng W, Long J, Gao YT, et al. Genome-wide association study identifies a new breast cancer susceptibility locus at 6q25.1. Nat Genet. 2009;41:324–328. [PMC free article] [PubMed]
7. Morin RD, Bainbridge M, Fejes A, et al. Profiling the HeLa S3 transcriptome using randomly primed cDNA and massively parallel short-read sequencing. Biotechniques. 2008;45:81–94. [PubMed]
8. Ley TJ, Mardis ER, Ding L, et al. DNA sequencing of a cytogenetically normal acute myeloid leukaemia genome. Nature. 2008;456:66–72. [PMC free article] [PubMed]
9. Pao W, Miller V, Zakowski M, et al. EGF receptor gene mutations are common in lung cancers from “never smokers” and are associated with sensitivity of tumors to gefitinib and erlotinib. Proc Natl Acad Sci U S A. 2004;101:13306–13311. [PubMed]
10. Homer N, Szelinger S, Redman M, et al. Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLoS Genet. 4:e1000167. doi: 10.1371/journal. pgen.1000167. [PMC free article] [PubMed] [Cross Ref]
11. The Pharmacogenomics Working Group: Terminology for sample collection in clinical genetic studies. Pharmacogenomics J. 2001;1:101–103. [PubMed]
12. McGuire AL, Caulfield T, Cho MK. Research ethics and the challenge of whole genome sequencing. Nat Rev Genet. 2009;9:152–156. [PMC free article] [PubMed]
13. Pub L No. 110–233, 122 Stat 881.
14. Kaye J, Heeney C, Hawkins N, et al. Data sharing in genomics: re-shaping scientific practice. Nat Rev Genet. 2009;10:331–335. [PMC free article] [PubMed]
15. Lunshof JE, Chadwick R, Vorhaus DB, et al. From genetic privacy to open consent. Nat Rev Genet. 2008;9:406–411. [PubMed]
16. Mardis ER. New strategies and emerging technologies for massively parallel sequencing: applications in medical research. Genome Med. 2009;1:40. doi: 10.1186/gm40. [PMC free article] [PubMed] [Cross Ref]
17. Walter MJ, Ding L, Shao J, et al. Recurrent DNMT3A mutations in patients with myelodysplastic syndromes. Leukemia. 2011;25:1153–1158. [PMC free article] [PubMed]
18. Ding L, Ellis MJ, Li S, et al. Genome remodeling in a basal-like breast cancer metastasis and xenograft. Nature. 2010;464:999–1005. [PMC free article] [PubMed]
19. Office of Biorepositories and Biospecimen Research, National Cancer Institute, National Institutes of Health, US Department of Health and Human Services. [Accessed November 28, 2011];NCI Best Practices for Biospecimen Repositories, 2011 revised version.
20. International Society for Biological and Environmental Repositories (ISBER) [Accessed November 28, 2011];2008 Best Practices for Biorepositories: Collection, Storage, Retrieval and Distribution of Biological Materials for Research.
21. Department of Health, Education, and Welfare. [Accessed December 13, 2011];Belmont Report: Ethical Principles and Guidelines for the Protection of Human Subjects of Research: Report of the National Commission for the Protection of Human Subjects of Biomedical and Behavioral Research. [PubMed]
22. Department of Health and Human Services. [Accessed October 31, 2011];Guidance on research involving coded private information or biological specimens. 2008
23. McDonald SA, Chernock RD, Leach TA, et al. Procurement of human tissues for research banking in the surgical pathology laboratory: prioritization practices at Washington University Medical Center. Biopreserv Biobank. 2011;9:245–251. [PMC free article] [PubMed]
24. NIH Points to Consider for IRBs and Institutions in Their Review of Data Submission Plans for Institutional Certifications Under NIH’s Policy for Sharing of Data Obtained in NIH Supported or Conducted Genome-Wide Association Studies (GWAS) [Accessed October 28, 2011];
25. National Cancer Institute. TCGA Tissue Sample Requirements: High Quality Requirements Yield High Quality Data. [Accessed January 4, 2012];The Cancer Genome Atlas.
26. McDonald SA. Principles of research tissue banking and specimen evaluation from the pathologist’s perspective. Biopreserv Biobank. 2010;8:197–201. [PMC free article] [PubMed]
27. Emmert-Buck MR, Bonner RF, Smith PD, et al. Laser capture microdissection. Science. 1996;274:998–1001. [PubMed]
28. Williams C, Ponten F, Moberg C, et al. A high frequency of sequence alterations is due to formalin fixation of archival specimens. Am J Pathol. 1999;155:1467–1471. [PubMed]
29. Kerick M, Timmermann B, Schweiger MR. High-throughput sequencing of frozen and paraffin-embedded tumor and normal tissue [in German] Pathologe. 2010;31(suppl 2):255–257. [PubMed]
30. Duncavage EJ, Magrini V, Becker N, et al. Hybrid capture and next-generation sequencing identify viral integration sites from formalin-fixed, paraffin-embedded tissue. J Mol Diagn. 2011;13:325–333. [PubMed]
31. Green ED, Guyer MS. the National Human Genome Research Institute. Charting a course for genomic medicine from base pairs to bedside. Nature. 2011;470:204–213. doi: 10.1038/nature09764. [PubMed] [Cross Ref]
32. ZR Genomic DNA–Tissue MiniPrep. [Accessed May 19, 2012];Zymo Research Web site.
33. Sample Preparation and Isolation of RNA or DNA. [Accessed November 29, 2011];AsuraGen Web site.
34. Average RNA yields typically obtained from various starting material. [Accessed November 29, 2011];MACS molecular Web site.
35. Mychaleckyj JC, Farber EA, Chmielewski J, et al. Buffy coat specimens remain viable as a DNA source for highly multiplexed genome-wide genetic tests after long term storage. J Transl Med. 2011;9:91. doi: 10.1186/1479-5876-9-91. [PMC free article] [PubMed] [Cross Ref]
36. Voelkerding KV, Dames S, Durtschi JD. Next generation sequencing for clinical diagnostics: principles and application to targeted resequencing for hypertrophic cardiomyopathy. J Mol Diagn. 2010;12:539–551. [PubMed]
37. Metzker ML. Sequencing technologies: the next generation. Nat Rev Genet. 2010;11:31–46. [PubMed]