|Home | About | Journals | Submit | Contact Us | Français|
BCM faculty members spearheaded development of a first generation Personal Genome Profile (Baylor PGP) assay to assist physicians in diagnosing and managing patients in this new era of medicine. The principles that are guiding the design and implementation of the Baylor PGP are high quality, robustness, low expense, flexibility, practical clinical utility and the ability to facilitate broad areas of clinical research. The single most distinctive feature of the approach taken is an emphasis on extensive screening for rare disease causing mutations rather than common risk-increasing polymorphisms. Because these variants have very large direct effects, the ability to inexpensively screen for them could have major immediate clinical impact in disease diagnosis, carrier detection, pre-symptomatic detection of late onset disease and even prenatal diagnosis. In addition to creating a counseling tool for individual ‘consumers’ this system will fit into the established medical record and be used by physicians involved in direct patient care. This paper describes an overall framework for clinical diagnostic array genotyping, and the available technologies as well as highlights the opportunities and challenges for implementation.
For several years Baylor College of Medicine (BCM) leadership has envisioned an integrated healthcare model that actively develops and deploys genetic testing in its Personalized Medicine strategy. As part of these efforts, BCM, like other leading schools of medicine, became an active member of the Personalized Medicine Coalition, created an infrastructure (the Personalized Medicine Alliance) to coordinate these efforts within the institution and began planning for a new graduate training curriculum to train the next generation of medical leaders.
As part of this larger strategy, BCM faculty members spearheaded development of a first generation Personal Genome Profile (Baylor PGP) assay to assist physicians in diagnosing and managing patients in this new era of medicine. The principles that are guiding the design and implementation of the Baylor PGP were high quality, robustness, low expense, flexibility, practical clinical utility and the ability to facilitate broad areas of clinical research. The tests contained within the prototype (Personal Genomic Profile: Version 1) were targeted to strategic areas within the Baylor Clinic and affiliated clinical and translational research programs. Like most other groups developing testing for personalized medicine we are including genetic tests that can aid in risk classification and that are linked to individual responses to pharmacologic agents. The single most distinctive feature of our approach is the emphasis on extensive screening for rare disease causing mutations rather than just common risk increasing polymorphisms. We are developing tools for large-scale genotyping for rare mutant alleles already known to be responsible for genetic diseases. Because these variants have very large direct effects we believe that the ability to inexpensively screen for them could have major immediate clinical impact in disease diagnosis, carrier detection, presymptomatic detection of late onset disease and even prenatal diagnosis. We emphasize that our purpose is not exclusively to create a counseling tool for individual ‘consumers’ but rather to develop a system that fits into the established medical record and that can be used by physicians involved in direct patient care.
In this paper, we will describe an overall framework for clinical diagnostic array genotyping, describe the available technologies, and highlight the challenges for implementation.
Our simple answer to this question is that the content should include genetic markers that have a clear clinical interpretation. We give greatest priority to results that would have a practical and direct effect on diagnosis, prognosis, and clinical decision-making. We do not believe that risk counseling based on the results from common polymorphisms is yet understood well enough to convert to routine clinical practice – with a few notable exceptions.
Over the last 3 years there has been a steady stream of publications that have identified associations between disease risk and common single nucleotide polymorphisms (1). These studies have been tremendously successful in the sense that for the first time there are epidemiologically robust and statistically sound demonstrations of the effect of these common variants on disease. A remarkable result has been the identification of previously unsuspected pathogenic mechanisms that are likely to provide important avenues for novel interventions and drug development (2, 3). On the other hand, the impact of knowing the specific pattern of variants in individuals is not clear. A recent publication from Brautbar et al demonstrates how incorporation of a single genetic variant into a well-established risk prediction model for coronary artery disease might be used to aid the decision whether to start statin therapy in individuals with otherwise moderate risks (2). The general concept that genetic tests can be incorporated into clinical decision algorithms is very attractive. In contrast, vague assertions about lifestyle counseling based on genetic risks raise many questions about acceptability, cost, efficacy and application.
There are now a handful of loci and genetic variants for which there are convincing data indicating that they affect drug metabolism and/or risk of drug-induced toxicity (http://www.pharmgkb.org/). Genetic variants that influence the absorption, distribution, metabolism, and elimination (ADME) of various drugs have been associated with individual differences in pharmacokinetics (3). In addition, some genetic variants have also been associated with the risk of drug side effects and toxicities (4–6). The broad problem of pharmacogenetics is beyond the scope of this short commentary, but it is clear that there are a limited number of such markers with large effects and only a few in which testing might be ready to incorporate into clinical practice. Nevertheless, we recognize that pharmacogenetic testing is one of the most important potential applications for diagnostic genotyping. At the time of design of our assay, clinical pharmacogenetic testing was offered by at least one laboratory for 32 different genes. Review of the literature suggests that only a subset of these are supported by studies with adequate sample sizes, consistently defined effects on drug metabolism or toxicity, repeated independent replication of the claimed genetic effect, and unambiguously defined risk alleles or haplotypes. Although we have incorporated genetic analysis of all these candidate genes in our version 1.0 assay, we plan to report only a subset of proven markers to referring physicians. Key examples are the CYP2D9 and VKORC loci which have been shown to have very large effects on the metabolism of warfarin (7). Table 1 shows some examples of the gene representation on the assay and the drugs whose metabolism is known to be affected by common variants in these loci. Another example relates to the predictive association of HLA-B*5701 and hypersensitivity to the reverse transcriptase inhibitor, abacavir, used to treat HIV-1 infection. HLA-B*5701 is in tight linkage with SNP HCP5, which is included in the Baylor PGP: version 1.
The lifetime risk of being affected with a genetic disorder has been estimated to be at least 8% (8). Although most physicians tend to dismiss these conditions as “zebras” because individually they are relatively rare, the aggregate effect of these conditions is very large. In the past 20 years almost 3,000 human genes responsible for Mendelian (single gene) disease have been identified. A role for some of these mutations is strongly suspected in more common disorders and shown to be risk factors in multiple diseases (9–17) but, to date, there has been no systematic studies that address how much rare mutations contribute to common diseases.
There are virtually no higher prevalence single gene disorders in which the genes(s) responsible have eluded detection. Either because they are more frequent or because the medical impact is very high, there is now population information about the mutant alleles that underlie the most important of these diseases. Autosomal recessive conditions like cystic fibrosis and phenylketonuria have been extensively studied. Familial breast cancer, an autosomal dominant trait whose genetics encompass locus heterogeneity, age-dependent penetrance, and variable expression, has been the object of very large scale DNA sequencing. Several of the X-linked recessive disorders, especially Factor IX deficiency, have been intensively studied and have much to teach us about mechanisms of human mutation.
In some ways these ‘classic’ genetics diseases are exceptional in that most Mendelian diseases are even rarer. Less than 1000 affected individuals have ever been observed in most of the several thousand Mendelian diseases and so for those in which a disease gene has been identified there is a correspondingly short list of mutations. Whether considering higher prevalence diseases or extremely rare ones, there are certain general lessons that one can draw from all the available mutational data. There are some disease phenotypes that reflect the unique expression of a single or very small number of alleles. Perhaps the most dramatic illustration of this mechanism is the fact that >98% of individuals with achrondroplastic dwarfism bear the G380R mutation in the FGFR3 locus. Some disease genes have distinctive structural features that predispose them to particular mutational mechanisms. For example, the dystrophin locus is very large and has intragenic low copy repeat structures that result in common deletions that together account for 70% of the mutant alleles. Another example is the duplication of a short region of chromosome 17 that results in the most common form of Charcot-Marie-Tooth disease. Demographic factors including migration, founder effects, and possibly selection have caused certain alleles responsible for the most common autosomal recessive diseases to have far higher frequencies than the average among all such alleles. Another important consideration is that certain nucleotide positions have high intrinsic mutation rates compared to surrounding bases. The best understood of these mechanisms is the C->T transition caused by deamination of 5-methylcytosine. Triplet-repeat expansions are another special type of mutation restricted to certain loci with a predisposing but otherwise normal repeat structure. Finally, mutations in some nucleotide positions are more likely to lead to deleterious amino acid substitutions in protein coding sequence and consequently are more likely to be sites observed in individuals with Mendelian disease.
For all these reasons, in about half of all Mendelian diseases, one sees a substantial fraction of mutant chromosomes contributed by mutations that have been observed in more than one unrelated individual. Figure 1 illustrates the cumulative percent of disease alleles in 2 examples, familial adenomatous polyposis—a dominant disorder caused by mutation in APC—and Wilson disease—a recessive disorder caused by mutation in ATP7B. In both cases, 25–45% of alleles have been observed more than once and in aggregate account for 70–90% of alleles in cases. To obtain a more representative sample of diseases, 24 faculty and fellows within the BCM Medical Genetics Clinical Program carried out literature and database review of 666 genes responsible for Mendelian disorders and for which there is clinical laboratory testing available. Figure 2 shows that in ~50% of single gene disorders, ≥20% of disease-associated alleles are accounted for by mutations that are common in the population (formally called identical-by-descent or IBD)-mutants or recurrent alleles. The remaining diseases are mostly quite rare or had had very limited investigation. There is an inverse correlation of disease frequency and the contribution of common and recurrent alleles. Although there are large numbers of singleton or private alleles in all such diseases (and many more to be found by sequencing), the majority of individuals affected with a Mendelian disorder bear a mutant allele already documented in the published literature.
That IBD-mutants and recurrent mutations can exist at important frequencies is one of the classic results of population genetics theory. For highly penetrant alleles, the allele frequency spectrum is predicted by mutation-selection balance (14, 15). For weakly penetrant alleles other evolutionary forces can play an important role in shaping their individual frequencies as well as the full allele frequency spectrum (18). Drift (e.g. founder effects and bottlenecks), balancing selection (e.g. heterozygote advantage or overdominance; frequency-dependent selection; time-dependent selection; and environmental diversity), intragenic allelic complementation, and non-allelic suppressors may all play a role in the frequency spectrum of deleterious alleles. Our current knowledge of the allele frequency spectrum of single gene disorders is affected by many factors. There is biased case ascertainment in almost all data sources on human pathological DNA sequence variation because the cost of sequencing in very large numbers of unaffected individuals has been, heretofore, prohibitive. Inference of causality of particular alleles is not based on uniform criterion in the literature, and there is evident positive bias in reporting of functional analyses. Both of these factors could and do lead to allele misclassifications. Allele-specific penetrances are unknown in the general population; possibly causing one of two recognized problems in diagnostic test development, called spectrum effect and spectrum bias (17–19). In such situations the magnitude of an effect is systematically overestimated because of biased sampling or because of true biological heterogeneity.
The effect of rare deleterious mutations has been one of the main objects of research from the earliest development of Human Genetics as a distinct scientific discipline. Although at least 80,000 rare disease-causing mutations have been identified (Human Gene Mutation Database; http://www.hgmd.cf.ac.uk/ac/index.php), it is exceptionally difficult to study the effects of these variants in the general population. Our diagnostic program has been developing a technical platform that can begin to close the gap between common SNPs and known disease-associated rare variants. This suggests to us that by assaying for known disease-causing common/recurrent mutations in the general population one would have an extremely useful screening or diagnostic test. The cost of such testing until recently would have made such a strategy impractical. However, recent advances in technology demonstrate that this can be done. The opportunity and technical limitations of such an approach are considered next.
The research investment in the study of single gene disorders has led to many thousands of publications and a variety of gene-specific databases that catalog the allelic spectrum of the mutations. New genotyping technologies have made testing for very large numbers of common gene variants a routine process (19). So far all of the focus has been on genotyping single nucleotide polymorphisms (SNPs), but in principle the same methods can be extended to a broader array of genetic variations including small indels as well as exon deletions and duplications. Disease-causing mutations documented by their involvement in specific Mendelian diseases are rarer alleles than those typically chosen for SNP assays, but otherwise they could be analyzed by these new methods. The experience gained from production of standardized arrays will enable much more rapid development of targeted screening tools for clinical genetics practice. This would allow expensive sequencing tests to be reserved for either confirmation of genotyped alleles or for identification of private alleles not yet reported in any database. Furthermore, the approach would enable a genome-wide approach to single nucleotide and small indel mutation analysis as opposed to the current locus specific approach. Although indels are much less frequent than SNPs, they account for ~24% of severe mutations. Tackling these challenges requires an innovative approach, which will address technical gaps in current methods.
There are significant barriers that have inhibited development of high-throughput genotyping for mutations in both academic and private sectors. Almost no academic groups have the all the necessary skills in general medicine, medical genetics, bioinformatics, genomics and diagnostics required for basic assay design. Academic groups do not have the necessary expertise in array fabrication, assay optimization, and quality control that are exclusively the domain of the commercial sector. The resources and experience at BCM in the diagnostics laboratory are unique and demonstrate a long-term commitment to development of this project. The potential role of rare variants in common diseases has been the subject of conjecture but there have not been basic studies that would stimulate commercial investment in the required assays. The complexity of the medical information and extremely broad diversity of single gene disorders has also inhibited commercial interest. Finally, the complexity of the intellectual property issues has also inhibited commercial entry into this obvious area of medical application. Diagnostic programs such as the one we have developed cannot solve this problem acting by themselves, but will certainly focus attention on the need for reform with organized cross licensing in the area of diagnostic gene patents.
We propose that large scale genotyping for rare mutant alleles already known to be responsible for genetic diseases could have a major clinical impact in carrier detection, presymptomatic detection of late onset disease, disease diagnosis, and risk assessment for common disease. Standardized arrays for testing such variants could be used in research, as a relatively inexpensive clinical screening tool, or as a confirmatory method following personal genome sequencing.
Mutation analysis in affected individuals consists of either mutation discovery or testing for known mutations. Mutation discovery relies on various screening procedures like DNA sequencing and/or multiplex ligation-dependent probe amplification (MLPA). Even these approaches have limitations, particularly in the discovery of regulatory mutants and various DNA rearrangements. When a disease diagnosis has been made based on clinical criteria, sequencing is often the single most effective way to identify the causative mutation. However, sequencing and methods for characterizing locus-specific structural alterations do not lend themselves to examining multiple genes and even modest locus heterogeneity in a disease can make sequencing too expensive for routine clinical application.
Testing for known mutations must be inexpensive, flexible, and highly accurate. Currently methods like allele-specific oligonucleotide hybridization, TaqMan, Invader, and MALDI-TOF are the most likely to be used. Facile single nucleotide polymorphism (SNP) genotyping methods have rapidly gained acceptance for linkage and association analyses. Testing for known mutations has much in common with SNP genotyping and single nucleotide mutations are essentially low frequency SNPs and so it may be possible to use these same chemistries and instrument platforms for mutation screening. Improved SNP assays have achieved a greater than two orders of magnitude reduction in the cost per genotype. The allele specific primer extension (ASPE) and single base extension (SBE) assays produced by Illumina in their iSelect custom genotyping format use a proprietary whole genome amplification scheme and can be multiplexed at up to 1 million markers per assay. We have selected this assay platform because of its flexibility and high accuracy.
There are some obvious limitations to these genotyping platforms regarding important human mutations. At present, for example, it is not possible to use these methods to identify triplet repeat expansions. There are also some sites that may not lend themselves to genotyping with particular chemistries. Assay conversion rates for these methods are generally above 90% if the variant position has been previously validated. If the surrounding sequence is not unique or is highly similar to paralogous loci or pseudogenes the assay may fail. Nucleotide content and secondary structure may also affect the properties of the probe oligonucleotides and reduce conversion rate. For positions that have not been previously assayed many of these properties can be successfully predicted in silico. In SNP genotyping it is usually possible to avoid markers with poor design scores through use of another linked and more favorable marker, but this will not be possible for mutation testing. None of these methods have been extensively tested using procedures designed to maximize assay conversion and genotyping accuracy for all tested sites. Duplicate assays and assays of both DNA strands for a single mutation site are a part of the design of probe pools that we are using for diagnostic test development.
All these genotyping methods have very low costs for each position with current reagent costs ranging from $0.005–0.07 per locus. For clinical testing one can anticipate somewhat higher costs as assay redundancy, multiple probe sets for indel analysis, and controls will be routinely employed. In addition, fixed overhead costs for diagnostic testing account for at least two-thirds of the total service cost. We envision that positive results from very rare variants will be confirmed by additional testing using DNA sequencing. The complexity of such follow up testing is an important additional part of the total cost picture.
We have been developing array-based tools for genotyping rare mutations, common risk increasing variants, and markers for pharmacogenetic testing in a single custom panel. This project has required a multidisciplinary collaboration between individuals in the fields of genomics, medical genetics, bioinformatics, array chemistry, and statistical genetics. The challenge of developing and validating these arrays require interpretation of an exceedingly complex medical literature, application of advanced bioinformatics methods, and development of new genotyping technology.
Evaluation of assay designs for known deleterious rare variants has allowed improvement of array-based analyses and will stimulate further development of clinically useful standardized genotyping panels. In addition, experience gained from this developmental process will allow more efficient confirmation of some of the most important rare variants that are expected to be detected in individual genome sequencing studies (20). A standard research panel of known deleterious variants could also be widely employed in studies on their roles in common diseases such a type II diabetes, cancers, neurological and cardiovascular disorders.
There are three major areas in which a diagnostic test encompassing all common and recurrent mutations would be useful. This test will be cost effective as a screening tool to detect >3,000 common/recurrent mutations in 99 genes that result in 260 diseases including adult onset diseases, familial cancer predisposition, cardiovascular diseases (vascular, cardiomyopathy and arrhythmias) and neurological diseases (Parkinson’s, Alzheimer’s and peripheral neuropathy). In reproductive age adults, testing for carrier status of recessives will have great impact. One can also anticipate that carrier status for certain conditions confers risk in some common adult onset diseases. SNP genotypes associated with increased risk of common diseases such as breast and prostate cancer may ultimately be integrated into cancer screening algorithms. Decision to treat algorithms may also successfully incorporate both risk and pharmacogenetic information. Medication selection, initial dosing, and monitoring plan for medication-induced toxicity may all be greatly influenced by results of the pharmacogenetic test results. We believe that all these applications will have to be integrated into an active medical record as it will not be practical for either physicians or patients to fully control their resulting profile without the aid of automated tools.
All these potential clinical indications raise new questions about the use, disclosure, and impact of enormous amounts of individual genetic information. In almost all clinical applications only a portion of the data will be directly relevant to the patient at the time of initial testing. The question of what to do with carrier information in very young and old individuals is one such concern. Privacy and insurability are key concerns since some completely unexpected risks may be uncovered by very extensive testing. Mutation data will also provide a potentially rich resource for research on the molecular basis of more common diseases and research access will have to be addressed in the context of protection of the individual patient.
The principles that are guiding the design and implementation of a first generation Personal Genome Profile (Baylor PGP) assay are high quality, robustness, low expense, flexibility, practical clinical utility and the ability to facilitate broad areas of clinical research. The single most distinctive feature its development is an emphasis on extensive screening for rare disease causing mutations rather than common risk-increasing polymorphisms. Because these variants have very large direct effects, the ability to inexpensively screen for them could have major immediate clinical impact in disease diagnosis, carrier detection, pre-symptomatic detection of late onset disease and even prenatal diagnosis. This system will fit into the established medical record and be used by physicians involved in direct patient care.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.