|Home | About | Journals | Submit | Contact Us | Français|
The cost of genomic information has fallen steeply but the path to clinical translation of risk estimates for common variants found in genome wide association studies remains unclear. Since the speed and cost of sequencing complete genomes is rapidly declining, more comprehensive means of analyzing these data in concert with rare variants for genetic risk assessment and individualisation of therapy are required. Here, we present the first integrated analysis of a complete human genome in a clinical context.
An individual with a family history of vascular disease and early sudden death was evaluated. Clinical assessment included risk prediction for coronary artery disease, screening for causes of sudden cardiac death, and genetic counselling. Genetic analysis included the development of novel methods for the integration of whole genome sequence data including 2.6 million single nucleotide polymorphisms and 752 copy number variations. The algorithm focused on predicting genetic risk of genes associated with known Mendelian disease, recognised drug responses, and pathogenicity for novel variants. In addition, since integration of risk ratios derived from case control studies is challenging, we estimated posterior probabilities from age and sex appropriate prior probability and likelihood ratios derived for each genotype. In addition, we developed a visualisation approach to account for gene-environment interactions and conditionally dependent risks.
We found increased genetic risk for myocardial infarction, type II diabetes and certain cancers. Rare variants in LPA are consistent with the family history of coronary artery disease. Pharmacogenomic analysis suggested a positive response to lipid lowering therapy, likely clopidogrel resistance, and a low initial dosing requirement for warfarin. Many variants of uncertain significance were reported.
Although challenges remain, our results suggest that whole genome sequencing can yield useful and clinically relevant information for individual patients, especially for those with a strong family history of significant disease.
Technological advance has brought a steep decline in the cost of genetic information, but the explanatory power and path to clinical translation of risk estimates for common variants found in genome wide association studies remains unclear. Much of the reason for this lies in the presence of rare and structural genetic variation. Since we are now able to rapidly and inexpensively sequence complete genomes,1–5 more comprehensive genetic risk assessment and individualisation of therapies may be possible.6 However, analytic tools are presently lacking to make these data accessible in a clinical context and the clinical utility of these data at an individual level has not been formally evaluated.
The patient was assessed at Stanford's Center for Inherited Cardiovascular disease by a cardiologist (EA) as well as a board certified genetic counsellor (KO). The patient was a 40 year old male who presented with a family history concerning for coronary artery disease and sudden death. There was no significant past medical history and the patient exercised regularly without symptoms. The patient was taking no medications. A four generation family pedigree was drawn (Figure 1). Family history revealed coronary artery disease and abdominal aortic aneurysm in first and second degree relatives. There was also a family history of early sudden, presumed cardiac, death. On examination, the patient was well appearing. Clinical examination was within normal limits. Conventional risk assessment for coronary artery disease included a lipid panel (Table 1). Due to his family history of cardiovascular disease, he underwent electrocardiography, which showed sinus rhythm, normal axis and high praecordial voltage with early repolarisation. The family history of sudden death prompted an echocardiogram and cardiopulmonary exercise test. The echocardiogram revealed normal right and left ventricular size, systolic, diastolic and valvular function. There were no wall motion abnormalities on maximal exercise and 1.5mm ST depression was upsloping. Maximum oxygen uptake was 49ml/kg/min.
The technical details of the genome sequencing for this individual have been described previously.7 In brief, genomic DNA was purified from 2 ml of whole blood and sequenced with a Heliscope genome sequencer. Output comprised 148 GB of raw sequence with an average read length of 33 bases. Sequence data were mapped to the National Center for Biotechnology Information reference human genome build 36 using the open-source aligner IndexDP.7 Base calling was performed with the UMKA algorithm, resulting in the detection of 2.6 million single nucleotide variations and 752 copy number variations from the reference sequence. A subset of SNP calls were independently validated with the Illumina BeadArray and with Sanger sequencing. A subset of CNV calls were independently validated with digital PCR.
Disease and risk analysis of the genome was focused in four areas: i) variants associated with genes for known Mendelian disease, ii) novel mutations, iii) variants known to modulate response to pharmacotherapy, and (iv) single nucleotide polymorphisms previously associated with complex disease.
Database queries, biophysical prediction algorithms, and analyses of non-coding regions were used to screen rare and novel variants in the genome. We queried disease-specific mutation databases, the Human Genome Mutation Database (HGMD) and Online Mendelian Inheritance in Man (OMIM) to identify genes and mutations with known associations to monogenic diseases. We applied prediction algorithms to weight the likelihood of variant pathogenicity based on allele frequency, conservation and protein domain disruption. In addition, we developed algorithms to index variants affecting or creating start sites, stop sites, splice sites and microRNAs (Figure 2, Supplemental methods).
PharmGKB8 contains data on 2500 variants, of which 650 refer specifically to drug response phenotypes. PharmGKB curators examined these 650 annotations in the context of this patient's genotype. Key variants were then identified based on the relevance of the phenotype in the annotation, the medical and family history, and the study population upon which the annotation was based. Since our disease risk estimation and pharmacogenomic analysis draw on previously published observations, we rated the level of evidence used in one of three categories (Supplemental methods).
To integrate common variant genetic risk across a spectrum of human disease, we built a manually curated disease-SNP database (Supplemental methods). Diseases and phenotypes were mapped to the Unified Medical Language System (UMLS) Concept Unique Identifiers (CUIs). Since strand direction was variably reported in studies, we identified strand direction by comparing with the major/minor alleles found in the appropriate HapMap population. Odds ratios were available for allele comparisons in most cases (Supplemental Figure 1) however to generate a medically relevant posterior probability of disease from integrated environmental and genetic risk, we calculated likelihood ratios for the most significant SNP from each haplotype block. Pre-probability was derived from published sources (Supplemental Table 4) and the LR was applied to the pre-test odds of disease, calculated from age and sex appropriate population prevalence. Some studies did not provide frequency data for genotype that allowed calculation of the LR.
The study was approved by the Institutional Review Board of Stanford University. The patient received education and counselling before signing the consent form and throughout the process of testing and follow up.
The study sponsors had no role in the design, data collection, data analysis, data interpretation, or writing of the report. Dr. Ashley had full access to all data in the study and final responsibility in the decision to submit the manuscript for publication.
A four generation family pedigree (Figure 1) revealed atherosclerotic vascular disease with multiple manifestations as well as prominent osteoarthritis. The patient's first cousin once removed (IV-1) died suddenly of an unknown cause.
An important benefit of sequencing over DNA chip based methods of genotyping is the identification of rare or novel variants. We searched for evidence of rare or novel variants that would predispose the patient or his family to disease (Table 2, Supplemental Table 2). Specific to cardiovascular disease, we discovered rare variants in three genes clinically associated with sudden cardiac death: TMEM43, DSP, and MYBPC3. The MYBPC3 variant, encoding an arginine to glutamine change at position 326 of the cardiac myosin binding protein C, was originally associated with late onset hypertrophic cardiomyopathy.9 Subsequently it has also been found in multiple independent control populations without known hypertrophic cardiomyopathy,10 suggesting it may be a benign variant. Mutations in TMEM4311 or DSP12 have been associated with familial arrhythmogenic right ventricular dysplasia/cardiomyopathy (ARVD/C). Review of clinical assessment of extended family members revealed minor criteria for ARVD/C in one first cousin, whose son died suddenly in his teens. In contrast to the findings for the identified rare MYBPC3 variant, the TMEM43 variant, encoding a methionine to valine change at position 41 of transmembrane protein 43, has not been previously published, but was seen in 1 of 150 probands known to have ARVD/C.13 The identified DSP variant, encoding an arginine to histidine change to amino acid 1838 of the desmoplakin protein, is entirely novel. Control populations from clinical testing laboratories (more than 1000 total chromosomes) have not found either the DSP or TMEM43 variants.
The patient's genome revealed three novel and potentially damaging variants in two related genes previously associated with the development of haemochromatosis. Subsequent to these findings, detailed personal and family history review failed to identify a history of haemochromatosis in the patient or family members. Available clinical testing did not show evidence of haemochromatosis based on echocardiographic parameters and liver function testing. The justification for further surveillance and testing with serum iron studies was explored with the patient. Additionally, the patient was found to harbor a novel stop mutation in a gene implicated in hyperparathyroidism and parathyroid tumours. This variant may increase probability of future development of hyperparathyroidism or parathyroid tumours through a loss-of-heterozygosity mechanism. Osteoarthritis was prominent in family history and knee pain without a formal diagnosis was present in the patient.
We found 64 clinically relevant previously described pharmacogenomic variants (see Table 3, Supplemental Table 3) and 12 novel, non-conservative, amino acid changing SNPs in genes known to be important for drug response. There was a heterozygous null mutation in CYP2C19, a gene product important for the metabolism of many drugs, including proton pump inhibitors, antiepileptics, and the anti-platelet agent clopidogrel. Notably, the rate of cardiovascular events is higher among patients taking clopidogrel with CYP2C19 loss of function mutations.14 In addition, the patient has three distinct genetic variations that suggest a lower maintenance dose of warfarin. The patient has the single most important variant in VKORC1 associated with a lower maintenance dose,15 is homozygous for a CYP4F2 SNP associated with lower dosing, and interestingly has a novel non-synonymous SNP in VKORC1.16 Thus, warfarin loading could be managed in an individualised manner for this patient with lower expected doses. The patient has several variants associated with a good response to statins (including lower risk for myopathy) and one variant suggesting that he may need a higher dose to achieve a good response. Finally, the patient is wild type (with no copy number variations) for the important drug metabolizing enzymes that impact hundreds of drug responses: CYP2D6, CYP2C9, and CYP3A4.
While genome wide association studies have provided highly significant association of many common variants with disease, integrating these small odds ratios in the context of the individual patient remains challenging. In particular additive or multiplicative models of even highly significant SNPs can add little to the classified status of the patient.17, 18 Further, these approaches take no account of prior probability of disease. To approach some of these concerns, we adopted established methods from within evidence based medicine that have to date rarely been applied to clinical genetics. We calculated pre-probabilities from referenced sources for 121 diseases (Supplementary Table 4). Of the 55 diseases for which we could calculate a post-test, there was consistently increased genetic risk (likelihood ratio, LR > 2) for 8 diseases and decreased genetic risk (LR < 0.5) for 7 diseases (Figure 3A). Of note, an increase in genetic risk did not always translate into a high post-test probability. It was rare for us to find post-test probabilities that were an order of magnitude higher or lower than pre-test probabilities. Decision towards acting on these predictions will necessarily be a function of the post-test probability threshold for action (i.e. the post-test probability of type 2 diabetes), the consequences of action (i.e. regular testing for fasting blood sugar), and the utility and efficacy of action.
Increased genetic risk for myocardial infarction (MI) took the form of 6 MI-susceptible SNPs and 2 protective SNPs (Figure 3b). In addition, the patient had risk markers at the locus (9p21) most replicated in genome wide association studies (an example is rs1333049 associated with an odds ratios of 1.5 for early onset myocardial infarction19 – this marker is part of a commercial genetic “risk” test for myocardial infarction). Further, he patient harbors a single copy of the previously studied rare variant of the LPA gene that encodes the apolipoprotein (A) precursor. Notably, the patient manifest a very high Lp(a) level (114 mg/dl, Table 1) which is associated with an increased risk of cardiovascular events. This variant is associated with a 5-fold higher median plasma Lp(a) level, a 1.7 to 2-fold20 risk of coronary artery disease, and a 3-fold21 adjusted odds ratio versus non-carriers for severe coronary artery disease. This polymorphism has been associated with a low number of Kringle IV-2 (KIV-2) domain repeats in the LPA gene, high Lp(a) levels, and adverse cardiovascular events.22,23 Given the technical limitations of the short read sequencing, a precise estimate of the number of KIV-2 domains in the patient's genome sequence was not determined.
We placed the disease associated genetic risk into the context of known environmental and behavioural modifiers, as well as predisposing conditions (Figure 3c). Diseases that may be independently associated with low genetic risk (e.g. abdominal aortic aneurysm) are visualised in the context of others that may be aetiologically related but for which genetic risk may be higher (e.g. obesity, which predisposes to type 2 diabetes and hypertension). Thus, overall risk can then be assessed using both direct and conditionally dependent information because they are illustrated together in the circuit. For example, we predict a reduced risk probability for hypertension of 16.8% (LR = 0.81) relative to the normal population, however the patient has a substantially elevated genetic risk for obesity (LR = 6.28) imparting a high post-test risk of 56.1% for a pre-disposing risk factor for hypertension. Furthermore, hypertension is associated with a number of modifiable environmental factors imparting on risk either directly (e.g. sodium intake) or conditionally by association with another node in the circuit (e.g. antipsychotics). Although no methods currently exist for statistical integration of such conditionally dependent risks, interpretation of findings in the context of the causal circuit diagram allows individualised assessment of the combined effect of environmental and genetic risk.
We discussed the possibility that this clinical assessment incorporating a personal genome might uncover high risk of a serious disease, even some that do not have therapies. In addition, we described the reproductive implications of heterozygous status for autosomal recessive diseases such as cystic fibrosis, potentially not predictable from family history (Table 2, Figure 1). We also warned of increases or decreases in genetic risk for common diseases. We noted that the vast majority of the sequence information available is currently difficult to interpret. We discussed error rates and validation processes. We addressed the possibility of discrimination based on genetics. While a specialised physician can provide information for a patient seeking a specific single disease genetic test, patients with whole genome sequence data need information on significantly more diseases with a wide clinical range (Table 2). For this reason, we offered extended access to clinical geneticists, genetic counsellors and clinical lab directors to interpret the information we presented.
We provide an approach to comprehensive analysis of a human genome in a defined clinical context. We assessed whole genome genetic risk, focusing on variants in genes associated with Mendelian disease, novel rare variants across the genome, and variants of known pharmacogenomic importance. In addition, we developed an approach to the integration of disease risk across multiple common polymorphisms. Although the methodology is nascent, the results provide a proof-of-principle that clinically meaningful information can be derived about disease risk and the response to medications in patients with whole genome sequence data. A prominent aspect of the patient's family history (Figure 1) is the diagnosis of arrythmogenic right ventricular dysplasia/cardiomyopathy in his first cousin (III-3) and the sudden death of his first cousin once removed (IV-1). Our patient shares 12.5% of his genetic information with his first cousin and 6.25% with that relative's son and, while a diagnostic workup would involve targeted sequencing of DNA from these individuals, our analysis uncovered several variants in genes with potential explanatory value. Most were common variants. One gene (MYBPC3) was previously associated with hypertrophic cardiomyopathy but seems to in fact be a common variant, exemplifying the limitations of current variant databases. Two rare variants in genes (TMEM43, DSP) previously associated with ARVD/C were novel.
Our patient reported a prominent family history of vascular disease including aortic aneurysm and coronary artery disease (Figure 1, individuals II-1, II-2, I-1, I-2). While it is possible that the collagen variant we found contributes to familial risk of aortic aneurysm, disease in this family is more likely related to atherosclerotic disease. In estimating the risk of coronary artery disease, we integrated the most replicated risk associations, likelihood ratio projections from the entire literature, and a known rare variant in the LPA gene that may not have been found using chip based genotyping. According to the ATP-III guidelines,24 our patient does not currently have major risk factors for coronary artery disease and would require an LDL > 190 mg/dl to qualify for lipid lowering therapy. However, he is borderline for three major risk factors and any two of these would lower the LDL threshold for treatment to 160 mg/dl (his measured level was 156 mg/dl). Although no standards yet exist for the incorporation of global genetic risk in cardiovascular risk assessment, physicians are accustomed to incorporating many sources of information in clinical decision-making. In this case, the patient's physician took account of this lifetime genetic risk and knowledge of his likely response to therapy into the clinical decision to recommend a lipid lowering medication. Part of this decision focused on the likely response to this therapy. His genome includes variants (Table 3) that predict greater likelihood of beneficial effect for statin medications and lower risk for the adverse effect of skeletal myopathy. In addition, a significant reduction of attributable risk was found in carriers of the LPA risk allele who took aspirin,20 leading to a discussion between the physician and his patient on the threshold for primary prevention with aspirin therapy. Given a predisposition to coronary artery disease and other diseases on which risk is conditionally dependent (Figure 3c), understanding the patient's potential response to clopidogrel and warfarin may be important aspects of individualising future medical therapy. The patient is at risk for clopidogrel resistance as a result of his CYP2C19 loss of function mutation and his physician recommended a higher dose of clopidogrel in the event of future use or consideration of newer agents with alternative metabolism. In contrast, should the patient develop an indication for warfarin, his genotype at the VKORC1 and CYP4F2 loci suggests he should take lower initial doses of warfarin. The novel VKORC1 variant may have additional effects on warfarin metabolism.
In contrast, our patient did not report a family history of haemochromatosis or parathyroid tumours yet harbours some genetic risk for these conditions. An important contribution of clinical-genetic risk integration is the appropriate consideration of further screening studies. In addition, risk alleles may be discovered that carry reproductive or familial significance rather than personal significance (such as those for breast or ovarian cancer in a male patient). Appropriate incorporation of such risk alleles into both medical and ethical discussion is warranted.
There remain significant limitations to our ability to comprehensively integrate genetic information into clinical care. For example, there is a lack of a comprehensive rare mutation database or a framework for the statistical combination of risk estimates from multiple common polymorphisms. Since risk estimates change as more studies are completed, a continuously updated pipeline is required. On a technical level, we remain limited in our ability to improve error rates associated with sequencing, in particular, detecting structural variants. Finally, gene-environment interactions are challenging to quantify and have to date been little studied.
As whole genome sequencing becomes more widespread, obtaining genomic information will no longer be the limiting factor in the application of genetics to clinical medicine. Developing tools to integrate genetic data on common and rare variants along with clinical data to assist in clinical decision making is a large step towards individualised medicine. The transition to a new era of genome-informed medical care will require a team approach that incorporates medical and genetics professionals, ethicists and healthcare delivery organizations.
The authors would like to acknowledge the invaluable help of Josephine Puryear, Joshua Spin, Emidio Capriotti, Connie Oshiro. We also thank Yuti Bhide and Prajkta Bhide from Optra Systems for the curation of disease-associated SNPs from literature.
Grant support This work was supported by grants from the National Institutes of Health General Medical Sciences (GM61374 and associated ARRA supplement, GM079719), National Institutes of Health Heart, Lung And Blood Institute (F32HL097462, LM009719, K08 HL083914), National Institutes of Health Human Genome Research Institute (HG003389), Howard Hughes Medical Institute, The John D. and Catharine T. MacArthur Foundation for The Law and Neuroscience Project.
Conflicts of Interest RA is consultant to a direct-to-consumer genetic testing company, 23andme. GC is an advisor to several sequencing and direct-tconsumer companie (23andme, Knome, Helicos). Full list accessible here: http://arep.med.harvard.edu/gmc/tech.html. KO was a paid consultant as a member of the Genetic Counseling Task Force for Navigenics from 6/07 to 8/09. SQ is a founder, consultant and equity holder in Helicos BioSciences. DP is an equity holder in Helicos BioSciences. EA, DB, AB, RC, MC, FD, JD, LG, HG, JH, LMH, LH, TK, JWK, AM, NN, AP, AR, HS, KS, JT, CT, RW, MTW, MW, and AZ declare that they have no conflicts of interest.