Patients, clinicians, researchers and payers are seeking to understand the value of using genomic information (as reflected by genotyping, sequencing, family history or other data) to inform clinical decision-making. However, challenges exist to widespread clinical implementation of genomic medicine, a prerequisite for developing evidence of its real-world utility.
To address these challenges, the National Institutes of Health-funded IGNITE (Implementing GeNomics In pracTicE; www.ignite-genomics.org) Network, comprised of six projects and a coordinating center, was established in 2013 to support the development, investigation and dissemination of genomic medicine practice models that seamlessly integrate genomic data into the electronic health record and that deploy tools for point of care decision making. IGNITE site projects are aligned in their purpose of testing these models, but individual projects vary in scope and design, including exploring genetic markers for disease risk prediction and prevention, developing tools for using family history data, incorporating pharmacogenomic data into clinical care, refining disease diagnosis using sequence-based mutation discovery, and creating novel educational approaches.
This paper describes the IGNITE Network and member projects, including network structure, collaborative initiatives, clinical decision support strategies, methods for return of genomic test results, and educational initiatives for patients and providers. Clinical and outcomes data from individual sites and network-wide projects are anticipated to begin being published over the next few years.
The IGNITE Network is an innovative series of projects and pilot demonstrations aiming to enhance translation of validated actionable genomic information into clinical settings and develop and use measures of outcome in response to genome-based clinical interventions using a pragmatic framework to provide early data and proofs of concept on the utility of these interventions. Through these efforts and collaboration with other stakeholders, IGNITE is poised to have a significant impact on the acceleration of genomic information into medical practice.
Precision medicine; Pharmacogenomics; Genomics; Personalized medicine; Clinical decision support; Electronic health record; Implementation
We describe here the design and initial implementation of the eMERGE-PGx project. eMERGE-PGx, a partnership of the eMERGE and PGRN consortia, has three objectives : 1) Deploy PGRNseq, a next-generation sequencing platform assessing sequence variation in 84 proposed pharmacogenes, in nearly 9,000 patients likely to be prescribed drugs of interest in a 1–3 year timeframe across several clinical sites; 2) Integrate well-established clinically-validated pharmacogenetic genotypes into the electronic health record with associated clinical decision support and assess process and clinical outcomes of implementation; and 3) Develop a repository of pharmacogenetic variants of unknown significance linked to a repository of EHR-based clinical phenotype data for ongoing pharmacogenomics discovery. We describe site-specific project implementation and anticipated products, including genetic variant and phenotype data repositories, novel variant association studies, clinical decision support modules, clinical and process outcomes, approaches to manage incidental findings, and patient and clinician education methods.
pharmacogenetics; pharmacogenomics; next generation sequencing; study design; pre-emptive genotyping
Precision medicine; Tailored interventions; Personalized medicine; Mobile health; Health informatics; Pharmacogenetics; Cohort studies; Behavioral risk factors; Environmental risk factors
Genome-wide association studies, DNA sequencing studies, and other genomic studies are finding an increasing number of genetic variants associated with clinical phenotypes that may be useful in developing diagnostic, preventive, and treatment strategies for individual patients. However, few common variants have been integrated into routine clinical practice. The reasons for this are several, but two of the most significant are limited evidence about the clinical implications of the variants and a lack of a comprehensive knowledge base that captures genetic variants, their phenotypic associations, and other pertinent phenotypic information that is openly accessible to clinical groups attempting to interpret sequencing data. As the field of medicine begins to incorporate genome-scale analysis into clinical care, approaches need to be developed for collecting and characterizing data on the clinical implications of variants, developing consensus on their actionability, and making this information available for clinical use. The National Human Genome Research Institute (NHGRI) and the Wellcome Trust thus convened a workshop to consider the processes and resources needed to: 1) identify clinically valid genetic variants; 2) decide whether they are actionable and what the action should be; and 3) provide this information for clinical use. This commentary outlines the key discussion points and recommendations from the workshop.
genomic medicine; clinical actionability; database; electronic health records (EHR); pharmacogenomics; DNA sequencing
Heart failure is more prevalent among African Americans than in the general population. Metabolomic studies among African Americans may efficiently identify novel biomarkers of heart failure. We used untargeted methods to measure 204 stable serum metabolites and evaluated their associations with incident heart failure hospitalization (n = 276) after a median follow-up of 20 years (1987–2008) by using Cox regression in data from 1,744 African Americans aged 45–64 years without heart failure at baseline from the Jackson, Mississippi, field center of the Atherosclerosis Risk in Communities (ARIC) Study. After adjustment for established risk factors, we found that 16 metabolites (6 named with known structural identities and 10 unnamed with unknown structural identities, the latter denoted by using the format X-12345) were associated with incident heart failure (P < 0.0004 based on a modified Bonferroni procedure). Of the 6 named metabolites, 4 are involved in amino acid metabolism, 1 (prolylhydroxyproline) is a dipeptide, and 1 (erythritol) is a sugar alcohol. After additional adjustment for kidney function, 2 metabolites remained associated with incident heart failure (for metabolite X-11308, hazard ratio = 0.75, 95% confidence interval: 0.65, 0.86; for metabolite X-11787, hazard ratio = 1.23, 95% confidence interval: 1.10, 1.37). Further structural analysis revealed X-11308 to be a dihydroxy docosatrienoic acid and X-11787 to be an isoform of either hydroxyleucine or hydroxyisoleucine. Our metabolomic analysis revealed novel biomarkers associated with incident heart failure independent of traditional risk factors.
heart failure; metabolomics; risk factors
Both the prevalence and incidence of heart failure (HF) are increasing, especially among African-Americans, but no large-scale, genome-wide association study (GWAS) of HF-related metabolites have been reported. We sought to identify novel genetic variants that are associated with metabolites previously reported to relate to HF incidence. GWASs of three metabolites identified previously as risk factors for incident HF (pyroglutamine, dihydroxy docosatrienoic acid and X-11787, being either hydroxy-leucine or hydroxy-isoleucine) were performed in 1260 African-Americans free of HF at the baseline examination of the Atherosclerosis Risk in Communities (ARIC) study. A significant association on chromosome 5q33 (rs10463316, MAF = 0.358, p-value = 1.92×10−10) was identified for pyroglutamine. One region on chromosome 2p13 contained a nonsynonymous substitution in N-acetyltransferase 8 (NAT8) was associated with X-11787 (rs13538, MAF = 0.481, p-value = 1.71×10−23). The smallest p-value for dihydroxy docosatrienoic acid was rs4006531 on chromosome 8q24 (MAF = 0.400, p-value = 6.98×10−7). None of the above SNPs were individually associated with incident HF, but a genetic risk score (GRS) created by summing the most significant risk alleles from each metabolite detected 11% greater risk of HF per allele. In summary, we identified three loci associated with previously reported HF-related metabolites. Further use of metabolomics technology will facilitate replication of these findings in independent samples.
metabolomics; genome-wide association; African-Americans; heart failure
Electrocardiographic QRS duration, a measure of cardiac intraventricular conduction, varies ~2-fold in individuals without cardiac disease. Slow conduction may promote reentrant arrhythmias.
Methods and Results
We performed a genome-wide association study (GWAS) to identify genomic markers of QRS duration in 5,272 individuals without cardiac disease selected from electronic medical record (EMR) algorithms at five sites in the Electronic Medical Records and Genomics (eMERGE) network. The most significant loci were evaluated within the CHARGE consortium QRS GWAS meta-analysis. Twenty-three single nucleotide polymorphisms in 5 loci, previously described by CHARGE, were replicated in the eMERGE samples; 18 SNPs were in the chromosome 3 SCN5A and SCN10A loci, where the most significant SNPs were rs1805126 in SCN5A with p=1.2×10−8 (eMERGE) and p=2.5×10−20 (CHARGE) and rs6795970 in SCN10A with p=6×10−6 (eMERGE) and p=5×10−27 (CHARGE). The other loci were in NFIA, near CDKN1A, and near C6orf204. We then performed phenome-wide association studies (PheWAS) on variants in these five loci in 13,859 European Americans to search for diagnoses associated with these markers. PheWAS identified atrial fibrillation and cardiac arrhythmias as the most common associated diagnoses with SCN10A and SCN5A variants. SCN10A variants were also associated with subsequent development of atrial fibrillation and arrhythmia in the original 5,272 “heart-healthy” study population.
We conclude that DNA biobanks coupled to EMRs provide a platform not only for GWAS but may also allow broad interrogation of the longitudinal incidence of disease associated with genetic variants. The PheWAS approach implicated sodium channel variants modulating QRS duration in subjects without cardiac disease as predictors of subsequent arrhythmias.
cardiac conduction; QRS duration; atrial fibrillation; genome-wide association study; phenome-wide association study; electronic medical records
Candidate gene and genome-wide association studies (GWAS) have identified genetic variants that modulate risk for human disease; many of these associations require further study to replicate the results. Here we report the first large-scale application of the phenome-wide association study (PheWAS) paradigm within electronic medical records (EMRs), an unbiased approach to replication and discovery that interrogates relationships between targeted genotypes and multiple phenotypes. We scanned for associations between 3,144 single-nucleotide polymorphisms (previously implicated by GWAS as mediators of human traits) and 1,358 EMR-derived phenotypes in 13,835 individuals of European ancestry. This PheWAS replicated 66% (51/77) of sufficiently powered prior GWAS associations and revealed 63 potentially pleiotropic associations with P < 4.6 × 10−6 (false discovery rate < 0.1); the strongest of these novel associations were replicated in an independent cohort (n = 7,406). These findings validate PheWAS as a tool to allow unbiased interrogation across multiple phenotypes in EMR-based cohorts and to enhance analysis of the genomic basis of human disease.
The National Human Genome Research Institute (NHGRI) Catalog of Published Genome-Wide Association Studies (GWAS) Catalog provides a publicly available manually curated collection of published GWAS assaying at least 100 000 single-nucleotide polymorphisms (SNPs) and all SNP-trait associations with P <1 × 10−5. The Catalog includes 1751 curated publications of 11 912 SNPs. In addition to the SNP-trait association data, the Catalog also publishes a quarterly diagram of all SNP-trait associations mapped to the SNPs’ chromosomal locations. The Catalog can be accessed via a tabular web interface, via a dynamic visualization on the human karyotype, as a downloadable tab-delimited file and as an OWL knowledge base. This article presents a number of recent improvements to the Catalog, including novel ways for users to interact with the Catalog and changes to the curation infrastructure.
Nicotine dependence is a highly heritable disorder associated with severe medical morbidity and mortality. Recent meta-analyses have found novel genetic loci associated with cigarettes per day (CPD), a proxy for nicotine dependence. The aim of this paper is to evaluate the importance of phenotype definition (i.e. CPD versus Fagerström Test for Cigarette Dependence (FTCD) score as a measure of nicotine dependence) on genome-wide association studies of nicotine dependence.
Genome-wide association study
A total of 3,365 subjects who had smoked at least one cigarette were selected from the Study of Addiction: Genetics and Environment (SAGE). Of the participants, 2,267 were European Americans,999 were African Americans.
Nicotine dependence defined by FTCD score ≥4, CPD
The genetic locus most strongly associated with nicotine dependence was rs1451240 on chromosome 8 in the region of CHRNB3 (OR=0.65, p=2.4×10−8). This association was further strengthened in a meta-analysis with a previously published dataset (combined p=6.7 ×10−16, total n=4,200).When CPD was used as an alternate phenotype, the association no longer reached genome-wide significance (β=−0.08, p=0.0007).
Daily cigarette consumption and the Fagerstrom Test for Cigarette Dependence (FTCD) show different associations with polymorphisms in genetic loci.
The Electronic Medical Records and Genomics (eMERGE) Network is a National Human Genome Research Institute (NHGRI)-funded consortium engaged in the development of methods and best-practices for utilizing the Electronic Medical Record (EMR) as a tool for genomic research. Now in its sixth year, its second funding cycle and comprising nine research groups and a coordinating center, the network has played a major role in validating the concept that clinical data derived from EMRs can be used successfully for genomic research. Current work is advancing knowledge in multiple disciplines at the intersection of genomics and healthcare informatics, particularly electronic phenotyping, genome-wide association studies, genomic medicine implementation and the ethical and regulatory issues associated with genomics research and returning results to study participants. Here we describe the evolution, accomplishments, opportunities and challenges of the network since its inception as a five-group consortium focused on genotype-phenotype associations for genomic discovery to its current form as a nine-group consortium pivoting towards implementation of genomic medicine.
electronic medical records; personalized medicine; genome-wide association studies; genetics and genomics; collaborative research
Only one LDL-C GWAS has been reported in African Americans. We performed a GWAS of LDL-C in African Americans using data extracted from electronic medical records (EMR) in the eMERGE network. African Americans were genotyped on the Illumina 1M chip. All LDL-C measurements, prescriptions, and diagnoses of concomitant disease were extracted from EMR. We created two analytic datasets; one dataset having median LDL-C calculated after the exclusion of some lab values based on co-morbidities and medication (n = 618) and another dataset having median LDL-C calculated without any exclusions (n = 1249). Rs7412 in APOE was strongly associated with LDL-C at levels of GWAS significance in both datasets (p < 5 X 10−8). In the dataset with exclusions, a decrease of 20.0 mg/dl per minor allele was observed. The effect size was attenuated (12.3 mg/dl) in the dataset without any lab values excluded. Although other signals in APOE have been detected in previous GWAS, this large and important SNP association has not been well detected in large GWAS because rs7412 was not included on many genotyping arrays. Use of median LDL-C extracted from EMR after exclusions for medications and co-morbidities increased the percentage of trait variance explained by genetic variation.
GWAS; LDL; electronic medical records
To identify novel genetic loci influencing interindividual variation in red blood cell (RBC) traits in African-Americans, we conducted a genome-wide association study (GWAS) in 2315 individuals, divided into discovery (n = 1904) and replication (n = 411) cohorts. The traits included hemoglobin concentration (HGB), hematocrit (HCT), RBC count, mean corpuscular volume (MCV), mean corpuscular hemoglobin (MCH), and mean corpuscular hemoglobin concentration (MCHC). Patients were participants in the electronic MEdical Records and GEnomics (eMERGE) network and underwent genotyping of ~1.2 million single-nucleotide polymorphisms on the Illumina Human1M-Duo array. Association analyses were performed adjusting for age, sex, site, and population stratification. Three loci previously associated with resistance to malaria—HBB (11p15.4), HBA1/HBA2 (16p13.3), and G6PD (Xq28)—were associated (P ≤ 1 × 10−6) with RBC traits in the discovery cohort. The loci replicated in the replication cohort (P ≤ 0.02), and were significant at a genome-wide significance level (P < 5 × 10−8) in the combined cohort. The proportions of variance in RBC traits explained by significant variants at these loci were as follows: rs7120391 (near HBB) 1.3% of MCHC, rs9924561 (near HBA1/A2) 5.5% of MCV, 6.9% of MCH and 2.9% of MCHC, and rs1050828 (in G6PD) 2.4% of RBC count, 2.9% of MCV, and 1.4% of MCH, respectively. We were not able to replicate loci identified by a previous GWAS of RBC traits in a European ancestry cohort of similar sample size, suggesting that the genetic architecture of RBC traits differs by race. In conclusion, genetic variants that confer resistance to malaria are associated with RBC traits in African-Americans.
red blood cell (RBC) traits; genome-wide association study; African-Americans; natural selection; informatics; electronic medical record
Clinical data in Electronic Medical Records (EMRs) is a potential source of longitudinal clinical data for research. The Electronic Medical Records and Genomics Network or eMERGE investigates whether data captured through routine clinical care using EMRs can identify disease phenotypes with sufficient positive and negative predictive values for use in genome wide association studies (GWAS). Using data from five different sets of EMRs, we have identified five disease phenotypes with positive predictive values of 73–98% and negative predictive values of 98–100%. A majority of EMRs captured key information (diagnoses, medications, laboratory tests) used to define phenotypes in a structured format. We identified natural language processing as an important tool to improve case identification rates. Efforts and incentives to increase the implementation of interoperable EMRs will markedly improve the availability of clinical data for genomics research.
Genome-wide association studies (GWAS) are a useful approach in the study of the genetic components of complex phenotypes. Aside from large cohorts, GWAS have generally been limited to the study of one or a few diseases or traits. The emergence of biobanks linked to electronic medical records (EMRs) allows the efficient re-use of genetic data to yield meaningful genotype-phenotype associations for multiple phenotypes or traits. Phase I of the electronic MEdical Records and GEnomics (eMERGE-I) Network is a National Human Genome Research Institute (NHGRI)-supported consortium composed of five sites to perform various genetic association studies using DNA repositories and EMR systems. Each eMERGE site has developed EMR-based algorithms to comprise a core set of fourteen phenotypes for extraction of study samples from each site’s DNA repository. Each eMERGE site selected samples for a specific phenotype, and these samples were genotyped at either the Broad Institute or at the Center for Inherited Disease Research (CIDR) using the Illumina Infinium BeadChip technology. In all, approximately 17,000 samples from across the five sites were genotyped. A unified quality control (QC) pipeline was developed by the eMERGE Genomics Working Group and used to ensure thorough cleaning of the data. This process includes examination of sample quality, marker quality, and various batch effects. Upon completion of the genotyping and QC analyses for each site’s primary study, the eMERGE Coordinating Center merged the datasets from all five sites. This larger merged dataset re-entered the established eMERGE QC pipeline. Based on lessons learned during the process, additional analyses and QC checkpoints were added to the pipeline to ensure proper merging. Here we explore the challenges associated with combining datasets from different genotyping centers and describe the expansion to the eMERGE QC pipeline for merged datasets. These additional steps will be useful as the eMERGE project expands to include additional sites in eMERGE-II and also serve as a starting point for investigators merging multiple genotype data sets accessible through the National Center for Biotechnology Information (NCBI) in the database of Genotypes and Phenotypes (dbGaP). Our experience demonstrates that merging multiple datasets after additional QC can be an efficient use of genotype data despite new challenges that appear in the process.
quality control; genome-wide association (GWAS); eMERGE; dbGaP; merging datasets
Genetic imputation has become standard practice in modern genetic studies. However, several important issues have not been adequately addressed including the utility of study-specific reference, performance in admixed populations, and quality for less common (minor allele frequency [MAF] 0.005–0.05) and rare (MAF < 0.005) variants. These issues only recently became addressable with genome-wide association studies (GWAS) follow-up studies using dense genotyping or sequencing in large samples of non-European individuals. In this work, we constructed a study-specific reference panel of 3,924 haplotypes using African Americans in the Women’s Health Initiative (WHI) genotyped on both the Metabochip and the Affymetrix 6.0 GWAS platform. We used this reference panel to impute into 6,459 WHI SNP Health Association Resource (SHARe) study subjects with only GWAS genotypes. Our analysis confirmed the imputation quality metric Rsq (estimated r2, specific to each SNP) as an effective post-imputation filter. We recommend different Rsq thresholds for different MAF categories such that the average (across SNPs) Rsq is above the desired dosage r2 (squared Pearson correlation between imputed and experimental genotypes).With a desired dosage r2 of 80%, 99.9% (97.5%, 83.6%, 52.0%, 20.5%) of SNPs with MAF > 0.05 (0.03–0.05, 0.01–0.03, 0.005–0.01, and 0.001–0.005) passed the post-imputation filter. The average dosage r2 for these SNPs is 94.7%, 92.1%, 89.0%, 83.1%, and 79.7%, respectively. These results suggest that for African Americans imputation of Metabochip SNPs from GWAS data, including low frequency SNPs with MAF 0.005–0.05, is feasible and worthwhile for power increase in downstream association analysis provided a sizable reference panel is available.
genotype imputation; Metabochip; internal reference; African Americans; rare variants
Recommendations and guidance on how to handle the return of genetic results to patients have offered limited insight into how to approach incidental genetic findings in the context of clinical trials. This paper provides the Genomics and Randomized Trials Network (GARNET) recommendations on incidental genetic findings in the context of clinical trials, and discusses the ethical and practical issues considered in formulating our recommendations. There are arguments in support of as well as against returning incidental genetic findings in clinical trials. For instance, reporting incidental findings in clinical trials may improve the investigator-participant relationship and the satisfaction of participation, but it may also blur the line between clinical care and research. The issues of whether and how to return incidental genetic findings, including the costs of doing so, should be considered when developing clinical trial protocols. Once decided, plans related to sharing individual results from the aim(s) of the trial, as well as incidental findings, should be discussed explicitly in the consent form. Institutional Review Boards (IRBs) and other study-specific governing bodies should be part of the decision as to if, when, and how to return incidental findings, including when plans in this regard are being reconsidered.
Clonal mosaicism for large chromosomal anomalies (duplications, deletions and uniparental disomy) was detected using SNP microarray data from over 50,000 subjects recruited for genome-wide association studies. This detection method requires a relatively high frequency of cells (>5–10%) with the same abnormal karyotype (presumably of clonal origin) in the presence of normal cells. The frequency of detectable clonal mosaicism in peripheral blood is low (<0.5%) from birth until 50 years of age, after which it rises rapidly to 2–3% in the elderly. Many of the mosaic anomalies are characteristic of those found in hematological cancers and identify common deleted regions that pinpoint the locations of genes previously associated with hematological cancers. Although only 3% of subjects with detectable clonal mosaicism had any record of hematological cancer prior to DNA sampling, those without a prior diagnosis have an estimated 10-fold higher risk of a subsequent hematological cancer (95% confidence interval = 6–18).
To identify common genetic variants influencing red blood cell (RBC) traits.
Patients and Methods
We performed a genomewide association study from June 2008 through July 2011 of hemoglobin, hematocrit, RBC count, mean corpuscular volume, mean corpuscular hemoglobin, and mean corpuscular hemoglobin concentration in 12,486 patients of European ancestry from the electronic MEdical Records and Genomics (eMERGE) network. We developed an electronic medical record–based algorithm that included individuals who had RBC measurements obtained for clinical care and excluded values measured in the setting of hematopoietic disorders, comorbid conditions, or medications known to affect RBC production or a recent history of blood loss.
We identified 4 new genetic loci and replicated 11 loci previously reported to be associated with one or more RBC traits in individuals of European ancestry. Notably, genes present in 3 of the 4 newly identified loci (THRB, PTPLAD1, CDT1) and in 6 of the 11 replicated loci (KLF1, ALDH8A1, CCND3, SPTA1, FBXO7, TFR2/EPO) are implicated in erythroid differentiation and regulation of cell cycle in hematopoietic stem cells.
Genes in the erythroid differentiation and cell cycle regulation pathways influence interindividual variation in RBC indices. Our results provide insights into the molecular basis underlying variation in RBC traits.
eMERGE, electronic MEdical Records and GEnomics; EMMAX, mixed-model association-expedited; EMR, electronic medical record; eQTL, expression quantitative trait locus; GHC, Group Health Cooperative--University of Washington; GWAS, genomewide association study; HCT, hematocrit; HGB, hemoglobin; IBS, identity-by-state; LD, linkage disequilibrium; MC, Marshfield Clinic; MCH, mean corpuscular hemoglobin; MCHC, mean corpuscular hemoglobin concentration; MCV, mean corpuscular volume; MIM, Mendelian Inheritance of Man; NU, Northwestern University; RBC, red blood cell; SNP, single-nucleotide polymorphism; VUMC, Vanderbilt University Medical Center
Background and Purpose
Does progression of MRI-defined vascular disease predict subsequent vascular events in the elderly?
The Cardiovascular Health Study, a longitudinal cohort study of vascular disease in the elderly, allows the question to be answered because its participants had two MRI scans about five years apart and have been followed for about 9 years since the follow-up scan for incident vascular events.
Both MRI-defined incident infarcts and worsened white matter grade (WMG) were significantly associated with heart failure (HF), stroke and death but not transient ischemic attacks, angina, or myocardial infarction. Strongest associations occurred when both incident infarcts and worsened WMG were present: for HF, hazard ratio 1.79 (95% confidence interval 1.18–2.73); for stroke, 2.58 (1.53–4.36); for death, 1.69 (1.28–2.24); and for cardiovascular death 1.97 (1.24–3.14).
Progression of MRI-defined vascular disease identifies elderly people at increased risk of subsequent HF, stroke, and death. Whether aggressive risk factor management would reduce risk is unknown.
MRI; brain infarction; leukoaraiosis; stroke; death