Environment-wide association studies (EWAS) provide a way to uncover the environmental mechanisms involved in complex traits in a high-throughput manner. Genome-wide association studies have led to the discovery of genetic variants associated with many common diseases but do not take into account the environmental component of complex phenotypes. This EWAS assesses the comprehensive association between environmental variables and the outcome of type 2 diabetes (T2D) in the Marshfield Personalized Medicine Research Project Biobank (Marshfield PMRP). We sought replication in two National Health and Nutrition Examination Surveys (NHANES). The Marshfield PMRP currently uses four tools for measuring environmental exposures and outcome traits: 1) the PhenX Toolkit includes standardized exposure and phenotypic measures across several domains, 2) the Diet History Questionnaire (DHQ) is a food frequency questionnaire, 3) the Measurement of a Person's Habitual Physical Activity scores the level of an individual's physical activity, and 4) electronic health records (EHR) employs validated algorithms to establish T2D case-control status. Using PLATO software, 314 environmental variables were tested for association with T2D using logistic regression, adjusting for sex, age, and BMI in over 2,200 European Americans. When available, similar variables were tested with the same methods and adjustment in samples from NHANES III and NHANES 1999-2002. Twelve and 31 associations were identified in the Marshfield samples at p<0.01 and p<0.05, respectively. Seven and 13 measures replicated in at least one of the NHANES at p<0.01 and p<0.05, respectively, with the same direction of effect. The most significant environmental exposures associated with T2D status included decreased alcohol use as well as increased smoking exposure in childhood and adulthood. The results demonstrate the utility of the EWAS method and survey tools for identifying environmental components of complex diseases like type 2 diabetes. These high-throughput and comprehensive investigation methods can easily be applied to investigate the relation between environmental exposures and multiple phenotypes in future analyses.
Platelets are enucleated cell fragments derived from megakaryocytes that play key roles in hemostasis and in the pathogenesis of atherothrombosis and cancer. Platelet traits are highly heritable and identification of genetic variants associated with platelet traits and assessing their pleiotropic effects may help to understand the role of underlying biological pathways. We conducted an electronic medical record (EMR)-based study to identify common variants that influence inter-individual variation in the number of circulating platelets (PLT) and mean platelet volume (MPV), by performing a genome-wide association study (GWAS). We characterized association of variants influencing MPV and PLT using functional, pathway and disease enrichment analysis assess pleiotropic effects of such variants by performing a phenome-wide association study (PheWAS) with a wide range of EMR-derived phenotypes. A total of 13,582 participants in the electronic MEdical Records and GEnomic (eMERGE) network had data for PLT and 6,291 participants had data for MPV. We identified 5 chromosomal regions associated with PLT and 8 associated with MPV at genome-wide significance (P<5E-8). In addition, we replicated 20 SNPs (out of 56 SNPs (α: 0.05/56=9E-4)) influencing PLT and 22 SNPs (out of 29 SNPs (α: 0.05/29=2E-3)) influencing MPV in a meta-analysis of GWAS of PLT and MPV. While our GWAS did not reveal any novel associations, our functional analyses revealed that genes in these regions influence thrombopoiesis and encode kinases, membrane proteins, proteins involved in cellular trafficking, transcription factors, proteasome complex subunits, proteins of signal transduction pathways, proteins involved in megakaryocyte development and platelet production and hemostasis. PheWAS using a single-SNP Bonferroni correction for 1368 diagnoses (0.05/1368=3.6E-5) revealed that several variants in these genes have pleiotropic associations with myocardial infarction, autoimmune and hematologic disorders. We conclude that multiple genetic loci influence interindividual variation in platelet traits and also have significant pleiotropic effects; the related genes are in multiple functional pathways including those relevant to thrombopoiesis.
To examine Lynch Syndrome (LS) screening of metastatic colorectal cancer (mCRC) patients in integrated healthcare delivery organizations.
We determined the availability of LS screening criteria and actual LS screening in the medical records among 1,188 patients diagnosed with mCRC between 2004–2009 at seven institutions in the Cancer Research Network (CRN).
We found infrequent use of LS screening (41/1188). Family history was available for 937 of the 1188 patients (79%). There was sufficient information to assess LS risk using family history based criteria in 719 of the 937 patients (77%) with family history documentation. In 391 individuals with a family history of a LS-associated cancer, 107 (27%) could not be evaluated due to missing information such as age of cancer onset. Eleven percent of patients who met Bethesda criteria and 25% of individuals who met the Amsterdam II criteria were screened for LS. When screening occurred, it followed recommended guidelines, but no testing method was preferred.
The information required for LS screening decisions is routinely collected but seldom utilized. There is a critical gap between collection of family history and its use to guide LS screening, which may support a case for implementation of universal screening guidelines.
Lynch Syndrome; genetic testing; metastatic colorectal cancer; family history; hereditary cancer screening
Thyroid stimulating hormone (TSH) hormone levels are normally tightly regulated within an individual; thus, relatively small variations may indicate thyroid disease. Genome-wide association studies (GWAS) have identified variants in PDE8B and FOXE1 that are associated with TSH levels. However, prior studies lacked racial/ethnic diversity, limiting the generalization of these findings to individuals of non-European ethnicities. The Electronic Medical Records and Genomics (eMERGE) Network is a collaboration across institutions with biobanks linked to electronic medical records (EMRs). The eMERGE Network uses EMR-derived phenotypes to perform GWAS in diverse populations for a variety of phenotypes. In this report, we identified serum TSH levels from 4,501 European American and 351 African American euthyroid individuals in the eMERGE Network with existing GWAS data. Tests of association were performed using linear regression and adjusted for age, sex, body mass index (BMI), and principal components, assuming an additive genetic model. Our results replicate the known association of PDE8B with serum TSH levels in European Americans (rs2046045 p = 1.85×10−17, β = 0.09). FOXE1 variants, associated with hypothyroidism, were not genome-wide significant (rs10759944: p = 1.08×10−6, β = −0.05). No SNPs reached genome-wide significance in African Americans. However, multiple known associations with TSH levels in European ancestry were nominally significant in African Americans, including PDE8B (rs2046045 p = 0.03, β = −0.09), VEGFA (rs11755845 p = 0.01, β = −0.13), and NFIA (rs334699 p = 1.50×10−3, β = −0.17). We found little evidence that SNPs previously associated with other thyroid-related disorders were associated with serum TSH levels in this study. These results support the previously reported association between PDE8B and serum TSH levels in European Americans and emphasize the need for additional genetic studies in more diverse populations.
Electronic health records (EHRs) are an emerging relational domain with large potential to improve clinical outcomes. We apply two statistical relational learning (SRL) algorithms to the task of predicting primary myocardial infarction. We show that one SRL algorithm, relational functional gradient boosting, outperforms propositional learners particularly in the medically-relevant high recall region. We observe that both SRL algorithms predict outcomes better than their propositional analogs and suggest how our methods can augment current epidemiological practices.
Large-scale multiple testing tasks often exhibit dependence, and leveraging the dependence between individual tests is still one challenging and important problem in statistics. With recent advances in graphical models, it is feasible to use them to perform multiple testing under dependence. We propose a multiple testing procedure which is based on a Markov-random-field-coupled mixture model. The ground truth of hypotheses is represented by a latent binary Markov random-field, and the observed test statistics appear as the coupled mixture variables. The parameters in our model can be automatically learned by a novel EM algorithm. We use an MCMC algorithm to infer the posterior probability that each hypothesis is null (termed local index of significance), and the false discovery rate can be controlled accordingly. Simulations show that the numerical performance of multiple testing can be improved substantially by using our procedure. We apply the procedure to a real-world genome-wide association study on breast cancer, and we identify several SNPs with strong association evidence.
In the National Cancer Institute Cancer Genetic Markers of Susceptibility (CGEMS) genome-wide association study of breast cancer, a single nucleotide polymorphism (SNP) marker, rs999737, in the 14q24.1 interval, was associated with breast cancer risk. In order to fine map this region, we imputed a 3.93MB region flanking rs999737 for Stages 1 and 2 of the CGEMS study (5,692 cases, 5,576 controls) using the combined reference panels of the HapMap 3 and the 1000 Genomes Project. Single-marker association testing and variable-sized sliding-window haplotype analysis were performed, and for both analyses the initial tagging SNP rs999737 retained the strongest association with breast cancer risk. Investigation of contiguous regions did not reveal evidence for an additional independent signal. Therefore, we conclude that rs999737 is an optimal tag SNP for common variants in the 14q24.1 region and thus narrow the candidate variants that should be investigated in follow-up laboratory evaluation.
RAD51L1; breast cancer; genome-wide association study; fine-mapping; imputation
The objective of this study was to identify genetic variants associated with angiotensin-converting enzyme (ACE) inhibitor-associated angioedema.
Participants and methods
We carried out a genome-wide association study in 175 individuals with ACE inhibitor-associated angioedema and 489 ACE inhibitor-exposed controls from Nashville (Tennessee) and Marshfield (Wisconsin). We tested for replication in 19 cases and 57 controls who participated in Ongoing Telmisartan Alone and in Combination with Ramipril Global Endpoint Trial (ONTARGET).
There were no genome-wide significant associations of any single-nucleotide polymorphism (SNP) with angioedema. Sixteen SNPs in African Americans and 41 SNPs in European Americans were associated moderately with angioedema (P<10−4) and evaluated for association in ONTARGET. The T allele of rs500766 in PRKCQ was associated with a reduced risk, whereas the G allele of rs2724635 in ETV6 was associated with an increased risk of ACE inhibitor-associated angioedema in the Nashville/Marshfield sample and ONTARGET. In a candidate gene analysis, rs989692 in the gene encoding neprilysin (MME), an enzyme that degrades bradykinin and substance P, was significantly associated with angioedema in ONTARGET and Nashville/Marshfield African Americans.
Unlike other serious adverse drug effects, ACE inhibitor-associated angioedema is not associated with a variant with a large effect size. Variants in MME and genes involved in immune regulation may be associated with ACE inhibitor-associated angioedema.
adverse drug event; angioedema; angiotensin-converting enzyme; neprilysin
The incidence of angiotensin-converting enzyme (ACE) inhibitor-associated angioedema is increased in patients with seasonal allergies.
We tested the hypothesis that patients with ACE inhibitor-associated angioedema present during months when pollen counts are increased.
Cohort analysis examined the month of presentation of ACE inhibitor-associated angioedema and pollen counts in the ambulatory and hospital setting. Patients with ACE inhibitor-associated angioedema were ascertained through (1) an observational study of patients presenting to Vanderbilt University Medical Center, (2) patients presenting to the Marshfield Clinic and participating in the Marshfield Clinic Personalized Medicine Research Project, and (3) patients enrolled in The Ongoing Telmisartan Alone and in Combination with Ramipril Global Endpoint Trial (ONTARGET). Measurements include date of presentation of ACE inhibitor-associated angioedema, population exposure to ACE inhibitor by date, and local pollen counts by date.
At Vanderbilt, the rate of angioedema was significantly associated with tree pollen months (P = .01 from χ2 test). When separate analyses were conducted in patients with a history of seasonal allergies and patients without, the rate of ACE inhibitor-associated angioedema was increased during tree pollen months only in patients with a history of seasonal allergies (P = .002). In Marshfield, the rate of angioedema was significantly associated with ragweed pollen months (P = .025). In ONTARGET, a positive trend was observed between the ACE inhibitor-associated angioedema rate and grass season, although it was not statistically significant (P = .057).
Patients with ACE inhibitor-associated angioedema are more likely to present with this adverse drug event during months when pollen counts are increased.
Angiotensin-converting enzyme inhibitor; angioedema; pollen; bradykinin; substance P; seasonal allergies
Many informed consent studies demonstrate that research subjects poorly retain and understand information in written consent documents. Previous research in multimedia consent is mixed in terms of success for improving participants’ understanding, satisfaction, and retention. This failure may be due to a lack of a community-centered design approach to building the interventions. The goal of this study was to gather information from the community to determine the best way to undertake the consent process. Community perceptions regarding different computer-based consenting approaches were evaluated, and a computer-based consent was developed and tested. A second goal was to evaluate whether participants make truly informed decisions to participate in research. Simulations of an informed consent process were videotaped to document the process. Focus groups were conducted to determine community attitudes towards a computer-based informed consent process. Hybrid focus groups were conducted to determine the most acceptable hardware device. Usability testing was conducted on a computer-based consent prototype using a touch-screen kiosk. Based on feedback, a computer-based consent was developed. Representative study participants were able to easily complete the consent, and all were able to correctly answer the comprehension check questions. Community involvement in developing a computer-based consent proved valuable for a population-based genetic study. These findings may translate to other types of informed consents, such as genetic clinical trials consents. A computer-based consent may serve to better communicate consistent, clear, accurate, and complete information regarding the risks and benefits of study participation. Additional analysis is necessary to measure the level of comprehension of the check-question answers by larger numbers of participants. The next step will involve contacting participants to measure whether understanding of what they consented to is retained over time.
Decision making; focus groups; genetic research; computer-based informed consent; usability
Phenome-wide association studies (PheWAS) have demonstrated utility in validating genetic associations derived from traditional genetic studies as well as identifying novel genetic associations. Here we used an electronic health record (EHR)-based PheWAS to explore pleiotropy of genetic variants in the fat mass and obesity associated gene (FTO), some of which have been previously associated with obesity and type 2 diabetes (T2D). We used a population of 10,487 individuals of European ancestry with genome-wide genotyping from the Electronic Medical Records and Genomics (eMERGE) Network and another population of 13,711 individuals of European ancestry from the BioVU DNA biobank at Vanderbilt genotyped using Illumina HumanExome BeadChip. A meta-analysis of the two study populations replicated the well-described associations between FTO variants and obesity (odds ratio [OR] = 1.25, 95% Confidence Interval = 1.11–1.24, p = 2.10 × 10−9) and FTO variants and T2D (OR = 1.14, 95% CI = 1.08–1.21, p = 2.34 × 10−6). The meta-analysis also demonstrated that FTO variant rs8050136 was significantly associated with sleep apnea (OR = 1.14, 95% CI = 1.07–1.22, p = 3.33 × 10−5); however, the association was attenuated after adjustment for body mass index (BMI). Novel phenotype associations with obesity-associated FTO variants included fibrocystic breast disease (rs9941349, OR = 0.81, 95% CI = 0.74–0.91, p = 5.41 × 10−5) and trends toward associations with non-alcoholic liver disease and gram-positive bacterial infections. FTO variants not associated with obesity demonstrated other potential disease associations including non-inflammatory disorders of the cervix and chronic periodontitis. These results suggest that genetic variants in FTO may have pleiotropic associations, some of which are not mediated by obesity.
PheWAS; genetic association; pleiotropy; Exome chip; FTO; BMI
Emerging biomarkers for acute myocardial infarction (AMI) may enhance conventional risk prediction algorithms if they are informative and associated with risk independently of established predictors. In this study we constructed a cohort for testing emerging biomarkers for AMI in managed care populations using existing biospecimen repositories linked to EHR.
EHR-based biorepositories collected by healthcare systems can be federated to provide large, methodologically-sound testing sets for biomarker validation.
Subjects aged 40 to 80 were selected from two existing population-based biospecimen repositories. Incident AMI status and covariates were ascertained from EHR. An ad-hoc model for AMI risk was parameterized and validated. Simulation was used to test incremental gains in performance due to the inclusion of biomarkers in this model. Gains in performance were assessed in terms of area under the ROC curve and case reclassification.
A total of 18,329 individuals (57% female) contributed 108,400 person-years of EHR follow-up. The crude AMI incidence was 10.8 and 5.0 per 1,000 person-years among males and females, respectively. Compared to the model with risk factors alone, inclusion of a simulated biomarker yielded substantial gains in sensitivity without loss of specificity. Furthermore, a net ROC-AUC gain of 13.3% was observed as well as correct reclassification of 9.8% of incident cases (79 of 806) that were otherwise not considered statin-indicated at baseline under ATPIII criteria.
More research is needed to assess incremental contribution of emerging biomarkers for AMI prediction in managed care populations.
Human epidermal growth factor receptor 2 (HER2) expression is amplified in about 20% of breast cancer tumors, and evaluation of HER2 status should influence therapy selection. A critical gap in our knowledge is the real-world implementation of HER2 testing and its impact on treatment decisions for women diagnosed with breast cancer.
Study aims were to assess use of HER2 testing, to describe characteristics of patients who do or do not receive HER2 testing, to describe which HER2 tests were used (fluorescence in situ hybridization [FISH] or immunohistochemistry [IHC]), and to evaluate trastuzumab use as a function of HER2 results.
The population included 6,460 women diagnosed with invasive breast cancer between 1999 and 2007 at eight geographically distributed Cancer Research Network health care delivery systems in the United States.
Electronic records were used to identify patient and tumor characteristics and treatment with trastuzumab, and chart abstraction was performed for 400 women (50 per site) to identify receipt of HER2 testing and results.
Over 90% of study participants received HER2 testing. Everyone who received trastuzumab had a HER2 test, and nearly all (>95%) who received trastuzumab had a positive HER2 test result recorded in their medical chart. Most (77%) eligible cases with a positive HER2 test result diagnosed after 2005 received trastuzumab. This study expands upon previous work in individual health plans.
HER2 status has been successfully incorporated into medical practice to guide treatment decisions for breast cancer patients in diverse integrated health care delivery settings.
herceptin; trastuzumab; pharmacogenomics; her2neu; breast cancer
The electronic MEdical Records & GEnomics (eMERGE) network was established in 2007 by the National Human Genome Research Institute (NHGRI) of the National Institutes of Health (NIH) in part to explore the utility of electronic medical records (EMRs) in genome science. The initial focus was on discovery primarily using the genome-wide association paradigm, but more recently, the network has begun evaluating mechanisms to implement new genomic information coupled to clinical decision support into EMRs. Herein, we describe this evolution including the development of the individual and merged eMERGE genomic datasets, the contribution the network has made toward genomic discovery and human health, and the steps taken toward the next generation genotype-phenotype association studies and clinical implementation.
biobanks; genome-wide association studies; pharmacogenomics; electronic medical records
Experimental and epidemiologic studies suggest that vitamin D metabolites (1,25-dihydroxyvitamin D [1,25(OH)2D] and its precursor 25-hydroxyvitamin D [25(OH)D]) may reduce breast cancer risk. We examined subsequent breast cancer risk related to serum levels of these metabolites. In a cohort of women ages 55 to 74 years, who donated blood at baseline (1993–2001) in the Prostate, Lung, Colorectal, and Ovarian Cancer Screening Trial, we identified 1,005 incident breast cancer cases during follow-up through 2005 (mean time between blood draw and diagnosis, 3.9 years). Noncases (n = 1,005) were frequency matched to the cases based on age and year of entry. Sample weights that accounted for unequal probabilities of selecting cases and noncases were applied to make inferences that reflected the entire Prostate, Lung, Colorectal, and Ovarian cohort. Using Cox proportional hazards modeling, we computed breast cancer relative risks (RR) and 95% confidence intervals (95% CI) by quintile for each metabolite. The RR of breast cancer for the highest quintile of 25(OH)D concentration versus the lowest was 1.04 (95% CI, 0.75–1.45; Ptrend = 0.81). Similarly, the breast cancer RR for the highest quintile of 1,25(OH)2D compared with the lowest was 1.23 (95% CI, 0.91–1.68; Ptrend = 0.14). Excluding the first 2 years of follow-up did not materially alter these estimates. There was also no evidence of inverse risk in older women (≥60 years) versus younger women (<60 years). In this prospective study of postmenopausal women, we did not observe an inverse association between circulating 25(OH)D or 1,25(OH)2D and breast cancer risk, although we cannot exclude an association in younger women or with long-term or earlier exposure.
With white blood cell count emerging as an important risk factor for chronic inflammatory diseases, genetic associations of differential leukocyte types, specifically monocyte count, are providing novel candidate genes and pathways to further investigate. Circulating monocytes play a critical role in vascular diseases such as in the formation of atherosclerotic plaque. We performed a joint and ancestry-stratified genome-wide association analyses to identify variants specifically associated with monocyte count in 11 014 subjects in the electronic Medical Records and Genomics Network. In the joint and European ancestry samples, we identified novel associations in the chromosome 16 interferon regulatory factor 8 (IRF8) gene (P-value = 2.78×10(−16), β = −0.22). Other monocyte associations include novel missense variants in the chemokine-binding protein 2 (CCBP2) gene (P-value = 1.88×10(−7), β = 0.30) and a region of replication found in ribophorin I (RPN1) (P-value = 2.63×10(−16), β = −0.23) on chromosome 3. The CCBP2 and RPN1 region is located near GATA binding protein2 gene that has been previously shown to be associated with coronary heart disease. On chromosome 9, we found a novel association in the prostaglandin reductase 1 gene (P-value = 2.29×10(−7), β = 0.16), which is downstream from lysophosphatidic acid receptor 1. This region has previously been shown to be associated with monocyte count. We also replicated monocyte associations of genome-wide significance (P-value = 5.68×10(−17), β = −0.23) at the integrin, alpha 4 gene on chromosome 2. The novel IRF8 results and further replications provide supporting evidence of genetic regions associated with monocyte count.
We use genotype data from the Marshfield Clinical Research Foundation Personalized Medicine Research Project to investigate genetic similarity and divergence between Europeans and the sampled population of European Americans in Central Wisconsin, USA. To infer recent genetic ancestry of the sampled Wisconsinites, we train support vector machines (SVMs) on the positions of Europeans along top principal components (PCs). Our SVM models partition continent-wide European genetic variance into eight regional classes, which is an improvement over the geographically broader categories of recent ancestry reported by personal genomics companies. After correcting for misclassification error associated with the SVMs (<10%, in all cases), we observe a >14% discrepancy between insular ancestries reported by Wisconsinites and those inferred by SVM. Values of FST as well as Mantel tests for correlation between genetic and European geographic distances indicate minimal divergence between Europe and the local Wisconsin population. However, we find that individuals from the Wisconsin sample show greater dispersion along higher-order PCs than individuals from Europe. Hypothesizing that this pattern is characteristic of nascent divergence, we run computer simulations that mimic the recent peopling of Wisconsin. Simulations corroborate the pattern in higher-order PCs, demonstrate its transient nature, and show that admixture accelerates the rate of divergence between the admixed population and its parental sources relative to drift alone. Together, empirical and simulation results suggest that genetic divergence between European source populations and European Americans in Central Wisconsin is subtle but already under way.
population structure; genetic ancestry; admixture; support vector machine; principal component analysis
Epidermal growth factor receptor (EGFR) inhibitors are approved for treating metastatic colorectal cancer (CRC); KRAS mutation testing is recommended prior to treatment. We conducted a non-inferiority analysis to examine whether KRAS testing has impacted survival in CRC patients.
Patients and Methods
We included 1186 metastatic CRC cases from seven health plans. A cutpoint of July, 2008, was used to define two KRAS testing time period groups: “pre-testing” (n = 760 cases) and “post-testing” (n = 426 cases). Overall survival (OS) was estimated, and the difference in median OS between the groups was calculated. The lower bound of the one-sided 95% confidence interval (CI) for the difference in survival was used to test the null hypothesis of post-testing inferiority. Multivariable Cox regression models were constructed to adjust for covariates.
The median unadjusted OS was 15.4 months (95% CI: 14.0–17.5) and 12.8 months (95% CI: 10.0–15.2) in the pre- and post-testing groups, respectively. The OS difference was −2.6 months with one-sided 95% lower confidence bound of −5.13 months, which was less than the non-inferiority margin (−5.0 months, unadjusted p = 0.06), leading to a failure to reject inferiority of OS in the post-testing period. In contrast, in the adjusted analysis, OS non-inferiority was identified in the post-testing period (p = 0.001). Sensitivity analyses using cutpoints before and after July, 2008, also met the criteria for non-inferiority.
Implementation of KRAS testing did not influence CRC OS. Our data support the use of KRAS testing to guide administration of EGFR inhibitors for treatment of metastatic CRC without diminished OS.
Electrocardiographic QRS duration, a measure of cardiac intraventricular conduction, varies ~2-fold in individuals without cardiac disease. Slow conduction may promote reentrant arrhythmias.
Methods and Results
We performed a genome-wide association study (GWAS) to identify genomic markers of QRS duration in 5,272 individuals without cardiac disease selected from electronic medical record (EMR) algorithms at five sites in the Electronic Medical Records and Genomics (eMERGE) network. The most significant loci were evaluated within the CHARGE consortium QRS GWAS meta-analysis. Twenty-three single nucleotide polymorphisms in 5 loci, previously described by CHARGE, were replicated in the eMERGE samples; 18 SNPs were in the chromosome 3 SCN5A and SCN10A loci, where the most significant SNPs were rs1805126 in SCN5A with p=1.2×10−8 (eMERGE) and p=2.5×10−20 (CHARGE) and rs6795970 in SCN10A with p=6×10−6 (eMERGE) and p=5×10−27 (CHARGE). The other loci were in NFIA, near CDKN1A, and near C6orf204. We then performed phenome-wide association studies (PheWAS) on variants in these five loci in 13,859 European Americans to search for diagnoses associated with these markers. PheWAS identified atrial fibrillation and cardiac arrhythmias as the most common associated diagnoses with SCN10A and SCN5A variants. SCN10A variants were also associated with subsequent development of atrial fibrillation and arrhythmia in the original 5,272 “heart-healthy” study population.
We conclude that DNA biobanks coupled to EMRs provide a platform not only for GWAS but may also allow broad interrogation of the longitudinal incidence of disease associated with genetic variants. The PheWAS approach implicated sodium channel variants modulating QRS duration in subjects without cardiac disease as predictors of subsequent arrhythmias.
cardiac conduction; QRS duration; atrial fibrillation; genome-wide association study; phenome-wide association study; electronic medical records
Candidate gene and genome-wide association studies (GWAS) have identified genetic variants that modulate risk for human disease; many of these associations require further study to replicate the results. Here we report the first large-scale application of the phenome-wide association study (PheWAS) paradigm within electronic medical records (EMRs), an unbiased approach to replication and discovery that interrogates relationships between targeted genotypes and multiple phenotypes. We scanned for associations between 3,144 single-nucleotide polymorphisms (previously implicated by GWAS as mediators of human traits) and 1,358 EMR-derived phenotypes in 13,835 individuals of European ancestry. This PheWAS replicated 66% (51/77) of sufficiently powered prior GWAS associations and revealed 63 potentially pleiotropic associations with P < 4.6 × 10−6 (false discovery rate < 0.1); the strongest of these novel associations were replicated in an independent cohort (n = 7,406). These findings validate PheWAS as a tool to allow unbiased interrogation across multiple phenotypes in EMR-based cohorts and to enhance analysis of the genomic basis of human disease.
The electronic Medical Records and Genomics (eMERGE) (Phase I) network was established in 2007 to further genomic discovery using biorepositories linked to the electronic health record (EHR). In Phase II, which began in 2011, genomic discovery efforts continue and in addition the network is investigating best practices for implementing genomic medicine, in particular, the return of genomic results in the EHR for use by physicians at point-of-care. To develop strategies for addressing the challenges of implementing genomic medicine in the clinical setting, the eMERGE network is conducting studies that return clinically-relevant genomic results to research participants and their health care providers. These genomic medicine pilot studies include returning individual genetic variants associated with disease susceptibility or drug response, as well as genetic risk scores for common “complex” disorders. Additionally, as part of a network-wide pharmacogenomics-related project, targeted resequencing of 84 pharmacogenes is being performed and select genotypes of pharmacogenetic relevance are being placed in the EHR to guide individualized drug therapy. Individual sites within the eMERGE network are exploring mechanisms to address incidental findings generated by resequencing of the 84 pharmacogenes. In this paper, we describe studies being conducted within the eMERGE network to develop best practices for integrating genomic findings into the EHR, and the challenges associated with such work.
genomics; electronic health records; incidental findings; implementation; genetic counseling; next generation sequencing; pharmacogenetics
The purpose of this manuscript is to describe the PhenX RISING network and the site experiences in the implementation of PhenX measures into ongoing population-based genomic studies.
Eighty PhenX measures were implemented across the seven PhenX RISING groups, thirty-three of which were used at more than two sites, allowing for cross-site collaboration. Each site used between four and 37 individual measures and five of the sites are validating the PhenX measures through comparison with other study measures. Self-administered and computer-based administration modes are being evaluated at several sites which required changes to the original PhenX Toolkit protocols. A network-wide data use agreement was developed to facilitate data sharing and collaboration.
PhenX Toolkit measures have been collected for more than 17,000 participants across the PhenX RISING network. The process of implementation provided information that was used to improve the PhenX Toolkit. The Toolkit was revised to allow researchers to select self- or interviewer administration when creating the data collection worksheets and ranges of specimens necessary to run biological assays has been added to the Toolkit.
The PhenX RISING network has demonstrated that the PhenX Toolkit measures can be implemented successfully in ongoing genomic studies. The next step will be to conduct gene/environment studies.
PhenX; Phenotype; Epidemiology; Risk factors; Harmonization
A retrospective chart review of cases with congenital vertebral malformations (CVM) and controls with normal spine morphology.
To determine the relative contribution of maternal environmental factors (MEF) during pregnancy including maternal insulin dependent diabetes mellitus, valproic acid, alcohol, smoking, hyperthermia, twin gestation, assisted reproductive technology, in-vitro fertilization and maternal clomiphene usage to CVM development.
Summary of Background Data
Congenital vertebral malformations (CVM) represent defects in formation and segmentation of somites occurring with an estimated incidence of between 0.13–0.50 per 1000 live births. CVM may be associated with congenital scoliosis, Klippel-Feil syndrome, hemifacial microsomia and VACTERL syndromes, and represent significant morbidity due to pain and cosmetic disfigurement.
A multicenter retrospective chart review of 229 cases with CVM and 267 controls with normal spine morphology between the ages of 1–50 years was performed in order to obtain the odds ratio (OR) of MEF related to CVM among cases vs. controls. CVM due to an underlying syndrome associated with a known gene mutation or chromosome etiology were excluded. An imputation based analysis was performed in which subjects with no documentation of MEF history were treated as no maternal exposure.” Univariate and multivariate analysis was conducted to calculate the OR.
Of the 229 total cases, 104 cases had single or multiple CVM without additional congenital malformations (CM) (Group 1) and 125 cases had single or multiple CVM and additional CM (Group 2). Nineteen percent of total cases had an identified MEF. The OR (95% CI, P-value) for MEF history for Group 1 was 6.0 (2.4–15.1, P<0.001) in the univariate analysis. The OR for MEF history in Group 2 was 9.1 (95%CI, P-value) (3.8–21.6, P<0.001) in the univariate analysis. The results were confirmed in the multivariate analysis, after adjusting for age, gender, and institution.
These results support a hypothesis for an association between the above MEF during pregnancy and CVM and have implications for development of prevention strategies. Further prospective studies are needed to quantify association between CVM and specific MEF.
Efforts to define the genetic architecture underlying variable statin response have met with limited success possibly because previous studies were limited to effect based on one-single-dose. We leveraged electronic medical records (EMRs) to extract potency (ED50) and efficacy (Emax) of statin dose-response curves and tested them for association with 144 pre-selected variants. Two large biobanks were used to construct dose-response curves for 2,026 (simvastatin) and 2,252 subjects (atorvastatin). Atorvastatin was more efficacious, more potent, and demonstrated less inter-individual variability than simvastatin. A pharmacodynamic variant emerging from randomized trials (PRDM16) was associated with Emax for both. For atorvastatin, Emax was 51.7 mg/dl in homozygous for the minor allele versus 75.0 mg/dl for those homozygous for the major allele. We also identified several loci associated with ED50. The extraction of rigorously defined traits from EMRs for pharmacogenetic studies represents a promising approach to further understand of genetic factors contributing to drug response.
Integrating genomic information into clinical care and the electronic health record can facilitate personalized medicine through genetically guided clinical decision support. Stakeholder involvement is critical to the success of these implementation efforts. Prior work on implementation of clinical information systems provides broad guidance to inform effective engagement strategies. We add to this evidence-based recommendations that are specific to issues at the intersection of genomics and the electronic health record. We describe stakeholder engagement strategies employed by the Electronic Medical Records and Genomics Network, a national consortium of US research institutions funded by the National Human Genome Research Institute to develop, disseminate, and apply approaches that combine genomic and electronic health record data. Through select examples drawn from sites of the Electronic Medical Records and Genomics Network, we illustrate a continuum of engagement strategies to inform genomic integration into commercial and homegrown electronic health records across a range of health-care settings. We frame engagement as activities to consult, involve, and partner with key stakeholder groups throughout specific phases of health information technology implementation. Our aim is to provide insights into engagement strategies to guide genomic integration based on our unique network experiences and lessons learned within the broader context of implementation research in biomedical informatics. On the basis of our collective experience, we describe key stakeholder practices, challenges, and considerations for successful genomic integration to support personalized medicine.
electronic health records; genomics; health information technology; personalized medicine; stakeholder engagement; translational medical research