Vancomycin, a commonly used antibiotic, can be nephrotoxic. Known risk factors such as age, creatinine clearance, vancomycin dose / dosing interval, and concurrent nephrotoxic medications fail to accurately predict nephrotoxicity. To identify potential genomic risk factors, we performed a genome-wide association study (GWAS) of serum creatinine levels while on vancomycin in 489 European American individuals and validated findings in three independent cohorts totaling 439 European American individuals. In primary analyses, the chromosome 6q22.31 locus was associated with increased serum creatinine levels while on vancomycin therapy (most significant variant rs2789047, risk allele A, β = -0.06, p = 1.1 x 10-7). SNPs in this region had consistent directions of effect in the validation cohorts, with a meta-p of 1.1 x 10-7. Variation in this region on chromosome 6, which includes the genes TBC1D32/C6orf170 and GJA1 (encoding connexin43), may modulate risk of vancomycin-induced kidney injury.
The QT interval, an electrocardiographic measure reflecting myocardial repolarization, is a heritable trait. QT prolongation is a risk factor for ventricular arrhythmias and sudden cardiac death (SCD) and could indicate the presence of the potentially lethal Mendelian Long QT Syndrome (LQTS). Using a genome-wide association and replication study in up to 100,000 individuals we identified 35 common variant QT interval loci, that collectively explain ∼8-10% of QT variation and highlight the importance of calcium regulation in myocardial repolarization. Rare variant analysis of 6 novel QT loci in 298 unrelated LQTS probands identified coding variants not found in controls but of uncertain causality and therefore requiring validation. Several newly identified loci encode for proteins that physically interact with other recognized repolarization proteins. Our integration of common variant association, expression and orthogonal protein-protein interaction screens provides new insights into cardiac electrophysiology and identifies novel candidate genes for ventricular arrhythmias, LQTS,and SCD.
genome-wide association study; QT interval; Long QT Syndrome; sudden cardiac death; myocardial repolarization; arrhythmias
The electronic MEdical Records and GEnomics (eMERGE) network brings together DNA biobanks linked to electronic health records (EHRs) from multiple institutions. Approximately 51,000 DNA samples from distinct individuals have been genotyped using genome-wide SNP arrays across the nine sites of the network. The eMERGE Coordinating Center and the Genomics Workgroup developed a pipeline to impute and merge genomic data across the different SNP arrays to maximize sample size and power to detect associations with a variety of clinical endpoints. The 1000 Genomes cosmopolitan reference panel was used for imputation. Imputation results were evaluated using the following metrics: accuracy of imputation, allelic R2 (estimated correlation between the imputed and true genotypes), and the relationship between allelic R2 and minor allele frequency. Computation time and memory resources required by two different software packages (BEAGLE and IMPUTE2) were also evaluated. A number of challenges were encountered due to the complexity of using two different imputation software packages, multiple ancestral populations, and many different genotyping platforms. We present lessons learned and describe the pipeline implemented here to impute and merge genomic data sets. The eMERGE imputed dataset will serve as a valuable resource for discovery, leveraging the clinical data that can be mined from the EHR.
imputation; genome-wide association; eMERGE; electronic health records
Thyroid stimulating hormone (TSH) hormone levels are normally tightly regulated within an individual; thus, relatively small variations may indicate thyroid disease. Genome-wide association studies (GWAS) have identified variants in PDE8B and FOXE1 that are associated with TSH levels. However, prior studies lacked racial/ethnic diversity, limiting the generalization of these findings to individuals of non-European ethnicities. The Electronic Medical Records and Genomics (eMERGE) Network is a collaboration across institutions with biobanks linked to electronic medical records (EMRs). The eMERGE Network uses EMR-derived phenotypes to perform GWAS in diverse populations for a variety of phenotypes. In this report, we identified serum TSH levels from 4,501 European American and 351 African American euthyroid individuals in the eMERGE Network with existing GWAS data. Tests of association were performed using linear regression and adjusted for age, sex, body mass index (BMI), and principal components, assuming an additive genetic model. Our results replicate the known association of PDE8B with serum TSH levels in European Americans (rs2046045 p = 1.85×10−17, β = 0.09). FOXE1 variants, associated with hypothyroidism, were not genome-wide significant (rs10759944: p = 1.08×10−6, β = −0.05). No SNPs reached genome-wide significance in African Americans. However, multiple known associations with TSH levels in European ancestry were nominally significant in African Americans, including PDE8B (rs2046045 p = 0.03, β = −0.09), VEGFA (rs11755845 p = 0.01, β = −0.13), and NFIA (rs334699 p = 1.50×10−3, β = −0.17). We found little evidence that SNPs previously associated with other thyroid-related disorders were associated with serum TSH levels in this study. These results support the previously reported association between PDE8B and serum TSH levels in European Americans and emphasize the need for additional genetic studies in more diverse populations.
Combining samples across multiple cohorts in large-scale scientific research programs is often required to achieve the necessary power for genome-wide association studies. Controlling for genomic ancestry through principal component analysis (PCA) to address the effect of population stratification is a common practice. In addition to local genomic variation, such as copy number variation and inversions, other factors directly related to combining multiple studies, such as platform and site recruitment bias, can drive the correlation patterns in PCA. In this report, we describe the combination and analysis of multi-ethnic cohort with biobanks linked to electronic health records for large-scale genomic association discovery analyses. First, we outline the observed site and platform bias, in addition to ancestry differences. Second, we outline a general protocol for selecting variants for input into the subject variance-covariance matrix, the conventional PCA approach. Finally, we introduce an alternative approach to PCA by deriving components from subject loadings calculated from a reference sample. This alternative approach of generating principal components controlled for site and platform bias, in addition to ancestry differences, has the advantage of fewer covariates and degrees of freedom.
principal component analysis; ancestry; biobank; loadings; genetic association study
The objective of this study was to identify genetic variants associated with angiotensin-converting enzyme (ACE) inhibitor-associated angioedema.
Participants and methods
We carried out a genome-wide association study in 175 individuals with ACE inhibitor-associated angioedema and 489 ACE inhibitor-exposed controls from Nashville (Tennessee) and Marshfield (Wisconsin). We tested for replication in 19 cases and 57 controls who participated in Ongoing Telmisartan Alone and in Combination with Ramipril Global Endpoint Trial (ONTARGET).
There were no genome-wide significant associations of any single-nucleotide polymorphism (SNP) with angioedema. Sixteen SNPs in African Americans and 41 SNPs in European Americans were associated moderately with angioedema (P<10−4) and evaluated for association in ONTARGET. The T allele of rs500766 in PRKCQ was associated with a reduced risk, whereas the G allele of rs2724635 in ETV6 was associated with an increased risk of ACE inhibitor-associated angioedema in the Nashville/Marshfield sample and ONTARGET. In a candidate gene analysis, rs989692 in the gene encoding neprilysin (MME), an enzyme that degrades bradykinin and substance P, was significantly associated with angioedema in ONTARGET and Nashville/Marshfield African Americans.
Unlike other serious adverse drug effects, ACE inhibitor-associated angioedema is not associated with a variant with a large effect size. Variants in MME and genes involved in immune regulation may be associated with ACE inhibitor-associated angioedema.
adverse drug event; angioedema; angiotensin-converting enzyme; neprilysin
Phenome-wide association studies (PheWAS) have demonstrated utility in validating genetic associations derived from traditional genetic studies as well as identifying novel genetic associations. Here we used an electronic health record (EHR)-based PheWAS to explore pleiotropy of genetic variants in the fat mass and obesity associated gene (FTO), some of which have been previously associated with obesity and type 2 diabetes (T2D). We used a population of 10,487 individuals of European ancestry with genome-wide genotyping from the Electronic Medical Records and Genomics (eMERGE) Network and another population of 13,711 individuals of European ancestry from the BioVU DNA biobank at Vanderbilt genotyped using Illumina HumanExome BeadChip. A meta-analysis of the two study populations replicated the well-described associations between FTO variants and obesity (odds ratio [OR] = 1.25, 95% Confidence Interval = 1.11–1.24, p = 2.10 × 10−9) and FTO variants and T2D (OR = 1.14, 95% CI = 1.08–1.21, p = 2.34 × 10−6). The meta-analysis also demonstrated that FTO variant rs8050136 was significantly associated with sleep apnea (OR = 1.14, 95% CI = 1.07–1.22, p = 3.33 × 10−5); however, the association was attenuated after adjustment for body mass index (BMI). Novel phenotype associations with obesity-associated FTO variants included fibrocystic breast disease (rs9941349, OR = 0.81, 95% CI = 0.74–0.91, p = 5.41 × 10−5) and trends toward associations with non-alcoholic liver disease and gram-positive bacterial infections. FTO variants not associated with obesity demonstrated other potential disease associations including non-inflammatory disorders of the cervix and chronic periodontitis. These results suggest that genetic variants in FTO may have pleiotropic associations, some of which are not mediated by obesity.
PheWAS; genetic association; pleiotropy; Exome chip; FTO; BMI
Electrocardiographic QRS duration, a measure of cardiac intraventricular conduction, varies ~2-fold in individuals without cardiac disease. Slow conduction may promote reentrant arrhythmias.
Methods and Results
We performed a genome-wide association study (GWAS) to identify genomic markers of QRS duration in 5,272 individuals without cardiac disease selected from electronic medical record (EMR) algorithms at five sites in the Electronic Medical Records and Genomics (eMERGE) network. The most significant loci were evaluated within the CHARGE consortium QRS GWAS meta-analysis. Twenty-three single nucleotide polymorphisms in 5 loci, previously described by CHARGE, were replicated in the eMERGE samples; 18 SNPs were in the chromosome 3 SCN5A and SCN10A loci, where the most significant SNPs were rs1805126 in SCN5A with p=1.2×10−8 (eMERGE) and p=2.5×10−20 (CHARGE) and rs6795970 in SCN10A with p=6×10−6 (eMERGE) and p=5×10−27 (CHARGE). The other loci were in NFIA, near CDKN1A, and near C6orf204. We then performed phenome-wide association studies (PheWAS) on variants in these five loci in 13,859 European Americans to search for diagnoses associated with these markers. PheWAS identified atrial fibrillation and cardiac arrhythmias as the most common associated diagnoses with SCN10A and SCN5A variants. SCN10A variants were also associated with subsequent development of atrial fibrillation and arrhythmia in the original 5,272 “heart-healthy” study population.
We conclude that DNA biobanks coupled to EMRs provide a platform not only for GWAS but may also allow broad interrogation of the longitudinal incidence of disease associated with genetic variants. The PheWAS approach implicated sodium channel variants modulating QRS duration in subjects without cardiac disease as predictors of subsequent arrhythmias.
cardiac conduction; QRS duration; atrial fibrillation; genome-wide association study; phenome-wide association study; electronic medical records
Genetic variants of the enzyme that metabolizes warfarin, cytochrome P-450 2C9 (CYP2C9), and of a key pharmacologic target of warfarin, vitamin K epoxide reductase (VKORC1), contribute to differences in patients’ responses to various warfarin doses, but the role of these variants during initial anticoagulation is not clear.
In 297 patients starting warfarin therapy, we assessed CYP2C9 genotypes (CYP2C9 *1, *2, and *3), VKORC1 haplotypes (designated A and non-A), clinical characteristics, response to therapy (as determined by the international normalized ratio [INR]), and bleeding events. The study outcomes were the time to the first INR within the therapeutic range, the time to the first INR of more than 4, the time above the therapeutic INR range, the INR response over time, and the warfarin dose requirement.
As compared with patients with the non-A/non-A haplotype, patients with the A/A haplotype of VKORC1 had a decreased time to the first INR within the therapeutic range (P = 0.02) and to the first INR of more than 4 (P = 0.003). In contrast, the CYP2C9 genotype was not a significant predictor of the time to the first INR within the therapeutic range (P = 0.57) but was a significant predictor of the time to the first INR of more than 4 (P = 0.03). Both the CYP2C9 genotype and VKORC1 haplotype had a significant influence on the required warfarin dose after the first 2 weeks of therapy.
Initial variability in the INR response to warfarin was more strongly associated with genetic variability in the pharmacologic target of warfarin, VKORC1, than with CYP2C9.
Alzheimer disease (AD) is a devastating neurodegenerative disease affecting more than five million Americans. In this study, we have used updated genetic linkage data from chromosome 10 in combination with expression data from serial analysis of gene expression to choose a new set of thirteen candidate genes for genetic analysis in late onset Alzheimer disease (LOAD). Results in this study identify the KIAA1462 locus as a candidate locus for LOAD in APOE4 carriers. Two genes exist at this locus, KIAA1462, a gene associated with coronary artery disease, and “rokimi”, encoding an untranslated spliced RNA The genetic architecture at this locus suggests that the gene product important in this association is either “rokimi”, or a different isoform of KIAA1462 than the isoform that is important in cardiovascular disease. Expression data suggests that isoform f of KIAA1462 is a more attractive candidate for association with LOAD in APOE4 carriers than “rokimi” which had no detectable expression in brain.
Genome-wide association studies (GWAS) are a useful approach in the study of the genetic components of complex phenotypes. Aside from large cohorts, GWAS have generally been limited to the study of one or a few diseases or traits. The emergence of biobanks linked to electronic medical records (EMRs) allows the efficient re-use of genetic data to yield meaningful genotype-phenotype associations for multiple phenotypes or traits. Phase I of the electronic MEdical Records and GEnomics (eMERGE-I) Network is a National Human Genome Research Institute (NHGRI)-supported consortium composed of five sites to perform various genetic association studies using DNA repositories and EMR systems. Each eMERGE site has developed EMR-based algorithms to comprise a core set of fourteen phenotypes for extraction of study samples from each site’s DNA repository. Each eMERGE site selected samples for a specific phenotype, and these samples were genotyped at either the Broad Institute or at the Center for Inherited Disease Research (CIDR) using the Illumina Infinium BeadChip technology. In all, approximately 17,000 samples from across the five sites were genotyped. A unified quality control (QC) pipeline was developed by the eMERGE Genomics Working Group and used to ensure thorough cleaning of the data. This process includes examination of sample quality, marker quality, and various batch effects. Upon completion of the genotyping and QC analyses for each site’s primary study, the eMERGE Coordinating Center merged the datasets from all five sites. This larger merged dataset re-entered the established eMERGE QC pipeline. Based on lessons learned during the process, additional analyses and QC checkpoints were added to the pipeline to ensure proper merging. Here we explore the challenges associated with combining datasets from different genotyping centers and describe the expansion to the eMERGE QC pipeline for merged datasets. These additional steps will be useful as the eMERGE project expands to include additional sites in eMERGE-II and also serve as a starting point for investigators merging multiple genotype data sets accessible through the National Center for Biotechnology Information (NCBI) in the database of Genotypes and Phenotypes (dbGaP). Our experience demonstrates that merging multiple datasets after additional QC can be an efficient use of genotype data despite new challenges that appear in the process.
quality control; genome-wide association (GWAS); eMERGE; dbGaP; merging datasets
Drug-induced long QT syndrome (diLQTS) is an adverse drug effect that has an important impact on drug use, development, and regulation. Here, we tested the hypothesis that common variants in key genes controlling cardiac electrical properties modify the risk of diLQTS.
Methods and Results
In a case-control setting, we included 176 patients of European descent from North America and Europe with diLQTS, defined as documented torsades de pointes during treatment with a QT prolonging drug. Control samples were obtained from 207 patients of European ancestry who displayed <50 msec QT lengthening during initiation of therapy with a QT-prolonging drug, and 837 controls from the population based KORA study. Subjects were successfully genotyped at 1,424 single nucleotide polymorphisms (SNPs) in 18 candidate genes including 1,386 SNPs tagging common haplotype blocks, and 38 non-synonymous ion channel gene SNPs. For validation we used a set of cases (n=57) and population-based controls of European descent. The SNP KCNE1 D85N (rs1805128), known to modulate an important potassium current in the heart, predicted diLQTS with an odds ratio of 9.0 (95% confidence interval: 3.5–22.9). The variant allele was present in 8.6% of cases, 2.9% of drug-exposed controls, and 1.8% of population controls. In the validation cohort the variant allele was present in 3.5% of cases, and in 1.4% of controls.
This high-density candidate SNP approach identified a key potassium channel susceptibility allele that may be associated with the rare adverse drug reaction torsades de pointes.
candidate genes; death, sudden; SNP; torsade de pointes; adverse drug events
Genome-wide association studies (GWAS) are being conducted at an unprecedented rate in population-based cohorts and have increased our understanding of the pathophysiology of complex disease. The recent application of GWAS to clinic-based cohorts has also yielded genetic predictors of clinical outcomes. Regardless of context, the practical utility of this information will ultimately depend upon the quality of the original data. Quality control (QC) procedures for GWAS are computationally intensive, operationally challenging, and constantly evolving. With each new dataset, new realities are discovered about GWAS data and best practices continue to be developed. The Genomics Workgroup of the National Human Genome Research Institute (NHGRI) funded electronic Medical Records and Genomics (eMERGE) network has invested considerable effort in developing strategies for QC of these data. The lessons learned by this group will be valuable for other investigators dealing with large scale genomic datasets. Here we enumerate some of the challenges in QC of GWAS data and describe the approaches that the eMERGE network is using for quality assurance in GWAS data, thereby minimizing potential bias and error in GWAS results. In this protocol we discuss common issues associated with QC of GWAS data, including data file formats, software packages for data manipulation and analysis, sex chromosome anomalies, sample identity, sample relatedness, population substructure, batch effects, and marker quality. We propose best practices and discuss areas of ongoing and future research.
Multiple sclerosis is a debilitating neuroimmunological and neurodegenerative disease affecting more than 400,000 individuals in the United States. Population and family-based studies have suggested that there is a strong genetic component. Numerous genomic linkage screens have identified regions of interest for MS loci. Our own second-generation genome-wide linkage study identified a handful of non-MHC regions with suggestive linkage. Several of these regions were further examined using single-nucleotide polymorphisms (SNPs) with average spacing between SNPs of approximately 1.0 Mb in a dataset of 173 multiplex families. The results of that study provided further evidence for the involvement of the chromosome 1q43 region. This region is of particular interest given linkage evidence in studies of other autoimmune and inflammatory diseases including rheumatoid arthritis and systemic lupus erythematosus. In this follow-up study, we saturated the region with ~700 SNPs (average spacing of 10kb per SNP) in search of disease associated variation within this region. We found preliminary evidence to suggest that common variation within the RGS7 locus may be involved in disease susceptibility.
multiple sclerosis; linkage; association; 1q43; RGS7
A substantial body of research supports a genetic involvement in autism. Furthermore, results from various genomic screens implicate a region on chromosome 7q31 as harboring an autism susceptibility variant. We previously narrowed this 34 cM region to a 3 cM critical region (located between D7S496 and D7S2418) using the Collaborative Linkage Study of Autism (CLSA) chromosome 7 linked families. This interval encompasses about 4.5 Mb of genomic DNA and encodes over fifty known and predicted genes. Four candidate genes (NRCAM, LRRN3, KIAA0716, and LAMB1) in this region were chosen for examination based on their proximity to the marker most consistently cosegregating with autism in these families (D7S1817), their tissue expression patterns, and likely biological relevance to autism.
Thirty-six intronic and exonic single nucleotide polymorphisms (SNPs) and one microsatellite marker within and around these four candidate genes were genotyped in 30 chromosome 7q31 linked families. Multiple SNPs were used to provide as complete coverage as possible since linkage disequilibrium can vary dramatically across even very short distances within a gene. Analyses of these data used the Pedigree Disequilibrium Test for single markers and a multilocus likelihood ratio test.
As expected, linkage disequilibrium occurred within each of these genes but we did not observe significant LD across genes. None of the polymorphisms in NRCAM, LRRN3, or KIAA0716 gave p < 0.05 suggesting that none of these genes is associated with autism susceptibility in this subset of chromosome 7-linked families. However, with LAMB1, the allelic association analysis revealed suggestive evidence for a positive association, including one individual SNP (p = 0.02) and three separate two-SNP haplotypes across the gene (p = 0.007, 0.012, and 0.012).
NRCAM, LRRN3, KIAA0716 are unlikely to be involved in autism. There is some evidence that variation in or near the LAMB1 gene may be involved in autism.