1.  Genetic Variation among 82 Pharmacogenes: the PGRN-Seq data from the eMERGE Network 
Genetic variation can affect drug response in multiple ways, though it remains unclear how rare genetic variants affect drug response. The electronic Medical Records and Genomics (eMERGE) Network, collaborating with the Pharmacogenomics Research Network, began eMERGE-PGx, a targeted sequencing study to assess genetic variation in 82 pharmacogenes critical for implementation of “precision medicine.” The February 2015 eMERGE-PGx data release includes sequence-derived data from ~5000 clinical subjects. We present the variant frequency spectrum categorized by variant type, ancestry, and predicted function. We found 95.12% of genes have variants with a scaled CADD score above 20, and 96.19% of all samples had one or more Clinical Pharmacogenetics Implementation Consortium Level A actionable variants. These data highlight the distribution and scope of genetic variation in relevant pharmacogenes, identifying challenges associated with implementing clinical sequencing for drug treatment at a broader level, underscoring the importance for multifaceted research in the execution of precision medicine.
PMCID: PMC5010878  PMID: 26857349
2.  Global Implementation of Genomic Medicine: We Are Not Alone 
Science translational medicine  2015;7(290):290ps13.
Advances in high-throughput genomic technologies coupled with a growing number of genomic results potentially useful in clinical care have led to ground-breaking genomic medicine implementation programs in various nations. Many of these innovative programs capitalize on unique local capabilities arising from the structure of their health care systems or their cultural or political milieu, as well as from unusual burdens of disease or risk alleles. Many such programs are being conducted in relative isolation and might benefit from sharing of approaches and lessons learned in other nations. The National Human Genome Research Institute recently brought together 25 of these groups from around the world to describe and compare projects, examine the current state of implementation and desired near-term capabilities, and identify opportunities for collaboration to promote the responsible implementation of genomic medicine.
The wide variety of nascent programs in diverse settings demonstrates that implementation of genomic medicine is expanding globally in varied and highly innovative ways. Opportunities for collaboration abound in the areas of evidence generation, health information technology, education, workforce development, pharmacogenomics, and policy and regulatory issues. Several international organizations that are already facilitating effective research collaborations should engage to ensure implementation proceeds collaboratively without potentially wasteful duplication. Efforts to coalesce these groups around concrete but compelling signature projects, such as global eradication of genetically-mediated drug reactions or developing a truly global genomic variant data resource across a wide number of ethnicities, would accelerate appropriate implementation of genomics to improve clinical care world-wide.
PMCID: PMC4898888  PMID: 26041702
medical genomics; implementation; global collaborations; practice standards; pharmacogenomics; personalized medicine; precision medicine
3.  A Genome-Wide Association Study Identifies Variants in KCNIP4 Associated with ACE Inhibitor Induced Cough 
The pharmacogenomics journal  2015;16(3):231-237.
The most common side effect of angiotensin converting enzyme inhibitor drugs (ACEi) is a cough. We conducted a genome wide association study (GWAS) of ACEi-induced cough among 7,080 subjects of diverse ancestries in the eMERGE network. Cases were subjects diagnosed with ACEi-induced cough. Controls were subjects with at least 6 months of ACEi use and no cough. A GWAS (1,595 cases and 5,485 controls) identified associations on chromosome 4 in an intron of KCNIP4. The strongest association was at rs145489027 (MAF=0.33, OR=1.3 [95%CI: 1.2–1.4], p=1.0×10−8). Replication for six SNPs in KCNIP4 was tested in a second eMERGE population (n=926) and in the GoDARTS cohort (n=4,309). Replication was observed at rs7675300 (OR=1.32 [1.01–1.70], p=0.04) in eMERGE and rs16870989 and rs1495509 (OR=1.15 [1.01–1.30], p=0.03 for both) in GoDARTS. The combined association at rs1495509 was significant (OR=1.23 [1.15–1.32], p=1.9×10−9). These results indicate that SNPs in KCNIP4 may modulate ACEi-induced cough risk.
PMCID: PMC4713364  PMID: 26169577
ACE inhibitor; angiotensin converting enzyme inhibitor; GWAS; KCNIP4; Drug Related Side Effects and Adverse Reactions; pharmacogenetics
4.  The IGNITE network: a model for genomic medicine implementation and research 
Patients, clinicians, researchers and payers are seeking to understand the value of using genomic information (as reflected by genotyping, sequencing, family history or other data) to inform clinical decision-making. However, challenges exist to widespread clinical implementation of genomic medicine, a prerequisite for developing evidence of its real-world utility.
To address these challenges, the National Institutes of Health-funded IGNITE (Implementing GeNomics In pracTicE; Network, comprised of six projects and a coordinating center, was established in 2013 to support the development, investigation and dissemination of genomic medicine practice models that seamlessly integrate genomic data into the electronic health record and that deploy tools for point of care decision making. IGNITE site projects are aligned in their purpose of testing these models, but individual projects vary in scope and design, including exploring genetic markers for disease risk prediction and prevention, developing tools for using family history data, incorporating pharmacogenomic data into clinical care, refining disease diagnosis using sequence-based mutation discovery, and creating novel educational approaches.
This paper describes the IGNITE Network and member projects, including network structure, collaborative initiatives, clinical decision support strategies, methods for return of genomic test results, and educational initiatives for patients and providers. Clinical and outcomes data from individual sites and network-wide projects are anticipated to begin being published over the next few years.
The IGNITE Network is an innovative series of projects and pilot demonstrations aiming to enhance translation of validated actionable genomic information into clinical settings and develop and use measures of outcome in response to genome-based clinical interventions using a pragmatic framework to provide early data and proofs of concept on the utility of these interventions. Through these efforts and collaboration with other stakeholders, IGNITE is poised to have a significant impact on the acceleration of genomic information into medical practice.
PMCID: PMC4700677  PMID: 26729011
Precision medicine; Pharmacogenomics; Genomics; Personalized medicine; Clinical decision support; Electronic health record; Implementation
6.  Design and Anticipated Outcomes of the eMERGE-PGx Project: A Multi-Center Pilot for Pre-Emptive Pharmacogenomics in Electronic Health Record Systems 
We describe here the design and initial implementation of the eMERGE-PGx project. eMERGE-PGx, a partnership of the eMERGE and PGRN consortia, has three objectives : 1) Deploy PGRNseq, a next-generation sequencing platform assessing sequence variation in 84 proposed pharmacogenes, in nearly 9,000 patients likely to be prescribed drugs of interest in a 1–3 year timeframe across several clinical sites; 2) Integrate well-established clinically-validated pharmacogenetic genotypes into the electronic health record with associated clinical decision support and assess process and clinical outcomes of implementation; and 3) Develop a repository of pharmacogenetic variants of unknown significance linked to a repository of EHR-based clinical phenotype data for ongoing pharmacogenomics discovery. We describe site-specific project implementation and anticipated products, including genetic variant and phenotype data repositories, novel variant association studies, clinical decision support modules, clinical and process outcomes, approaches to manage incidental findings, and patient and clinician education methods.
PMCID: PMC4169732  PMID: 24960519
pharmacogenetics; pharmacogenomics; next generation sequencing; study design; pre-emptive genotyping
7.  News from the NIH: potential contributions of the behavioral and social sciences to the precision medicine initiative 
PMCID: PMC4537462  PMID: 26327928
Precision medicine; Tailored interventions; Personalized medicine; Mobile health; Health informatics; Pharmacogenetics; Cohort studies; Behavioral risk factors; Environmental risk factors
9.  Characterizing Genetic Variants for Clinical Action 
Genome-wide association studies, DNA sequencing studies, and other genomic studies are finding an increasing number of genetic variants associated with clinical phenotypes that may be useful in developing diagnostic, preventive, and treatment strategies for individual patients. However, few common variants have been integrated into routine clinical practice. The reasons for this are several, but two of the most significant are limited evidence about the clinical implications of the variants and a lack of a comprehensive knowledge base that captures genetic variants, their phenotypic associations, and other pertinent phenotypic information that is openly accessible to clinical groups attempting to interpret sequencing data. As the field of medicine begins to incorporate genome-scale analysis into clinical care, approaches need to be developed for collecting and characterizing data on the clinical implications of variants, developing consensus on their actionability, and making this information available for clinical use. The National Human Genome Research Institute (NHGRI) and the Wellcome Trust thus convened a workshop to consider the processes and resources needed to: 1) identify clinically valid genetic variants; 2) decide whether they are actionable and what the action should be; and 3) provide this information for clinical use. This commentary outlines the key discussion points and recommendations from the workshop.
PMCID: PMC4158437  PMID: 24634402
genomic medicine; clinical actionability; database; electronic health records (EHR); pharmacogenomics; DNA sequencing
10.  Associations Between Metabolomic Compounds and Incident Heart Failure Among African Americans: The ARIC Study 
American Journal of Epidemiology  2013;178(4):534-542.
Heart failure is more prevalent among African Americans than in the general population. Metabolomic studies among African Americans may efficiently identify novel biomarkers of heart failure. We used untargeted methods to measure 204 stable serum metabolites and evaluated their associations with incident heart failure hospitalization (n = 276) after a median follow-up of 20 years (1987–2008) by using Cox regression in data from 1,744 African Americans aged 45–64 years without heart failure at baseline from the Jackson, Mississippi, field center of the Atherosclerosis Risk in Communities (ARIC) Study. After adjustment for established risk factors, we found that 16 metabolites (6 named with known structural identities and 10 unnamed with unknown structural identities, the latter denoted by using the format X-12345) were associated with incident heart failure (P < 0.0004 based on a modified Bonferroni procedure). Of the 6 named metabolites, 4 are involved in amino acid metabolism, 1 (prolylhydroxyproline) is a dipeptide, and 1 (erythritol) is a sugar alcohol. After additional adjustment for kidney function, 2 metabolites remained associated with incident heart failure (for metabolite X-11308, hazard ratio = 0.75, 95% confidence interval: 0.65, 0.86; for metabolite X-11787, hazard ratio = 1.23, 95% confidence interval: 1.10, 1.37). Further structural analysis revealed X-11308 to be a dihydroxy docosatrienoic acid and X-11787 to be an isoform of either hydroxyleucine or hydroxyisoleucine. Our metabolomic analysis revealed novel biomarkers associated with incident heart failure independent of traditional risk factors.
PMCID: PMC3736751  PMID: 23788672
heart failure; metabolomics; risk factors
11.  Genome-Wide Association Study of a Heart Failure-Related Metabolomic Profile among African Americans in the Atherosclerosis Risk in Communities (ARIC) Study 
Genetic epidemiology  2013;37(8):840-845.
Both the prevalence and incidence of heart failure (HF) are increasing, especially among African-Americans, but no large-scale, genome-wide association study (GWAS) of HF-related metabolites have been reported. We sought to identify novel genetic variants that are associated with metabolites previously reported to relate to HF incidence. GWASs of three metabolites identified previously as risk factors for incident HF (pyroglutamine, dihydroxy docosatrienoic acid and X-11787, being either hydroxy-leucine or hydroxy-isoleucine) were performed in 1260 African-Americans free of HF at the baseline examination of the Atherosclerosis Risk in Communities (ARIC) study. A significant association on chromosome 5q33 (rs10463316, MAF = 0.358, p-value = 1.92×10−10) was identified for pyroglutamine. One region on chromosome 2p13 contained a nonsynonymous substitution in N-acetyltransferase 8 (NAT8) was associated with X-11787 (rs13538, MAF = 0.481, p-value = 1.71×10−23). The smallest p-value for dihydroxy docosatrienoic acid was rs4006531 on chromosome 8q24 (MAF = 0.400, p-value = 6.98×10−7). None of the above SNPs were individually associated with incident HF, but a genetic risk score (GRS) created by summing the most significant risk alleles from each metabolite detected 11% greater risk of HF per allele. In summary, we identified three loci associated with previously reported HF-related metabolites. Further use of metabolomics technology will facilitate replication of these findings in independent samples.
PMCID: PMC4079107  PMID: 23934736
metabolomics; genome-wide association; African-Americans; heart failure
12.  Genome- and Phenome-Wide Analysis of Cardiac Conduction Identifies Markers of Arrhythmia Risk 
Circulation  2013;127(13):1377-1385.
Electrocardiographic QRS duration, a measure of cardiac intraventricular conduction, varies ~2-fold in individuals without cardiac disease. Slow conduction may promote reentrant arrhythmias.
Methods and Results
We performed a genome-wide association study (GWAS) to identify genomic markers of QRS duration in 5,272 individuals without cardiac disease selected from electronic medical record (EMR) algorithms at five sites in the Electronic Medical Records and Genomics (eMERGE) network. The most significant loci were evaluated within the CHARGE consortium QRS GWAS meta-analysis. Twenty-three single nucleotide polymorphisms in 5 loci, previously described by CHARGE, were replicated in the eMERGE samples; 18 SNPs were in the chromosome 3 SCN5A and SCN10A loci, where the most significant SNPs were rs1805126 in SCN5A with p=1.2×10−8 (eMERGE) and p=2.5×10−20 (CHARGE) and rs6795970 in SCN10A with p=6×10−6 (eMERGE) and p=5×10−27 (CHARGE). The other loci were in NFIA, near CDKN1A, and near C6orf204. We then performed phenome-wide association studies (PheWAS) on variants in these five loci in 13,859 European Americans to search for diagnoses associated with these markers. PheWAS identified atrial fibrillation and cardiac arrhythmias as the most common associated diagnoses with SCN10A and SCN5A variants. SCN10A variants were also associated with subsequent development of atrial fibrillation and arrhythmia in the original 5,272 “heart-healthy” study population.
We conclude that DNA biobanks coupled to EMRs provide a platform not only for GWAS but may also allow broad interrogation of the longitudinal incidence of disease associated with genetic variants. The PheWAS approach implicated sodium channel variants modulating QRS duration in subjects without cardiac disease as predictors of subsequent arrhythmias.
PMCID: PMC3713791  PMID: 23463857
cardiac conduction; QRS duration; atrial fibrillation; genome-wide association study; phenome-wide association study; electronic medical records
13.  Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data 
Nature biotechnology  2013;31(12):1102-1110.
Candidate gene and genome-wide association studies (GWAS) have identified genetic variants that modulate risk for human disease; many of these associations require further study to replicate the results. Here we report the first large-scale application of the phenome-wide association study (PheWAS) paradigm within electronic medical records (EMRs), an unbiased approach to replication and discovery that interrogates relationships between targeted genotypes and multiple phenotypes. We scanned for associations between 3,144 single-nucleotide polymorphisms (previously implicated by GWAS as mediators of human traits) and 1,358 EMR-derived phenotypes in 13,835 individuals of European ancestry. This PheWAS replicated 66% (51/77) of sufficiently powered prior GWAS associations and revealed 63 potentially pleiotropic associations with P < 4.6 × 10−6 (false discovery rate < 0.1); the strongest of these novel associations were replicated in an independent cohort (n = 7,406). These findings validate PheWAS as a tool to allow unbiased interrogation across multiple phenotypes in EMR-based cohorts and to enhance analysis of the genomic basis of human disease.
PMCID: PMC3969265  PMID: 24270849
14.  Counterpoint: “Streamlined” Does Not Mean Simple 
American Journal of Epidemiology  2013;177(4):283-284.
PMCID: PMC3626054  PMID: 23296355
15.  Vehement Agreement on New Models? 
American Journal of Epidemiology  2013;177(4):290-291.
PMCID: PMC3626056  PMID: 23296352
16.  The NHGRI GWAS Catalog, a curated resource of SNP-trait associations 
Nucleic Acids Research  2013;42(Database issue):D1001-D1006.
The National Human Genome Research Institute (NHGRI) Catalog of Published Genome-Wide Association Studies (GWAS) Catalog provides a publicly available manually curated collection of published GWAS assaying at least 100 000 single-nucleotide polymorphisms (SNPs) and all SNP-trait associations with P <1 × 10−5. The Catalog includes 1751 curated publications of 11 912 SNPs. In addition to the SNP-trait association data, the Catalog also publishes a quarterly diagram of all SNP-trait associations mapped to the SNPs’ chromosomal locations. The Catalog can be accessed via a tabular web interface, via a dynamic visualization on the human karyotype, as a downloadable tab-delimited file and as an OWL knowledge base. This article presents a number of recent improvements to the Catalog, including novel ways for users to interact with the Catalog and changes to the curation infrastructure.
PMCID: PMC3965119  PMID: 24316577
18.  CHRNB3 is more strongly associated with FTCD-based nicotine dependence than cigarettes per day: phenotype definition changes GWAS results 
Addiction (Abingdon, England)  2012;107(11):2019-2028.
Nicotine dependence is a highly heritable disorder associated with severe medical morbidity and mortality. Recent meta-analyses have found novel genetic loci associated with cigarettes per day (CPD), a proxy for nicotine dependence. The aim of this paper is to evaluate the importance of phenotype definition (i.e. CPD versus Fagerström Test for Cigarette Dependence (FTCD) score as a measure of nicotine dependence) on genome-wide association studies of nicotine dependence.
Genome-wide association study
Community sample
A total of 3,365 subjects who had smoked at least one cigarette were selected from the Study of Addiction: Genetics and Environment (SAGE). Of the participants, 2,267 were European Americans,999 were African Americans.
Nicotine dependence defined by FTCD score ≥4, CPD
The genetic locus most strongly associated with nicotine dependence was rs1451240 on chromosome 8 in the region of CHRNB3 (OR=0.65, p=2.4×10−8). This association was further strengthened in a meta-analysis with a previously published dataset (combined p=6.7 ×10−16, total n=4,200).When CPD was used as an alternate phenotype, the association no longer reached genome-wide significance (β=−0.08, p=0.0007).
Daily cigarette consumption and the Fagerstrom Test for Cigarette Dependence (FTCD) show different associations with polymorphisms in genetic loci.
PMCID: PMC3427406  PMID: 22524403
19.  The Electronic Medical Records and Genomics (eMERGE) Network: Past, Present and Future 
The Electronic Medical Records and Genomics (eMERGE) Network is a National Human Genome Research Institute (NHGRI)-funded consortium engaged in the development of methods and best-practices for utilizing the Electronic Medical Record (EMR) as a tool for genomic research. Now in its sixth year, its second funding cycle and comprising nine research groups and a coordinating center, the network has played a major role in validating the concept that clinical data derived from EMRs can be used successfully for genomic research. Current work is advancing knowledge in multiple disciplines at the intersection of genomics and healthcare informatics, particularly electronic phenotyping, genome-wide association studies, genomic medicine implementation and the ethical and regulatory issues associated with genomics research and returning results to study participants. Here we describe the evolution, accomplishments, opportunities and challenges of the network since its inception as a five-group consortium focused on genotype-phenotype associations for genomic discovery to its current form as a nine-group consortium pivoting towards implementation of genomic medicine.
PMCID: PMC3795928  PMID: 23743551
electronic medical records; personalized medicine; genome-wide association studies; genetics and genomics; collaborative research
20.  High Density GWAS for LDL Cholesterol in African Americans Using Electronic Medical Records Reveals a Strong Protective Variant in APOE 
Only one LDL-C GWAS has been reported in African Americans. We performed a GWAS of LDL-C in African Americans using data extracted from electronic medical records (EMR) in the eMERGE network. African Americans were genotyped on the Illumina 1M chip. All LDL-C measurements, prescriptions, and diagnoses of concomitant disease were extracted from EMR. We created two analytic datasets; one dataset having median LDL-C calculated after the exclusion of some lab values based on co-morbidities and medication (n = 618) and another dataset having median LDL-C calculated without any exclusions (n = 1249). Rs7412 in APOE was strongly associated with LDL-C at levels of GWAS significance in both datasets (p < 5 X 10−8). In the dataset with exclusions, a decrease of 20.0 mg/dl per minor allele was observed. The effect size was attenuated (12.3 mg/dl) in the dataset without any lab values excluded. Although other signals in APOE have been detected in previous GWAS, this large and important SNP association has not been well detected in large GWAS because rs7412 was not included on many genotyping arrays. Use of median LDL-C extracted from EMR after exclusions for medications and co-morbidities increased the percentage of trait variance explained by genetic variation.
PMCID: PMC3521536  PMID: 23067351
GWAS; LDL; electronic medical records
21.  Genetic Variants That Confer Resistance to Malaria Are Associated with Red Blood Cell Traits in African-Americans: An Electronic Medical Record-based Genome-Wide Association Study 
G3: Genes|Genomes|Genetics  2013;3(7):1061-1068.
To identify novel genetic loci influencing interindividual variation in red blood cell (RBC) traits in African-Americans, we conducted a genome-wide association study (GWAS) in 2315 individuals, divided into discovery (n = 1904) and replication (n = 411) cohorts. The traits included hemoglobin concentration (HGB), hematocrit (HCT), RBC count, mean corpuscular volume (MCV), mean corpuscular hemoglobin (MCH), and mean corpuscular hemoglobin concentration (MCHC). Patients were participants in the electronic MEdical Records and GEnomics (eMERGE) network and underwent genotyping of ~1.2 million single-nucleotide polymorphisms on the Illumina Human1M-Duo array. Association analyses were performed adjusting for age, sex, site, and population stratification. Three loci previously associated with resistance to malaria—HBB (11p15.4), HBA1/HBA2 (16p13.3), and G6PD (Xq28)—were associated (P ≤ 1 × 10−6) with RBC traits in the discovery cohort. The loci replicated in the replication cohort (P ≤ 0.02), and were significant at a genome-wide significance level (P < 5 × 10−8) in the combined cohort. The proportions of variance in RBC traits explained by significant variants at these loci were as follows: rs7120391 (near HBB) 1.3% of MCHC, rs9924561 (near HBA1/A2) 5.5% of MCV, 6.9% of MCH and 2.9% of MCHC, and rs1050828 (in G6PD) 2.4% of RBC count, 2.9% of MCV, and 1.4% of MCH, respectively. We were not able to replicate loci identified by a previous GWAS of RBC traits in a European ancestry cohort of similar sample size, suggesting that the genetic architecture of RBC traits differs by race. In conclusion, genetic variants that confer resistance to malaria are associated with RBC traits in African-Americans.
PMCID: PMC3704235  PMID: 23696099
red blood cell (RBC) traits; genome-wide association study; African-Americans; natural selection; informatics; electronic medical record
22.  Electronic Medical Records for Genetic Research: Results of the eMERGE Consortium 
Science translational medicine  2011;3(79):79re1.
Clinical data in Electronic Medical Records (EMRs) is a potential source of longitudinal clinical data for research. The Electronic Medical Records and Genomics Network or eMERGE investigates whether data captured through routine clinical care using EMRs can identify disease phenotypes with sufficient positive and negative predictive values for use in genome wide association studies (GWAS). Using data from five different sets of EMRs, we have identified five disease phenotypes with positive predictive values of 73–98% and negative predictive values of 98–100%. A majority of EMRs captured key information (diagnoses, medications, laboratory tests) used to define phenotypes in a structured format. We identified natural language processing as an important tool to improve case identification rates. Efforts and incentives to increase the implementation of interoperable EMRs will markedly improve the availability of clinical data for genomics research.
PMCID: PMC3690272  PMID: 21508311
23.  Pitfalls of Merging GWAS Data: Lessons Learned in the eMERGE Network and Quality Control Procedures to Maintain High Data Quality 
Genetic epidemiology  2011;35(8):887-898.
Genome-wide association studies (GWAS) are a useful approach in the study of the genetic components of complex phenotypes. Aside from large cohorts, GWAS have generally been limited to the study of one or a few diseases or traits. The emergence of biobanks linked to electronic medical records (EMRs) allows the efficient re-use of genetic data to yield meaningful genotype-phenotype associations for multiple phenotypes or traits. Phase I of the electronic MEdical Records and GEnomics (eMERGE-I) Network is a National Human Genome Research Institute (NHGRI)-supported consortium composed of five sites to perform various genetic association studies using DNA repositories and EMR systems. Each eMERGE site has developed EMR-based algorithms to comprise a core set of fourteen phenotypes for extraction of study samples from each site’s DNA repository. Each eMERGE site selected samples for a specific phenotype, and these samples were genotyped at either the Broad Institute or at the Center for Inherited Disease Research (CIDR) using the Illumina Infinium BeadChip technology. In all, approximately 17,000 samples from across the five sites were genotyped. A unified quality control (QC) pipeline was developed by the eMERGE Genomics Working Group and used to ensure thorough cleaning of the data. This process includes examination of sample quality, marker quality, and various batch effects. Upon completion of the genotyping and QC analyses for each site’s primary study, the eMERGE Coordinating Center merged the datasets from all five sites. This larger merged dataset re-entered the established eMERGE QC pipeline. Based on lessons learned during the process, additional analyses and QC checkpoints were added to the pipeline to ensure proper merging. Here we explore the challenges associated with combining datasets from different genotyping centers and describe the expansion to the eMERGE QC pipeline for merged datasets. These additional steps will be useful as the eMERGE project expands to include additional sites in eMERGE-II and also serve as a starting point for investigators merging multiple genotype data sets accessible through the National Center for Biotechnology Information (NCBI) in the database of Genotypes and Phenotypes (dbGaP). Our experience demonstrates that merging multiple datasets after additional QC can be an efficient use of genotype data despite new challenges that appear in the process.
PMCID: PMC3592376  PMID: 22125226
quality control; genome-wide association (GWAS); eMERGE; dbGaP; merging datasets
24.  Genotype Imputation of Metabochip SNPs Using a Study-Specific Reference Panel of ~4,000 Haplotypes in African Americans From the Women’s Health Initiative 
Genetic epidemiology  2012;36(2):107-117.
Genetic imputation has become standard practice in modern genetic studies. However, several important issues have not been adequately addressed including the utility of study-specific reference, performance in admixed populations, and quality for less common (minor allele frequency [MAF] 0.005–0.05) and rare (MAF < 0.005) variants. These issues only recently became addressable with genome-wide association studies (GWAS) follow-up studies using dense genotyping or sequencing in large samples of non-European individuals. In this work, we constructed a study-specific reference panel of 3,924 haplotypes using African Americans in the Women’s Health Initiative (WHI) genotyped on both the Metabochip and the Affymetrix 6.0 GWAS platform. We used this reference panel to impute into 6,459 WHI SNP Health Association Resource (SHARe) study subjects with only GWAS genotypes. Our analysis confirmed the imputation quality metric Rsq (estimated r2, specific to each SNP) as an effective post-imputation filter. We recommend different Rsq thresholds for different MAF categories such that the average (across SNPs) Rsq is above the desired dosage r2 (squared Pearson correlation between imputed and experimental genotypes).With a desired dosage r2 of 80%, 99.9% (97.5%, 83.6%, 52.0%, 20.5%) of SNPs with MAF > 0.05 (0.03–0.05, 0.01–0.03, 0.005–0.01, and 0.001–0.005) passed the post-imputation filter. The average dosage r2 for these SNPs is 94.7%, 92.1%, 89.0%, 83.1%, and 79.7%, respectively. These results suggest that for African Americans imputation of Metabochip SNPs from GWAS data, including low frequency SNPs with MAF 0.005–0.05, is feasible and worthwhile for power increase in downstream association analysis provided a sizable reference panel is available.
PMCID: PMC3410659  PMID: 22851474
genotype imputation; Metabochip; internal reference; African Americans; rare variants
25.  Incidental genetic findings in randomized clinical trials: recommendations from the Genomics and Randomized Trials Network (GARNET) 
Genome Medicine  2013;5(1):7.
Recommendations and guidance on how to handle the return of genetic results to patients have offered limited insight into how to approach incidental genetic findings in the context of clinical trials. This paper provides the Genomics and Randomized Trials Network (GARNET) recommendations on incidental genetic findings in the context of clinical trials, and discusses the ethical and practical issues considered in formulating our recommendations. There are arguments in support of as well as against returning incidental genetic findings in clinical trials. For instance, reporting incidental findings in clinical trials may improve the investigator-participant relationship and the satisfaction of participation, but it may also blur the line between clinical care and research. The issues of whether and how to return incidental genetic findings, including the costs of doing so, should be considered when developing clinical trial protocols. Once decided, plans related to sharing individual results from the aim(s) of the trial, as well as incidental findings, should be discussed explicitly in the consent form. Institutional Review Boards (IRBs) and other study-specific governing bodies should be part of the decision as to if, when, and how to return incidental findings, including when plans in this regard are being reconsidered.
PMCID: PMC3706830  PMID: 23363732

