Search tips
Search criteria

Results 1-25 (36)

Clipboard (0)

Select a Filter Below

Year of Publication
Document Types
1.  Preemptive Genotyping for Personalized Medicine: Design of the Right Drug, Right Dose, Right Time – Using Genomic Data to Individualize Treatment Protocol 
Mayo Clinic proceedings  2014;89(1):25-33.
To report the design and implementation of the Right Drug, Right Dose, Right Time: Using Genomic Data to Individualize Treatment Protocol that was developed to test the concept that prescribers can deliver genome guided therapy at the point-of-care by using preemptive pharmacogenomics (PGx) data and clinical decision support (CDS) integrated in the electronic medical record (EMR).
Patients and Methods
We used a multivariable prediction model to identify patients with a high risk of initiating statin therapy within 3 years. The model was used to target a study cohort most likely to benefit from preemptive PGx testing among Mayo Clinic Biobank participants with a recruitment goal of 1000 patients. Cox proportional hazards model was utilized using the variables selected through the Lasso shrinkage method. An operational CDS model was adapted to implement PGx rules within the EMR.
The prediction model included age, sex, race, and 6 chronic diseases categorized by the Clinical Classifications Software for ICD-9 codes (dyslipidemia, diabetes, peripheral atherosclerosis, disease of the blood-forming organs, coronary atherosclerosis and other heart diseases, and hypertension). Of the 2000 Biobank participants invited, 50% provided blood samples, 13% refused, 28% did not respond, and 9% consented but did not provide a blood sample within the recruitment window (October 4, 2012 – March 20, 2013). Preemptive PGx testing included CYP2D6 genotyping and targeted sequencing of 84 PGx genes. Synchronous real-time CDS is integrated in the EMR and flags potential patient-specific drug-gene interactions and provides therapeutic guidance.
These interventions will improve understanding and implementation of genomic data in clinical practice.
PMCID: PMC3932754  PMID: 24388019
2.  Strength of Association for Incident Diabetes Risk Factors According to Diabetes Case Definitions 
American Journal of Epidemiology  2012;175(5):466-472.
Prospective epidemiologic studies have characterized major risk factors for incident diabetes by a variety of diabetes case definitions. Whether different definitions alter the association of diabetes with risk factors is largely unknown. Using 1987–1998 data from the ongoing Atherosclerosis Risk in Communities (ARIC) Study, the authors assessed the relation of traditional risk factors with 3 different diabetes case definitions and 4 fasting glucose categories. They compared the study protocol case definition with 2 nested case definitions, self-reported diabetes and a multiple-evidence definition. Significant differences in risk factor associations by case definition and by screening cutpoints were observed. Specifically, the magnitude of the association between the risk factors (baseline metabolic syndrome, fasting glucose, blood pressure, body mass index, and serum insulin) and incident diabetes differed by case definition. Associations with these risk factors were weaker with a case definition based on self-report compared with other definitions. These results illustrate the potential limitations of case definitions that rely solely on self-report or those that incorporate measured glucose values to ascertain undiagnosed cases. Although the ability to identify risk factors of diabetes was consistent for the case definitions studied, tests of novel risk factors may result in different estimates of effect sizes depending on the definition used.
PMCID: PMC3282875  PMID: 22247044
diabetes mellitus, type 2; epidemiologic methods
3.  Polymorphisms in the ICAM1 gene predict circulating soluble intercellular adhesion molecule-1(sICAM-1) 
Atherosclerosis  2011;216(2):390-394.
Polymorphisms within the ICAM1 structural gene have been shown to influence circulating levels of soluble intercellular adhesion molecule -1 (sICAM-1) but their relation to atherosclerosis has not been clearly established. We sought to determine whether ICAM1 SNPs are associated with circulating sICAM-1 concentration, coronary artery calcium (CAC), and common and internal carotid intima medial thickness (IMT).
Methods and Results
3,550 black and white Coronary Artery Risk Development in Young Adults (CARDIA) Study subjects who participated in the year 15 and/or 20 examinations and were part of the Young Adult Longitudinal Study of Antioxidants (YALTA) ancillary study were included in this analysis. In whites, rs5498 was significantly associated with sICAM-1 (p < 0.001) and each G-allele of rs5498 was associated with 5% higher sICAM-1 concentration. In blacks, each C-allele of rs5490 was associated with 6 % higher sICAM-1 level; this SNP was in strong linkage disequilibrium with rs5491, a functional variant. Subclinical measurements of atherosclerosis in either year 15 or year 20 were not significantly related to ICAM1 SNPs.
In CARDIA, ICAM1 DNA segment variants were associated with sICAM-1 protein level including the novel finding that levels differ by the functional variant rs5491. However, ICAM1 SNPs were not strongly related to either IMT or CAC. Our findings in CARDIA suggest that ICAM1 variants are not major early contributors to subclinical atherosclerosis.
PMCID: PMC3402038  PMID: 21392767
cell adhesion molecules; atherosclerosis; coronary calcium; genetics; inflammation
4.  Association of TNFSF8 Polymorphisms With Peripheral Neutrophil Count 
Mayo Clinic Proceedings  2011;86(11):1075-1081.
OBJECTIVE: To investigate the association between 347 single-nucleotide polymorphisms within candidate genes of the tumor necrosis factor, interleukin 1 and interleukin 6 families with neutrophil count.
PATIENTS AND METHODS: Four hundred cases with heart failure after myocardial infarction (MI) were matched by age, sex, and date of incident MI to 694 controls (MI without post-MI heart failure). Both genotypes and neutrophil count at admission for incident MI were available in 314 cases and 515 controls.
RESULTS: We found significant associations between the TNFSF8 poly morphisms rs927374 (P=5.1 x 10–5) and rs2295800 (P=1.3 x 10–4) and neutrophil count; these single-nucleotide polymorphisms are in high linkage disequilibrium (r2=0.97). Associations persisted after controlling for clinical characteristics and were unchanged after adjusting for case-control status. For rs927374, the neutrophil count of GG homozygotes (7.6±5.1) was 16% lower than that of CC homozygotes (9.0±5.2).
CONCLUSION: The TNFSF8 polymorphisms rs927374 and rs2295800 were associated with neutrophil count. This finding suggests that post-MI inflammatory response is genetically modulated.
PMCID: PMC3202998  PMID: 22033252
5.  Mayo Genome Consortia: A Genotype-Phenotype Resource for Genome-Wide Association Studies With an Application to the Analysis of Circulating Bilirubin Levels 
Mayo Clinic Proceedings  2011;86(7):606-614.
OBJECTIVE: To create a cohort for cost-effective genetic research, the Mayo Genome Consortia (MayoGC) has been assembled with participants from research studies across Mayo Clinic with high-throughput genetic data and electronic medical record (EMR) data for phenotype extraction.
PARTICIPANTS AND METHODS: Eligible participants include those who gave general research consent in the contributing studies to share high-throughput genotyping data with other investigators. Herein, we describe the design of the MayoGC, including the current participating cohorts, expansion efforts, data processing, and study management and organization. A genome-wide association study to identify genetic variants associated with total bilirubin levels was conducted to test the genetic research capability of the MayoGC.
RESULTS: Genome-wide significant results were observed on 2q37 (top single nucleotide polymorphism, rs4148325; P=5.0 × 10–62) and 12p12 (top single nucleotide polymorphism, rs4363657; P=5.1 × 10–8) corresponding to a gene cluster of uridine 5′-diphospho-glucuronosyltransferases (the UGT1A cluster) and solute carrier organic anion transporter family, member 1B1 (SLCO1B1), respectively.
CONCLUSION: Genome-wide association studies have identified genetic variants associated with numerous phenotypes but have been historically limited by inadequate sample size due to costly genotyping and phenotyping. Large consortia with harmonized genotype data have been assembled to attain sufficient statistical power, but phenotyping remains a rate-limiting factor in gene discovery research efforts. The EMR consists of an abundance of phenotype data that can be extracted in a relatively quick and systematic manner. The MayoGC provides a model of a unique collaborative effort in the environment of a common EMR for the investigation of genetic determinants of diseases.
PMCID: PMC3127556  PMID: 21646302
6.  ICAM1 and VCAM1 polymorphisms, coronary artery calcium, and circulating levels of soluble ICAM-1: The Multi-Ethnic Study of Atherosclerosis (MESA) 
Atherosclerosis  2008;201(2):339-344.
Intercellular adhesion molecule-1 (ICAM-1) and vascular cell adhesion molecule-1 (VCAM-1) may be important contributors to the development and progression of atherosclerosis. Using a stratified random sample of 2,880 participants of the Multi-Ethnic Study of Atherosclerosis we investigated the relationship of 12 ICAM1 and 17 VCAM1 SNPs and coronary artery calcium (CAC) and ICAM1 SNPs and circulating levels of soluble ICAM-1 (sICAM-1). There were no ICAM1 or VCAM1 SNPs significantly associated with CAC in any of the four race/ethnic groups. In a subset of 1,451 subjects with sICAM-1 measurements, we observed a significant association with rs5491 in all four race/ethnic groups corroborating previous research that has shown that the T-allele of rs5491 interferes with the monoclonal antibody used to measure sICAM-1 in this study. After excluding all rs5491 T-allele carriers, several ICAM1 SNPs were significantly associated with sICAM-1 levels; rs5496 in African Americans, rs5498 and rs3093030 in European Americans, and rs1799969 in Hispanics. Our results identified ICAM1 polymorphisms that were significantly associated with sICAM-1 level but not CAC, a subclinical marker of atherosclerosis.
PMCID: PMC2615541  PMID: 18420209
coronary artery calcium; intercellular adhesion molecule-1 (ICAM-1); vascular adhesion molecule-1 (VCAM-1); soluble intercellular adhesion molecule-1 (sICAM-1); gene; single nucleotide polymorphism (SNP); haplotypes
7.  Circulating soluble ICAM-1 levels shows linkage to ICAM gene cluster region on Chromosome 19: the NHLBI Family Heart Study Follow-up Examination 
Atherosclerosis  2007;199(1):172-178.
Atherogenesis is a chronic inflammatory process in which intercellular adhesion molecule 1 (ICAM-1) plays a critical role. Circulating soluble ICAM-1 (sICAM-1) is thought to be the result of cleavage of membrane-bound ICAM-1 and its concentration in serum/plasma has been shown to be heritable. Genome-wide linkage scans were conducted for quantitative trait loci influencing sICAM-1. Phenotype and genetic marker data were available for 2,617 white and 531 black individuals in the NHLBI Family Heart Study follow-up examination. Heritability for sICAM-1 was 0.39 in whites and 0.59 in blacks. Significant linkage was observed on chromosome 19 (LOD = 4.0 at 14 cM) in whites near the ICAM gene cluster that includes the structural gene for ICAM-1. The T-allele of ICAM-1 SNP rs5491 has been strongly associated with the specific sICAM-1 assay we used in our study. Through additional genotyping we were able to rule out rs5491 as the cause of the linkage finding. This study provides preliminary evidence linking genetic variation in the ICAM-1 structural gene to circulating sICAM-1 levels.
PMCID: PMC2517220  PMID: 18045607
Intercellular adhesion molecule-1; Linkage (Genetics); ICAM gene cluster; inflammation; atherosclerosis
8.  TCF7L2 SNPs, cardiovascular disease, and all-cause mortality: The Atherosclerosis Risk in Communities (ARIC) Study 
Diabetologia  2008;51(6):968-970.
Aims and Hypothesis
We hypothesize that transcription factor 7-like 2 (TCF7L2) single nucleotide polymorphisms (SNPs) are associated with cardiovascular disease (CVD) and that the associations differ in diabetic and non-diabetic participants.
Black and white subjects from the Atherosclerosis Risk in Communities (ARIC) study who were free of prevalent CVD at baseline and genotyped for rs7903146, rs12255372, rs7901695, rs11196205, and rs7895340 were included in this analysis (n = 13,369). Cox proportional hazard regression was used to estimate the associations of polymorphisms and incident events and logistic and linear regression were used for associations with baseline risk factor levels.
TCF7L2 SNPs were not significantly associated with incident coronary heart disease, ischemic stroke, CVD, prevalent peripheral artery disease (PAD), or with all-cause mortality in the full cohort or stratified by race.
In the whole cohort, TCF7L2 SNPs were not associated with incident CVD, all-cause mortality, or prevalent PAD. This result suggests that the increased health risk associated with rs7903146 genotype is specific to diabetes.
PMCID: PMC2597203  PMID: 18437354
All-cause mortality; Cardiovascular disease; Coronary heart disease; Diabetes; Peripheral artery disease; Stroke; Transcription factor 7-like 2 (TCF7L2)
9.  Lack of Association between Uncoupling Protein-2 Ala55Val polymorphism and Incident Diabetes in the Atherosclerosis Risk in Communities Study (ARIC) 
Acta diabetologica  2008;45(3):179-182.
Type 2 diabetes mellitus (T2DM) is characterized by impaired insulin secretion, peripheral insulin resistance, and increased hepatic glucose production. Genes that contribute to genetic susceptibility to T2DM function in numerous biochemical pathways. Uncoupling protein-2 (UCP2) functions as a negative regulator of insulin secretion. Animal studies show induction of UCP2 plays a pathogenic role in the progression of obesity-induced T2DM, and some human studies have shown an association between a common UCP2 polymorphism, Ala55Val (rs660339), and T2DM, obesity, and resting metabolic rate with the Val/Val genotype conferring increased risk. We investigated the relationship between the Ala55Val variant and incidence of T2DM among 12,056 participants in the Atherosclerosis Risk in Communities (ARIC) Study ages 45−64 years at baseline. Incident T2DM (n=1,406) cases were identified over 9 years of follow-up. The Val55 allele frequency was 44% in blacks and 41% in whites. The rate of T2DM per 1,000 person-years was 15.0, 15.6, and 15.6 for Ala/Ala, Ala/Val, and Val/Val genotypes respectively. We found no significant association between UCP2 genotypes and incident T2DM in the whole cohort, in race-gender subgroups, or in categories of body mass index (normal-overweight-obese). The Ala55Val polymorphism of UCP2 was not associated with incident T2DM in the ARIC cohort.
PMCID: PMC2586599  PMID: 18496642
mitochondrial uncoupling protein 2; Diabetes Mellitus; Type 2; Polymorphism; Single Nucleotide; Obesity; genetics
10.  Genetic Variants Associated with Serum Thyroid Stimulating Hormone (TSH) Levels in European Americans and African Americans from the eMERGE Network 
PLoS ONE  2014;9(12):e111301.
Thyroid stimulating hormone (TSH) hormone levels are normally tightly regulated within an individual; thus, relatively small variations may indicate thyroid disease. Genome-wide association studies (GWAS) have identified variants in PDE8B and FOXE1 that are associated with TSH levels. However, prior studies lacked racial/ethnic diversity, limiting the generalization of these findings to individuals of non-European ethnicities. The Electronic Medical Records and Genomics (eMERGE) Network is a collaboration across institutions with biobanks linked to electronic medical records (EMRs). The eMERGE Network uses EMR-derived phenotypes to perform GWAS in diverse populations for a variety of phenotypes. In this report, we identified serum TSH levels from 4,501 European American and 351 African American euthyroid individuals in the eMERGE Network with existing GWAS data. Tests of association were performed using linear regression and adjusted for age, sex, body mass index (BMI), and principal components, assuming an additive genetic model. Our results replicate the known association of PDE8B with serum TSH levels in European Americans (rs2046045 p = 1.85×10−17, β = 0.09). FOXE1 variants, associated with hypothyroidism, were not genome-wide significant (rs10759944: p = 1.08×10−6, β = −0.05). No SNPs reached genome-wide significance in African Americans. However, multiple known associations with TSH levels in European ancestry were nominally significant in African Americans, including PDE8B (rs2046045 p = 0.03, β = −0.09), VEGFA (rs11755845 p = 0.01, β = −0.13), and NFIA (rs334699 p = 1.50×10−3, β = −0.17). We found little evidence that SNPs previously associated with other thyroid-related disorders were associated with serum TSH levels in this study. These results support the previously reported association between PDE8B and serum TSH levels in European Americans and emphasize the need for additional genetic studies in more diverse populations.
PMCID: PMC4249871  PMID: 25436638
11.  Atrial fibrillation in myocardial infarction patients: impact on healthcare utilization 
American heart journal  2013;166(4):10.1016/j.ahj.2013.07.013.
Atrial fibrillation (AF) often complicates myocardial infarction (MI). While AF adversely impacts survival in MI patients, the impact of AF on healthcare utilization has not been studied.
The risk of hospitalizations, emergency department (ED) visits, and outpatient visits associated with prior, early-onset (<30 days post-MI) and late-onset (≥30 days post-MI) AF was assessed among incident MI patients from the Olmsted County, MN community.
Of 1502 MI patients, 237 had prior AF, 163 developed new-onset AF, 113 developed late-onset AF, and 989 had no AF. Over a mean follow-up of 3.9 years, 3661 hospitalizations, 5559 ED visits, and 80,240 outpatient visits occurred. After adjustment, compared to patients without AF, those with prior and early-onset AF exhibited a 1.6-fold and 1.4-fold increased risk of hospitalization, respectively. In contrast, late-onset AF carried a 2.2-fold increased risk of hospitalization. The hazard ratios were 1.4, 1.2, and 1.8 for ED visits and 1.5, 1.2, and 1.7 for outpatient visits for prior, early-onset, and late-onset AF. Additional adjustment for time-dependent recurrent MI and heart failure attenuated the results slightly for hospitalizations and ED visits; however, patients with late-onset AF still exhibited more than a 50% increased risk for both utilization measures.
In MI patients, the risk of hospitalizations, ED visits, and outpatient visits differed by the timing of AF onset, with the greatest risk conferred by late-onset AF. AF imparts an adverse prognosis after MI underscoring the importance of its management in MI patients.
PMCID: PMC3811034  PMID: 24093857
12.  Hospitalizations and Emergency Department Use of Mayo Clinic Biobank Participants within the Employee and Community Health Medical Home 
Mayo Clinic proceedings  2013;88(9):963-969.
To evaluate the participants in the Mayo Clinic Biobank for their representativeness to the entire Employee and Community Health (ECH) primary care population with regards to hospital utilization.
Patient and Methods
Participants enrolled in the Mayo Clinic Biobank from April 1, 2009, to December 31, 2010, were linked to ECH panels. Subjects were categorized into risk tiers (0–4) based on the number of health conditions as of December 31, 2010. Outcomes were ascertained through December 31, 2011. Hazard ratios (HR) and 95% confidence intervals (CI) for risk of hospitalization, ER visits, and 30-day re-hospitalization were estimated using Cox regression, accounting for age and sex.
The 8,927 Biobank participants were part of an ECH panel (N=84,872). Compared to all of ECH, the Biobank-ECH participants were more likely to be female (64% vs 55%), older (median age of 58 years vs 47 years), and have a lower percentage in tier 0 (6% vs 24%). There were strong positive associations of tier (4 vs 0/1) with risk of hospitalization (HR=5.8; 95% CI, 4.6–7.5) and ER visits (HR=5.4; 95% CI, 4.2–6.8), among Biobank-ECH participants. Similar associations for risk of hospitalization (HR=8.5; 95% CI, 7.8–9.3) and ER visits (HR=6.9; 95%CI, 6.4–7.5) were observed for all of ECH.
Biobank-ECH participants were older and had more chronic conditions compared to the ECH panel. Nevertheless, the associations of risk tier with utilization outcomes were similar, supporting the use of the Biobank-ECH participants for assessing biomarkers for health care outcomes in the primary care setting.
PMCID: PMC4151531  PMID: 24001488
13.  Genome-wide and gene-centric analyses of circulating myeloperoxidase levels in the charge and care consortia 
Human Molecular Genetics  2013;22(16):3381-3393.
Increased systemic levels of myeloperoxidase (MPO) are associated with the risk of coronary artery disease (CAD). To identify the genetic factors that are associated with circulating MPO levels, we carried out a genome-wide association study (GWAS) and a gene-centric analysis in subjects of European ancestry and African Americans (AAs). A locus on chromosome 1q31.1 containing the complement factor H (CFH) gene was strongly associated with serum MPO levels in 9305 subjects of European ancestry (lead SNP rs800292; P = 4.89 × 10−41) and in 1690 AA subjects (rs505102; P = 1.05 × 10−8). Gene-centric analyses in 8335 subjects of European ancestry additionally identified two rare MPO coding sequence variants that were associated with serum MPO levels (rs28730837, P = 5.21 × 10−12; rs35897051, P = 3.32 × 10−8). A GWAS for plasma MPO levels in 9260 European ancestry subjects identified a chromosome 17q22 region near MPO that was significantly associated (lead SNP rs6503905; P = 2.94 × 10−12), but the CFH locus did not exhibit evidence of association with plasma MPO levels. Functional analyses revealed that rs800292 was associated with levels of complement proteins in serum. Variants at chromosome 17q22 also had pleiotropic cis effects on gene expression. In a case–control analysis of ∼80 000 subjects from CARDIoGRAM, none of the identified single-nucleotide polymorphisms (SNPs) were associated with CAD. These results suggest that distinct genetic factors regulate serum and plasma MPO levels, which may have relevance for various acute and chronic inflammatory disorders. The clinical implications for CAD and a better understanding of the functional basis for the association of CFH and MPO variants with circulating MPO levels require further study.
PMCID: PMC3723315  PMID: 23620142
14.  Meta-Analysis of Genome-Wide Association Studies in African Americans Provides Insights into the Genetic Architecture of Type 2 Diabetes 
Ng, Maggie C. Y. | Shriner, Daniel | Chen, Brian H. | Li, Jiang | Chen, Wei-Min | Guo, Xiuqing | Liu, Jiankang | Bielinski, Suzette J. | Yanek, Lisa R. | Nalls, Michael A. | Comeau, Mary E. | Rasmussen-Torvik, Laura J. | Jensen, Richard A. | Evans, Daniel S. | Sun, Yan V. | An, Ping | Patel, Sanjay R. | Lu, Yingchang | Long, Jirong | Armstrong, Loren L. | Wagenknecht, Lynne | Yang, Lingyao | Snively, Beverly M. | Palmer, Nicholette D. | Mudgal, Poorva | Langefeld, Carl D. | Keene, Keith L. | Freedman, Barry I. | Mychaleckyj, Josyf C. | Nayak, Uma | Raffel, Leslie J. | Goodarzi, Mark O. | Chen, Y-D Ida | Taylor, Herman A. | Correa, Adolfo | Sims, Mario | Couper, David | Pankow, James S. | Boerwinkle, Eric | Adeyemo, Adebowale | Doumatey, Ayo | Chen, Guanjie | Mathias, Rasika A. | Vaidya, Dhananjay | Singleton, Andrew B. | Zonderman, Alan B. | Igo, Robert P. | Sedor, John R. | Kabagambe, Edmond K. | Siscovick, David S. | McKnight, Barbara | Rice, Kenneth | Liu, Yongmei | Hsueh, Wen-Chi | Zhao, Wei | Bielak, Lawrence F. | Kraja, Aldi | Province, Michael A. | Bottinger, Erwin P. | Gottesman, Omri | Cai, Qiuyin | Zheng, Wei | Blot, William J. | Lowe, William L. | Pacheco, Jennifer A. | Crawford, Dana C. | Grundberg, Elin | Rich, Stephen S. | Hayes, M. Geoffrey | Shu, Xiao-Ou | Loos, Ruth J. F. | Borecki, Ingrid B. | Peyser, Patricia A. | Cummings, Steven R. | Psaty, Bruce M. | Fornage, Myriam | Iyengar, Sudha K. | Evans, Michele K. | Becker, Diane M. | Kao, W. H. Linda | Wilson, James G. | Rotter, Jerome I. | Sale, Michèle M. | Liu, Simin | Rotimi, Charles N. | Bowden, Donald W.
PLoS Genetics  2014;10(8):e1004517.
Type 2 diabetes (T2D) is more prevalent in African Americans than in Europeans. However, little is known about the genetic risk in African Americans despite the recent identification of more than 70 T2D loci primarily by genome-wide association studies (GWAS) in individuals of European ancestry. In order to investigate the genetic architecture of T2D in African Americans, the MEta-analysis of type 2 DIabetes in African Americans (MEDIA) Consortium examined 17 GWAS on T2D comprising 8,284 cases and 15,543 controls in African Americans in stage 1 analysis. Single nucleotide polymorphisms (SNPs) association analysis was conducted in each study under the additive model after adjustment for age, sex, study site, and principal components. Meta-analysis of approximately 2.6 million genotyped and imputed SNPs in all studies was conducted using an inverse variance-weighted fixed effect model. Replications were performed to follow up 21 loci in up to 6,061 cases and 5,483 controls in African Americans, and 8,130 cases and 38,987 controls of European ancestry. We identified three known loci (TCF7L2, HMGA2 and KCNQ1) and two novel loci (HLA-B and INS-IGF2) at genome-wide significance (4.15×10−94
Author Summary
Despite the higher prevalence of type 2 diabetes (T2D) in African Americans than in Europeans, recent genome-wide association studies (GWAS) were examined primarily in individuals of European ancestry. In this study, we performed meta-analysis of 17 GWAS in 8,284 cases and 15,543 controls to explore the genetic architecture of T2D in African Americans. Following replication in additional 6,061 cases and 5,483 controls in African Americans, and 8,130 cases and 38,987 controls of European ancestry, we identified two novel and three previous reported T2D loci reaching genome-wide significance. We also examined 158 loci previously reported to be associated with T2D or regulating glucose homeostasis. While 56% of these loci were shared between African Americans and the other populations, the strongest associations in African Americans are often found in nearby single nucleotide polymorphisms (SNPs) instead of the original SNPs reported in other populations due to differential genetic architecture across populations. Our results highlight the importance of performing genetic studies in non-European populations to fine map the causal genetic variants.
PMCID: PMC4125087  PMID: 25102180
Frontiers in Genetics  2014;5:250.
Phenome-wide association studies (PheWAS) have demonstrated utility in validating genetic associations derived from traditional genetic studies as well as identifying novel genetic associations. Here we used an electronic health record (EHR)-based PheWAS to explore pleiotropy of genetic variants in the fat mass and obesity associated gene (FTO), some of which have been previously associated with obesity and type 2 diabetes (T2D). We used a population of 10,487 individuals of European ancestry with genome-wide genotyping from the Electronic Medical Records and Genomics (eMERGE) Network and another population of 13,711 individuals of European ancestry from the BioVU DNA biobank at Vanderbilt genotyped using Illumina HumanExome BeadChip. A meta-analysis of the two study populations replicated the well-described associations between FTO variants and obesity (odds ratio [OR] = 1.25, 95% Confidence Interval = 1.11–1.24, p = 2.10 × 10−9) and FTO variants and T2D (OR = 1.14, 95% CI = 1.08–1.21, p = 2.34 × 10−6). The meta-analysis also demonstrated that FTO variant rs8050136 was significantly associated with sleep apnea (OR = 1.14, 95% CI = 1.07–1.22, p = 3.33 × 10−5); however, the association was attenuated after adjustment for body mass index (BMI). Novel phenotype associations with obesity-associated FTO variants included fibrocystic breast disease (rs9941349, OR = 0.81, 95% CI = 0.74–0.91, p = 5.41 × 10−5) and trends toward associations with non-alcoholic liver disease and gram-positive bacterial infections. FTO variants not associated with obesity demonstrated other potential disease associations including non-inflammatory disorders of the cervix and chronic periodontitis. These results suggest that genetic variants in FTO may have pleiotropic associations, some of which are not mediated by obesity.
PMCID: PMC4134007  PMID: 25177340
PheWAS; genetic association; pleiotropy; Exome chip; FTO; BMI
Genetic studies require precise phenotype definitions, but electronic medical record (EMR) phenotype data are recorded inconsistently and in a variety of formats.
To present lessons learned about validation of EMR-based phenotypes from the Electronic Medical Records and Genomics (eMERGE) studies.
Materials and methods
The eMERGE network created and validated 13 EMR-derived phenotype algorithms. Network sites are Group Health, Marshfield Clinic, Mayo Clinic, Northwestern University, and Vanderbilt University.
By validating EMR-derived phenotypes we learned that: (1) multisite validation improves phenotype algorithm accuracy; (2) targets for validation should be carefully considered and defined; (3) specifying time frames for review of variables eases validation time and improves accuracy; (4) using repeated measures requires defining the relevant time period and specifying the most meaningful value to be studied; (5) patient movement in and out of the health plan (transience) can result in incomplete or fragmented data; (6) the review scope should be defined carefully; (7) particular care is required in combining EMR and research data; (8) medication data can be assessed using claims, medications dispensed, or medications prescribed; (9) algorithm development and validation work best as an iterative process; and (10) validation by content experts or structured chart review can provide accurate results.
Despite the diverse structure of the five EMRs of the eMERGE sites, we developed, validated, and successfully deployed 13 electronic phenotype algorithms. Validation is a worthwhile process that not only measures phenotype performance but also strengthens phenotype algorithm definitions and enhances their inter-institutional sharing.
PMCID: PMC3715338  PMID: 23531748
electronic medical record; electronic health record; genomics; phenotype; validation studies
Nature biotechnology  2013;31(12):1102-1110.
Candidate gene and genome-wide association studies (GWAS) have identified genetic variants that modulate risk for human disease; many of these associations require further study to replicate the results. Here we report the first large-scale application of the phenome-wide association study (PheWAS) paradigm within electronic medical records (EMRs), an unbiased approach to replication and discovery that interrogates relationships between targeted genotypes and multiple phenotypes. We scanned for associations between 3,144 single-nucleotide polymorphisms (previously implicated by GWAS as mediators of human traits) and 1,358 EMR-derived phenotypes in 13,835 individuals of European ancestry. This PheWAS replicated 66% (51/77) of sufficiently powered prior GWAS associations and revealed 63 potentially pleiotropic associations with P < 4.6 × 10−6 (false discovery rate < 0.1); the strongest of these novel associations were replicated in an independent cohort (n = 7,406). These findings validate PheWAS as a tool to allow unbiased interrogation across multiple phenotypes in EMR-based cohorts and to enhance analysis of the genomic basis of human disease.
PMCID: PMC3969265  PMID: 24270849
Clinical chemistry  2011;58(2):10.1373/clinchem.2011.168559.
Soluble intercellular adhesion molecule-1 (sICAM-1) is associated with endothelial dysfunction and clinical cardiovascular disease. We investigated the relationship of subclinical atherosclerosis with sICAM-1 concentration.
sICAM-1 concentration was assayed at year 15 of the Coronary Artery Risk Development in Young Adults (CARDIA) Study (black and white men and women, average age 40 years). We assessed progression of coronary artery calcification through year 20 (CAC, n=2378), and both carotid artery stenosis (n=2432) and intima media thickness at year 20 (IMT, n = 2240).
Median sICAM-1 was 145.9 ng/ml. Among a subgroup with advanced atherosclerotic plaque (either CAC or stenosis), IMT was 0.010 (95% confidence interval (CI) 0.003–0.017 mm) higher per standard deviation of sICAM-1 (44 ng/ml) in a model adjusted for age, race, sex, clinic, smoking, exercise, body size, education, blood pressure, antihypertensive medication, plasma lipids, and cholesterol lowering medication. With the same adjustment, the odds ratios (OR) for the presence of year 20 carotid artery stenosis per SD of sICAM-1 was 1.12 (CI 1.01–1.25, p<0.04), while for occurrence of CAC progression the OR was 1.16 (CI 1.04–1.31, p<0.01). The associations with CAC and carotid stenosis were strongest in the top 20th of the sICAM-1 distribution.
sICAM-1 concentration may be an early biomarker that indicates changes in the artery wall that accompany atherosclerosis, as well as the presence of advanced plaque in the coronary and carotid arteries. This finding holds in people with low total burden of atherosclerosis, decades prior to the development of clinical CVD.
PMCID: PMC3867124  PMID: 22179741
Only one LDL-C GWAS has been reported in African Americans. We performed a GWAS of LDL-C in African Americans using data extracted from electronic medical records (EMR) in the eMERGE network. African Americans were genotyped on the Illumina 1M chip. All LDL-C measurements, prescriptions, and diagnoses of concomitant disease were extracted from EMR. We created two analytic datasets; one dataset having median LDL-C calculated after the exclusion of some lab values based on co-morbidities and medication (n = 618) and another dataset having median LDL-C calculated without any exclusions (n = 1249). Rs7412 in APOE was strongly associated with LDL-C at levels of GWAS significance in both datasets (p < 5 X 10−8). In the dataset with exclusions, a decrease of 20.0 mg/dl per minor allele was observed. The effect size was attenuated (12.3 mg/dl) in the dataset without any lab values excluded. Although other signals in APOE have been detected in previous GWAS, this large and important SNP association has not been well detected in large GWAS because rs7412 was not included on many genotyping arrays. Use of median LDL-C extracted from EMR after exclusions for medications and co-morbidities increased the percentage of trait variance explained by genetic variation.
PMCID: PMC3521536  PMID: 23067351
GWAS; LDL; electronic medical records
Science translational medicine  2011;3(79):79re1.
Clinical data in Electronic Medical Records (EMRs) is a potential source of longitudinal clinical data for research. The Electronic Medical Records and Genomics Network or eMERGE investigates whether data captured through routine clinical care using EMRs can identify disease phenotypes with sufficient positive and negative predictive values for use in genome wide association studies (GWAS). Using data from five different sets of EMRs, we have identified five disease phenotypes with positive predictive values of 73–98% and negative predictive values of 98–100%. A majority of EMRs captured key information (diagnoses, medications, laboratory tests) used to define phenotypes in a structured format. We identified natural language processing as an important tool to improve case identification rates. Efforts and incentives to increase the implementation of interoperable EMRs will markedly improve the availability of clinical data for genomics research.
PMCID: PMC3690272  PMID: 21508311
Information extraction (IE), a natural language processing (NLP) task that automatically extracts structured or semi-structured information from free text, has become popular in the clinical domain for supporting automated systems at point-of-care and enabling secondary use of electronic health records (EHRs) for clinical and translational research. However, a high performance IE system can be very challenging to construct due to the complexity and dynamic nature of human language. In this paper, we report an IE framework for cohort identification using EHRs that is a knowledge-driven framework developed under the Unstructured Information Management Architecture (UIMA). A system to extract specific information can be developed by subject matter experts through expert knowledge engineering of the externalized knowledge resources used in the framework.
PMCID: PMC3845757  PMID: 24303255
Genetic epidemiology  2011;35(8):887-898.
Genome-wide association studies (GWAS) are a useful approach in the study of the genetic components of complex phenotypes. Aside from large cohorts, GWAS have generally been limited to the study of one or a few diseases or traits. The emergence of biobanks linked to electronic medical records (EMRs) allows the efficient re-use of genetic data to yield meaningful genotype-phenotype associations for multiple phenotypes or traits. Phase I of the electronic MEdical Records and GEnomics (eMERGE-I) Network is a National Human Genome Research Institute (NHGRI)-supported consortium composed of five sites to perform various genetic association studies using DNA repositories and EMR systems. Each eMERGE site has developed EMR-based algorithms to comprise a core set of fourteen phenotypes for extraction of study samples from each site’s DNA repository. Each eMERGE site selected samples for a specific phenotype, and these samples were genotyped at either the Broad Institute or at the Center for Inherited Disease Research (CIDR) using the Illumina Infinium BeadChip technology. In all, approximately 17,000 samples from across the five sites were genotyped. A unified quality control (QC) pipeline was developed by the eMERGE Genomics Working Group and used to ensure thorough cleaning of the data. This process includes examination of sample quality, marker quality, and various batch effects. Upon completion of the genotyping and QC analyses for each site’s primary study, the eMERGE Coordinating Center merged the datasets from all five sites. This larger merged dataset re-entered the established eMERGE QC pipeline. Based on lessons learned during the process, additional analyses and QC checkpoints were added to the pipeline to ensure proper merging. Here we explore the challenges associated with combining datasets from different genotyping centers and describe the expansion to the eMERGE QC pipeline for merged datasets. These additional steps will be useful as the eMERGE project expands to include additional sites in eMERGE-II and also serve as a starting point for investigators merging multiple genotype data sets accessible through the National Center for Biotechnology Information (NCBI) in the database of Genotypes and Phenotypes (dbGaP). Our experience demonstrates that merging multiple datasets after additional QC can be an efficient use of genotype data despite new challenges that appear in the process.
PMCID: PMC3592376  PMID: 22125226
quality control; genome-wide association (GWAS); eMERGE; dbGaP; merging datasets
Genome-wide association studies (GWAS) require high specificity and large numbers of subjects to identify genotype–phenotype correlations accurately. The aim of this study was to identify type 2 diabetes (T2D) cases and controls for a GWAS, using data captured through routine clinical care across five institutions using different electronic medical record (EMR) systems.
Materials and Methods
An algorithm was developed to identify T2D cases and controls based on a combination of diagnoses, medications, and laboratory results. The performance of the algorithm was validated at three of the five participating institutions compared against clinician review. A GWAS was subsequently performed using cases and controls identified by the algorithm, with samples pooled across all five institutions.
The algorithm achieved 98% and 100% positive predictive values for the identification of diabetic cases and controls, respectively, as compared against clinician review. By standardizing and applying the algorithm across institutions, 3353 cases and 3352 controls were identified. Subsequent GWAS using data from five institutions replicated the TCF7L2 gene variant (rs7903146) previously associated with T2D.
By applying stringent criteria to EMR data collected through routine clinical care, cases and controls for a GWAS were identified that subsequently replicated a known genetic variant. The use of standard terminologies to define data elements enabled pooling of subjects and data across five different institutions to achieve the robust numbers required for GWAS.
An algorithm using commonly available data from five different EMR can accurately identify T2D cases and controls for genetic study across multiple institutions.
PMCID: PMC3277617  PMID: 22101970
Analytics; application of biological knowledge to clinical care; bioinformatics; biomedical informatics; clinical phenotyping; controlled terminologies and vocabularies; data mining; EHR; EMR secondary and meaningful use; genetic epidemiology; genetics; genome-wide association studies; genomics; HIT data standards; improving the education and skills training of health professionals; infection control; information retrieval; knowledge representations; linking the genotype and phenotype; medical informatics; modeling; natural-language processing; ontologies; pharmacogenomics; phenotyping; reuseability; translational research
The ability to conduct genome-wide association studies (GWAS) has enabled new exploration of how genetic variations contribute to health and disease etiology. However, historically GWAS have been limited by inadequate sample size due to associated costs for genotyping and phenotyping of study subjects. This has prompted several academic medical centers to form “biobanks” where biospecimens linked to personal health information, typically in electronic health records (EHRs), are collected and stored on a large number of subjects. This provides tremendous opportunities to discover novel genotype-phenotype associations and foster hypotheses generation.
In this work, we study how emerging Semantic Web technologies can be applied in conjunction with clinical and genotype data stored at the Mayo Clinic Biobank to mine the phenotype data for genetic associations. In particular, we demonstrate the role of using Resource Description Framework (RDF) for representing EHR diagnoses and procedure data, and enable federated querying via standardized Web protocols to identify subjects genotyped for Type 2 Diabetes and Hypothyroidism to discover gene-disease associations. Our study highlights the potential of Web-scale data federation techniques to execute complex queries.
This study demonstrates how Semantic Web technologies can be applied in conjunction with clinical data stored in EHRs to accurately identify subjects with specific diseases and phenotypes, and identify genotype-phenotype associations.
PMCID: PMC3554594  PMID: 23244446
The ability to conduct genome-wide association studies (GWAS) has enabled new exploration of how genetic variations contribute to health and disease etiology. However, historically GWAS have been limited by inadequate sample size due to associated costs for genotyping and phenotyping of study subjects. This has prompted several academic medical centers to form “biobanks” where biospecimens linked to personal health information, typically in electronic health records (EHRs), are collected and stored on large number of subjects. This provides tremendous opportunities to discover novel genotype-phenotype associations and foster hypothesis generation. In this work, we study how emerging Semantic Web technologies can be applied in conjunction with clinical and genotype data stored at the Mayo Clinic Biobank to mine the phenotype data for genetic associations. In particular, we demonstrate the role of using Resource Description Framework (RDF) for representing EHR diagnoses and procedure data, and enable federated querying via standardized Web protocols to identify subjects genotyped with Type 2 Diabetes for discovering gene-disease associations. Our study highlights the potential of Web-scale data federation techniques to execute complex queries.
PMCID: PMC3540447  PMID: 23304343

Results 1-25 (36)