Prospective epidemiologic studies have characterized major risk factors for incident diabetes by a variety of diabetes case definitions. Whether different definitions alter the association of diabetes with risk factors is largely unknown. Using 1987–1998 data from the ongoing Atherosclerosis Risk in Communities (ARIC) Study, the authors assessed the relation of traditional risk factors with 3 different diabetes case definitions and 4 fasting glucose categories. They compared the study protocol case definition with 2 nested case definitions, self-reported diabetes and a multiple-evidence definition. Significant differences in risk factor associations by case definition and by screening cutpoints were observed. Specifically, the magnitude of the association between the risk factors (baseline metabolic syndrome, fasting glucose, blood pressure, body mass index, and serum insulin) and incident diabetes differed by case definition. Associations with these risk factors were weaker with a case definition based on self-report compared with other definitions. These results illustrate the potential limitations of case definitions that rely solely on self-report or those that incorporate measured glucose values to ascertain undiagnosed cases. Although the ability to identify risk factors of diabetes was consistent for the case definitions studied, tests of novel risk factors may result in different estimates of effect sizes depending on the definition used.
diabetes mellitus, type 2; epidemiologic methods
Soluble intercellular adhesion molecule-1 (sICAM-1) is associated with endothelial dysfunction and clinical cardiovascular disease. We investigated the relationship of subclinical atherosclerosis with sICAM-1 concentration.
sICAM-1 concentration was assayed at year 15 of the Coronary Artery Risk Development in Young Adults (CARDIA) Study (black and white men and women, average age 40 years). We assessed progression of coronary artery calcification through year 20 (CAC, n=2378), and both carotid artery stenosis (n=2432) and intima media thickness at year 20 (IMT, n = 2240).
Median sICAM-1 was 145.9 ng/ml. Among a subgroup with advanced atherosclerotic plaque (either CAC or stenosis), IMT was 0.010 (95% confidence interval (CI) 0.003–0.017 mm) higher per standard deviation of sICAM-1 (44 ng/ml) in a model adjusted for age, race, sex, clinic, smoking, exercise, body size, education, blood pressure, antihypertensive medication, plasma lipids, and cholesterol lowering medication. With the same adjustment, the odds ratios (OR) for the presence of year 20 carotid artery stenosis per SD of sICAM-1 was 1.12 (CI 1.01–1.25, p<0.04), while for occurrence of CAC progression the OR was 1.16 (CI 1.04–1.31, p<0.01). The associations with CAC and carotid stenosis were strongest in the top 20th of the sICAM-1 distribution.
sICAM-1 concentration may be an early biomarker that indicates changes in the artery wall that accompany atherosclerosis, as well as the presence of advanced plaque in the coronary and carotid arteries. This finding holds in people with low total burden of atherosclerosis, decades prior to the development of clinical CVD.
Aims and Hypothesis
We hypothesize that transcription factor 7-like 2 (TCF7L2) single nucleotide polymorphisms (SNPs) are associated with cardiovascular disease (CVD) and that the associations differ in diabetic and non-diabetic participants.
Black and white subjects from the Atherosclerosis Risk in Communities (ARIC) study who were free of prevalent CVD at baseline and genotyped for rs7903146, rs12255372, rs7901695, rs11196205, and rs7895340 were included in this analysis (n = 13,369). Cox proportional hazard regression was used to estimate the associations of polymorphisms and incident events and logistic and linear regression were used for associations with baseline risk factor levels.
TCF7L2 SNPs were not significantly associated with incident coronary heart disease, ischemic stroke, CVD, prevalent peripheral artery disease (PAD), or with all-cause mortality in the full cohort or stratified by race.
In the whole cohort, TCF7L2 SNPs were not associated with incident CVD, all-cause mortality, or prevalent PAD. This result suggests that the increased health risk associated with rs7903146 genotype is specific to diabetes.
All-cause mortality; Cardiovascular disease; Coronary heart disease; Diabetes; Peripheral artery disease; Stroke; Transcription factor 7-like 2 (TCF7L2)
Clinical data in Electronic Medical Records (EMRs) is a potential source of longitudinal clinical data for research. The Electronic Medical Records and Genomics Network or eMERGE investigates whether data captured through routine clinical care using EMRs can identify disease phenotypes with sufficient positive and negative predictive values for use in genome wide association studies (GWAS). Using data from five different sets of EMRs, we have identified five disease phenotypes with positive predictive values of 73–98% and negative predictive values of 98–100%. A majority of EMRs captured key information (diagnoses, medications, laboratory tests) used to define phenotypes in a structured format. We identified natural language processing as an important tool to improve case identification rates. Efforts and incentives to increase the implementation of interoperable EMRs will markedly improve the availability of clinical data for genomics research.
Information extraction (IE), a natural language processing (NLP) task that automatically extracts structured or semi-structured information from free text, has become popular in the clinical domain for supporting automated systems at point-of-care and enabling secondary use of electronic health records (EHRs) for clinical and translational research. However, a high performance IE system can be very challenging to construct due to the complexity and dynamic nature of human language. In this paper, we report an IE framework for cohort identification using EHRs that is a knowledge-driven framework developed under the Unstructured Information Management Architecture (UIMA). A system to extract specific information can be developed by subject matter experts through expert knowledge engineering of the externalized knowledge resources used in the framework.
Genome-wide association studies (GWAS) are a useful approach in the study of the genetic components of complex phenotypes. Aside from large cohorts, GWAS have generally been limited to the study of one or a few diseases or traits. The emergence of biobanks linked to electronic medical records (EMRs) allows the efficient re-use of genetic data to yield meaningful genotype-phenotype associations for multiple phenotypes or traits. Phase I of the electronic MEdical Records and GEnomics (eMERGE-I) Network is a National Human Genome Research Institute (NHGRI)-supported consortium composed of five sites to perform various genetic association studies using DNA repositories and EMR systems. Each eMERGE site has developed EMR-based algorithms to comprise a core set of fourteen phenotypes for extraction of study samples from each site’s DNA repository. Each eMERGE site selected samples for a specific phenotype, and these samples were genotyped at either the Broad Institute or at the Center for Inherited Disease Research (CIDR) using the Illumina Infinium BeadChip technology. In all, approximately 17,000 samples from across the five sites were genotyped. A unified quality control (QC) pipeline was developed by the eMERGE Genomics Working Group and used to ensure thorough cleaning of the data. This process includes examination of sample quality, marker quality, and various batch effects. Upon completion of the genotyping and QC analyses for each site’s primary study, the eMERGE Coordinating Center merged the datasets from all five sites. This larger merged dataset re-entered the established eMERGE QC pipeline. Based on lessons learned during the process, additional analyses and QC checkpoints were added to the pipeline to ensure proper merging. Here we explore the challenges associated with combining datasets from different genotyping centers and describe the expansion to the eMERGE QC pipeline for merged datasets. These additional steps will be useful as the eMERGE project expands to include additional sites in eMERGE-II and also serve as a starting point for investigators merging multiple genotype data sets accessible through the National Center for Biotechnology Information (NCBI) in the database of Genotypes and Phenotypes (dbGaP). Our experience demonstrates that merging multiple datasets after additional QC can be an efficient use of genotype data despite new challenges that appear in the process.
quality control; genome-wide association (GWAS); eMERGE; dbGaP; merging datasets
The ability to conduct genome-wide association studies (GWAS) has enabled new exploration of how genetic variations contribute to health and disease etiology. However, historically GWAS have been limited by inadequate sample size due to associated costs for genotyping and phenotyping of study subjects. This has prompted several academic medical centers to form “biobanks” where biospecimens linked to personal health information, typically in electronic health records (EHRs), are collected and stored on a large number of subjects. This provides tremendous opportunities to discover novel genotype-phenotype associations and foster hypotheses generation.
In this work, we study how emerging Semantic Web technologies can be applied in conjunction with clinical and genotype data stored at the Mayo Clinic Biobank to mine the phenotype data for genetic associations. In particular, we demonstrate the role of using Resource Description Framework (RDF) for representing EHR diagnoses and procedure data, and enable federated querying via standardized Web protocols to identify subjects genotyped for Type 2 Diabetes and Hypothyroidism to discover gene-disease associations. Our study highlights the potential of Web-scale data federation techniques to execute complex queries.
This study demonstrates how Semantic Web technologies can be applied in conjunction with clinical data stored in EHRs to accurately identify subjects with specific diseases and phenotypes, and identify genotype-phenotype associations.
The ability to conduct genome-wide association studies (GWAS) has enabled new exploration of how genetic variations contribute to health and disease etiology. However, historically GWAS have been limited by inadequate sample size due to associated costs for genotyping and phenotyping of study subjects. This has prompted several academic medical centers to form “biobanks” where biospecimens linked to personal health information, typically in electronic health records (EHRs), are collected and stored on large number of subjects. This provides tremendous opportunities to discover novel genotype-phenotype associations and foster hypothesis generation. In this work, we study how emerging Semantic Web technologies can be applied in conjunction with clinical and genotype data stored at the Mayo Clinic Biobank to mine the phenotype data for genetic associations. In particular, we demonstrate the role of using Resource Description Framework (RDF) for representing EHR diagnoses and procedure data, and enable federated querying via standardized Web protocols to identify subjects genotyped with Type 2 Diabetes for discovering gene-disease associations. Our study highlights the potential of Web-scale data federation techniques to execute complex queries.
Markers of monocyte activation play a critical role in atherosclerosis, but little is known about the genetic influences on cellular levels. Therefore, we investigated the influence of genetic variants in monocyte differentiation antigen (CD14), toll-like receptor-4 (TLR4), toll-like receptor-2 (TLR2), and myeloperoxidase (MPO) on monocyte surface receptor levels. The study sample consisted of 1,817 members of a biracial cohort of adults from the Atherosclerosis Risk in Communities Carotid MRI Study. Monocyte receptors were measured using flow cytometry on fasting whole blood samples. TLR2 rs1816702 genotype was significantly associated with CD14+/TLR2+ percent of positive cells (%) and median fluorescence intensity (MFI) in whites but not in blacks (p < 0.001). Specifically, the presence of the minor T-allele was associated with increased receptor levels. In blacks, TLR4 rs5030719 was significantly associated with CD14+/TLR4+ monocytes (MFI) with mean ± SE intensities of 16.7 ± 0.05 and 16.0 ± 0.14 for GG and GT/TT genotypes, respectively (p < 0.001). Variants in TLR2 and TLR4 were associated with monocyte receptor levels of TLR2 and TLR4, respectively, in a biracial cohort of adults. To our knowledge, this is the first study to look at associations between variants in the toll-like receptor family and toll-like receptor levels on monocytes.
Polymorphisms within the ICAM1 structural gene have been shown to influence circulating levels of soluble intercellular adhesion molecule -1 (sICAM-1) but their relation to atherosclerosis has not been clearly established. We sought to determine whether ICAM1 SNPs are associated with circulating sICAM-1 concentration, coronary artery calcium (CAC), and common and internal carotid intima medial thickness (IMT).
Methods and Results
3,550 black and white Coronary Artery Risk Development in Young Adults (CARDIA) Study subjects who participated in the year 15 and/or 20 examinations and were part of the Young Adult Longitudinal Study of Antioxidants (YALTA) ancillary study were included in this analysis. In whites, rs5498 was significantly associated with sICAM-1 (p < 0.001) and each G-allele of rs5498 was associated with 5% higher sICAM-1 concentration. In blacks, each C-allele of rs5490 was associated with 6 % higher sICAM-1 level; this SNP was in strong linkage disequilibrium with rs5491, a functional variant. Subclinical measurements of atherosclerosis in either year 15 or year 20 were not significantly related to ICAM1 SNPs.
In CARDIA, ICAM1 DNA segment variants were associated with sICAM-1 protein level including the novel finding that levels differ by the functional variant rs5491. However, ICAM1 SNPs were not strongly related to either IMT or CAC. Our findings in CARDIA suggest that ICAM1 variants are not major early contributors to subclinical atherosclerosis.
cell adhesion molecules; atherosclerosis; coronary calcium; genetics; inflammation
Atrial fibrillation (AF) often coexists with myocardial infarction (MI), yet its prognostic influence is controversial. Prior reports studied the role of AF during the early hospitalization for acute MI on the risk of death and could not address the timing of AF in relation to the MI (i.e. prior, during, post). Further, as data come mostly from clinical trials, their applicability to the community is uncertain. The aims of our study were to assess the occurrence of AF among MI patients, determine whether it has changed over time, and quantify its impact and the impact of its timing on mortality after MI.
Methods and Results
This was a community-based cohort of 3220 patients hospitalized with incident (first-ever) MI from 1983 to 2007 in Olmsted County, Minnesota. AF was identified by diagnostic codes and ECG. Outcomes were all-cause and cardiovascular death. AF prior to MI was identified in 304 patients and 729 developed AF after MI (218 (30%) within 2 days, 119 (16%) between 3 and 30 days, and 392 (54%) >30 days post-MI). The cumulative incidence of AF after MI at 5 years was 19% and did not change over calendar year of MI. During a mean follow-up of 6.6 years, 1638 deaths occurred. AF was associated with an increased risk of death (HR (95% CI) 3.77 (3.37–4.21)), independently of clinical characteristics at the time of MI and heart failure. This risk differed markedly according to the timing of AF and was the greatest for AF occurring >30 days post-MI (HR (95% CI) 1.63 (1.37–1.93) for AF within 2 days, 1.81 (.45–2.27) for AF between 3 and 30 days, and 2.58 (2.21–3.00) for AF > 30 days post MI).
In the community, AF is frequent in the setting of MI. AF carries an excess risk of death, which is the highest for AF developing more than 30 days post-MI.
atrial fibrillation; myocardial Infarction; mortality
OBJECTIVE: To investigate the association between 347 single-nucleotide polymorphisms within candidate genes of the tumor necrosis factor, interleukin 1 and interleukin 6 families with neutrophil count.
PATIENTS AND METHODS: Four hundred cases with heart failure after myocardial infarction (MI) were matched by age, sex, and date of incident MI to 694 controls (MI without post-MI heart failure). Both genotypes and neutrophil count at admission for incident MI were available in 314 cases and 515 controls.
RESULTS: We found significant associations between the TNFSF8 poly morphisms rs927374 (P=5.1 x 10–5) and rs2295800 (P=1.3 x 10–4) and neutrophil count; these single-nucleotide polymorphisms are in high linkage disequilibrium (r2=0.97). Associations persisted after controlling for clinical characteristics and were unchanged after adjusting for case-control status. For rs927374, the neutrophil count of GG homozygotes (7.6±5.1) was 16% lower than that of CC homozygotes (9.0±5.2).
CONCLUSION: The TNFSF8 polymorphisms rs927374 and rs2295800 were associated with neutrophil count. This finding suggests that post-MI inflammatory response is genetically modulated.
Elevated fasting glucose level is associated with increased carotid intima-media thickness (IMT), a measure of subclinical atherosclerosis. It is unclear if this association is causal. Using the principle of Mendelian randomization, we sought to explore the causal association between circulating glucose and IMT by examining the association of a genetic risk score with IMT.
RESEARCH DESIGN AND METHODS
The sample was drawn from the Atherosclerosis Risk in Communities (ARIC) study and included 7,260 nondiabetic Caucasian individuals with IMT measurements and relevant genotyping. Components of the fasting glucose genetic risk score (FGGRS) were selected from a fasting glucose genome-wide association study in ARIC. The score was created by combining five single nucleotide polymorphisms (SNPs) (rs780094 [GCKR], rs560887 [G6PC2], rs4607517 [GCK], rs13266634 [SLC30A8], and rs10830963 [MTNR1B]) and weighting each SNP by its strength of association with fasting glucose. IMT was measured through bilateral carotid ultrasound. Mean IMT was regressed on the FGGRS and on the component SNPs, individually.
The FGGRS was significantly associated (P = 0.009) with mean IMT. The difference in IMT predicted by a 1 SD increment in the FGGRS (0.0048 mm) was not clinically relevant but was larger than would have been predicted based on observed associations between the FFGRS, fasting glucose, and IMT. Additional adjustment for baseline measured glucose in regression models attenuated the association by about one third.
The significant association of the FGGRS with IMT suggests a possible causal association of elevated fasting glucose with atherosclerosis, although it may be that these loci influence IMT through nonglucose pathways.
OBJECTIVE: To create a cohort for cost-effective genetic research, the Mayo Genome Consortia (MayoGC) has been assembled with participants from research studies across Mayo Clinic with high-throughput genetic data and electronic medical record (EMR) data for phenotype extraction.
PARTICIPANTS AND METHODS: Eligible participants include those who gave general research consent in the contributing studies to share high-throughput genotyping data with other investigators. Herein, we describe the design of the MayoGC, including the current participating cohorts, expansion efforts, data processing, and study management and organization. A genome-wide association study to identify genetic variants associated with total bilirubin levels was conducted to test the genetic research capability of the MayoGC.
RESULTS: Genome-wide significant results were observed on 2q37 (top single nucleotide polymorphism, rs4148325; P=5.0 × 10–62) and 12p12 (top single nucleotide polymorphism, rs4363657; P=5.1 × 10–8) corresponding to a gene cluster of uridine 5′-diphospho-glucuronosyltransferases (the UGT1A cluster) and solute carrier organic anion transporter family, member 1B1 (SLCO1B1), respectively.
CONCLUSION: Genome-wide association studies have identified genetic variants associated with numerous phenotypes but have been historically limited by inadequate sample size due to costly genotyping and phenotyping. Large consortia with harmonized genotype data have been assembled to attain sufficient statistical power, but phenotyping remains a rate-limiting factor in gene discovery research efforts. The EMR consists of an abundance of phenotype data that can be extracted in a relatively quick and systematic manner. The MayoGC provides a model of a unique collaborative effort in the environment of a common EMR for the investigation of genetic determinants of diseases.
Although GWAS have been performed in longitudinal studies, most used only a single trait measure. GWAS of fasting glucose have generally included only normoglycemic individuals. We examined the impact of both repeated measures and sample selection on GWAS in ARIC, a study which obtained four longitudinal measures of fasting glucose and included both individuals with and without prevalent diabetes. The sample included Caucasians and the Affymetrix 6.0 chip was used for genotyping. Sample sizes for GWAS analyses ranged from 8372 (first study visit) to 5782 (average fasting glucose). Candidate SNP analyses with SNPs identified through fasting glucose or diabetes GWAS were conducted in 9133 individuals, including 761 with prevalent diabetes. For a constant sample size, smaller p-values were obtained for the average measure of fasting glucose compared to values at any single visit, and two additional significant GWAS signals were detected. For four candidate SNPs (rs780094, rs10830963, rs7903146, and rs4607517), the strength of association between genotype and glucose was significantly (p-interaction < .05) different in those with and without prevalent diabetes and for all five fasting glucose candidate SNPs (rs780094, rs10830963, rs560887, rs4607517, rs13266634) the association with measured fasting glucose was more significant in the smaller sample without prevalent diabetes than in the larger combined sample of those with and without diabetes. This analysis demonstrates the potential utility of averaging trait values in GWAS studies and explores the advantage of using only individuals without prevalent diabetes in GWAS of fasting glucose.
GWAS; fasting glucose; type 2 diabetes; sample selection
To compare the sleep-disordered breathing prevalence among Hispanic and white Americans and Japanese, we performed a one-night sleep study with a single channel airflow monitor on 211 Hispanics and 246 whites from the Minnesota Field Center of the Multi-Ethnic Study of Atherosclerosis (MESA), and 978 Japanese from three community-based cohorts of the Circulatory Risk in Communities Study (CIRCS) in Japan.
The respiratory disturbance index and sleep-disordered breathing, defined as respiratory disturbance index ≥ 15 disturbances/hr, were estimated. The sleep-disordered breathing prevalence was higher in men (34.2%) than women (14.8%), and higher among Hispanics (36.5%) and whites (33.3%) than among Japanese (18.4%), corresponding to differences in body mass index. Within body mass index strata, the race difference in sleep-disordered breathing was attenuated. This was also true when we adjusted for body mass index instead of stratification. The strong association between body mass index and sleep-disordered breathing was similar in Japanese and Americans.
The sleep-disordered breathing prevalence was lower among Japanese than the Americans. However, the association of body mass index with sleep-disordered breathing was strong, and similar among the race/ethnic groups studied. The majority of the race/ethnic difference in sleep-disordered breathing prevalence was explained by a difference in body mass index distribution.
cross-sectional study; epidemiology; prevalence; sleep apnea
The transcription factor hepatocyte nuclear factor 1 (HNF-1) α regulates the activity of a number of genes involved in innate immunity, blood coagulation, lipid and glucose transport and metabolism, and cellular detoxification. Common polymorphisms of the HNF-1α gene (HNF1A) were recently associated with plasma C-reactive protein (CRP) and gamma-glutamyl transferase (GGT) concentration in middle-aged to older European-Americans (EA).
Methods and Results
We assessed whether common variants of HNF1A are associated with CRP, GGT, and other atherosclerotic and metabolic risk factors, in the large, population-based CARDIA study of healthy young European-American (EA; n=2,154) and African-American (AA; n=2,083) adults. The minor alleles of Ile27Leu (rs1169288) and Ser486Asn (rs2464196) were associated with 0.10 to 0.15 standard deviation units lower CRP and GGT levels in EA. The same HNF1A coding variants were associated with higher LDL cholesterol, apolipoprotein B, creatinine, and fibrinogen in EA. We replicated the associations between HNF1A coding variants and CRP, fibrinogen, LDL cholesterol, and renal function in a second population-based sample of EA adults 65 years and older from the Cardiovascular Health Study. The HNF1A Ser486Asn and/or Ile27Leu variants were also associated with increased risk of subclinical coronary atherosclerosis in CARDIA and with incident coronary heart disease in CHS. The Ile27Leu and Ser486Asn variants were 3-fold less common than in EA. There was little evidence of association between HNF1A genotype and atherosclerosis-related phenotypes in AA.
Common polymorphisms of HNF1A appear to influence multiple phenotypes related to cardiovascular risk in the general population of younger and older EA adults.
atherosclerosis; genetics; C-reactive protein; HNF-1; gamma glutamyl transferase
We examined the association of variation in the type 2 diabetes risk-conferring TCF7L2 gene with the risk of incident coronary heart disease (CHD) among the lean, overweight, and obese members of the Atherosclerosis Risk in Communities (ARIC) Study cohort. Cox proportional hazard regression analyses were performed using a general model, with the major homozygote as the reference category. For 9,865 whites, a significant increase in the risk of CHD was seen only among lean ( BMI < 25 kg/m2) individuals homozygous for the T allele of the TCF7L2 rs7903146 gene risk variant (hazard ratio 1.42; 95% CI 1.03,1.97; P = .01). No association was found among 3,631 blacks, regardless of BMI status. An attenuated hazard ratio was observed among the nondiabetic ARIC cohort members. This study suggests that body mass modifies the association of the TCF7L2 rs7903146 T allele with CHD risk.
Intercellular adhesion molecule-1 (ICAM-1) and vascular cell adhesion molecule-1 (VCAM-1) may be important contributors to the development and progression of atherosclerosis. Using a stratified random sample of 2,880 participants of the Multi-Ethnic Study of Atherosclerosis we investigated the relationship of 12 ICAM1 and 17 VCAM1 SNPs and coronary artery calcium (CAC) and ICAM1 SNPs and circulating levels of soluble ICAM-1 (sICAM-1). There were no ICAM1 or VCAM1 SNPs significantly associated with CAC in any of the four race/ethnic groups. In a subset of 1,451 subjects with sICAM-1 measurements, we observed a significant association with rs5491 in all four race/ethnic groups corroborating previous research that has shown that the T-allele of rs5491 interferes with the monoclonal antibody used to measure sICAM-1 in this study. After excluding all rs5491 T-allele carriers, several ICAM1 SNPs were significantly associated with sICAM-1 levels; rs5496 in African Americans, rs5498 and rs3093030 in European Americans, and rs1799969 in Hispanics. Our results identified ICAM1 polymorphisms that were significantly associated with sICAM-1 level but not CAC, a subclinical marker of atherosclerosis.
coronary artery calcium; intercellular adhesion molecule-1 (ICAM-1); vascular adhesion molecule-1 (VCAM-1); soluble intercellular adhesion molecule-1 (sICAM-1); gene; single nucleotide polymorphism (SNP); haplotypes
Atherogenesis is a chronic inflammatory process in which intercellular adhesion molecule 1 (ICAM-1) plays a critical role. Circulating soluble ICAM-1 (sICAM-1) is thought to be the result of cleavage of membrane-bound ICAM-1 and its concentration in serum/plasma has been shown to be heritable. Genome-wide linkage scans were conducted for quantitative trait loci influencing sICAM-1. Phenotype and genetic marker data were available for 2,617 white and 531 black individuals in the NHLBI Family Heart Study follow-up examination. Heritability for sICAM-1 was 0.39 in whites and 0.59 in blacks. Significant linkage was observed on chromosome 19 (LOD = 4.0 at 14 cM) in whites near the ICAM gene cluster that includes the structural gene for ICAM-1. The T-allele of ICAM-1 SNP rs5491 has been strongly associated with the specific sICAM-1 assay we used in our study. Through additional genotyping we were able to rule out rs5491 as the cause of the linkage finding. This study provides preliminary evidence linking genetic variation in the ICAM-1 structural gene to circulating sICAM-1 levels.
Intercellular adhesion molecule-1; Linkage (Genetics); ICAM gene cluster; inflammation; atherosclerosis
To determine whether variation in the transcription factor 7-like 2 (TCF7L2) gene, which influences diabetes risk, is associated with incidence of cancers.
RESEARCH DESIGN AND METHODS
We related diabetes and TCF7L2 variation with occurrence of several common cancers in a prospective cohort study of 13,117 middle-aged adults initially free of cancer in 1987–89. We assessed five SNPs in TCF7L2 including the putative SNP (rs7903146) for diabetes. We identified incident cancers through 2000 via cancer registries, supplemented by hospital records.
Diabetes was associated marginally inversely with incidence of prostate cancer, but not associated with incidence of colorectal, colon, lung, or breast cancer. The T allele of rs7903146 (frequency = 30%) was associated with increased risk of colorectal cancer and, more specifically, colon cancer, with adjusted hazard ratios (95% CI) of 1.0 for CC, 1.25 (0.85, 1.83) for CT, and 2.15 (1.27, 3.64) for TT genotypes (p for trend = 0.009). TCF7L2 variation also was associated with lung cancer incidence in whites but not blacks, but residual confounding by smoking may be present.
Initially cancer-free subjects carrying certain genetic variants of TCF7L2, most notably the T allele of rs7903146, have increased risk of colon cancer. This association appears to be an independent gene effect, not explained by diabetes. Because the T allele of rs7903146 is common, if a causal link is established, this variant could account for a sizable proportion (approximately 17% here) of colon cancer cases in the general population.
Type 2 diabetes mellitus (T2DM) is characterized by impaired insulin secretion, peripheral insulin resistance, and increased hepatic glucose production. Genes that contribute to genetic susceptibility to T2DM function in numerous biochemical pathways. Uncoupling protein-2 (UCP2) functions as a negative regulator of insulin secretion. Animal studies show induction of UCP2 plays a pathogenic role in the progression of obesity-induced T2DM, and some human studies have shown an association between a common UCP2 polymorphism, Ala55Val (rs660339), and T2DM, obesity, and resting metabolic rate with the Val/Val genotype conferring increased risk. We investigated the relationship between the Ala55Val variant and incidence of T2DM among 12,056 participants in the Atherosclerosis Risk in Communities (ARIC) Study ages 45−64 years at baseline. Incident T2DM (n=1,406) cases were identified over 9 years of follow-up. The Val55 allele frequency was 44% in blacks and 41% in whites. The rate of T2DM per 1,000 person-years was 15.0, 15.6, and 15.6 for Ala/Ala, Ala/Val, and Val/Val genotypes respectively. We found no significant association between UCP2 genotypes and incident T2DM in the whole cohort, in race-gender subgroups, or in categories of body mass index (normal-overweight-obese). The Ala55Val polymorphism of UCP2 was not associated with incident T2DM in the ARIC cohort.
mitochondrial uncoupling protein 2; Diabetes Mellitus; Type 2; Polymorphism; Single Nucleotide; Obesity; genetics
Data accumulated from mouse studies and in vitro studies of human arteries support the notion that soluble intercellular adhesion molecule-1 (sICAM-1) and monocyte chemoattractant protein-1 (MCP-1) play important roles in the inflammation process involved in atherosclerosis. However, at the population level, the utility of sICAM-1 and MCP-1 as biomarkers for subclinical atherosclerosis is less clear. In the follow-up exam of the NHLBI Family Heart Study, we evaluated whether plasma levels of sICAM-1 and MCP-1 were associated with coronary artery calcification (CAC), a measure of the burden of coronary atherosclerosis.
CAC was measured using the Agatston score with multidetector computed tomography. Information on CAC and MCP-1 was obtained in 2246 whites and 470 African Americans (mean age 55 years) without a history of coronary heart disease (CHD). Information on sICAM-1 was obtained for white participants only.
In whites, after adjustment for age and gender, the odds ratios (ORs) of CAC (CAC > 0) associated with the second, third, fourth, and fifth quintiles of sICAM-1 compared to the first quintile were 1.22 (95% confidence interval [CI]: 0.91–1.63), 1.15 (0.84–1.58), 1.49 (1.09–2.05), and 1.72 (1.26–2.36) (p = 0.0005 for trend test), respectively. The corresponding ORs for the second to fifth quintiles of MCP-1 were 1.26 (0.92–1.73), 0.99 (0.73–1.34), 1.42 (1.03–1.96), and 2.00 (1.43–2.79) (p < 0.0001 for trend test), respectively. In multivariable analysis that additionally adjusted for other CHD risk factors, the association of CAC with sICAM-1 and MCP-1 was attenuated and no longer statistically significant. In African Americans, the age and gender-adjusted ORs of CAC associated with the second and third tertiles of MCP-1 compared to the first tertile were 1.16 (0.64–2.08) and 1.25 (0.70–2.23) (p = 0.44 for trend test), respectively. This result did not change materially after additional adjustment for other CHD risk factors. Test of race interaction showed that the magnitude of association between MCP-1 and CAC did not differ significantly between African Americans and whites. Similar results were obtained when CAC ≥ 10 was analyzed as an outcome for both MCP-1 and sICAM-1.
This study suggests that sICAM-1 and MCP-1 are biomarkers of coronary atherosclerotic burden and their association with CAC was mainly driven by established CHD risk factors.
Only one LDL-C GWAS has been reported in African Americans. We performed a GWAS of LDL-C in African Americans using data extracted from electronic medical records (EMR) in the eMERGE network. African Americans were genotyped on the Illumina 1M chip. All LDL-C measurements, prescriptions, and diagnoses of concomitant disease were extracted from EMR. We created two analytic datasets; one dataset having median LDL-C calculated after the exclusion of some lab values based on co-morbidities and medication (n = 618) and another dataset having median LDL-C calculated without any exclusions (n = 1249). Rs7412 in APOE was strongly associated with LDL-C at levels of GWAS significance in both datasets (p < 5 X 10−8). In the dataset with exclusions, a decrease of 20.0 mg/dl per minor allele was observed. The effect size was attenuated (12.3 mg/dl) in the dataset without any lab values excluded. Although other signals in APOE have been detected in previous GWAS, this large and important SNP association has not been well detected in large GWAS because rs7412 was not included on many genotyping arrays. Use of median LDL-C extracted from EMR after exclusions for medications and co-morbidities increased the percentage of trait variance explained by genetic variation.
GWAS; LDL; electronic medical records
Genome-wide association studies (GWAS) require high specificity and large numbers of subjects to identify genotype–phenotype correlations accurately. The aim of this study was to identify type 2 diabetes (T2D) cases and controls for a GWAS, using data captured through routine clinical care across five institutions using different electronic medical record (EMR) systems.
Materials and Methods
An algorithm was developed to identify T2D cases and controls based on a combination of diagnoses, medications, and laboratory results. The performance of the algorithm was validated at three of the five participating institutions compared against clinician review. A GWAS was subsequently performed using cases and controls identified by the algorithm, with samples pooled across all five institutions.
The algorithm achieved 98% and 100% positive predictive values for the identification of diabetic cases and controls, respectively, as compared against clinician review. By standardizing and applying the algorithm across institutions, 3353 cases and 3352 controls were identified. Subsequent GWAS using data from five institutions replicated the TCF7L2 gene variant (rs7903146) previously associated with T2D.
By applying stringent criteria to EMR data collected through routine clinical care, cases and controls for a GWAS were identified that subsequently replicated a known genetic variant. The use of standard terminologies to define data elements enabled pooling of subjects and data across five different institutions to achieve the robust numbers required for GWAS.
An algorithm using commonly available data from five different EMR can accurately identify T2D cases and controls for genetic study across multiple institutions.
Analytics; application of biological knowledge to clinical care; bioinformatics; biomedical informatics; clinical phenotyping; controlled terminologies and vocabularies; data mining; EHR; EMR secondary and meaningful use; genetic epidemiology; genetics; genome-wide association studies; genomics; HIT data standards; improving the education and skills training of health professionals; infection control; information retrieval; knowledge representations; linking the genotype and phenotype; medical informatics; modeling; natural-language processing; ontologies; pharmacogenomics; phenotyping; reuseability; translational research