More than 100 clinical variables (Table ) were extracted from the EHR on a total of 824 patients who were consented as part of a bariatric surgery clinical research program on the genetics of obesity and related co-morbidities. Data in the EHR was obtained from a comprehensive history and physical examination performed on the initial visit, with laboratory measurements obtained within one month prior to surgery.
To define a population of morbidly obese patients for study, 49 patients (5.9%) whose body mass index (BMI) was <40, as well as 16 patients (1.9%) whose height and/or weight data were missing, were excluded from the analysis leaving 759 patients. Genotyping was then performed on available DNA from 709 of these patients. Gender, age, race, diagnoses, and medication use were obtained from the EHR on all patients. Values for laboratory measurements were obtained on at least 98% of patients for glomerular filtration rate, glucose, bun, sodium, potassium, chloride, CO2, calcium, and creatinine; on at least 97% of patients for white blood cell count (wbc), red blood cell count (rbc), hemoglobin (hgb), hematocrit (hct), mean cell volume (mcv), mean cell hemoglobin (mch), mean cell hemoglobin concentration (mchc), and red cell distribution width (rdw); on at least 96% of patients for triglycerides, cholesterol, high density lipoprotein cholesterol (hdl), alanine aminotransferase (alt), aspartate aminotransferase (ast), alkaline phosphatase, total bilirubin, and thyroid stimulating hormone (tsh); and on at least 94% of patients for low density lipoprotein cholesterol (ldl calculated), insulin, and hemoglobin A1c. Values were obtained on lower percentages of patients for iron (81%), iron binding capacity (81%), ferritin (81%), platelet count (86%), mean platelet volume (86%) albumin (33%), and total protein (33%). An “iron panel” (iron, iron binding capacity, and ferritin) was added to the clinical protocol after recruitment had begun, which accounts for the lower percentage of patients for those values. A platelet count and mean platelet volume were not reported if a hemoglobin and hematocrit was ordered rather than a complete blood count, which likely accounts for the lower percentage for these patients. Total protein and albumin were ordered only if nutritional status was deemed clinically necessary to evaluate.
The cohort consisted of 709 patients with BMI measurements of 40 or greater with a 97.5% self reported/clinically verified Caucasian ethnicity. Other demographic and relevant clinical data are shown in Table .
Demographic and selected clinical data on patient cohort (n = 709)
Clinical correlates of T2D and CHD
The database was used to determine whether expected relationships could be found with diabetes (i.e., ICD-9 code 250), defined as a binary variable for both split prediction analysis and regression analysis using the Golden Helix statistical software package. Of the more than 150 variables examined, the diagnosis of diabetes was associated with 35 following Bonferroni correction (bP < 0.05). The top ten statistically related measures to ICD-9 code 250 in the database are shown in Table (3 variables represented by both split prediction and regression analyses). All can be directly related to diabetes. Pre-operative hemoglobin A1C was the most highly correlated (by regression) followed by the diabetes medication biguanides, hemoglobin A1c (split prediction), and insulin. The use of the statin class of lipid lowering drugs was also related, as was age (by both split prediction and regression). All of the relationships are expected based upon the clinical findings in diabetes.
Clinical variables with highest statistical relationship to diagnosis code for diabetes (ICD-9 250)
A similar analysis was completed for CHD (i.e., defined as ICD-9 code 414 by clinical staff) as a dependent variable (Table ). A total of 13 of the database variables were found to be statistically significant following Bonferroni correction (bP < 0.05). CHD medications including nitrates, beta blockers, platelet aggregation inhibitors, aspirin, statins and fibric acid derivatives, age (regression and split prediction), and gender were all statistically related, as was the diagnosis of hypercholesterolemia.
Clinical variables with highest statistical relationship to diagnosis code for ischemic heart diseases (ICD-9 code 414)
Genotypic correlates of T2D and CHD
A total of 709 patient DNA samples were genotyped for the chromosome 9p21 T2D SNP (r10811661) and CHD SNP (rs2383206) SNP variants (Table ). Patients were defined as carriers of the “C” and/or “T” DNA sequences at the T2D SNP and the “G” and/or “A” DNA sequences at the CHD SNP. The T2D “T” SNP and the CHD “G” SNP are considered the high risk SNPs. The frequencies of the minor alleles of the T2D SNP and the CHD SNP (0.49 vs. 0.48) reported for control populations (McPherson et al.2007
; Saxena et al. 2007
) are in good agreement with the results here (0.17 vs. 0.17 for T2D and 0.49 vs. 0.48 for CHD).
Frequencies of the SNP DNA sequences
To determine whether the population was genetically skewed through inbreeding or strong founder effects, a statistical test for Hardy–Weinberg equilibrium was performed. Both SNPs were found to be well within Hardy–Weinberg equilibrium (T2D P > 0.19; CHD P > 0.81). The frequency of the SNP alleles is thus consistent with an outbred mixed Caucasian/European population.
Because the SNPs are located within 20,000 bases of each other on chromosome 9, the extent of linkage disequilibrium between them was determined. No significant linkage disequilibrium was observed (LD Correlation R = 0.034), consistent with their presence in two distinct two haplotype blocks.
The diploid SNP sequences or genotypes (i.e., T2D “CC”, “CT”, and “TT”; CHD “AA”, “AG”, and “GG”), of each patient for each gene were also analyzed (Table ). The T2D homozygous high risk “TT” genotype was present in ~70% of the population and the CHD homozygous high risk “GG” genotype was present in ~27%, consistent with previous studies. The T2D heterozygous “CT” and the CHD heterozygous “AG” genotypes were present at ~27% and ~50%, respectively. The low risk T2D genotype “CC” was present in ~3.5% of the population and the low risk CHD genotype “GG” was present in ~24%.
Frequencies of T2D and CHD SNP genotypes
The relationship of the T2D and CHD SNP genotypes to the approximate 100 clinical variables obtained from the EHR was analyzed using the HelixTree Genetics Analysis Software. The initial analysis was performed using the individual T2D and CHD SNP genotypes (i.e., T2D “CC”, “CT”, and “TT”; CHD “AA”, “AG”, and “GG”). For T2D SNP rs10811661, two variables were found to be significantly different (bP < 0.05); the percentage of patients with the diagnoses of polycystic ovary syndrome (PCOS) and the diagnosis of hypertension (HTN). Interestingly, no patients with the CC genotype were diagnosed with PCOS and, correspondingly, a lower percentage had the diagnosis of HTN (Fig. ). The mechanism by which this gene variant is related to PCOS and HTN is not clear.
Association of T2D SNP rs10811661 with the diagnosis of polycystic ovary syndrome (PCOS) and the diagnosis of hypertension (HTN). No patient with a “CC” genotype was diagnosed with PCOS
For CHD SNP rs2383206, 3 variables met the bonferroni corrected P-value threshold of 0.05; the percentage of patients on tricyclic antidepressants and sulfonylureas, as well as the laboratory value creatine kinase (CK). A fourth variable, the percentage of patients on statins, had a bP-value of 0.064. The genotype distribution patterns for tricyclic antidepressant and sulfonylurea use were different than for CK and statins. The AG heterozygotes had the highest use of tricyclics and sulfonylureas relative to AA and GG homozygotes (Fig. ). The AG and GG genotypes had higher statin use. The GG CHD high risk genotype had CK levels that were over 2-fold higher than the non-GG genotypes (GG = 196 vs. AG = 86 vs. AA = 92).
Fig. 2 Association of CHD SNP rs2383206 with tricyclic antidepressant, sulfonylurea, and statin use. The “AG” heterozygotes had the highest tricyclic and sulfonylurea use, while statin use was higher in patients carrying a “G” (more ...)
Recognizing that each patient inherits the T2D and CHD risk alleles independently, we tested for compound genotype (i.e. T2D/CHD “CC”/“AA”, “CC”/“AG”, “CC”/“GG”, “CT”/“AA”, “CT”/“AG”, “CT”/“GG”, “TT”/“AA”, “TT”/“AG”, and “TT”/“GG”) associations. Each T2D and CHD genotype was classified as low (L), medium (M), and high (H) risk based upon the predicted risk group from previous studies (McPherson et al. 2007
; Saxena et al. 2007
). Thus, each patient could be categorized as T2D LOW/CHD LOW or L/L (“CC”/“AA”) through T2D High/CHD High or H/H (“TT”/“GG”).
A total of 19 EHR derived variables (Table ) were found to be statistically significant among the groups (bP < 0.05). The percentage of patients diagnosed with CHD was influenced by both SNPs; 4 of 5 compound genotypes with a low risk genotype had no patients diagnosed with CHD (Fig. ). A similar pattern was present for the diagnoses of respiratory disorders (Fig. ) and neurotic disorders (Fig. ). The distribution of patients on thiazide diuretics was skewed toward low risk T2D/CHD alleles (Fig. ). Patterns for the other associated variables were more complex and did not trend toward low or high risk genotypes.
Clinical variables with statistical association with type 2 diabetes/coronary heart disease compound genotypes
Frequency of diagnosis of CHD among patients with each T2D and CHD SNP compound genotype. The pattern is skewed toward those carrying the medium (M) and high (H) risk genotypes
Frequency of diagnosis of respiratory disorders (RESP) among patients with each T2D and CHD compound genotype. The pattern is skewed toward those carrying the medium (M) and high (H) risk genotypes, similar to the pattern with CHD
Frequency of diagnosis of neurotic disorders among patients with each T2D and CHD compound genotype. The pattern is skewed toward those carrying the medium (M) and high (H) risk genotypes, similar to the pattern with CHD
Frequency of thiazide medication use among patients with each T2D and CHD compound genotype. The distribution is skewed toward those with low risk genotypes