1.  Genome- and Phenome-Wide Analysis of Cardiac Conduction Identifies Markers of Arrhythmia Risk 
Circulation  2013;127(13):1377-1385.
Electrocardiographic QRS duration, a measure of cardiac intraventricular conduction, varies ~2-fold in individuals without cardiac disease. Slow conduction may promote reentrant arrhythmias.
Methods and Results
We performed a genome-wide association study (GWAS) to identify genomic markers of QRS duration in 5,272 individuals without cardiac disease selected from electronic medical record (EMR) algorithms at five sites in the Electronic Medical Records and Genomics (eMERGE) network. The most significant loci were evaluated within the CHARGE consortium QRS GWAS meta-analysis. Twenty-three single nucleotide polymorphisms in 5 loci, previously described by CHARGE, were replicated in the eMERGE samples; 18 SNPs were in the chromosome 3 SCN5A and SCN10A loci, where the most significant SNPs were rs1805126 in SCN5A with p=1.2×10−8 (eMERGE) and p=2.5×10−20 (CHARGE) and rs6795970 in SCN10A with p=6×10−6 (eMERGE) and p=5×10−27 (CHARGE). The other loci were in NFIA, near CDKN1A, and near C6orf204. We then performed phenome-wide association studies (PheWAS) on variants in these five loci in 13,859 European Americans to search for diagnoses associated with these markers. PheWAS identified atrial fibrillation and cardiac arrhythmias as the most common associated diagnoses with SCN10A and SCN5A variants. SCN10A variants were also associated with subsequent development of atrial fibrillation and arrhythmia in the original 5,272 “heart-healthy” study population.
We conclude that DNA biobanks coupled to EMRs provide a platform not only for GWAS but may also allow broad interrogation of the longitudinal incidence of disease associated with genetic variants. The PheWAS approach implicated sodium channel variants modulating QRS duration in subjects without cardiac disease as predictors of subsequent arrhythmias.
PMCID: PMC3713791  PMID: 23463857
cardiac conduction; QRS duration; atrial fibrillation; genome-wide association study; phenome-wide association study; electronic medical records
2.  Systematic comparison of phenome-wide association study of electronic medical record data and genome-wide association study data 
Nature biotechnology  2013;31(12):1102-1110.
Candidate gene and genome-wide association studies (GWAS) have identified genetic variants that modulate risk for human disease; many of these associations require further study to replicate the results. Here we report the first large-scale application of the phenome-wide association study (PheWAS) paradigm within electronic medical records (EMRs), an unbiased approach to replication and discovery that interrogates relationships between targeted genotypes and multiple phenotypes. We scanned for associations between 3,144 single-nucleotide polymorphisms (previously implicated by GWAS as mediators of human traits) and 1,358 EMR-derived phenotypes in 13,835 individuals of European ancestry. This PheWAS replicated 66% (51/77) of sufficiently powered prior GWAS associations and revealed 63 potentially pleiotropic associations with P < 4.6 × 10−6 (false discovery rate < 0.1); the strongest of these novel associations were replicated in an independent cohort (n = 7,406). These findings validate PheWAS as a tool to allow unbiased interrogation across multiple phenotypes in EMR-based cohorts and to enhance analysis of the genomic basis of human disease.
PMCID: PMC3969265  PMID: 24270849
3.  Return of results in the genomic medicine projects of the eMERGE network 
The electronic Medical Records and Genomics (eMERGE) (Phase I) network was established in 2007 to further genomic discovery using biorepositories linked to the electronic health record (EHR). In Phase II, which began in 2011, genomic discovery efforts continue and in addition the network is investigating best practices for implementing genomic medicine, in particular, the return of genomic results in the EHR for use by physicians at point-of-care. To develop strategies for addressing the challenges of implementing genomic medicine in the clinical setting, the eMERGE network is conducting studies that return clinically-relevant genomic results to research participants and their health care providers. These genomic medicine pilot studies include returning individual genetic variants associated with disease susceptibility or drug response, as well as genetic risk scores for common “complex” disorders. Additionally, as part of a network-wide pharmacogenomics-related project, targeted resequencing of 84 pharmacogenes is being performed and select genotypes of pharmacogenetic relevance are being placed in the EHR to guide individualized drug therapy. Individual sites within the eMERGE network are exploring mechanisms to address incidental findings generated by resequencing of the 84 pharmacogenes. In this paper, we describe studies being conducted within the eMERGE network to develop best practices for integrating genomic findings into the EHR, and the challenges associated with such work.
PMCID: PMC3972474  PMID: 24723935
genomics; electronic health records; incidental findings; implementation; genetic counseling; next generation sequencing; pharmacogenetics
4.  Analysis of Maternal Risk Factors Associated With Congenital Vertebral Malformations 
Spine  2013;38(5):E293-E298.
Study Design
A retrospective chart review of cases with congenital vertebral malformations (CVM) and controls with normal spine morphology.
To determine the relative contribution of maternal environmental factors (MEF) during pregnancy including maternal insulin dependent diabetes mellitus, valproic acid, alcohol, smoking, hyperthermia, twin gestation, assisted reproductive technology, in-vitro fertilization and maternal clomiphene usage to CVM development.
Summary of Background Data
Congenital vertebral malformations (CVM) represent defects in formation and segmentation of somites occurring with an estimated incidence of between 0.13–0.50 per 1000 live births. CVM may be associated with congenital scoliosis, Klippel-Feil syndrome, hemifacial microsomia and VACTERL syndromes, and represent significant morbidity due to pain and cosmetic disfigurement.
A multicenter retrospective chart review of 229 cases with CVM and 267 controls with normal spine morphology between the ages of 1–50 years was performed in order to obtain the odds ratio (OR) of MEF related to CVM among cases vs. controls. CVM due to an underlying syndrome associated with a known gene mutation or chromosome etiology were excluded. An imputation based analysis was performed in which subjects with no documentation of MEF history were treated as no maternal exposure.” Univariate and multivariate analysis was conducted to calculate the OR.
Of the 229 total cases, 104 cases had single or multiple CVM without additional congenital malformations (CM) (Group 1) and 125 cases had single or multiple CVM and additional CM (Group 2). Nineteen percent of total cases had an identified MEF. The OR (95% CI, P-value) for MEF history for Group 1 was 6.0 (2.4–15.1, P<0.001) in the univariate analysis. The OR for MEF history in Group 2 was 9.1 (95%CI, P-value) (3.8–21.6, P<0.001) in the univariate analysis. The results were confirmed in the multivariate analysis, after adjusting for age, gender, and institution.
These results support a hypothesis for an association between the above MEF during pregnancy and CVM and have implications for development of prevention strategies. Further prospective studies are needed to quantify association between CVM and specific MEF.
PMCID: PMC3959640  PMID: 23446706
5.  Characterization of Statin Dose-response within Electronic Medical Records 
Efforts to define the genetic architecture underlying variable statin response have met with limited success possibly because previous studies were limited to effect based on one-single-dose. We leveraged electronic medical records (EMRs) to extract potency (ED50) and efficacy (Emax) of statin dose-response curves and tested them for association with 144 pre-selected variants. Two large biobanks were used to construct dose-response curves for 2,026 (simvastatin) and 2,252 subjects (atorvastatin). Atorvastatin was more efficacious, more potent, and demonstrated less inter-individual variability than simvastatin. A pharmacodynamic variant emerging from randomized trials (PRDM16) was associated with Emax for both. For atorvastatin, Emax was 51.7 mg/dl in homozygous for the minor allele versus 75.0 mg/dl for those homozygous for the major allele. We also identified several loci associated with ED50. The extraction of rigorously defined traits from EMRs for pharmacogenetic studies represents a promising approach to further understand of genetic factors contributing to drug response.
PMCID: PMC3944214  PMID: 24096969
6.  Stakeholder engagement: a key component of integrating genomic information into electronic health records 
Integrating genomic information into clinical care and the electronic health record can facilitate personalized medicine through genetically guided clinical decision support. Stakeholder involvement is critical to the success of these implementation efforts. Prior work on implementation of clinical information systems provides broad guidance to inform effective engagement strategies. We add to this evidence-based recommendations that are specific to issues at the intersection of genomics and the electronic health record. We describe stakeholder engagement strategies employed by the Electronic Medical Records and Genomics Network, a national consortium of US research institutions funded by the National Human Genome Research Institute to develop, disseminate, and apply approaches that combine genomic and electronic health record data. Through select examples drawn from sites of the Electronic Medical Records and Genomics Network, we illustrate a continuum of engagement strategies to inform genomic integration into commercial and homegrown electronic health records across a range of health-care settings. We frame engagement as activities to consult, involve, and partner with key stakeholder groups throughout specific phases of health information technology implementation. Our aim is to provide insights into engagement strategies to guide genomic integration based on our unique network experiences and lessons learned within the broader context of implementation research in biomedical informatics. On the basis of our collective experience, we describe key stakeholder practices, challenges, and considerations for successful genomic integration to support personalized medicine.
PMCID: PMC3909653  PMID: 24030437
electronic health records; genomics; health information technology; personalized medicine; stakeholder engagement; translational medical research
7.  CDKN2B-AS1 Genotype – Glaucoma Feature Correlations in Primary Open-Angle Glaucoma Patients from the United States 
American journal of ophthalmology  2012;155(2):342-353.e5.
To assess the association between single nucleotide polymorphisms (SNPs) of the gene region containing cyclin dependent kinase inhibitor 2B antisense noncoding RNA (CDKN2B-AS1) and glaucoma features among primary open-angle glaucoma (POAG) patients.
Retrospective observational case series.
We studied associations between ten CDKN2B-AS1 SNPs and glaucoma features among 976 POAG cases from the Glaucoma Genes and Environment (GLAUGEN) study and 1971 cases from the National Eye Institute Glaucoma Human Genetics Collaboration (NEIGHBOR) consortium. For each patient, we chose the feature from the eye with the higher value. We created cohort-specific multivariable models for glaucoma features and then meta-analyzed the results.
For nine of the ten protective CDKN2B-AS1 SNPs with minor alleles associated with reduced disease risk (e.g., the G allele at rs2157719), POAG patients carrying these minor alleles had smaller cup-disc ratio (0.05 units smaller per G allele at diagnosis; 95% CI: −0.08, −0.03; p=6.23E-05) despite having higher intraocular pressure (IOP) (0.70 mm Hg higher per G allele at DNA collection; 95% CI: 0.40, 1.00; P=5.45E-06). For the one adverse rs3217992 SNP with minor allele A associated with increased disease risk, POAG patients with A alleles had larger cup-disc ratio (0.05 units larger per A allele at diagnosis; 95% CI: 0.02, 0.07; P=4.74E-04) despite having lower IOP (−0.57 mm Hg per A allele at DNA collection; 95% CI: −0.84, −0.29; P=6.55E-05).
Alleles of CDKN2B-AS1 SNPs, which influence risk of developing POAG, also modulate optic nerve degeneration among POAG patients, underscoring the role of CDKN2B-AS1 in POAG.
PMCID: PMC3544983  PMID: 23111177
8.  Validation of PhenX measures in the personalized medicine research project for use in gene/environment studies 
The purpose of this paper is to describe the data collection efforts and validation of PhenX measures in the Personalized Medicine Research Project (PMRP) cohort.
Thirty-six measures were chosen from the PhenX Toolkit within the following domains: demographics; anthropometrics; alcohol, tobacco and other substances; cardiovascular; environmental exposures; cancer; psychiatric; neurology; and physical activity and physical fitness. Eligibility criteria for the current study included: living PMRP subjects with known addresses who consented to future contact and were not currently living in a nursing home, available GWAS data from eMERGE I for subjects where age-related cataract, HDL, dementia and resistant hypertension were the primary phenotypes, thus biasing the sample to the older PMRP participants. The questionnaires were mailed twice. Data from the PhenX measures were compared with information from PMRP questionnaires and data from Marshfield Clinic electronic medical records.
Completed PhenX questionnaires were returned by 2271 subjects for a final response rate of 70%. The mean age reported on the PhenX questionnaire (73.1 years) was greater than the PMRP questionnaire (64.8 years) because the data were collected at different time points. The mean self-reported weight, and subsequently calculated BMI, were less on the PhenX survey than the measured values at the time of enrollment into PMRP (PhenX means 173.5 pounds and BMI 28.2 kg/m2 versus PMRP 182.9 pounds and BMI 29.6 kg/m2). There was 95.3% agreement between the two questionnaires about having ever smoked at least 100 cigarettes. 139 (6.2%) of subjects indicated on the PhenX questionnaire that they had been told they had a stroke. Of them, only 15 (10.8%) had no electronic indication of a prior stroke or TIA. All of the age-and gender-specific 95% confidence limits around point estimates for major depressive episodes overlap and show that 31% of women aged 50–64 reported symptoms associated with a major depressive episode.
The approach employed resulted in a high response rate and valuable data for future gene/environment analyses. These results and high response rate highlight the utility of the PhenX Toolkit to collect valid phenotypic data that can be shared across groups to facilitate gene/environment studies.
PMCID: PMC3896802  PMID: 24423110
9.  KRAS Testing and Epidermal Growth Factor Receptor Inhibitor Treatment for Colorectal Cancer in Community Settings 
In metastatic colorectal cancer (mCRC), mutations in the KRAS gene predict poor response to epidermal growth factor receptor (EGFR) inhibitors. Clinical treatment guidelines now recommend KRAS testing if EGFR inhibitors are considered. Our study investigates the clinical uptake and utilization of KRAS testing.
We included 1,188 patients with mCRC diagnosed from 2004 to 2009, from seven integrated health care delivery systems with a combined membership of 5.5 million. We used electronic medical records and targeted manual chart review to capture the complexity and breadth of real-world clinical oncology care.
Overall, 428 patients (36%) received KRAS testing during their clinical care, and 266 (22%) were treated with EGFR inhibitors. Age at diagnosis (p=0.0034), comorbid conditions (p=0.0316), and survival time from diagnosis (p<0.0001) influence KRAS testing and EGFR inhibitor prescribing. The proportion who received KRAS testing increased from 7% to 97% for those treated in 2006 and 2010, respectively, and 83% of all treated patients had a KRAS wild type genotype. Most patients with a KRAS mutation (86%) were not treated with EGFR inhibitors. The interval between mCRC diagnosis and receipt of KRAS testing decreased from 26 months (2006) to 10 months (2009).
These findings demonstrate rapid uptake and incorporation of this predictive biomarker into clinical oncology care.
In this delivery setting, KRAS testing is widely used to guide treatment decisions with EGFR inhibitors in patients with mCRC. An important future research goal is to evaluate utilization of KRAS testing in other delivery settings in the US.
PMCID: PMC3567775  PMID: 23155138
biomarker; utilization; colorectal neoplasms; managed care programs
10.  Mechanistic Phenotypes: An Aggregative Phenotyping Strategy to Identify Disease Mechanisms Using GWAS Data 
PLoS ONE  2013;8(12):e81503.
A single mutation can alter cellular and global homeostatic mechanisms and give rise to multiple clinical diseases. We hypothesized that these disease mechanisms could be identified using low minor allele frequency (MAF<0.1) non-synonymous SNPs (nsSNPs) associated with “mechanistic phenotypes”, comprised of collections of related diagnoses. We studied two mechanistic phenotypes: (1) thrombosis, evaluated in a population of 1,655 African Americans; and (2) four groupings of cancer diagnoses, evaluated in 3,009 white European Americans. We tested associations between nsSNPs represented on GWAS platforms and mechanistic phenotypes ascertained from electronic medical records (EMRs), and sought enrichment in functional ontologies across the top-ranked associations. We used a two-step analytic approach whereby nsSNPs were first sorted by the strength of their association with a phenotype. We tested associations using two reverse genetic models and standard additive and recessive models. In the second step, we employed a hypothesis-free ontological enrichment analysis using the sorted nsSNPs to identify functional mechanisms underlying the diagnoses comprising the mechanistic phenotypes. The thrombosis phenotype was solely associated with ontologies related to blood coagulation (Fisher's p = 0.0001, FDR p = 0.03), driven by the F5, P2RY12 and F2RL2 genes. For the cancer phenotypes, the reverse genetics models were enriched in DNA repair functions (p = 2×10−5, FDR p = 0.03) (POLG/FANCI, SLX4/FANCP, XRCC1, BRCA1, FANCA, CHD1L) while the additive model showed enrichment related to chromatid segregation (p = 4×10−6, FDR p = 0.005) (KIF25, PINX1). We were able to replicate nsSNP associations for POLG/FANCI, BRCA1, FANCA and CHD1L in independent data sets. Mechanism-oriented phenotyping using collections of EMR-derived diagnoses can elucidate fundamental disease mechanisms.
PMCID: PMC3861317  PMID: 24349080
11.  Oncologists' attitudes toward KRAS testing: a multisite study 
Cancer Medicine  2013;2(6):881-888.
Recent discoveries promise increasingly to help oncologists individually tailor anticancer therapy to their patients’ molecular tumor characteristics. One such promising molecular diagnostic is Kirsten ras (KRAS) tumor mutation testing for metastatic colorectal cancer (mCRC) patients. In the current study, we examined how and why physicians adopt KRAS testing and how they subsequently utilize the information when discussing treatment strategies with patients. We conducted 34 semi-structured in-person or telephone interviews with oncologists from seven different health plans. Each interview was audiotaped, transcribed, and coded using qualitative research methods. Information and salient themes relating to the research questions were summarized for each interview. All of the oncologists in this study reported using the KRAS test at the time of the interview. Most appeared to have adopted the test rapidly, within 6 months of the publication of National Clinical Guidelines. Oncologists chose to administer the test at various time points, although the majority ordered the test at the time their patient was diagnosed with mCRC. While oncologists expressed a range of opinions about the KRAS test, there was a general consensus that the test was useful and provided benefits to mCRC patients. The rapid adoption and enthusiasm for KRAS suggests that these types of tests may be filling an important informational need for oncologists when making treatment decisions. Future research should focus on the informational needs of patients around this test and whether patients feel informed or confident with their physicians’ use of these tests to determine treatment access.
PMCID: PMC3892392  PMID: 24403261
Cancer genetics; colorectal cancer; psychosocial studies
12.  The use of dietary supplements and their association with blood pressure in a large Midwestern cohort 
There have been numerous studies assessing the association of diet and blood pressure but little is known about the association between less commonly used nutritional supplements and blood pressured. The purpose of this study was to quantify the use of dietary supplements and their potential association with blood pressure in a large population-based cohort of adults in the Midwest.
The Personalized Medicine Research Project cohort was the population source for the current study. The current study includes subjects with Dietary History Questionnaire (DHQ) data available as well as at least one clinical blood pressure measurement recorded in their electronic medical record. After excluding extreme outlying measurements, median systolic and diastolic blood pressure measurements were calculated for each individual and were compared for subjects who did and did not report taking one of a list of 37 different supplements listed on the DHQ more than once per week over the previous 12 months.
9,732 subjects had both blood pressure and DHQ data available. They ranged in age from 18 to 98 years (mean 56 years) and 3,625 (37%) were male. Nine of 37 supplements showed evidence for association with blood pressure: coenzyme Q10, fish oil, iron, bilberry, echinacea, evening primrose oil, garlic, goldenseal and milk thistle. With the exception of the mineral iron, mean systolic and diastolic blood pressures were higher for users of the specific supplements than non-users.
These results should not be interpreted as causal, nor can the direction of the association be assumed to be correct because the temporality of the association is unknown. We hope the observed significant associations will foster future research to evaluate blood pressure effects of dietary supplements.
PMCID: PMC3924237  PMID: 24283381
Blood pressure; Dietary supplements
13.  Lack of Association Between Polymorphisms in the Prostaglandin F2α Receptor (PTGFR) and Solute Carrier Organic Anion Transporter Family 2A1 (SLCO2A1) Genes and Intraocular Pressure Response to Prostaglandin Analogs 
Ophthalmic genetics  2011;33(2):10.3109/13816810.2011.628357.
To evaluate the association between variants in the prostaglandin Fα receptor (pTGFR) and solute carrier organic anion transporter family 2A1 (SLCO2A1) genes and IOP response to prostaglandin analogs
The medical records of subjects with previously diagnosed open angle glaucoma or ocular hypertension were searched for intraocular pressure measurements before and after prescriptions of prostaglandin analogs. Stored DNA samples were genotyped for the following SNPs: rs3753380 (promoter region) and rs3766355 (intronic region) of the prostaglandin F2α receptor gene, and rs34550074 (Ala396Thr) of SLCO2A1. The mean change in IOP by genotype was measured.
Prostaglandin analogs were prescribed to 267 subjects; 242 (204 right eyes, 205 left eyes) met the inclusion/exclusion criteria for the current study. There was no significant association between genotype and IOP response to prostaglandin. analogs (p=0.48, p=0.54, p=0.90).
In summary, we found no indication for an association between SNPs in the prostaglandin F2α receptor gene or SLCO2A1 and IOP response to prostaglandin analogs in a population of European descent.
PMCID: PMC3832133  PMID: 22060278
glaucoma; pharmacogenetics; intraocular pressure; prostaglandin analog
14.  Genetic Variants Improve Breast Cancer Risk Prediction on Mammograms 
Several recent genome-wide association studies have identified genetic variants associated with breast cancer. However, how much these genetic variants may help advance breast cancer risk prediction based on other clinical features, like mammographic findings, is unknown. We conducted a retrospective case-control study, collecting mammographic findings and high-frequency/low-penetrance genetic variants from an existing personalized medicine data repository. A Bayesian network was developed using Tree Augmented Naive Bayes (TAN) by training on the mammographic findings, with and without the 22 genetic variants collected. We analyzed the predictive performance using the area under the ROC curve, and found that the genetic variants significantly improved breast cancer risk prediction on mammograms. We also identified the interaction effect between the genetic variants and collected mammographic findings in an attempt to link genotype to mammographic phenotype to better understand disease patterns, mechanisms, and/or natural history.
PMCID: PMC3900221  PMID: 24551380
15.  The Electronic Medical Records and Genomics (eMERGE) Network: Past, Present and Future 
The Electronic Medical Records and Genomics (eMERGE) Network is a National Human Genome Research Institute (NHGRI)-funded consortium engaged in the development of methods and best-practices for utilizing the Electronic Medical Record (EMR) as a tool for genomic research. Now in its sixth year, its second funding cycle and comprising nine research groups and a coordinating center, the network has played a major role in validating the concept that clinical data derived from EMRs can be used successfully for genomic research. Current work is advancing knowledge in multiple disciplines at the intersection of genomics and healthcare informatics, particularly electronic phenotyping, genome-wide association studies, genomic medicine implementation and the ethical and regulatory issues associated with genomics research and returning results to study participants. Here we describe the evolution, accomplishments, opportunities and challenges of the network since its inception as a five-group consortium focused on genotype-phenotype associations for genomic discovery to its current form as a nine-group consortium pivoting towards implementation of genomic medicine.
PMCID: PMC3795928  PMID: 23743551
electronic medical records; personalized medicine; genome-wide association studies; genetics and genomics; collaborative research
16.  Prediction of breast cancer risk by genetic risk factors, overall and by hormone receptor status 
Journal of medical genetics  2012;49(9):601-608.
There is increasing interest in adding common genetic variants identified through genome wide association studies (GWAS) to breast cancer risk prediction models. First results from such models showed modest benefits in terms of risk discrimination. Heterogeneity of breast cancer as defined by hormone-receptor status has not been considered in this context. In this study we investigated the predictive capacity of 32 GWAS-detected common variants for breast cancer risk, alone and in combination with classical risk factors, and for tumors with different hormone receptor status.
Material and Methods
Within the Breast and Prostate Cancer Cohort Consortium (BPC3), we analyzed 6009 invasive breast cancer cases and 7827 matched controls of European ancestry, with data on classical breast cancer risk factors and 32 common gene variants identified through GWAS. Discriminatory ability with respect to breast cancer of specific hormone receptor-status was assessed with the age- and cohort-adjusted concordance statistic (AUROCa). Absolute risk scores were calculated with external reference data. Integrated discrimination improvement (IDI) was used to measure improvements in risk prediction.
We found a small but steady increase in discriminatory ability with increasing numbers of genetic variants included in the model (difference in AUROCa going from 2.7 to 4%). Discriminatory ability for all models varied strongly by hormone receptor status
Discussion and Conclusion
Adding information on common polymorphisms provides small but statistically significant improvements in the quality of breast cancer risk prediction models. We consistently observed better performance for receptor positive cases, but the gain in discriminatory quality is not sufficient for clinical application.
PMCID: PMC3793888  PMID: 22972951
breast cancer; risk prediction; genetic factors; hormone receptor status
17.  High Density GWAS for LDL Cholesterol in African Americans Using Electronic Medical Records Reveals a Strong Protective Variant in APOE 
Only one LDL-C GWAS has been reported in African Americans. We performed a GWAS of LDL-C in African Americans using data extracted from electronic medical records (EMR) in the eMERGE network. African Americans were genotyped on the Illumina 1M chip. All LDL-C measurements, prescriptions, and diagnoses of concomitant disease were extracted from EMR. We created two analytic datasets; one dataset having median LDL-C calculated after the exclusion of some lab values based on co-morbidities and medication (n = 618) and another dataset having median LDL-C calculated without any exclusions (n = 1249). Rs7412 in APOE was strongly associated with LDL-C at levels of GWAS significance in both datasets (p < 5 X 10−8). In the dataset with exclusions, a decrease of 20.0 mg/dl per minor allele was observed. The effect size was attenuated (12.3 mg/dl) in the dataset without any lab values excluded. Although other signals in APOE have been detected in previous GWAS, this large and important SNP association has not been well detected in large GWAS because rs7412 was not included on many genotyping arrays. Use of median LDL-C extracted from EMR after exclusions for medications and co-morbidities increased the percentage of trait variance explained by genetic variation.
PMCID: PMC3521536  PMID: 23067351
GWAS; LDL; electronic medical records
18.  Use of an electronic medical record to characterize cases of intermediate statin-induced muscle toxicity 
Preventive cardiology  2009;12(2):88-94.
Statin use can be accompanied by a variety of musculoskeletal complaints. We describe the clinical characteristics of case subjects experiencing adverse statin-induced musculoskeletal symptoms within a large, population based cohort in Central Wisconsin. Case status was determined based upon elevated serum creatine kinase (CK) levels and the presence of at least one physician note reflecting an increased index of suspicion for statin intolerance. From the medical records of nearly 2 million unique patients, we identified more than 20,000 potential study subjects (∼1%) having CK data and at least one exposure to a statin drug. Manual screening was completed on 2,227 subjects with CK levels in the upper 10th percentile. Of those screened, 267 met inclusion criteria (12.0% eligibility), and 218 agreed to participate in a retrospective study characterizing the risk determinants of statin-induced muscle toxicity. Three categorical pain variables were graded retrospectively (distribution, location, and severity of pain). The presenting complaints of these case subjects were extremely heterogeneous. The number of subjects with a compelling pain syndrome (diffuse, proximal muscle pain of high intensity) increased at higher serum CK levels; the number of subjects with indeterminate pain variables decreased at higher serum CK levels. The lines reflecting these relationships cross at a CK level of approximately 1,175 U/l, approximately half the threshold level needed to make a clinical diagnosis of “myopathy” (i.e., CK > 10-fold upper limit).
PMCID: PMC3773543  PMID: 19476582
19.  Return of Individual Research Results from Genome-wide Association Studies: Experience of the Electronic Medical Records & Genomics (eMERGE) Network 
Return of individual genetic results to research participants, including participants in archives and biorepositories, is receiving increased attention. However, few groups have deliberated on specific results or weighed deliberations against relevant local contextual factors.
The Electronic Medical Records and GEnomics (eMERGE) network, which includes five biorepositories conducting genome-wide association studies, convened a Return of Results Oversight Committee (RROC) to identify potentially returnable results. Network-wide deliberations were then brought to local constituencies for final decision-making.
Defining results that should be considered for return required input from clinicians with relevant expertise and much deliberation. The RROC identified two sex chromosomal anomalies, Klinefelter Syndrome and Turner Syndrome, as well as homozygosity for Factor V Leiden, as findings that could warrant reporting. Views about returning HFE gene mutations associated with hemochromatosis were mixed due to low penetrance. Review of EMRs suggested that most participants with detected abnormalities were unaware of these findings. Local considerations relevant to return varied and, to date, four sites have elected not to return findings (return was not possible at one site).
The eMERGE experience reveals the complexity of return of results decision-making and provides a potential deliberative model for adoption in other collaborative contexts.
PMCID: PMC3723451  PMID: 22361898
Result return; biorepository; electronic medical records; deliberation; context
20.  Enhancing the Power of Genetic Association Studies through the Use of Silver Standard Cases Derived from Electronic Medical Records 
PLoS ONE  2013;8(6):e63481.
The feasibility of using imperfectly phenotyped “silver standard” samples identified from electronic medical record diagnoses is considered in genetic association studies when these samples might be combined with an existing set of samples phenotyped with a gold standard technique. An analytic expression is derived for the power of a chi-square test of independence using either research-quality case/control samples alone, or augmented with silver standard data. The subset of the parameter space where inclusion of silver standard samples increases statistical power is identified. A case study of dementia subjects identified from electronic medical records from the Electronic Medical Records and Genomics (eMERGE) network, combined with subjects from two studies specifically targeting dementia, verifies these results.
PMCID: PMC3677889  PMID: 23762230
21.  Development of an optical character recognition pipeline for handwritten form fields from an electronic health record 
Although the penetration of electronic health records is increasing rapidly, much of the historical medical record is only available in handwritten notes and forms, which require labor-intensive, human chart abstraction for some clinical research. The few previous studies on automated extraction of data from these handwritten notes have focused on monolithic, custom-developed recognition systems or third-party systems that require proprietary forms.
We present an optical character recognition processing pipeline, which leverages the capabilities of existing third-party optical character recognition engines, and provides the flexibility offered by a modular custom-developed system. The system was configured and run on a selected set of form fields extracted from a corpus of handwritten ophthalmology forms.
The processing pipeline allowed multiple configurations to be run, with the optimal configuration consisting of the Nuance and LEADTOOLS engines running in parallel with a positive predictive value of 94.6% and a sensitivity of 13.5%.
While limitations exist, preliminary experience from this project yielded insights on the generalizability and applicability of integrating multiple, inexpensive general-purpose third-party optical character recognition engines in a modular pipeline.
PMCID: PMC3392858  PMID: 21890871
Luke; bioinformatics
22.  Alcohol, Genetics and Risk of Breast Cancer in the Prostate, Lung, Colorectal and Ovarian Cancer (PLCO) Screening Trial 
We tested the hypothesis that genes involved in the alcohol oxidation pathway modify the association between alcohol intake and breast cancer.
Subjects were women aged 55–74 at baseline from the screening arm of the Prostate, Lung, Colorectal and Ovarian (PLCO) Cancer Screening Trial. Incident breast cancers were identified through annual health surveys. Controls were frequency matched to cases by age and year of entry into the trial. A self-administered food frequency questionnaire queried frequency and usual serving size of beer, wine or wine coolers and liquor. Three SNPs in genes in the alcohol metabolism pathway were genotyped: alcohol dehydrogenase 2, alcohol dehydrogenase 3 and CYP2E1.
The study included 1041 incident breast cancer cases and 1070 controls. In comparison to non-drinkers, the intake of any alcohol significantly increased the risk of breast cancer, and this risk increased with each category of daily alcohol intake, (OR=2.01, 95% CL=1.14, 3.53) for women who drank three or more standard drinks per day. Stratification by genotype revealed significant gene/environment interactions. For the ADH1B gene, there were statistically significant associations between all levels of alcohol intake and risk of breast cancer (all OR>1.34 and all lower CL >1.01), while for women with the GA or AA genotype, there were no significant associations between alcohol intake and risk of breast cancer.
Alcohol intake, genes involved in alcohol metabolism and their interaction increase the risk of breast cancer in post-menopausal women.
This information could be useful for primary care providers to personalize information about breast cancer risk reduction.
PMCID: PMC3584637  PMID: 22331481
breast cancer; alcohol; metabolizing enzyme; genetics; risk factors
23.  High-Dimensional Structured Feature Screening Using Binary Markov Random Fields 
Feature screening is a useful feature selection approach for high-dimensional data when the goal is to identify all the features relevant to the response variable. However, common feature screening methods do not take into account the correlation structure of the covariate space. We propose the concept of a feature relevance network, a binary Markov random field to represent the relevance of each individual feature by potentials on the nodes, and represent the correlation structure by potentials on the edges. By performing inference on the feature relevance network, we can accordingly select relevant features. Our algorithm does not yield sparsity, which is different from the particular popular family of feature selection approaches based on penalized least squares or penalized pseudo-likelihood. We give one concrete algorithm under this framework and show its superior performance over common feature selection methods in terms of prediction error and recovery of the truly relevant features on real-world data and synthetic data.
PMCID: PMC3630518  PMID: 23606924
24.  Evaluation of polymorphisms in the sulfonamide detoxification genes NAT2, CYB5A, and CYB5R3 in patients with sulfonamide hypersensitivity 
Pharmacogenetics and genomics  2012;22(10):733-740.
To determine whether polymorphisms in the sulfonamide detoxification genes, CYB5A (encoding cytochrome b5), CYB5R3 (encoding cytochrome b5 reductase), or NAT2 (encoding N-acetyltransferase 2) were over-represented in patients with delayed sulfonamide drug hypersensitivity, compared to control patients that tolerated a therapeutic course of trimethoprim-sulfamethoxazole without adverse event.
DNA from 99 non-immunocompromised patients with sulfonamide hypersensitivity that were identified from the Personalized Medicine Research Project at the Marshfield Clinic, and from 99 age-, race-, and gender-matched drug-tolerant controls, were genotyped for four CYB5A and five CYB5R3 polymorphisms, and for all coding NAT2 SNPs.
CYB5A and CYB5R3 SNPs were found at low allele frequencies (less than 3–4%), which did not differ between hypersensitive and tolerant patients. NAT2 allele and haplotype frequencies, as well as inferred NAT2 phenotypes, also did not differ between groups (60% vs. 59% slow acetylators). Finally, no difference in NAT2 status was found in a subset of patients with more severe hypersensitivity signs (drug reaction with eosinophilia and systemic symptoms; DRESS) compared to tolerant patients.
We found no evidence for a substantial involvement of these 9 CYB5A or CYB5R3 polymorphisms in sulfonamide HS risk, although minor effects cannot be completely ruled out. Despite careful medical record review and full re-sequencing of the NAT2 coding region, we found no association of NAT2 coding alleles with sulfonamide hypersensitivity (predominantly cutaneous eruptions) in this adult Caucasian population.
PMCID: PMC3619396  PMID: 22850190
sulfamethoxazole; potentiated sulfonamides; drug hypersensitivity; N-acetyltransferase; cytochrome b5; hydroxylamine
Investigating the association between biobank derived genomic data and the information of linked electronic health records (EHRs) is an emerging area of research for dissecting the architecture of complex human traits, where cases and controls for study are defined through the use of electronic phenotyping algorithms deployed in large EHR systems. For our study, 2580 cataract cases and 1367 controls were identified within the Marshfield Personalized Medicine Research Project (PMRP) Biobank and linked EHR, which is a member of the NHGRI-funded electronic Medical Records and Genomics (eMERGE) Network. Our goal was to explore potential gene-gene and gene-environment interactions within these data for 529,431 single nucleotide polymorphisms (SNPs) with minor allele frequency > 1%, in order to explore higher level associations with cataract risk beyond investigations of single SNP-phenotype associations. To build our SNP-SNP interaction models we utilized a prior-knowledge driven filtering method called Biofilter to minimize the multiple testing burden of exploring the vast array of interaction models possible from our extensive number of SNPs. Using the Biofilter, we developed 57,376 prior-knowledge directed SNP-SNP models to test for association with cataract status. We selected models that required 6 sources of external domain knowledge. We identified 5 statistically significant models with an interaction term with p-value < 0.05, as well as an overall model with p-value < 0.05 associated with cataract status. We also conducted gene-environment interaction analyses for all GWAS SNPs and a set of environmental factors from the PhenX Toolkit: smoking, UV exposure, and alcohol use; these environmental factors have been previously associated with the formation of cataracts. We found a total of 288 models that exhibit an interaction term with a p-value ≤ 1×10−4 associated with cataract status. Our results show these approaches enable advanced searches for epistasis and gene-environment interactions beyond GWAS, and that the EHR based approach provides an additional source of data for seeking these advanced explanatory models of the etiology of complex disease/outcome such as cataracts.
PMCID: PMC3615413  PMID: 23424120

