The promise of “personalized medicine” guided by an understanding of each individual’s genome has been fostered by increasingly powerful and economical methods to acquire clinically relevant features. We describe operational implementation of prospective genotyping linked to an advanced clinical decision support system to guide individualized healthcare in a large academic health center. This approach to personalized medicine includes patient and healthcare provider engagement, identifying relevant genetic variation for implementation, assay reliability, point-of-care decision support, and necessary institutional investments. In one year, approximately 3,000 patients, most scheduled for cardiac catheterization, were genotyped on a multiplexed platform including CYP2C19 variants that modulate response to the widely-used antiplatelet drug clopidogrel. These data are deposited into the Electronic Medical Record and point-of-care decision support is deployed when clopidogrel is prescribed for those with variant genotypes. The establishment of programs such as this is a first step toward implementing and evaluating strategies for personalized medicine.
Drug-Drug Interactions; Personalized Medicine; Pharmacogenetics; Translational Medicine; Adverse Drug Reactions
To identify common genetic variants influencing red blood cell (RBC) traits.
Patients and Methods
We performed a genomewide association study from June 2008 through July 2011 of hemoglobin, hematocrit, RBC count, mean corpuscular volume, mean corpuscular hemoglobin, and mean corpuscular hemoglobin concentration in 12,486 patients of European ancestry from the electronic MEdical Records and Genomics (eMERGE) network. We developed an electronic medical record–based algorithm that included individuals who had RBC measurements obtained for clinical care and excluded values measured in the setting of hematopoietic disorders, comorbid conditions, or medications known to affect RBC production or a recent history of blood loss.
We identified 4 new genetic loci and replicated 11 loci previously reported to be associated with one or more RBC traits in individuals of European ancestry. Notably, genes present in 3 of the 4 newly identified loci (THRB, PTPLAD1, CDT1) and in 6 of the 11 replicated loci (KLF1, ALDH8A1, CCND3, SPTA1, FBXO7, TFR2/EPO) are implicated in erythroid differentiation and regulation of cell cycle in hematopoietic stem cells.
Genes in the erythroid differentiation and cell cycle regulation pathways influence interindividual variation in RBC indices. Our results provide insights into the molecular basis underlying variation in RBC traits.
eMERGE, electronic MEdical Records and GEnomics; EMMAX, mixed-model association-expedited; EMR, electronic medical record; eQTL, expression quantitative trait locus; GHC, Group Health Cooperative--University of Washington; GWAS, genomewide association study; HCT, hematocrit; HGB, hemoglobin; IBS, identity-by-state; LD, linkage disequilibrium; MC, Marshfield Clinic; MCH, mean corpuscular hemoglobin; MCHC, mean corpuscular hemoglobin concentration; MCV, mean corpuscular volume; MIM, Mendelian Inheritance of Man; NU, Northwestern University; RBC, red blood cell; SNP, single-nucleotide polymorphism; VUMC, Vanderbilt University Medical Center
StarBRITE is a one-stop, web-based research portal designed to meet the day-to-day needs of the Vanderbilt University and Meharry Medical College research community during the planning and conduct of research studies. StarBRITE serves as the main online location for research support addressing issues such as identification and location of resources, identification of experts, guidance for regulatory applications and approvals, regulatory assistance, funding requests, research data planning and collection, and serves as a central repository for educational offerings. To date, there have been more than 590,038 StarBRITE hits by more than 6582 cumulative users. We present here StarBRITE design objectives, details about technical infrastructure and system components, status report and activity metrics for the first 2.75-years of operation, and a report of lessons learned during organizing, launching and refining the portal.
Biomedical Informatics; Clinical Research; Translational Research; Scientific Portfolio Management; Researcher Portal; Research Services
Systematic study of clinical phenotypes is important for a better understanding of the genetic basis of human diseases and more effective gene-based disease management. A key aspect in facilitating such studies requires standardized representation of the phenotype data using common data elements (CDEs) and controlled biomedical vocabularies. In this study, the authors analyzed how a limited subset of phenotypic data is amenable to common definition and standardized collection, as well as how their adoption in large-scale epidemiological and genome-wide studies can significantly facilitate cross-study analysis.
The authors mapped phenotype data dictionaries from five different eMERGE (Electronic Medical Records and Genomics) Network sites studying multiple diseases such as peripheral arterial disease and type 2 diabetes. For mapping, standardized terminological and metadata repository resources, such as the caDSR (Cancer Data Standards Registry and Repository) and SNOMED CT (Systematized Nomenclature of Medicine), were used. The mapping process comprised both lexical (via searching for relevant pre-coordinated concepts and data elements) and semantic (via post-coordination) techniques. Where feasible, new data elements were curated to enhance the coverage during mapping. A web-based application was also developed to uniformly represent and query the mapped data elements from different eMERGE studies.
Approximately 60% of the target data elements (95 out of 157) could be mapped using simple lexical analysis techniques on pre-coordinated terms and concepts before any additional curation of terminology and metadata resources was initiated by eMERGE investigators. After curation of 54 new caDSR CDEs and nine new NCI thesaurus concepts and using post-coordination, the authors were able to map the remaining 40% of data elements to caDSR and SNOMED CT. A web-based tool was also implemented to assist in semi-automatic mapping of data elements.
This study emphasizes the requirement for standardized representation of clinical research data using existing metadata and terminology resources and provides simple techniques and software for data element mapping using experiences from the eMERGE Network.
Ritu and pupu and 12; informatics; ontologies; knowledge representations; controlled terminologies and vocabularies; machine learning; terminologies; metadata; mapping; harmonization; eMERGE Network
Observational studies of health conditions and outcomes often combine clinical care data from many sites without explicitly assessing the accuracy and completeness of these data. In order to improve the quality of data in an international multi-site observational cohort of HIV-infected patients, the authors conducted on-site, Good Clinical Practice-based audits of the clinical care datasets submitted by participating HIV clinics. Discrepancies between data submitted for research and data in the clinical records were categorized using the audit codes published by the European Organization for the Research and Treatment of Cancer. Five of seven sites had error rates >10% in key study variables, notably laboratory data, weight measurements, and antiretroviral medications. All sites had significant discrepancies in medication start and stop dates. Clinical care data, particularly antiretroviral regimens and associated dates, are prone to substantial error. Verifying data against source documents through audits will improve the quality of databases and research and can be a technique for retraining staff responsible for clinical data collection. The authors recommend that all participants in observational cohorts use data audits to assess and improve the quality of data and to guide future data collection and abstraction efforts at the point of care.
In 2008, 11 new fellows were elected to the American College of Medical Informatics, and were inducted into the College at a ceremony held in conjunction with the American Medical Informatics Association conference in Washington, DC on Nov 9, 2008. A brief synopsis of the background and accomplishments of each of the new fellows is provided here, in alphabetical order.
We describe a two-stage analytical approach for characterizing morbidity profile dissimilarity among patient cohorts using electronic medical records. We capture morbidities using the International Statistical Classification of Diseases and Related Health Problems (ICD-9) codes. In the first stage of the approach separate logistic regression analyses for ICD-9 sections (e.g., “hypertensive disease” or “appendicitis”) are conducted, and the odds ratios that describe adjusted differences in prevalence between two cohorts are displayed graphically. In the second stage, the results from ICD-9 section analyses are combined into a general morbidity dissimilarity index (MDI). For illustration, we examine nine cohorts of patients representing six phenotypes (or controls) derived from five institutions, each a participant in the electronic MEdical REcords and GEnomics (eMERGE) network. The phenotypes studied include type II diabetes and type II diabetes controls, peripheral arterial disease and peripheral arterial disease controls, normal cardiac conduction as measures by electrocardiography, and senile cataracts.
Electronic medical records; ICD-9; dissimilarity index; comorbidity index; population comparison; morbidity dissimilarity index
Recent genome-wide association studies (GWAS) using selected community populations have identified genomic signals in SCN10A influencing PR duration. The extent to which this can be demonstrated in cohorts derived from electronic medical records is unknown.
Methods and Results
We performed a GWAS on 2,334 European-American patients with normal ECGs without evidence of prior heart disease from the Vanderbilt DNA databank, BioVU, which accrues subjects from routine patient care. Subjects were identified using combinations of natural language processing, laboratory, and billing code queries of de-identified medical record data. Subjects were 58% female, mean (±SD) age 54±15 years, and had mean PR intervals of 158±18 milliseconds. Genotyping was performed using the Illumina Human660W-Quad platform. Our results identify four single nucleotide polymorphisms (rs6800541, rs6795970, rs6798015, rs7430477) linked to SCN10A associated with PR interval (p=5.73×10−7 to 1.78×10−6).
This GWAS confirms a gene heretofore-unimplicated in cardiac pathophysiology as a modulator of PR interval in humans. This study is one of the first replication GWAS performed using an electronic medical record-derived cohort, supporting their further use for genotype-phenotype analyses.
electronic medical records; atrioventricular conduction; genome-wide association study; natural language processing
The 1999 debate of the American College of Medical Informatics focused on the proposition that medical informatics and nursing informatics are distinctive disciplines that require their own core curricula, training programs, and professional identities. Proponents of this position emphasized that informatics training, technology applications, and professional identities are closely tied to the activities of the health professionals they serve and that, as nursing and medicine differ, so do the corresponding efforts in information science and technology. Opponents of the proposition asserted that informatics is built on a re-usable and widely applicable set of methods that are common to all health science disciplines, and that “medical informatics” continues to be a useful name for a composite core discipline that should be studied by all students, regardless of their health profession orientation.
Combining genome-wide association studies (GWAS) data with clinical information from the electronic medical record (EMR) provide unprecedented opportunities to identify genetic variants that influence susceptibility to common, complex diseases. While mining the vastness of EMR greatly expands the potential for conducting GWAS, non-standardized representation and wide variability of clinical data and phenotypes pose a major challenge to data integration and analysis. To address this requirement, we present experiences and methods developed to map phenotypic data elements from eMERGE (Electronic Medical Record and Genomics) to PhenX (Consensus Measures for Phenotypes and Exposures) and NCI’s Cancer Data Standards Registry and Repository (caDSR). Our results suggest that adopting multiple standards and biomedical terminologies will expose studies to a broader user community and enhance interoperability with a wider range of studies, in turn promoting cross-study pooling of data to detect both more subtle and more complex genotype-phenotype associations.
Significant research has been devoted to predicting diagnosis, prognosis, and response to treatment using high-throughput assays. Rapid translation into clinical results hinges upon efficient access to up-to-date and high-quality molecular medicine modalities.
We first explain why this goal is inadequately supported by existing databases and portals and then introduce a novel semantic indexing and information retrieval model for clinical bioinformatics. The formalism provides the means for indexing a variety of relevant objects (e.g. papers, algorithms, signatures, datasets) and includes a model of the research processes that creates and validates these objects in order to support their systematic presentation once retrieved.
We test the applicability of the model by constructing proof-of-concept encodings and visual presentations of evidence and modalities in molecular profiling and prognosis of: (a) diffuse large B-cell lymphoma (DLBCL) and (b) breast cancer.
information retrieval; molecular medicine; semantic model; clinical bioinformatics; predictive computational models
Warfarin pharmacogenomic algorithms reduce dosing error, but perform poorly in non-European–Americans. Electronic health record (EHR) systems linked to biobanks may allow for pharmacogenomic analysis, but they have not yet been used for this purpose.
Patients & methods
We used BioVU, the Vanderbilt EHR-linked DNA repository, to identify European–Americans (n = 1022) and African–Americans (n = 145) on stable warfarin therapy and evaluated the effect of 15 pharmacogenetic variants on stable warfarin dose.
Associations between variants in VKORC1, CYP2C9 and CYP4F2 with weekly dose were observed in European–Americans as well as additional variants in CYP2C9 and CALU in African–Americans. Compared with traditional 5 mg/day dosing, implementing the US FDA recommendations or the International Warfarin Pharmacogenomics Consortium (IWPC) algorithm reduced error in weekly dose in European–Americans (13.5–12.4 and 9.5 mg/week, respectively) but less so in African–Americans (15.2–15.0 and 13.8 mg/week, respectively). By further incorporating associated variants specific for European–Americans and African–Americans in an expanded algorithm, dose-prediction error reduced to 9.1 mg/week (95% CI: 8.4–9.6) in European–Americans and 12.4 mg/week (95% CI: 10.0–13.2) in African–Americans. The expanded algorithm explained 41 and 53% of dose variation in African–Americans and European–Americans, respectively, compared with 29 and 50%, respectively, for the IWPC algorithm. Implementing these predictions via dispensable pill regimens similarly reduced dosing error.
These results validate EHR-linked DNA biorepositories as real-world resources for pharmacogenomic validation and discovery.
anticoagulants; bioinformatics; electronic health record; genes; pharmacogenomics; warfarin
New computer technologies have made it feasible to represent, store, and communicate high resolution biomedical images via electronic means. Traditional two dimensional medical images such as those on printed pages have been supplemented by three dimensional images which can be rendered, rotated, and “dissected” from any point of view. The library of the future will provide electronic access not only to words and numbers, but to pictures, sounds, and other nontextual information. There currently exist few widely-accepted standards for the representation and communication of complex images, yet such standards will be critical to the feasibility and usefulness of digital image collections in the life sciences. The National Library of Medicine is embarked on a project to develop a complete digital volumetric representation of an adult human male and female. This “Visible Human Project” will address the issue of standards for computer representation of biological structure.