Many colleges and universities across the globe now offer bachelors, masters, and doctoral degrees, along with certificate programs in bioinformatics. While there is some consensus surrounding curricula competencies, programs vary greatly in their core foci, with some leaning heavily toward the biological sciences and others toward quantitative areas. This allows prospective students to choose a program that best fits their interests and career goals. In the digital age, most scientific fields are facing an enormous growth of data, and as a consequence, the goals and challenges of bioinformatics are rapidly changing; this requires that bioinformatics education also change. In this workshop, we seek to ascertain current trends in bioinformatics education by asking the question, “What are the core competencies all bioinformaticians should have at the end of their training, and how successful have programs been in placing students in desired careers?”
Tools such as genome resequencing and genome-wide association studies have recently been used to uncover a number of variants that affect drug toxicity and efficacy, as well as potential drug targets. But how much closer are we to incorporating pharmacogenomics into routine clinical practice? Five experts discuss how far we have come, and highlight the technological, informatics, educational and practical obstacles that stand in the way of realizing genome-driven medicine.
immunosuppressive agents; inosine monophosphate dehydrogenase; mycophenolate mofetil; mycophenolic acid; pharmacogenetics; pharmacogenomics
CYP2C8; CYP2C8*3; metabolism; pharmacogenetics; pharmacogenomics; pharmGKB
breast cancer; pathway; pharmacogenomics; tamoxifen
diuretics; hypertension; pathway; pharmacogenetic; pharmacogenomic
ABCB1; calcineurin; cyclosporine; CYP3A4; CYP3A5; pharmacodynamics; pharmacogenetics; pharmacokinetics; tacrolimus; transplantation
glucose-6-phosphate dehydrogenase; hemolytic anemia; methemoglobinemia; methylene blue; reduced nicotinamide adenine dinucleotide phosphate; oxidative stress; pentose phosphate pathway; pharmacodynamics; pharmacogenetics; red blood cells
The American College of Medical Genetics and Genomics (ACMG) recently released guidelines regarding the reporting of incidental findings in sequencing data. Given the availability of Direct to Consumer (DTC) genetic testing and the falling cost of whole exome and genome sequencing, individuals will increasingly have the opportunity to analyze their own genomic data. We have developed a web-based tool, PATH-SCAN, which annotates individual genomes and exomes for ClinVar designated pathogenic variants found within the genes from the ACMG guidelines. Because mutations in these genes predispose individuals to conditions with actionable outcomes, our tool will allow individuals or researchers to identify potential risk variants in order to consult physicians or genetic counselors for further evaluation. Moreover, our tool allows individuals to anonymously submit their pathogenic burden, so that we can crowd source the collection of quantitative information regarding the frequency of these variants. We tested our tool on 1092 publicly available genomes from the 1000 Genomes project, 163 genomes from the Personal Genome Project, and 15 genomes from a clinical genome sequencing research project. Excluding the most commonly seen variant in 1000 Genomes, about 20% of all genomes analyzed had a ClinVar designated pathogenic variant that required further evaluation.
CYP2C19; CYP2D6; pharmacogenetics; serotonin–norepinephrine reuptake inhibitor; venlafaxine
Physics-based simulation provides a powerful framework for understanding biological form and function. Simulations can be used by biologists to study macromolecular assemblies and by clinicians to design treatments for diseases. Simulations help biomedical researchers understand the physical constraints on biological systems as they engineer novel drugs, synthetic tissues, medical devices, and surgical interventions. Although individual biomedical investigators make outstanding contributions to physics-based simulation, the field has been fragmented. Applications are typically limited to a single physical scale, and individual investigators usually must create their own software. These conditions created a major barrier to advancing simulation capabilities. In 2004, we established a National Center for Physics-Based Simulation of Biological Structures (Simbios) to help integrate the field and accelerate biomedical research. In 6 years, Simbios has become a vibrant national center, with collaborators in 16 states and eight countries. Simbios focuses on problems at both the molecular scale and the organismal level, with a long-term goal of uniting these in accurate multiscale simulations.
Simulation; dynamics; biomedical computation; physics-based; neuromuscular biomechanics; molecular dynamics; multibody dynamics; domain-specific languages; DSLs; neuroprosthetic dynamics; drug target dynamics; physics-based simulation
Adverse drug events (ADEs) are common and account for 770 000 injuries and deaths each year and drug interactions account for as much as 30% of these ADEs. Spontaneous reporting systems routinely collect ADEs from patients on complex combinations of medications and provide an opportunity to discover unexpected drug interactions. Unfortunately, current algorithms for such “signal detection” are limited by underreporting of interactions that are not expected. We present a novel method to identify latent drug interaction signals in the case of underreporting.
Materials and Methods
We identified eight clinically significant adverse events. We used the FDA's Adverse Event Reporting System to build profiles for these adverse events based on the side effects of drugs known to produce them. We then looked for pairs of drugs that match these single-drug profiles in order to predict potential interactions. We evaluated these interactions in two independent data sets and also through a retrospective analysis of the Stanford Hospital electronic medical records.
We identified 171 novel drug interactions (for eight adverse event categories) that are significantly enriched for known drug interactions (p=0.0009) and used the electronic medical record for independently testing drug interaction hypotheses using multivariate statistical models with covariates.
Our method provides an option for detecting hidden interactions in spontaneous reporting systems by using side effect profiles to infer the presence of unreported adverse events.
Drug interactions; signal detection analysis; adverse effects; pharmacoepidemiology
Mental illness is the leading cause of disability in the USA, but boundaries between different mental illnesses are notoriously difficult to define. Electronic medical records (EMRs) have recently emerged as a powerful new source of information for defining the phenotypic signatures of specific diseases. We investigated how EMR-based text mining and statistical analysis could elucidate the phenotypic boundaries of three important neuropsychiatric illnesses—autism, bipolar disorder, and schizophrenia.
We analyzed the medical records of over 7000 patients at two facilities using an automated text-processing pipeline to annotate the clinical notes with Unified Medical Language System codes and then searching for enriched codes, and associations among codes, that were representative of the three disorders. We used dimensionality-reduction techniques on individual patient records to understand individual-level phenotypic variation within each disorder, as well as the degree of overlap among disorders.
We demonstrate that automated EMR mining can be used to extract relevant drugs and phenotypes associated with neuropsychiatric disorders and characteristic patterns of associations among them. Patient-level analyses suggest a clear separation between autism and the other disorders, while revealing significant overlap between schizophrenia and bipolar disorder. They also enable localization of individual patients within the phenotypic ‘landscape’ of each disorder.
Because EMRs reflect the realities of patient care rather than idealized conceptualizations of disease states, we argue that automated EMR mining can help define the boundaries between different mental illnesses, facilitate cohort building for clinical and genomic studies, and reveal how clear expert-defined disease boundaries are in practice.
Electronic Medical Records; Autism; Schizophrenia; Bipolar Disorder; Data Mining; Network Analysis
A review of 2010 research in translational bioinformatics provides much to marvel at. We have seen notable advances in personal genomics, pharmacogenetics, and sequencing. At the same time, the infrastructure for the field has burgeoned. While acknowledging that, according to researchers, the members of this field tend to be overly optimistic, the authors predict a bright future.
Translational bioinformatics; computational biology; genomics; electronic medical records
Gemcitabine; deoxycytidine analogs; pancreatic cancer; non-small cell lung cancer; breast cancer; pharmacogenomics
As public microarray repositories rapidly accumulate gene expression data, these resources contain increasingly valuable information about cellular processes in human biology. This presents a unique opportunity for intelligent data mining methods to extract information about the transcriptional modules underlying these biological processes. Modeling cellular gene expression as a combination of functional modules, we use independent component analysis (ICA) to derive 423 fundamental components of human biology from a 9,395-array compendium of heterogeneous expression data. Annotation using the Gene Ontology (GO) suggests that while some of these components represent known biological modules, others may describe biology not well characterized by existing manually-curated ontologies. In order to understand the biological functions represented by these modules, we investigate the mechanism of the preclinical anticancer drug parthenolide (PTL) by analyzing the differential expression of our fundamental components. Our method correctly identifies known pathways and predicts that N-glycan biosynthesis and T-cell receptor signaling may contribute to PTL response. The fundamental gene modules we describe have the potential to provide pathway-level insight into new gene expression datasets.
microarrays; independent component analysis; data mining; parthenolide; gene modules
There is debate about the utility of clinical data warehouses for research. Using a clinical warfarin dosing algorithm derived from research-quality data, we evaluated the data quality of both a general-purpose database and a coagulation-specific database. We evaluated the functional utility of these repositories by using data extracted from them to predict warfarin dose. We reasoned that high-quality clinical data would predict doses nearly as accurately as research data, while poor-quality clinical data would predict doses less accurately. We evaluated the Mean Absolute Error (MAE) in predicted weekly dose as a metric of data quality. The MAE was comparable between the clinical gold standard (10.1 mg/wk) and the specialty database (10.4 mg/wk), but the MAE for the clinical warehouse was 40% greater (14.1 mg/wk). Our results indicate that the research utility of clinical data collected in focused clinical settings is greater than that of data collected during general-purpose clinical care.
clinical; translational; database; warehouse; research; quality; warfarin; dosing; STRIDE; CoagClinic
AMP-activated protein kinase; diabetes mellitus; metformin; multidrug and toxin extrusion 1; OCT1; OCT2; pathway; pharmacodynamics; pharmacogenomic; pharmacokinetics; type 2 diabetes
Warfarin dosing remains challenging because of its narrow therapeutic window and large variability in dose response. We sought to analyze new factors involved in its dosing and to evaluate eight dosing algorithms, including two developed by the International Warfarin Pharmacogenetics Consortium (IWPC).
we enrolled 108 patients on chronic warfarin therapy and obtained complete clinical and pharmacy records; we genotyped single nucleotide polymorphisms relevant to the VKORC1, CYP2C9, and CYP4F2 genes using integrated fluidic circuits made by Fluidigm.
When applying the IWPC pharmacogenetic algorithm to our cohort of patients, the percentage of patients within 1 mg/d of the therapeutic warfarin dose increases from 54% to 63% using clinical factors only, or from 38% using a fixed-dose approach. CYP4F2 adds 4% to the fraction of the variability in dose (R2) explained by the IWPC pharmacogenetic algorithm (P < 0.05). Importantly, we show that pooling rare variants substantially increases the R2 for CYP2C9 (rare variants: P =0.0065, R2 = 6%; common variants: P= 0.0034, R2 = 7%; rare and common variants: P =0.00018; R2 = 12%), indicating that relatively rare variants not genotyped in genome-wide association studies may be important. In addition, the IWPC pharmacogenetic algorithm and the Gage (2008) algorithm perform best (IWPC: R2 = 50%; Gage: R2 = 49%), and all pharmacogenetic algorithms outperform the IWPC clinical equation (R2 = 22%). VKORC1 and CYP2C9 genotypes did not affect long-term variability in dose. Finally, the Fluidigm platform, a novel warfarin genotyping method, showed 99.65% concordance between different operators and instruments.
CYP4F2 and pooled rare variants of CYP2C9 significantly improve the ability to estimate warfarin dose.
algorithms; CYP2C9; CYP4F2; dosing; IWPC; kinetics; pharmacogenetics; rare variants; VKORC1; warfarin
etoposide; pathway; pharmacogenetics; pharmacogenomics; pharmGKB
The PharmGKB (http://www.pharmgkb.org) is a publicly available online resource that aims to facilitate understanding on how genetic variation contributes to variation in drug response. It is not only a repository of pharmacogenomics primary data, but also provides fully curated knowledge including drug pathways, annotated pharmacogene summaries and relationships amongst genes, drugs and diseases. This unit describes how to navigate the PharmGKB website to retrieve detailed information on genes and important variants, as well as their relationship to drugs and diseases. It also includes protocols on our drug-centered pathway, annotated pharmacogene summaries and our web services for downloading the underlying data. Workflow on how to use PharmGKB to facilitate design of the pharmacogenomic study is also described in this unit.
Database; pharmacogenomics; pharmacogenetics; drug response; genetic variation; pathway analysis; SNP; polymorphisms; study design
anticancer; drug response; pathway; pharmacogenomics; platinum
Adverse drug reactions; allopurinol; rasburicase; uric acid; uricosurics; pharmacodynamics; pharmacogenetics
VKORC1 and CYP2C9 are important contributors to warfarin dose variability, but explain less variability for individuals of African descent than for those of European or Asian descent. We aimed to identify additional variants contributing to warfarin dose requirements in African Americans.
We did a genome-wide association study of discovery and replication cohorts. Samples from African-American adults (aged ≥18 years) who were taking a stable maintenance dose of warfarin were obtained at International Warfarin Pharmacogenetics Consortium (IWPC) sites and the University of Alabama at Birmingham (Birmingham, AL, USA). Patients enrolled at IWPC sites but who were not used for discovery made up the independent replication cohort. All participants were genotyped. We did a stepwise conditional analysis, conditioning first for VKORC1 −1639G→A, followed by the composite genotype of CYP2C9*2 and CYP2C9*3. We prespecified a genome-wide significance threshold of p<5×10−8 in the discovery cohort and p<0·0038 in the replication cohort.
The discovery cohort contained 533 participants and the replication cohort 432 participants. After the prespecified conditioning in the discovery cohort, we identified an association between a novel single nucleotide polymorphism in the CYP2C cluster on chromosome 10 (rs12777823) and warfarin dose requirement that reached genome-wide significance (p=1·51×10−8). This association was confirmed in the replication cohort (p=5·04×10−5); analysis of the two cohorts together produced a p value of 4·5×10−12. Individuals heterozygous for the rs12777823 A allele need a dose reduction of 6·92 mg/week and those homozygous 9·34 mg/week. Regression analysis showed that the inclusion of rs12777823 significantly improves warfarin dose variability explained by the IWPC dosing algorithm (21% relative improvement).
A novel CYP2C single nucleotide polymorphism exerts a clinically relevant effect on warfarin dose in African Americans, independent of CYP2C9*2 and CYP2C9*3. Incorporation of this variant into pharmacogenetic dosing algorithms could improve warfarin dose prediction in this population.
National Institutes of Health, American Heart Association, Howard Hughes Medical Institute, Wisconsin Network for Health Research, and the Wellcome Trust.