Adverse drug events (ADEs) are common and account for 770 000 injuries and deaths each year and drug interactions account for as much as 30% of these ADEs. Spontaneous reporting systems routinely collect ADEs from patients on complex combinations of medications and provide an opportunity to discover unexpected drug interactions. Unfortunately, current algorithms for such “signal detection” are limited by underreporting of interactions that are not expected. We present a novel method to identify latent drug interaction signals in the case of underreporting.
Materials and Methods
We identified eight clinically significant adverse events. We used the FDA's Adverse Event Reporting System to build profiles for these adverse events based on the side effects of drugs known to produce them. We then looked for pairs of drugs that match these single-drug profiles in order to predict potential interactions. We evaluated these interactions in two independent data sets and also through a retrospective analysis of the Stanford Hospital electronic medical records.
We identified 171 novel drug interactions (for eight adverse event categories) that are significantly enriched for known drug interactions (p=0.0009) and used the electronic medical record for independently testing drug interaction hypotheses using multivariate statistical models with covariates.
Our method provides an option for detecting hidden interactions in spontaneous reporting systems by using side effect profiles to infer the presence of unreported adverse events.
Drug interactions; signal detection analysis; adverse effects; pharmacoepidemiology
AMP-activated protein kinase; diabetes mellitus; metformin; multidrug and toxin extrusion 1; OCT1; OCT2; pathway; pharmacodynamics; pharmacogenomic; pharmacokinetics; type 2 diabetes
pathway; pharmacodynamics; pharmacogenomics; pharmacokinetics; valproic acid
Epidermal growth factor receptor (EGFR); tyrosine kinase inhibitor; erlotinib; gefitinib; pharmacogenomics
We address the problem of assigning biological function to solved protein structures. Computational tools play a critical role in identifying potential active sites and informing screening decisions for further lab analysis. A critical parameter in the practical application of computational methods is the precision, or positive predictive value. Precision measures the level of confidence the user should have in a particular computed functional assignment. Low precision annotations lead to futile laboratory investigations and waste scarce research resources. In this paper we describe an advanced version of the protein function annotation system FEATURE, which achieved 99% precision and average recall of 95% across 20 representative functional sites. The system uses a Support Vector Machine classifier operating on the microenvironment of physicochemical features around an amino acid. We also compared performance of our method with state-of-the-art sequence-level annotator Pfam in terms of precision, recall and localization. To our knowledge, no other functional site annotator has been rigorously evaluated against these key criteria. The software and predictive models are incorporated into the WebFEATURE service at http://feature.stanford.edu/wf4.0-beta.
A review of 2010 research in translational bioinformatics provides much to marvel at. We have seen notable advances in personal genomics, pharmacogenetics, and sequencing. At the same time, the infrastructure for the field has burgeoned. While acknowledging that, according to researchers, the members of this field tend to be overly optimistic, the authors predict a bright future.
Translational bioinformatics; computational biology; genomics; electronic medical records
Many factors affect the risks for neurodevelopmental maladies such as autism spectrum disorders (ASD) and intellectual disability (ID). To compare environmental, phenotypic, socioeconomic and state-policy factors in a unified geospatial framework, we analyzed the spatial incidence patterns of ASD and ID using an insurance claims dataset covering nearly one third of the US population. Following epidemiologic evidence, we used the rate of congenital malformations of the reproductive system as a surrogate for environmental exposure of parents to unmeasured developmental risk factors, including toxins. Adjusted for gender, ethnic, socioeconomic, and geopolitical factors, the ASD incidence rates were strongly linked to population-normalized rates of congenital malformations of the reproductive system in males (an increase in ASD incidence by 283% for every percent increase in incidence of malformations, 95% CI: [91%, 576%], p<6×10−5). Such congenital malformations were barely significant for ID (94% increase, 95% CI: [1%, 250%], p = 0.0384). Other congenital malformations in males (excluding those affecting the reproductive system) appeared to significantly affect both phenotypes: 31.8% ASD rate increase (CI: [12%, 52%], p<6×10−5), and 43% ID rate increase (CI: [23%, 67%], p<6×10−5). Furthermore, the state-mandated rigor of diagnosis of ASD by a pediatrician or clinician for consideration in the special education system was predictive of a considerable decrease in ASD and ID incidence rates (98.6%, CI: [28%, 99.99%], p = 0.02475 and 99% CI: [68%, 99.99%], p = 0.00637 respectively). Thus, the observed spatial variability of both ID and ASD rates is associated with environmental and state-level regulatory factors; the magnitude of influence of compound environmental predictors was approximately three times greater than that of state-level incentives. The estimated county-level random effects exhibited marked spatial clustering, strongly indicating existence of as yet unidentified localized factors driving apparent disease incidence. Finally, we found that the rates of ASD and ID at the county level were weakly but significantly correlated (Pearson product-moment correlation 0.0589, p = 0.00101), while for females the correlation was much stronger (0.197, p<2.26×10−16).
Disease clusters are defined as geographically compact areas where a particular disease, such as a cancer, shows a significantly increased rate. It is presently unclear how common such clusters are for neurodevelopmental maladies, such as autism spectrum disorders (ASD) and intellectual disability (ID). In this study, examining data for one third of the whole US population, the authors show that (1) ASD and ID display strong clustering across US counties; (2) counties with high ASD rates also appear to have high ID rates, and (3) the spatial variation of both phenotypes appears to be driven by environmental, and, to a lesser extent, economic incentives at the state level.
Drug–drug interactions (DDIs) are an emerging threat to public health. Recent estimates indicate that DDIs cause nearly 74 000 emergency room visits and 195 000 hospitalizations each year in the USA. Current approaches to DDI discovery, which include Phase IV clinical trials and post-marketing surveillance, are insufficient for detecting many DDIs and do not alert the public to potentially dangerous DDIs before a drug enters the market. Recent work has applied state-of-the-art computational and statistical methods to the problem of DDIs. Here we review recent developments that encompass a range of informatics approaches in this domain, from the construction of databases for efficient searching of known DDIs to the prediction of novel DDIs based on data from electronic medical records, adverse event reports, scientific abstracts, and other sources. We also explore why DDIs are so difficult to detect and what the future holds for informatics-based approaches to DDI discovery.
As public microarray repositories rapidly accumulate gene expression data, these resources contain increasingly valuable information about cellular processes in human biology. This presents a unique opportunity for intelligent data mining methods to extract information about the transcriptional modules underlying these biological processes. Modeling cellular gene expression as a combination of functional modules, we use independent component analysis (ICA) to derive 423 fundamental components of human biology from a 9,395-array compendium of heterogeneous expression data. Annotation using the Gene Ontology (GO) suggests that while some of these components represent known biological modules, others may describe biology not well characterized by existing manually-curated ontologies. In order to understand the biological functions represented by these modules, we investigate the mechanism of the preclinical anticancer drug parthenolide (PTL) by analyzing the differential expression of our fundamental components. Our method correctly identifies known pathways and predicts that N-glycan biosynthesis and T-cell receptor signaling may contribute to PTL response. The fundamental gene modules we describe have the potential to provide pathway-level insight into new gene expression datasets.
microarrays; independent component analysis; data mining; parthenolide; gene modules
Transcription factors (TFs) are fundamental controllers of cellular regulation that function in a complex and combinatorial manner. Accurate identification of a transcription factor's targets is essential to understanding the role that factors play in disease biology. However, due to a high false positive rate, identifying coherent functional target sets is difficult. We have created an improved mapping of targets by integrating ChIP-Seq data with 423 functional modules derived from 9,395 human expression experiments. We identified 5,002 TF-module relationships, significantly improved TF target prediction, and found 30 high-confidence TF-TF associations, of which 14 are known. Importantly, we also connected TFs to diseases through these functional modules and identified 3,859 significant TF-disease relationships. As an example, we found a link between MEF2A and Crohn's disease, which we validated in an independent expression dataset. These results show the power of combining expression data and ChIP-Seq data to remove noise and better extract the associations between TFs, functional modules, and disease.
Transcription factors (TFs) are crucial to the precise regulation of many cellular processes and thus, are responsible for many human phenotypes and diseases. Now that the ENCODE project has mapped hundreds of TFs to their genomic binding locations, extracting functional biological signals is the next step in understanding their role in disease. In this paper, we present a novel approach to identifying TF targets and use these targets to find regulatory relationships between TFs and diseases. We present a large open dataset of putative TF-TF interactions and TF-disease associations which includes known connections as well as novel ones. We validate the association of one of our novel TF-disease associations, MEF2A and Crohn's disease, suggesting that our approach generates testable disease association hypotheses. Integrating these datasets will be crucial for understanding phenotypes and complex diseases.
There is debate about the utility of clinical data warehouses for research. Using a clinical warfarin dosing algorithm derived from research-quality data, we evaluated the data quality of both a general-purpose database and a coagulation-specific database. We evaluated the functional utility of these repositories by using data extracted from them to predict warfarin dose. We reasoned that high-quality clinical data would predict doses nearly as accurately as research data, while poor-quality clinical data would predict doses less accurately. We evaluated the Mean Absolute Error (MAE) in predicted weekly dose as a metric of data quality. The MAE was comparable between the clinical gold standard (10.1 mg/wk) and the specialty database (10.4 mg/wk), but the MAE for the clinical warehouse was 40% greater (14.1 mg/wk). Our results indicate that the research utility of clinical data collected in focused clinical settings is greater than that of data collected during general-purpose clinical care.
clinical; translational; database; warehouse; research; quality; warfarin; dosing; STRIDE; CoagClinic
Warfarin dosing remains challenging because of its narrow therapeutic window and large variability in dose response. We sought to analyze new factors involved in its dosing and to evaluate eight dosing algorithms, including two developed by the International Warfarin Pharmacogenetics Consortium (IWPC).
we enrolled 108 patients on chronic warfarin therapy and obtained complete clinical and pharmacy records; we genotyped single nucleotide polymorphisms relevant to the VKORC1, CYP2C9, and CYP4F2 genes using integrated fluidic circuits made by Fluidigm.
When applying the IWPC pharmacogenetic algorithm to our cohort of patients, the percentage of patients within 1 mg/d of the therapeutic warfarin dose increases from 54% to 63% using clinical factors only, or from 38% using a fixed-dose approach. CYP4F2 adds 4% to the fraction of the variability in dose (R2) explained by the IWPC pharmacogenetic algorithm (P < 0.05). Importantly, we show that pooling rare variants substantially increases the R2 for CYP2C9 (rare variants: P =0.0065, R2 = 6%; common variants: P= 0.0034, R2 = 7%; rare and common variants: P =0.00018; R2 = 12%), indicating that relatively rare variants not genotyped in genome-wide association studies may be important. In addition, the IWPC pharmacogenetic algorithm and the Gage (2008) algorithm perform best (IWPC: R2 = 50%; Gage: R2 = 49%), and all pharmacogenetic algorithms outperform the IWPC clinical equation (R2 = 22%). VKORC1 and CYP2C9 genotypes did not affect long-term variability in dose. Finally, the Fluidigm platform, a novel warfarin genotyping method, showed 99.65% concordance between different operators and instruments.
CYP4F2 and pooled rare variants of CYP2C9 significantly improve the ability to estimate warfarin dose.
algorithms; CYP2C9; CYP4F2; dosing; IWPC; kinetics; pharmacogenetics; rare variants; VKORC1; warfarin
ADORA2A; caffeine; CYP1A2; pathway; pharmacogenomics
cardiovascular toxicity; colon cancer; COX-2; coxibs; celecoxib; CYP2C9; drug response; inflammation; nonsteroidal anti-inflammatory drugs; pathway; pharmacogenomics; selective COX-2 inhibitors
drug-induced oxidative stress; glucose-6-phosphate dehydrogenase deficiency; hemolytic anemia; pharmacodynamics; pharmacokinetics; polymorphic variants
The number of molecules with solved three-dimensional structure but unknown function is increasing rapidly. Particularly problematic are novel folds with little detectable similarity to molecules of known function. Experimental assays can determine the functions of such molecules, but are time-consuming and expensive. Computational approaches can identify potential functional sites; however, these approaches generally rely on single static structures and do not use information about dynamics. In fact, structural dynamics can enhance function prediction: we coupled molecular dynamics simulations with structure-based function prediction algorithms that identify Ca2+ binding sites. When applied to 11 challenging proteins, both methods showed substantial improvement in performance, revealing 22 more sites in one case and 12 more in the other, with a modest increase in apparent false positives. Thus, we show that treating molecules as dynamic entities improves the performance of structure-based function prediction methods.
CYP1A2; caffeine; pharmacogene; pharmGKB
ABCC4; ABCB1; HIV infection; UGT2B7; zidovudine
carbamazepine; cytochrome P450 metabolizing enzymes; HLA-B; pharmacogenomics; pharmacokinetics
citalopram; escitalopram; pharmacogenomics; pharmacokinetics; pharmGKB; selective serotonin reuptake inhibitor
The biomedical literature presents a uniquely challenging text mining problem. Sentences are long and complex, the subject matter is highly specialized with a distinct vocabulary, and producing annotated training data for this domain is time consuming and expensive. In this environment, unsupervised text mining methods that do not rely on annotated training data are valuable. Here we investigate the use of random indexing, an automated method for producing vector-space semantic representations of words from large, unlabeled corpora, to address the problem of term normalization in sentences describing drugs and genes. We show that random indexing produces similarity scores that capture some of the structure of PHARE, a manually curated ontology of pharmacogenomics concepts. We further show that random indexing can be used to identify likely word candidates for inclusion in the ontology, and can help localize these new labels among classes and roles within the ontology.
cyclooxygenase-2; coxibs; non-steroidal anti-inflammatory drugs; pharmacogenomics; PTGS2; rs20417; rs5275; rs689466
aspirin; clopidogrel; glycoprotein IIb– IIIa inhibitors; pharmacogenomics; PharmGKB; platelet activation; platelet aggregation; polymorphism
drug response; genetic variants; pharmacogenomics; vitamin D receptor
dopamine receptor D2; PharmGKB; rs1799732; rs1800497; rs6277; rs1801028