The methods we present here build upon the foundation of signal detection algorithms developed for drug safety surveillance. The use of spontaneous reporting systems for identifying ADEs faces challenges as a result of sampling variance and reporting biases (4
). Modern signal detection algorithms address the issue of sampling variance by using shrinkage to down-weight drug-event associations with little evidence to support them (6
). Stratification is designed to address reporting biases by dividing the data across covariate-defined strata. However, systematic application of stratification using a fixed set of covariates reduces power by dividing up the available data across unimportant strata (4
). Our approach does not divide data across strata and can correct for the effects of confounders even if those variables are unknown or unmeasured. The key insight is that, at least for drugs, the indications of use and other drugs used capture most of many important covariates. Although our approach is inspired by those used in observational cohort analysis, it does not enable causative inference. Like other signal detection techniques, the goals are to generate quality hypotheses for follow-up analysis. Our method has a comparable running time to current techniques, making it suitable for systematic drug surveillance.
The successful prediction of side effects before a drug enters clinical trials remains a tantalizing goal. Chemical informatics techniques can predict drug side effects by comparing the structural similarity of drugs (16
). In an analogous manner, protein structural similarity can explain and predict drug side effects (18
). More recently, network and chemical properties have been combined together into predictive models of drug effects (19
); these approaches all rely on a comprehensive database of known drug effects. Package inserts list drug side effects and could serve as a primary source of known side effects, but these data are limited. First, because clinical trials are conducted on relatively small patient populations, only common effects can be detected with sufficient confidence to be listed on a drug’s package insert. Second, effects observed during the clinical trials may be incidental and not actually caused by the drug. Nonetheless, recent work in chemical biology has used the SIDER (a text-mined database of drug package inserts) to good effect (12
). Our Offsides
database contains information complementary to that found in SIDER and improves the prediction of protein targets and drug indications. As a complement to Offsides
, our Twosides
database of mined putative DDIs also lists predicted adverse events. These databases will serve as valuable resources for chemical biology, drug discovery, and pharmacoepidemiology studies. These databases are made available in the Supplementary Materials
and at the http://PharmGKB.org
Identification and prediction of DDIs is a critical activity for improved patient care (21
). Clinical trials do not routinely investigate DDIs because they are focused on establishing safety and efficacy of single-agent therapeutics. A wide range of methods, from text mining (27
) to network modeling (29
), can detect, explain, and predict DDIs. Recently, a systems pharmacology approach was presented to identify genes associated with adverse cardiovascular drug effects (31
). Integration of these methods with Twosides
may lead to further understanding of the molecular etiology of these effects (figs. S9
). We highlight one potentially clinically significant association between co-prescription of thiazides and SSRIs and QT interval prolongation. Prolonged QT is not a known interaction effect of thiazides and SSRIs. However, each drug class is individually implicated in causing hyponatremia (32
), and the mechanisms that cause this side effect may interact synergistically. The EMR analyses we report are not full epidemiological studies. EMR records are incomplete and may be missing data on medical history and prescription orders. In addition, patients who take multiple drugs may have a higher rate of adverse events than less-medicated patients. Further analysis is needed to evaluate these potentially important drug interactions.
Evaluation of signal detection algorithms and side-effect prediction algorithms, in general, is not straightforward; no gold standard of known ADEs exists. In lieu of a standard, we evaluated our proposed methodology against three “silver” standards: (i) effects listed on the drug’s package inserts, (ii) ADEs reported after the original download date of September 2009, and (iii) ADEs reported to the Canadian spontaneous reporting system. We found that when used in combination with modern signal detection algorithms, our method significantly improved performance. These standards, however, are biased toward more common effects, and so the performance of our method with respect to detecting rare events may be less reliable. A publicly available resource of drug effects would enhance the evaluation of this and other predictive algorithms.
In summary, we present a new methodology for correcting for the effects of confounding variables in large clinical observational databases when those variables are unknown, unmeasured, or sparsely collected. The goals of this work parallel those of patient stratification; however, our presented methodology adapts to specific drug-event pairs, does not require data to be split across strata, and can implicitly correct for unmeasured covariates. The key assumption of the method is that many patient covariates will be represented by the concomitant drugs the patient is taking and indications for which the patient is being treated. The method improves the performance of modern signal detection techniques and is suitable for systematic and routine drug safety surveillance. Finally, we present two new resources of adverse drug effects and drug interactions for use in drug discovery, repositioning, chemical biology, and pharmacoepidemiology studies.