PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
 
Clin Pharmacol Ther. Author manuscript; available in PMC 2013 December 2.
Published in final edited form as:
PMCID: PMC3846296
NIHMSID: NIHMS525847

Pharmacovigilance Using Clinical Notes

Abstract

With increasing adoption of electronic health records (EHRs), there is an opportunity to use the free-text portion of EHRs for pharmacovigilance. We present novel methods that annotate the unstructured clinical notes and transform them into a deidentified patient–feature matrix encoded using medical terminologies. We demonstrate the use of the resulting high-throughput data for detecting drug–adverse event associations and adverse events associated with drug–drug interactions. We show that these methods flag adverse events early (in most cases before an official alert), allow filtering of spurious signals by adjusting for potential confounding, and compile prevalence information. We argue that analyzing large volumes of free-text clinical notes enables drug safety surveillance using a yet untapped data source. Such data mining can be used for hypothesis generation and for rapid analysis of suspected adverse event risk.

Phase IV surveillance is a critical component of drug safety because not all safety issues associated with drugs are detected before market approval. Each year, drug-related events account for up to 50% of adverse events occurring in hospital stays,1 significantly increasing costs and length of stay in hospitals.2 As much as 30% of all drug reactions result from concomitant use—with an estimated 29.4% of elderly patients on six or more drugs.3

Efforts such as the Sentinel Initiative and the Observational Medical Outcomes Partnership4 envision the use of electronic health records (EHRs) for active pharmacovigilance.57 Complementing the current state of the art—based on reports of suspected adverse drug reactions—active surveillance aims to monitor drugs in near real time and potentially shorten the time that patients are at risk.

Coded discharge diagnoses and insurance claims data from EHRs have already been used for detecting safety signals.810 However, some experts argue that methods that rely on coded data could be missing >90% of the adverse events that actually occur, in part because of the nature of billing and claims data.1 Researchers have used discharge summaries (which summarize information from a care episode, including the final diagnosis and follow-up plan) for detecting a range of adverse events11 and for demonstrating the feasibility of using the EHR for pharmacovigilance by identifying known adverse events associated with seven drugs using 25,074 notes from 2004.12 Therefore, the clinical text can potentially play an important role in future pharmacovigilance, 13,14 particularly if we can transform notes taken daily by doctors, nurses, and other practitioners into more accessible data-mining inputs.1517

Two key barriers to using clinical notes are privacy and accessibility. 16 Clinical notes contain identifying information, such as names, dates, and locations, that are difficult to redact automatically, so care organizations are reluctant to share clinical notes.

We describe an approach that computationally processes clinical text rapidly and accurately enough to serve use cases such as drug safety surveillance. Like other terminology-based systems, it deidentifies the data as part of the process.18 We trade the “unreasonable effectiveness”24 of large data sets in exchange for sacrificing some individual note-level accuracy in the text processing. Given the large volumes of clinical notes, our method produces a patient–feature matrix encoded using standardized medical terminologies. We demonstrate the use of the resulting patient–feature matrix as a substrate for signal detection algorithms for drug–adverse event associations and drug–drug interactions.

RESULTS

Our results show that it is possible to detect drug safety signals using clinical notes transformed into a feature matrix encoded using medical terminologies. We evaluate the performance of the resulting data set for pharmacovigilance using curated reference sets of single-drug adverse events as well as adverse events related to drug–drug interactions. In addition, we show that we can simultaneously estimate the prevalence of adverse events resulting from drug–drug interactions. The reference set, described in the Methods section, contains 28 positive associations and 165 negative associations spanning 78 drugs and 12 different events for single drug–adverse event associations. For the drug–drug interactions, the reference set contains 466 positive and 466 negative associations spanning 333 drugs across 10 events.

Feasibility of detecting drug–adverse event associations

To demonstrate the feasibility of using free text–derived features for detecting drug–adverse event associations, we reproduce the well-known association between rofecoxib and myocardial infarction. Rofecoxib was taken off the market because of the increased risk of heart attack and stroke.19,20 We compute an association between rofecoxib and myocardial infarction, keeping track of the temporal order of the diagnosis of rheumatoid arthritis, exposure to the drug, and occurrence of an adverse event as described in the Methods section. Using data up to 2005, we obtain an odds ratio (OR) of 1.31 (95% confidence interval (CI): 1.16–1.45) for the association, which agrees with previously reported results.19,20 In a previous study, we compared using clinical notes with using the codes from the International Classification of Diseases, Ninth Revision (ICD-9), and found no association (OR: 1.71; 95% CI: 0.74–3.53) using the coded data.21 This is probably due to undercoding: for patients to be counted as exposed requires a prior arthritis indication, and approximately one-third of the patients meet that criterion.

Performance of detecting adverse drug events

Figure 1 shows the adjusted ORs and 95% CIs for the 28 true-positive associations from our single drug–adverse event reference set. As expected, the results show some variation by event across the adverse events.10 Figure 2a shows the overall performance for detecting associations between a single drug and its adverse event, with an area under the receiver operating characteristic curve (AUC) of 75.3% (unadjusted) and 80.4% (adjusted). A threshold of 1.0 (a commonly used cutoff) on the lower bound of the 95% CI of the adjusted ORs translates to 39% sensitivity and 97.5% specificity. Choosing a signaling threshold, defined using minimum specificity of 90%, based on the receiver operating characteristic curve, yields a cutoff of 1.18 (unadjusted) and 0.84 (adjusted) on the lower bound of the 95% CI. Supplementary Data S1 online lists all adjusted results, and Supplementary Data S2 online lists the AUC threshold data.

Figure 1
Adjusted odds ratios (ORs) for positive cases in the single drug–adverse event set. Results show some variability by event. The 28 positive cases include the following events: myocardial infarction (mi), rhabdomyolysis (rhabd), cardiovascular ...
Figure 2
Performance of adverse drug reactions and drug–drug interaction detection. Overall performance is measured using areas under the receiver operating characteristic curve (AUCs). (a) The unadjusted (blue) vs. adjusted (red) methods yield AUCs of ...

Profiling drug–adverse event associations over time

Figure 3 shows the cumulative ORs and exposures over time based on the unadjusted associations for the 10 drugs in our reference set that have had an alert in the past decade. Using a threshold of 1.0 on the lower bound of the CI for the association, we would flag six of nine alerts earlier than the official date (we do not have enough data for one drug, troglitazone). By comparison, the propensity-adjusted method would catch three of the alerts early. The unadjusted associations can flag signals worth investigating, and the adjusted associations may reduce false alarms.

Figure 3
Cumulative (unadjusted) odds and exposure plots for 10 positive cases involving US Food and Drug Administration (FDA) intervention. Signals are flagged earlier than official alerts in six of nine cases (troglitazone excluded for lack of sufficient exposure). ...

Performance of detecting adverse drug–drug interactions

Figure 2b shows the performance (AUC of 81.5%) for detecting known adverse events arising from drug–drug interactions. Adjusting the associations for potential confounding improves the signal detection capability (red curves in Figure 2b).22 In the drug–drug interaction scenario, we do not constrain by drug indications because of combinatorial complexity. We obtain 52% sensitivity at 91% specificity, using 1.0 as a threshold on the lower bound of the CI for the adjusted associations.

Estimating the prevalence of adverse events

Population-level prevalence data for adverse events are hard to come by. For single drugs, sources such as Side Effect Resource provide information on the frequency of specific adverse events from the drug product label. No such comparable resource exists for adverse events arising from drug–drug interactions.

While performing the drug–adverse event association calculations using data from a clinical data warehouse, we can in parallel estimate the prevalence of adverse events associated with drug–drug interactions. For example, we found that 42.8% (176 of 411) of patients on both levodopa and lorazepam experience parkinsonian symptoms, 19.8% (140 of 707) of patients on paclitaxel and trastuzumab experience neutropenia, and 17.8% (796 of 4,467) of patients on amiodarone and metoprolol experience bradycardia.

DISCUSSION

We have demonstrated that adverse drug events as well as adverse events associated with drug–drug interactions can be detected using a deidentified patient–feature matrix extracted from free-text clinical documents. Blumenthal and others5 envision a scenario in which a new drug comes to market and a nationwide learning system monitors for safety signals. Our results show that deidentified clinical notes can be used to generate drug safety signals—taking a step toward such a scenario. In addition, the patient–feature matrix also provides prevalence data not available from other data sources (e.g., spontaneous reports). Having such prevalence information can assist in prioritizing actionable events and reducing alert fatigue.23

Our approach to processing clinical notes is simple in comparison with advanced natural language processing (NLP) systems that may have better accuracy in identifying nuanced attributions of disease conditions. We sacrifice some individual note-level accuracy in exchange for the ability to detect population-level trends against massive data sets. Our results, based on a reference set of known drug–event pairs, show that when exposure data are numerous enough, the use of relatively simple text mining with standard association strength tests for signal detection can work, reflecting the adage in the machine-learning community that “a dumb algorithm with lots of data beats a clever one with modest amounts of it.”24,25 When used in combination with other data sets, clinical notes may address cases that otherwise pass undetected. We sacrifice sensitivity for specificity because for a new approach, and a new data source (clinical notes), keeping false-discovery rates low is important, particularly in the initial stages of establishing feasibility.

We find that ontologies are an excellent source of features and allow systematic normalization and aggregation when the feature set needs reduction.15,26 For example, we can count all patients who experience cardiac arrhythmias as patients with arrhythmias because of the hierarchical relationships. Therefore, ontology hierarchies can organize a very large number of terms into a smaller feature set. Moreover, because names, dates, and locations are not present in the clinical terminologies, those are not extracted as features by dictionary-based methods.18,27

We believe that the information embedded in text is crucial for leveraging EHR data,10,13,14 particularly for rare events for which large amounts of data are needed. Our annotation-based approach produces a feature matrix that complements other structured data such as codes from the ICD-9. Of note, our methods are not dependent on any particular NLP tool (we contrast MGREP and UNITEX in the Methods section), and we expect results to improve given the availability of better and faster clinical NLP tools.28,29 We are currently collaborating with researchers at the Mayo Clinic to improve the speed of the clinical Text Analysis and Knowledge Extraction System,29 one of the state-of-the-art NLP tools available for clinical text. Broader availability of curated clinical NLP data sets and health outcome definitions would accelerate research and validation.

Our work has several limitations and opportunities for improvement. Not all conditions are equally identifiable from text using lexical approaches (Supplementary Data S3 online reports validation results by condition). Advanced NLP tools would improve accuracy in these cases. Biases in our reference set—although among the largest used for such a study— affect our performance estimation. A new reference standard covering four events has just recently been released by the Observational Medical Outcomes Partnership,4 and we are currently evaluating its utility. Some adverse drug events are dose dependent, and our methods currently ignore this information. The UNITEX tool, described in the Methods section, includes libraries for dosage extraction and thus is a logical next step. We do not distinguish between new users of drugs and existing or chronic ones. Our methods have a limited ability to define eras (durations of medication and illness). We are currently examining the annotation data for the utility of the last mention of a concept, sentence-level co-occurrences, and temporal density of mentions to address this question. The majority of our findings are based on the Stanford Hospital and Clinics, which is a tertiary-care center representing a skewed population. At the same time, this population has added utility for investigating rare events. Variations in signaling thresholds can also occur as a result of the prevalence or rarity of an event,10 and more research is needed to adapt detection algorithms accordingly. The prevalence data estimated in studies such as ours are an important step in this direction.10 Finally, we note that the Observational Medical Outcomes Partnership group suggests that no single method works best uniformly, that different methods be considered for each event and data source, and that profiling performance via receiver operating characteristic curves assists in understanding the utility of a method or data source.4

To conclude, our method extracts from textual clinical notes a deidentified patient–feature matrix encoded using standardized medical terminologies. We have demonstrated the use of the resulting patient–feature matrix as a substrate for detecting single drug–adverse event associations (AUC of 80.4%) and for detecting adverse events associated with drug–drug interactions (AUC of 81.5%), illustrating that clinical notes can be a source for detecting drug safety signals at scale.15 The patient–feature matrix can also be used to learn off-label usage30 and to discern drug adverse events from indications.31 Using the textual contents of the EHR complements efforts using billing and claims data or spontaneous reports4,8,14,32,33 and opens up new opportunities for leveraging observational data.

METHODS

Data sources

Our primary data source was the Stanford Translational Research Integrated Database Environment,34 which spans 18 years of patient data from 1.8 million patients; it contains 19 million encounters, 35 million coded ICD-9, diagnoses, and >11 million unstructured clinical notes, which are a combination of pathology, radiology, and transcription reports. The gender split is ~60% female; the average age is 44 with an SD of 25.

The reference standard

We created reference standards of known drug–adverse event associations for testing the performance of our methods in detecting drug safety signals from text. Supplementary Data S4 online lists the single drug–event reference set.

For the single-drug adverse events, our reference set included 12 distinct events worth monitoring35 and 78 distinct drugs, 28 positive cases, and 165 negative cases. We started with a validation set from the European Union adverse drug event project (EU-ADR)36 and to that set, we added 10 drug safety signals that involved US Food and Drug Administration intervention in the past decade, manually curating these from the literature and cross-referencing with the agency’s website. We established our false-discovery rate by generating a set of negative associations by creating all combinations of drugs and events and subtracting any known associations that were identified by any one of the EU-ADR filtering workflows,37 the Medi-Span (Wolters Kluwer Health, Indianapolis, IN) Adverse Drug Effects Database, or the Side Effect Resource database.38

For the two-drug case, known drug–drug interactions were extracted (and manually validated) from textual monographs in DrugBank and the Medi-Span Drug Therapy Monitoring System. In this case, we simulated the negative set by associating drug pairs with a randomly chosen event, removing any cases that were already known to be associated on the basis of external knowledge (DrugBank, Medi-Span, Drugs.com, Unified Medical Language System (UMLS), or Side Effect Resource). This reference set included 10 distinct events, 333 distinct drugs, 466 positive cases, and 466 negative cases.

Testing for drug safety signals

We followed a two-step process for detecting drug safety signals: first, we computed a raw association in the form of an unadjusted OR, followed by adjustment for potential confounders. The first step is useful for flagging putative signals, and the second step is useful in reducing false alarms.

In the first step, we computed unadjusted ORs and 95% CIs by constructing a 2 × 2 contingency table26,33 from the patient–feature matrix. On the basis of first mentions of drug, event, and indication and their temporal order, we assigned patients to specific cells of a 2 × 2 contingency table as shown in Figure 4 (see also Supplementary Data S5 online). The temporal information in the patient–feature matrix is critical for determining whether the event follows exposure.39 Patients having no mention of the indication at any time are excluded from the analysis (see Supplementary Data S6 online for those patients being excluded). Using data following the indication, and not counting repeat mentions, the ordering of the drug and event determined into which cell of a 2 × 2 matrix the patient fell. Because all unexposed patients have the indication, they could be on an alternative drug or other treatment, or none at all.

Figure 4
Assignment of patients to 2 × 2 contingency tables. Patients are assigned to cells a, b, c, and d of a 2 × 2 contingency table (C) on the basis of the patterns shown in parts (A) and (B). In the patterns, indications are abbreviated with ...

In the second step, we adjusted for confounding by specific patient factors. We included age, gender, race, and comorbidity and coprescription frequency (as a surrogate for overall health status) in calculating the propensity score.9 The propensity score quantified the likelihood of a patient to be exposed to a drug. Patients with known indications were matched (exposed vs. unexposed) via the propensity score. Finally, we included the propensity score as a covariate in logistic regression to compute adjusted ORs and 95% CIs using the coefficients of the regression model. We used the Matching and Survival packages in R.40

For single drug–event associations, we identified the indications of the drug using the Medi-Span Drug Indications Database and the National Drug File–Reference Terminology. In the drug–drug interaction scenario, the key idea is to determine whether the association of the event with the combination of the two drugs outweighs any association of the event with either one of the drugs alone (or none at all). Including the indications adds a degree of combinatorial complexity, so we focused primarily on the temporal order of the two drugs and event (Figure 4b) without restricting by the indications of the drugs.

Generating the patient–feature matrix

Our annotator workflow, described previously,21,30 uses ~5.6 million strings from existing terminologies; filters unambiguous terms that are predominantly noun phrases representing drugs, diseases, devices, or procedures; uses the cleaned up lexicon for term recognition in the clinical notes to tag or annotate41 the text; excludes negated terms or terms that apply to family and medical history;42 normalizes all terms using the ontology hierarchies; and finally uses the time stamps of the note to produce a deidentified, temporally ordered patient–feature matrix. The process is summarized in Figure 5 and the individual steps are detailed below.

Figure 5
Generation of the patient–feature matrix. The workflow (1) starts by downloading ~5.6 million strings for every term in ontologies from both the Unified Medical Language System (UMLS) and BioPortal, as well as all trigger terms from NegEx and ...

Using biomedical ontologies for text annotation

We use existing ontologies as a source of (i) a lexicon of strings that are grouped together and linked to over a million concepts via synonymy (referred to as mappings) and (ii) a hierarchy of >14 million parent–child relationships among those concepts. We use the lexicon to recognize terms in the input text using a tool called MGREP,41 which also tracks the relative position at which each term occurs (Figures 5 and and6).6). In addition to clinical terms, based on the ConText system,42 we include terms corresponding to contextual cues called “triggers” in our lexicon. Cues such as “denies,” “no sign of,” and “father has a history of” are used in a postprocessing step to identify terms that are negated or that apply to family or medical history. Terms that correspond to mentions in these contexts are ignored—thus, the subsequent analysis relies on positive, present mentions of concepts.

Figure 6
Sample annotations. (a) A discharge summary is encoded internally using (b) a highly compressed, numerical representation. The strings in parenthesis are keyed to the first column of numbers and are included merely for illustration purposes. (c) The annotations ...

The resulting annotations for the Stanford Translational Research Integrated Database Environment data set comprise ~3.75 billion records. It takes 1 hour to generate annotations from 3 million documents using a single computer workstation and ~2 hours to postprocess the data. MGREP can be substituted with other NLP tools: one such tool we have tested is UNITEX,43 which offers advanced functionality such as regular expressions for drug doses and morpheme-based matching at the cost of an additional 10–20% processing time.

Cleaning the lexicon

Motivated by previous work on identifying and removing noninformative terms,44,45 we apply a series of suppression rules that fall into two categories: syntactic and semantic. We keep terms that are predominantly noun phrases46 based on an analysis of over 20 million MEDLINE abstracts; we remove uninformative phrases based on term frequency analysis of >50 million clinical documents from the Mayo clinic;47 and we suppress terms having fewer than four characters by default because the majority of these tend to be ambiguous abbreviations. Finally, using frequency-based sorting, we manually identify ambiguous terms that belong to more than one semantic group (drug, disease, device, and procedure),47,48 and we suppress their least likely interpretation. For example, “clip” is more likely to be a device than a drug in clinical text, so we suppress the interpretation as “corticotropin-like intermediate lobe peptide” even though clip is listed as its synonym.

Normalizing terms in the patient–feature matrix

Drug prescriptions are identified via the text processing and normalized into active ingredients using relationships from RxNorm (e.g., “tradename_of ”). Therefore, “rofecoxib 12.5 mg oral tablet” and “Vioxx” are normalized to the active ingredient rofecoxib. In addition, we map ingredients to the Anatomical Therapeutic Chemical Classification System, which enables four levels of aggregation, i.e., rofecoxib, celecoxib, and valdecoxib are all cyclooxygenase-2 inhibitors, which are nonsteroidal anti-inflammatory drugs, and so on.

Although drug normalization is fairly straightforward, diseases, devices, and procedures present a challenge. In what we call the two-hop method (Figure 7), we use a query-driven approach to normalize disease, device, and procedure concepts. We start with definitions from the EU- ADR project’s specifications and MedDRA standardized query definitions: for example, for myocardial infarction, we would start with the ICD-9, code 410 (acute myocardial infarction) and 18 different UMLS concept unique identifiers including C0027051 ( myocardial infarction), C0340324 (silent myocardial infarction), and C0155626 (acute myocardial infarction). Starting with these “seed” concepts, we utilize mappings across ontologies and the hierarchical parent–child relationships to expand subsumed entities. Supplementary Data S7 online lists all seed queries and their full expansions.

Figure 7
Two-hop query expansion. The algorithm takes a set of concepts C (solid red) and derives all subconcepts C′ (all red) in each ontology O and then repeats the process only once more for all derived concepts C′ (solid blue) to obtain C′′ ...

We first precompute the transitive closure over all parent–child hierarchies, and we index it such that we can retrieve all ancestors or all descendants of a given concept. Second, the mappings among synonymous terms form an equivalence class to which we assign a unique identifier (similar to the UMLS Metathesaurus concept unique identifiers). Using these two resources, given concepts of interest as a seed query, for example, the 18 concepts for myocardial infarction, we use the mappings to find all canonical identifiers (first hop) and then use the transitive closure to include all subsumed concepts in the query. Next, we repeat the process once more with this expanded set of concepts (second hop). For myocardial infarction, the expansion process yields 470 unique strings. In principle, recursion with a least fixed-point semantics would apply; however, recursion does not work well in practice because of differing abstraction levels among ontologies, which induce cycles. We have found that two hops achieve an adequate balance between soundness and completeness for the current use case.

Recognizing events and exposures

By combining the above procedures (seeding queries using established definitions, normalizing and aggregating terms, and using only positive, present mentions; see Supplementary Data S8 online), we are able to recognize events and exposures with enough accuracy for the drug safety use case. We determine the accuracy of the event identification using a gold-standard corpus from the 2008 i2b2 Obesity Challenge.49 This corpus has been manually annotated by two annotators for 16 conditions and was designed to evaluate the ability of NLP systems to identify a condition present for a patient given a textual note. We extended this corpus by manually annotating each of the events listed in Figures 1 and and33 (see Supplementary Data S3 online).

Using the set of terms corresponding to the definition of the event of interest (see Supplementary Data S7 online) and the set of terms recognized by our annotation workflow in the i2b2 notes, we evaluate the sensitivity and specificity of identifying each of the events (see Supplementary Data S3 online). Overall, our event identification has 74% sensitivity and 96% specificity. Accuracy varies by condition: for example, myocardial infarction has 63% sensitivity and 94% specificity, whereas gallstones have 15% sensitivity and 99% specificity.

Drug recognition is done in a similar manner using strings from RxNorm and an independent study at the University of Pittsburgh, which examined the annotations on 1,960 clinical notes manually50 and estimated over 84% recall and 84% precision for recognizing drugs (R. Boyce, personal communication).

Ordering the features

We use the time stamps for each note to induce a temporal ordering over the recognized concepts on a per-patient basis. We focus on first mentions of concepts and do not use exposure windows or eras. We keep positive, present mentions and ignore negated mentions and family and medical history mentions identified via trigger terms. Therefore, for every patient, the feature matrix contains a temporally ordered list of drugs, diseases, devices, and procedures mentioned in their medical record.

Study Highlights

WHAT IS THE CURRENT KNOWLEDGE ON THE TOPIC?

  • The current state of the art in drug safety surveillance relies either on databases of reported adverse events (such as the Adverse Event Reporting System) or on longitudinal observational data, primarily claims and billing, derived from coded EHR sources.

WHAT QUESTION DID THIS STUDY ADDRESS?

  • In this study, we demonstrate the feasibility of using large amounts of free-text notes as a substrate for performing pharmacovigilance after transforming clinical notes into a deidentified patient–feature matrix coded with standard medical terminologies.

WHAT THIS STUDY ADDS TO OUR KNOWLEDGE

  • We show that by using a large corpus, we can detect single drug–adverse event associations and adverse events associated with drug–drug interactions with high accuracy.

HOW THIS MIGHT CHANGE CLINICAL PHARMACOLOGY AND THERAPEUTICS

  • We argue that drug safety surveillance can be advanced by using this yet untapped data source of clinical notes, which comprise the majority of EHR data available.

Supplementary Material

S1

S5

S7

S8

Acknowledgments

The authors acknowledge support from the National Institutes of Health grant U54-HG004028 for the National Center for Biomedical Ontology. NHS also acknowledges support from NIH grant U54-LM008748. The authors thank Cédrick Fairon for assistance in evaluating UNITEX and Richard Boyce for evaluating drug accuracy.

Footnotes

SUPPLEMENTARY MATERIAL is linked to the online version of the paper at http://www.nature.com/cpt

AUTHOR CONTRIBUTIONS

P.L., A.B.-M., S.V.I, and N.H.S wrote the manuscript. P.L., S.V.I., A.B.-M., and N.H.S. designed the research. P.L., S.V.I., A.B.-M., J.M.M., and N.H.S. performed the research. P.L., S.V.I., A.B.-M., R.H., T.P., and T.A.F. analyzed the data. P.L., S.V.I, A.B.-M, and N.H.S contributed new reagents/analytical tools.

CONFLICT OF INTEREST

The authors declared no conflict of interest.

References

1. Classen DC, et al. ‘Global trigger tool’ shows that adverse events in hospitals may be ten times greater than previously measured. Health Aff (Millwood) 2011;30:581–589. [PubMed]
2. Hug BL, Keohane C, Seger DL, Yoon C, Bates DW. The costs of adverse drug events in community hospitals. Jt Comm J Qual Patient Saf. 2012;38:120–126. [PubMed]
3. Bushardt RL, Massey EB, Simpson TW, Ariail JC, Simpson KN. Polypharmacy: misleading, but manageable. Clin Interv Aging. 2008;3:383–389. [PMC free article] [PubMed]
4. Stang PE, et al. Advancing the science for active surveillance: rationale and design for the Observational Medical Outcomes Partnership. Ann Intern Med. 2010;153:600–606. [PubMed]
5. Friedman CP, Wong AK, Blumenthal D. Achieving a nationwide learning health system. Sci Transl Med. 2010;2(57):57cm29. [PubMed]
6. McClellan M. Drug safety reform at the FDA–pendulum swing or systematic improvement? N Engl J Med. 2007;356:1700–1702. [PubMed]
7. Avorn J, Schneeweiss S. Managing drug-risk information–what to do with all those new numbers. N Engl J Med. 2009;361:647–649. [PubMed]
8. Gagne JJ, et al. Active safety monitoring of newly marketed medications in a distributed data network: application of a semi-automated monitoring system. Clin Pharmacol Ther. 2012;92:80–86. [PMC free article] [PubMed]
9. Schneeweiss S, Rassen JA, Glynn RJ, Avorn J, Mogun H, Brookhart MA. High-dimensional propensity score adjustment in studies of treatment effects using health care claims data. Epidemiology. 2009;20:512–522. [PMC free article] [PubMed]
10. Coloma PM, et al. Electronic healthcare databases for active drug safety surveillance: is there enough leverage? Pharmacoepidemiol Drug Saf. 2012;21:611–621. [PubMed]
11. Melton GB, Hripcsak G. Automated detection of adverse events using natural language processing of discharge summaries. J Am Med Inform Assoc. 2005;12:448–457. [PMC free article] [PubMed]
12. Wang X, Hripcsak G, Markatou M, Friedman C. Active computerized pharmacovigilance using natural language processing, statistics, and electronic health records: a feasibility study. J Am Med Inform Assoc. 2009;16:328–337. [PMC free article] [PubMed]
13. Hennessy S, Flockhart DA. The need for translational research on drug-drug interactions. Clin Pharmacol Ther. 2012;91:771–773. [PMC free article] [PubMed]
14. Harpaz R, DuMouchel W, Shah NH, Madigan D, Ryan P, Friedman C. Novel data-mining methodologies for adverse drug event discovery and analysis. Clin Pharmacol Ther. 2012;91:1010–1021. [PMC free article] [PubMed]
15. Nadkarni PM. Drug safety surveillance using de-identified EMR and claims data: issues and challenges. J Am Med Inform Assoc. 2010;17:671–674. [PMC free article] [PubMed]
16. Chapman WW, Nadkarni PM, Hirschman L, D’Avolio LW, Savova GK, Uzuner O. Overcoming barriers to NLP for clinical text: the role of shared tasks and the need for additional creative solutions. J Am Med Inform Assoc. 2011;18:540–543. [PMC free article] [PubMed]
17. Radecki RP, Sittig DF. Application of electronic health records to the Joint Commission’s 2011 National Patient Safety Goals. JAMA. 2011;306:92–93. [PubMed]
18. Morrison FP, Li L, Lai AM, Hripcsak G. Repurposing the clinical record: can an existing natural language processing system de-identify clinical notes? J Am Med Inform Assoc. 2009;16:37–39. [PMC free article] [PubMed]
19. Graham DJ, et al. Risk of acute myocardial infarction and sudden cardiac death in patients treated with cyclo-oxygenase 2 selective and non-selective non-steroidal anti-inflammatory drugs: nested case-control study. Lancet. 2005;365:475–481. [PubMed]
20. Brownstein JS, Sordo M, Kohane IS, Mandl KD. The tell-tale heart: population-based surveillance reveals an association of rofecoxib and celecoxib with myocardial infarction. PLoS ONE. 2007;2:e840. [PMC free article] [PubMed]
21. Lependu P, Iyer SV, Fairon C, Shah NH. Annotation analysis for testing drug safety signals using unstructured clinical notes. J Biomed Semantics. 2012;3 (suppl 1):S5. [PMC free article] [PubMed]
22. Ryan PB, Madigan D, Stang PE, Overhage JM, Racoosin JA, Hartzema AG. Empirical assessment of methods for risk identification in healthcare data: results from the experiments of the Observational Medical Outcomes Partnership. Stat Med. 2012;31:4401–4415. [PubMed]
23. Phansalkar S, et al. Drug-drug interactions that should be non-interruptive in order to reduce alert fatigue in electronic health records. J Am Med Inform Assoc. 2012 e-pub ahead of print 25 September 2012. [PMC free article] [PubMed]
24. Halevy A, Norvig P, Pereira F. The unreasonable effectiveness of data. Intelligent Systems, IEEE. 2009;24 (2):8–12.
25. Domingos P. A few useful things to know about machine learning. Commun ACM. 2012;55 (10):78–87.
26. Bate A, Evans SJ. Quantitative signal detection using spontaneous ADR reporting. Pharmacoepidemiol Drug Saf. 2009;18:427–436. [PubMed]
27. Aronson AR, Lang FM. An overview of MetaMap: historical perspective and recent advances. J Am Med Inform Assoc. 2010;17:229–236. [PMC free article] [PubMed]
28. D’Avolio LW, Nguyen TM, Goryachev S, Fiore LD. Automated concept-level information extraction to reduce the need for custom software and rules development. J Am Med Inform Assoc. 2011;18:607–613. [PMC free article] [PubMed]
29. Savova GK, et al. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. J Am Med Inform Assoc. 2010;17:507–513. [PMC free article] [PubMed]
30. LePendu P, Liu Y, Iyer S, Udell M, Shah NH. Analyzing patterns of drug use in clinical notes for patient safety. AMIA Summit on Clinical Research Informatics; San Francisco, CA. 21–23 March 2012. [PMC free article] [PubMed]
31. Liu Y, LePendu P, Iyer S, Udell M, Shah NH. Using temporal patterns in medical records to discern adverse drug events from indications. AMIA Summit on Clinical Research Informatics; San Francisco, CA. 21–23 March 2012. [PMC free article] [PubMed]
32. Schuemie MJ, et al. Using electronic health care records for drug safety signal detection: a comparative evaluation of statistical methods. Med Care. 2012;50:890–897. [PubMed]
33. Hauben M, Bate A. Decision support methods for the detection of adverse events in post-marketing data. Drug Discov Today. 2009;14:343–357. [PubMed]
34. Lowe HJ, Ferris TA, Hernandez PM, Weber SC. STRIDE–an integrated standards-based translational research informatics platform. AMIA Annu Symp Proc. 2009;2009:391–395. [PMC free article] [PubMed]
35. Trifirò G, et al. EU-ADR group. Data mining on electronic health record databases for signal detection in pharmacovigilance: which events to monitor? Pharmacoepidemiol Drug Saf. 2009;18:1176–1184. [PubMed]
36. OMOP. Detection of Long Term Adverse Drug Reactions in Electronic Health Data, 2012. 2012 < http://omop.fnih.org/sites/default/files/Schuemie_Detection%20of%20ADRs_Protocol%202011.pdf>.
37. Bauer-Mehren A, et al. Automatic filtering and substantiation of drug safety signals. PLoS Comput Biol. 2012;8:e1002457. [PMC free article] [PubMed]
38. Kuhn M, Campillos M, Letunic I, Jensen LJ, Bork P. A side effect resource to capture phenotypic effects of drugs. Mol Syst Biol. 2010;6:343. [PMC free article] [PubMed]
39. Hanauer DA, Ramakrishnan N. Modeling temporal relationships in large scale clinical associations. J Am Med Inform Assoc. 2012;20:332–341. [PMC free article] [PubMed]
40. Sekhon JS. Multivariate and propensity score matching software with automated balance optimization: the matching package for R. J Stat Softw. 2011;42 (7):1–52.
41. Shah NH, Bhatia N, Jonquet C, Rubin D, Chiang AP, Musen MA. Comparison of concept recognizers for building the Open Biomedical Annotator. BMC Bioinformatics. 2009;10 (suppl 9):S14. [PMC free article] [PubMed]
42. Harkema H, Dowling JN, Thornblade T, Chapman WW. ConText: an algorithm for determining negation, experiencer, and temporal status from clinical reports. J Biomed Inform. 2009;42:839–851. [PMC free article] [PubMed]
43. Paumier S. De la reconnaissance de formes linguistiques à l’analyse syntaxique. Université de Marne-la-Vallée; 2003.
44. Demner-Fushman D, Mork JG, Shooshan SE, Aronson AR. UMLS content views appropriate for NLP processing of the biomedical literature vs. clinical text. J Biomed Inform. 2010;43:587–594. [PMC free article] [PubMed]
45. McCray AT, Bodenreider O, Malley JD, Browne AC. Evaluating UMLS strings for natural language processing. Proc AMIA Symp. 2001:448–452. [PMC free article] [PubMed]
46. Xu R, Musen MA, Shah NH. A comprehensive analysis of five million UMLS metathesaurus terms using eighteen million MEDLINE citations. AMIA Annu Symp Proc. 2010;2010:907–911. [PMC free article] [PubMed]
47. Wu ST, et al. UMLS term occurrences in clinical notes: a large scale corpus analysis. JAMIA. 2012;19:e149–e156. [PMC free article] [PubMed]
48. Bodenreider O, McCray AT. Exploring semantic groups through visual approaches. J Biomed Inform. 2003;36:414–432. [PMC free article] [PubMed]
49. Uzuner O. Recognizing obesity and comorbidities in sparse data. J Am Med Inform Assoc. 2009;16 (4):561–570. [PMC free article] [PubMed]
50. Marshall MS, et al. Emerging practices for mapping and linking life sciences data using RDF—a case series. Web Semantics: Science, Services and Agents on the World Wide Web. 2012;14:2–13.