We have demonstrated that adverse drug events as well as adverse events associated with drug–drug interactions can be detected using a deidentified patient–feature matrix extracted from free-text clinical documents. Blumenthal and others5
envision a scenario in which a new drug comes to market and a nationwide learning system monitors for safety signals. Our results show that deidentified clinical notes can be used to generate drug safety signals—taking a step toward such a scenario. In addition, the patient–feature matrix also provides prevalence data not available from other data sources (e.g., spontaneous reports). Having such prevalence information can assist in prioritizing actionable events and reducing alert fatigue.23
Our approach to processing clinical notes is simple in comparison with advanced natural language processing (NLP) systems that may have better accuracy in identifying nuanced attributions of disease conditions. We sacrifice some individual note-level accuracy in exchange for the ability to detect population-level trends against massive data sets. Our results, based on a reference set of known drug–event pairs, show that when exposure data are numerous enough, the use of relatively simple text mining with standard association strength tests for signal detection can work, reflecting the adage in the machine-learning community that “a dumb algorithm with lots of data beats a clever one with modest amounts of it.”24,25
When used in combination with other data sets, clinical notes may address cases that otherwise pass undetected. We sacrifice sensitivity for specificity because for a new approach, and a new data source (clinical notes), keeping false-discovery rates low is important, particularly in the initial stages of establishing feasibility.
We find that ontologies are an excellent source of features and allow systematic normalization and aggregation when the feature set needs reduction.15,26
For example, we can count all patients who experience cardiac arrhythmias as patients with arrhythmias because of the hierarchical relationships. Therefore, ontology hierarchies can organize a very large number of terms into a smaller feature set. Moreover, because names, dates, and locations are not present in the clinical terminologies, those are not extracted as features by dictionary-based methods.18,27
We believe that the information embedded in text is crucial for leveraging EHR data,10,13,14
particularly for rare events for which large amounts of data are needed. Our annotation-based approach produces a feature matrix that complements other structured data such as codes from the ICD-9. Of note, our methods are not dependent on any particular NLP tool (we contrast MGREP and UNITEX in the Methods section), and we expect results to improve given the availability of better and faster clinical NLP tools.28,29
We are currently collaborating with researchers at the Mayo Clinic to improve the speed of the clinical Text Analysis and Knowledge Extraction System,29
one of the state-of-the-art NLP tools available for clinical text. Broader availability of curated clinical NLP data sets and health outcome definitions would accelerate research and validation.
Our work has several limitations and opportunities for improvement. Not all conditions are equally identifiable from text using lexical approaches (Supplementary Data S3
online reports validation results by condition). Advanced NLP tools would improve accuracy in these cases. Biases in our reference set—although among the largest used for such a study— affect our performance estimation. A new reference standard covering four events has just recently been released by the Observational Medical Outcomes Partnership,4
and we are currently evaluating its utility. Some adverse drug events are dose dependent, and our methods currently ignore this information. The UNITEX tool, described in the Methods section, includes libraries for dosage extraction and thus is a logical next step. We do not distinguish between new users of drugs and existing or chronic ones. Our methods have a limited ability to define eras (durations of medication and illness). We are currently examining the annotation data for the utility of the last mention of a concept, sentence-level co-occurrences, and temporal density of mentions to address this question. The majority of our findings are based on the Stanford Hospital and Clinics, which is a tertiary-care center representing a skewed population. At the same time, this population has added utility for investigating rare events. Variations in signaling thresholds can also occur as a result of the prevalence or rarity of an event,10
and more research is needed to adapt detection algorithms accordingly. The prevalence data estimated in studies such as ours are an important step in this direction.10
Finally, we note that the Observational Medical Outcomes Partnership group suggests that no single method works best uniformly, that different methods be considered for each event and data source, and that profiling performance via receiver operating characteristic curves assists in understanding the utility of a method or data source.4
To conclude, our method extracts from textual clinical notes a deidentified patient–feature matrix encoded using standardized medical terminologies. We have demonstrated the use of the resulting patient–feature matrix as a substrate for detecting single drug–adverse event associations (AUC of 80.4%) and for detecting adverse events associated with drug–drug interactions (AUC of 81.5%), illustrating that clinical notes can be a source for detecting drug safety signals at scale.15
The patient–feature matrix can also be used to learn off-label usage30
and to discern drug adverse events from indications.31
Using the textual contents of the EHR complements efforts using billing and claims data or spontaneous reports4,8,14,32,33
and opens up new opportunities for leveraging observational data.