|Home | About | Journals | Submit | Contact Us | Français|
Evaluating medications for potential adverse events is a time-consuming process, typically involving manual lookup of information by physicians. This process can be expedited by CDS systems that support dynamic retrieval and filtering of adverse drug events (ADE’s), but such systems require a source of semantically-coded ADE data. We created a two-component system that addresses this need. First we created a natural language processing application which extracts adverse events from Structured Product Labels and generates a standardized ADE knowledge base. We then built a decision support service that consumes a Continuity of Care Document and returns a list of patient-specific ADE’s. Our database currently contains 534,125 ADE’s from 5602 product labels. An NLP evaluation of 9529 ADE’s showed recall of 93% and precision of 95%. On a trial set of 30 CCD’s, the system provided adverse event data for 88% of drugs and returned these results in an average of 620ms.
Reviewing the potential adverse events of prescribed medications is a necessary and common clinical task for physicians.1–3 Performing this review can be time consuming, however, particularly when patients are taking multiple medications.4,5 Drug information resources typically require that medications be looked up manually (e.g., via search box) and will return the results as a text document. This process carries two notable limitations. First, data for multiple medications cannot be retrieved simultaneously, a technique has been shown to improve the efficiency of ADE review.6 Second, the adverse events are not machine-interpretable and thus cannot be easily integrated into clinical decision support systems.
At present, no freely available comprehensive source of semantically coded ADE data exists. The National Drug File – Reference Terminology (NDFRT) provides information on the “physiologic effect” of medications but does not offer comprehensive adverse event data.7 The recently released SIDER project has compiled side-effects in structured form using the Coding Symbols for a Thesaurus of Adverse Reaction Terms (COSTART) terminology.8
While a useful reference, SIDER has its roots in molecular biology and is not optimized for clinical use. It contains a limited set of medications and receives only periodic updates.9 Furthermore, it does not provide services for integration of its side-effect data into external applications.
Web services have proven an effective means of delivering clinical decision support within and across institutions.10–12 Such decision support services (DSS) typically consume patient information in a structured format, perform rule-based or other logical inferences, and return patient-specific information in a format consumable by the end-user or end-application. With their component-based architecture, DSS’s can be integrated into multiple applications without limitations as to platform or programming language.
In this paper, we introduce a knowledge base and decision support service designed to address the limitations of current ADE information resources. We first describe a natural language processing tool used to create and maintain this knowledge base. We then describe the implementation and performance evaluation of a decision support service that returns patient-specific adverse drug event data upon receipt of a Continuity of Care Document.
To create the ADE knowledge base, we developed a natural language processing tool known as the Structured Product Label Information Coder and ExtractoR (SPLICER). The application’s core functions are to extract adverse reaction terms from the Structured Product Label (SPL) and to map these terms to the Medical Dictionary of Regulatory Activities (MedDRA). SPLICER is a rule-based natural language processing (NLP) system that uses a series of regular expressions and algorithms to identify and extract data from SPL’s. Details of these algorithms are beyond the scope of this paper, but the tool’s general architecture is described below.
SPLICER is a Java application consisting of three modules, each designed to perform a subset of the overall task of extracting adverse event information from the label (Figure 1).
SPLICER’s first module cleans and preprocesses the SPL before ADE extraction begins. The SPL is an Extensible Markup Language (XML)-based document and contains a number of tags and formatting characters. The Parser module uses these tags to identify product label sections such as Adverse Reactions, Precautions, and Boxed Warnings. The module then deconstructs the tables and paragraphs within each section into a series of individual sentences and words. All unnecessary XML tags and formatting characters are removed.
The second module comprises three individual “processors” which extract raw ADE information from the SPL. The first processor looks at punctuation patterns to detect sentences likely containing lists of adverse events and captures the ADE’s within these lists. The second processor identifies tables based on XML tags and retrieves the adverse event information for each row in the table. This processor captures frequency data for both drug and placebo where available. The third processor performs a final review to check for any MedDRA terms that have not been captured by the previous approaches and filters out ADEs identified as indications from the SPL’s Indications section.
Once all raw ADE terms have been extracted from the label, SPLICER’s third module attempts to map these phrases to an appropriate MedDRA concept. The mapping is performed by a series of algorithms, beginning with a search for exact matching terms and synonyms. Should these initial mapping efforts fail, SPLICER will ‘stem’ the raw ADE (e.g. ‘rashes’ becomes ‘rash’) and compare with a similarly stemmed MedDRA dictionary. Terms are also tokenized to account for variations in word order. If no match has been found, the system performs transformations on the raw ADE’s to make them more compatible with conventions used in MedDRA. These adjustments include replacing phrases (e.g. “injection site redness” to “local redness”), switching term order and tense (e.g. “swollen legs” to “leg swelling”), and unifying variant spellings (e.g. “lupus erythematoses” to “lupus erythematosus”).
Once all adverse events have been extracted and mapped to MedDRA lower-level terms, we invoke the MedDRA hierarchy to aggregate these concepts under preferred terms (e.g., “hyponatremia” and “sodium depletion” are both mapped to “hyponatraemia”). Use of preferred terms allows similar reactions under different names to be described consistently throughout the knowledge base. Additionally, we utilize the Unified Medical Language System (UMLS) Metathesaurus to generate mappings between extracted MedDRA terms and Systematized Nomenclature of Medicine—Clinical Terms (SNOMED-CT) concepts.
To build the ADE knowledge base, we processed all available Structured Product Labels (n=5782) on DailyMed13 as of 12/17/2009. For evaluation of NLP performance, a random set of 100 labels was selected and sent for manual review by a third-party. After comparison with the original SPL, the reviewer coded ADE’s as either true positives, false positives, or false negatives. System recall, precision, and F-measure were then calculated.
Structured Product Labels were mapped to RxNorm Concept Unique Identifiers (RxCUI’s) in order to facilitate matching medication lists to the extracted adverse event data. This mapping process relied on RxNorm14 and was done in two ways. First, we matched the SPL’s <setId root> to the SPL_SET_ID attribute found in RxNorm. Secondly, we mapped the SPL’s <containerPackagedProduct> code to RxNorm’s NDC codes. The resultant RxCUI’s from these mappings were then stored in a MySQL database for integration with the DSS.
The goal of our decision support service, known as ADESSA, was to allow rapid access to comprehensive adverse event data in a standards-based fashion. As such, we selected as our primary input the Continuity of Care Document (CCD), a patient summary specification based on the Health Level 7 (HL7) Clinical Document Architecture. CCD’s play an important role in health information exchange and are a common means of data transmission at our institution. Additionally, an increasing number of vendors, prompted by certification requirements, are supporting CCD generation and export from their medical record systems.
Despite the growing role of CCD’s, however, the documents are not yet ubiquitous. Furthermore, ADESSA requires a semantically coded (‘Level 3’15) CCD for processing, and many systems are not yet able to produce this more complex CCD format. Thus to support these clients, ADESSA also accepts medication lists as a simple string of RxCUI’s. This provision will allow even extremely lightweight applications to query ADESSA’s knowledge base.
In terms of the response returned by our system, we designed the data outputs to fit two potential roles in clinical decision support. The first role is as a simple conduit to display a list of side-effects to the end user. In this context, the MedDRA terminology itself, with its natural phrasing of adverse reactions, is the optimal output. The second role is that of a more integrated resource, connecting patient-level clinical data to adverse drug events. For this context, SNOMED-CT, as a standard terminology used in clinical settings, is preferable. Thus, we created a parameter to select either MedDRA or SNOMED-CT as the output vocabulary.
The DSS was built using the Simple Object Access Protocol (SOAP) specification, the protocol recommended for decision support services by the HL7 Healthcare Services Specification Project.16 The service was implemented using PHP/MySQL and running on an Apache HTTP server. This initial implementation was designed for use only within our institution’s firewall and does not yet accept requests from outside clients.
CCD’s submitted to the server are traversed via XML Path Language, and drug RxCUI’s are extracted from the CCD’s Medications section (example shown in Figure 2). Alternatively, SOAP requests containing only RxCUI’s are parsed to their constituent components. In both cases, the final set of RxCUI’s is passed to the ADE knowledge base, where the adverse event list is generated based on the requested output format (MedDRA or SNOMED-CT).
Example output is shown in Figure 3. Each medication is returned with an RxNorm identifier and a display name, followed by a list of adverse events annotated with a code and code system identifier.
The DSS was tested using a sample set of 30 randomly selected anonymized CCD’s from our institution. We measured time to deliver results from submission of the document until the web service response was displayed. Time measurements were performed via the Firebug browser plug-in.17 The resultant adverse event list for each CCD was then compared with the original document to determine the accuracy of drug capture.
SPLICER successfully processed 5602 Structured Product Labels representing 1706 distinct ingredients. SPLICER was unable to process 180 SPL’s due to XML formatting variations. Average processing time was under 1 minute per label without use of parallel computing. In total 534,125 adverse events were extracted, comprising 8709 MedDRA lower level terms. These terms were aggregated into 3667 distinct MedDRA preferred terms.
The NLP evaluation set consisted of 100 labels from which SPLICER extracted and mapped 9529 ADE’s. Of these, 9064 were true positives (identified by the third party reviewer as present in the label) and 465 were false positives (either not found in the label or not a true ADE). The reviewer found an additional 706 adverse events that were present in the label but not found by SPLICER. Based on these results, SPLICER demonstrated a recall of 92.8% and a precision of 95.1%. F-measure was 0.94.
The most common causes of false negatives were: mapping error (raw term extracted correctly but unsuccessfully mapped to MedDRA), incorrect filtering out of true ADEs as indications, and missed directionality of laboratory abnormalities (e.g., “decreased potassium” reported only as “potassium”). The most common causes of false positives were: mapping error (assignment of a MedDRA concept beyond the meaning of the original ADE), identification of a patient descriptor as an ADE (e.g. “patients with diabetes” captured as “diabetes”), and homonyms (e.g., “fall in hemoglobin” captured as “falls”).
Performance of the decision support service showed a mean response time of 620ms seconds from CCD submission to results display. The number of drugs per CCD ranged from 4 to 37 with a mean of 12.2. In total, 181 distinct RxCUI’s were represented. The service captured 157 (88%) of the RxCUI’s correctly and missed 24 (12%). Review of the original CCD’s showed two causes for failed capture. First, 7 CCD’s contained only National Drug Codes (NDC’s) for medications and did not contain any RxCUI’s. The remaining failures were due to unsuccessful mapping of the extracted RxCUI to a Structured Product Label. In all cases, the SPL for the referenced drug was actually available in the knowledge base but could not be correctly linked to the RxCUI.
In this study, we demonstrated a successful method of extracting and delivering semantically coded adverse drug event data for use in clinical decision support. The developed service retrieves information rapidly and supports queries on multiple drugs simultaneously. Furthermore, it leverages the Continuity of Care Document, an existing standard for communication of patient summary data, thus supporting potential cross-institutional use. Once such functionality is implemented, ADESSA will serve as a resource to the broader informatics community by facilitating creation of ADE-related decision support tools.
Our system has several additional advantages. By utilizing a natural language processing solution, we are able to scale and maintain the knowledge base in a manner not possible with manual curation. At under a minute per label (less with parallel processing), SPLICER’s speed supports real-time updates of new or changed labels as they are announced by the FDA. Furthermore, the structured format of our data allows querying for patterns in adverse events along multiple dimensions such as medication class or type of reaction. A forthcoming study will explore these relationships further.
The decision support service offers simple integration of patient-specific adverse event data into clinical decision support. This integration may be manifested in several ways. At the most basic level, a CDSS may display a list of ADE’s associated with a patient’s medications. While not substantively different from manually reviewing a drug label, utilizing ADESSA allows these data to be retrieved automatically without the need for manual physician input. A more sophisticated application would allow physicians to perform dynamic filtering of ADE’s, looking for specific reactions or categories of interest. By aggregating events under MedDRA preferred terms and thus using consistent terminology, our system presents ADE’s in a manner efficient for comparison across drugs. Such aggregation also supports visualization of complex adverse event profiles as we have demonstrated in our previous research.6
Another use of ADESSA’s semantic ADE data is integration with patient-specific clinical information. By providing a SNOMED-CT code for each adverse event, ADESSA facilitates linking these conditions to the patient record. Should a patient develop hyponatremia for example, a CDS system using our service could automatically highlight those medications known to cause this abnormality. Similarly, if a patient with a prior seizure history is placed on a medication that has the potential to lower seizure threshold, the system could alert the physician to this increased risk.
We encountered several challenges in the development of this system that should inform future work. First, while our natural language processing application SPLICER demonstrated an overall high rate of precision and recall, given the database’s size, many errors will still exist. Thus, SPLICER’s results should be used with some caution. Additionally, we encountered 180 labels that could not be correctly processed. Manual review of these labels’ headings showed that they were largely generic duplicates of other medications already in the database. Nevertheless, we will be making adjustments to SPLICER to accommodate these SPL variants.
The decision support service is not yet fully optimized. Our study revealed that while the knowledge base was comprehensive, failures still occurred in linking medications in the patient record to their respective product labels. We relied on RxCUI’s as the mapping point between the Continuity of Care Document and our database, but encountered a number of mapping errors. Some of these errors can be traced back to the CCD-generator at our institution, which itself utilizes a map to transform NDC’s to RxCUI’s. In some cases, this generator may be unable to map a particular NDC or may apply an older RxCUI. Thus, the multiple rounds of mapping may have resulted in some information loss. An alternative strategy would be to use NDC’s rather than RxCUI’s as the primary link between the CCD and our knowledge base. This change would remove one layer of translation and hopefully improve medication capture rates.
The other limitation of our DSS is that its security infrastructure does not yet permit consumption of CCD’s from outside institutions. Other ongoing projects in cross-institutional decision support, such as the Clinical Decision Support Consortium18, utilize secure tunneling to pass patient data. While such solutions are not yet available for ADESSA, an interim approach would be to accept only RxCUI strings from outside institutions while maintaining capabilities for CCD consumption internally.
Adverse drug event data can be efficiently extracted from product labels and distributed in a semantically coded form. We have developed a service to provide these data in a real-time patient-specific manner. Future work will focus on improving the extraction process, optimizing medication mapping, and evaluating the use of this service in a clinical decision support system. Additionally, we intend to make ADESSA publicly available for cross-institutional use.
This work was performed at the Regenstrief Institute and supported by grant 5T 15 LM007117 from the National Library of Medicine.