Our findings demonstrate that the system we proposed is feasible for pharmacovigilance. We were able to identify known drug-ADE with a performance of 75% for recall and 31% for precision. More importantly, our historic roll back experiments indicated that the system can potentially detect new ADEs prospectively.
The current study incorporates several important features. First, while other studies have focused on structured and coded data, this work took a completely different route using narrative data as the starting point for pharmacovigilance. The application of NLP unlocks rich information occurring in narrative reports. Data mining algorithms in pharmacovigilance miss important clinical data that is relevant for pharmacovigilance. Some important ADEs such as “fever” and “feeling suicidal” are generally only available in the narrative EHR reports. Some events, such as nose bleeding
may occasionally be available as structured data (ICD9 code: 784.7), but documentation for this symptom may be found more frequently in clinical notes. The ability of NLP systems to extract a broad variety of events from clinical reports provides a valuable access to clinical information that would not be available otherwise. The work could be extended to combine structured with unstructured data. For example, structured data consisting of abnormal laboratory results and pharmacy orders would provide complementary information for detection of ADEs. Second, other studies analyzed associations in their databases using DPA, regression or other methods, where this study focused on the association statistics adjusted with volume tests, which have been shown to provide more clinically meaningful cutoffs for clinical associations. 48,51
Third, while work in current pharmocovigilance practice focused on retrospective investigations, our system highlights the potential for prospective surveillance which detects novel ADEs automatically and actively.
It is, however, a constant challenge to accurately identify and evaluate safety signals in a timely manner for pharmacovigilance. One factor affecting precision could be that some adverse events were detected correctly but they have not yet become known. Another factor is due to temporality and dependencies of clinical events. To infer causal associations between drugs and potential ADEs, it is important to recognize temporal sequences between drugs and these potential ADEs. In this investigation, we tackled the problem of temporality in text using an extremely simple contextual filter consisting of the sections in the discharge summary where the information was found. The strategy was somewhat successful since we were able to detect 75% of the known-ADEs whereas without that strategy we almost always detected indication associations. By contrast, the precision was only 31% because 63% of the associations (30% of the indication associations and 33% of the remote indication associations) in our qualitative analysis should have been eliminated but were not. Most of the false positives classified as indications were caused by two types of confounding information. One type was related to the diseases and indications the patient had because their associations with the medications were therapeutic associations and not side effects (e.g., “Paroxetine-dizziness”). This showed that our strategy for handling temporal information was not effective enough. Another type of confounder was due to indirect associations which occurred when a medication used to treat a particular disease and manifestations of the disease formed statistically significant pairs (e.g., “Paroxetine-hallucinations”).
We have experimented with drugs which are used to treat the same diseases but have different safety profiles to differentiate ADEs from possible indication confounders. Our preliminary data have shown that rosiglitazone was associated with heart failure symptoms while other diabetes drugs tested (metformin, glipizide) were not. This does not, however, confirm that these heart failure symptoms are actually ADEs rather than treatment indications. Rosiglitazone may be more likely to be prescribed to severe and late stage diabetic patients, and these patients might be likely to develop comorbidities, such as heart disease, more often than patients on the other drugs. Stratification may solve the problem but it may be challenging. In addition, variable selection may have to be decided externally by an expert.
Inspired by work in Bioinformatics of characterizing interactions between genes, we applied mutual information (MI) and its property of data processing inequality (DPI) to help differentiate the direct and indirect types associations between clinical entities. This information theoretical approach using MI and DPI showed some promise for reducing false positives due to indirect associations, and is the focus of another paper. 52
As a further line of research, more sophisticated statistical methods that are able to account for the structure of a large database, and which are extensions of the proposed methodology, can be devised to differentiate between the different types of associations observed. This will also involve use of more sophisticated temporal models, use of information from other sources of clinical data, such as medication mentions in prior notes and prescription information. Additionally, use of other sources of knowledge, such as DXPlain or QMR will be explored. 53,54
Another challenge related to our methods concerns the granularity of the codes corresponding to diseases and symptoms. The disadvantage of using highly granular terms is the dilution of the signals among medically similar ADEs. For example, there are more than 150 codes for cough in the UMLS corresponding to highly specific terms such as “cough on exercise”, “postural cough”, “brassy cough”, and “increased frequency of cough”. In this study, the modified version of MedLEE, which excluded some of modifiers, worked well for symptoms but this issue needs to be explored further. A commonly used coding system in the pharmacovigilance field is the Medical Dictionary for Regulatory Activities (MedDRA) which contains five hierarchical levels. 55,56
The level of “preferred term” is often used in pharmacovigilance systems because it is considered the appropriate level of granularity for that purpose. Similarly, granularity information in other knowledge sources in the UMLS (e.g., parent/child relations in MRREL, the UMLS knowledge source specifying relationships among entities) should also be helpful in meeting the challenge. Development of statistical approaches should also be considered. For example, Berry and Berry cleverly applied Bayesian methods to “borrow strength” and enhance diluted “signals” across multiple similar ADEs. 57
In subsequent studies, we will explore using MedDRA, the UMLS, and statistical approaches to help solve the granularity problem.
Our study had several limitations. Some of the limitations were caused by UMLS codes that were not well-defined. For example, a code “thicken” was an entity that was frequently extracted by MedLEE and then subsequently statistically associated with one of the drugs. A manual review indicated that some concepts, such as “thickened sigmoid” and “thickened valve leaflet” were encoded correctly as combinations of “thicken” with body location qualifiers but there were no individual UMLS codes that corresponded to either of the two complete concepts. Therefore, the single and poorly defined concept “thicken” was used. Another limitation is that, for our initial study, we included narrative reports for inpatients. As a result, our findings should be understood in the context of a sick patient population, which affected our results in several ways. First, the details in the documentation may be different. For example, “temperature increased slightly” might be less documented for outpatients than for inpatients because an increased temperature could be perceived as being more burdensome for inpatients because they are sicker. Second, inpatients may be more prone to have ADEs due to their weakened conditions and because they are more likely to be taking multiple medications. However, this limitation is due to the type of reports we focused on and not the methodology. In subsequent studies we will adapt our methods to a corpus consisting of multiple outpatient visits as well as hospital admissions, so the information relating to a patient will span a longer period and so that we will obtain a more varied patient population including healthier patients. However, that will likely introduce new challenges. A further limitation of this investigation is the fact that the reference standard was obtained using only one expert, and only seven drugs were tested. A more comprehensive evaluation will be undertaken in our future work.
Although the methods discussed in this work focuses on pharmacovigilance, the same methodology could readily be broadened to include adverse events associated with vaccines, devices, and procedures. Extension to these patient safety domains would involve using MedLEE to extract these types of events rather than just drug events, and using similar statistical processes to determine signals, although determination of the thresholds would have to be optimized. In future research, we will experiment with such an extension.
While our study proves the feasibility of our method more work is needed to establish our method as a surveillance system. The success of this system relies on the following two fundamental components of the proposed method: (1) use of NLP to generate useful data, and (2) creation of a statistical methodology that successfully deals with the data obtained. Both of these components are important and supplement each other. In particular, the identification of the appropriate threshold(s) is at the heart of successfully extending this feasibility paper to a real time pharmacovigilance system. What we have used in this study, which is based on previously published work, is an operational threshold that provided reasonable results. In our future work, we will be experimenting with a variety of ideas for identifying and evaluating this threshold. If we are successful, precision should be significantly improved.