|Home | About | Journals | Submit | Contact Us | Français|
We describe a novel experiment that we conducted with the Drug Interaction Knowledge-base (DIKB) to determine which combinations of evidence enable a rule-based theory of metabolic drug-drug interactions to make the most optimal set of predictions. The focus of the experiment was a group of 16 drugs including six members of the HMG-CoA-reductase inhibitor family (statins). The experiment helped identify evidence-use strategies that enabled the DIKB to predict significantly more interactions present in a validation set than the most rigorous strategy developed by drug experts with no loss of accuracy. The best-performing strategies included evidence types that would normally be of lesser predictive value but that are often more accessible than more rigorous types. Our experimental methods represent a new approach to leveraging the available scientific evidence within a domain where important evidence is often missing or of questionable value for supporting important assertions.
Our research focuses on how to best utilize drug-mechanism knowledge to help drug-interaction knowledge-bases expand their coverage beyond what has been tested in clinical trials while avoiding prediction errors that occur when individual drug differences are not recognized. Drug mechanism knowledge presents difficult informatics challenges because it is often dynamic, frequently uncertain, and sometimes missing. A previous publication by our group describes a novel evidential knowledge-representation approach that we designed to address these issues . We have implemented the approach in a system called the Drug Interaction Knowledge Base (DIKB) and have used the system to represent drug-mechanism evidence for 16 drugs including six members of the HMG-CoA-reductase inhibitor family (statins) .
An intriguing feature of the DIKB’s knowledge-representation approach is that it enables exploration of the empirical prediction accuracy of a rule-based theory using many sub-sets of a given body of evidence. We tested the usefulness of this feature for integrating information from basic science, clinical research, and authoritative statements for the purpose of making drug-drug interaction (DDI) predictions. This paper briefly reviews the knowledge-representation approach implemented in the DIKB, details the methods and results of our novel experiment, and then discusses its implications for informatics and drug safety.
The DIKB is a knowledge representation system designed to predict DDIs using drug mechanisms. A key component of the system is a rule-based theory of how drugs interact by metabolic inhibition. The system’s knowledge-base contains assertions about specific entities such as drugs, drug metabolites, and enzymes whose relationships with each other are more generally modeled in the rule-based theory. DIKB maintainers place evidence for, or against, each assertion in the system’s knowledge-base in an evidence-base that is kept current by an editorial board. Before each evidence item is entered into the system, maintainers identify its type from a biomedical evidence taxonomy designed to have sufficient coverage of the kinds of evidence relevant for supporting or refuting drug mechanism assertions . They then confirm that the evidence item meets explicitly defined inclusion criteria that apply to all evidence of its type.
The DIKB distinguishes between assertion instances and assertion types. An assertion instance is a clear statement of some property or relationship belonging to one or more entities while an assertion type is the general form of all such instances. For example, the generic (X substrate-of Y) is an assertion type describing a general relationship between an abstract drug and an abstract enzyme. Instances of this assertion type might include assertions such as (carbamazepine substrate-of CYP3A4) and (s-warfarin substrate-of CYP2C9). Expert users map their confidence in drug-mechanism assertions by first defining combinations of evidence types from the system’s evidence taxonomy that might support or refute instances of each assertion type. They then rank the evidence-type combinations by the relative amount of confidence that they would have in an assertion instance of the given assertion type if it were supported by the types of evidence present in the definition. We call such rank-ordered combinations of evidence types levels-of-evidence (LOEs).
For each assertion type in the system, expert users define two, possibly identical, sets of LOEs; one for the types of evidence that can support an assertion type, the other for the types of evidence that refute it. They then select one LOE from each set of LOEs as a belief criterion. Evidence against an assertion dominates evidence for it hence, a query of the DIKB’s knowledge-base for valid drug-mechanism assertions will return only those assertions whose body of evidence for satisfies the belief criterion for supporting evidence and whose body of evidence against does not satisfy the belief criterion for refuting evidence. In this way, the system makes DDI predictions using only those assertions considered current by the system’s maintainers and believable by users.
When drug experts define and rank LOEs or choose belief criteria, they are making subjective judgements about the inferential force of an abstract body of evidence. An important question is whether the experts’ choices have any relationship to empirical measures of the system’s performance. Figure 1 defines four metrics relevant to evaluating the DIKB — sensitivity, specificity, coverage, and accuracy. Our hypothesis was that the system would have poorer coverage but greater accuracy when using belief criteria that inspire complete confidence in a drug expert than when using criteria that the expert believes to be less trustworthy. We based this hypothesis on our intuition that the evidence types that experts consider less trustworthy are more readily available in the literature, but often contain less generalizable scientific findings, than types that the same experts view as more trustworthy.
We conducted an experiment to characterize the effect of varying belief criteria on the system’s accuracy and coverage of DDIs present in a reference set of interactions and non-interactions. The experiment was designed to answer the following research questions:
We collected sufficient evidence on the pharmacokinetic drug properties of 16 drugs and 19 drug metabolites to perform this experiment. Figure 2 lists the specific drugs and drug metabolites that we chose to represent in the DIKB. The methods used to collect, classify, and enter evidence into the DIKB are described in part I of this two part series .
Once the evidence-base was complete except for minor revisions, we attempted to identify all known pairwise metabolic inhibition interactions and non-interactions between the 35 pharmaceutical entities in the DIKB’s evidence-base. Our intent was to use the interactions and non-interactions that we found as a validation set for determining the accuracy and coverage of the DIKB’s DDI predictions. The DIKB predicts interactions using knowledge of drug mechanisms and a rule-based theory of how interactions occur by metabolic inhibition. The method that we used to confirm interactions and non-interactions approximated an independent reference standard because it relied on quantitative observations and did not consider underlying mechanisms.
We considered a metabolic inhibition interaction to be independently confirmed if any one of following criteria was satisfied:
We considered a metabolic inhibition non-interaction to be independently confirmed if all the following criteria were satisfied:
If neither a metabolic inhibition interaction or non-interaction could be confirmed for any pair of pharmaceutical entities in the DIKB then, we labeled the pair as having no known interaction or non-interaction.
We defined a statistically significant increase in AUC to be:
Where AUC is the baseline AUC for a DDI study’s object drug or drug metabolite and AUCi is the AUC for the object drug in the presence of the study’s precipitant drug or drug metabolite. Often studies do not provide p-values, in such cases we judged an AUC increase statistically significant if the study provided 95% confidence intervals for the AUC ratio () that did not include 1.0. If the study’s results did not satisfy Equation 1, and/or the 95% confidence intervals for the AUC ratio (if available) included 1.0 then, we defined the metabolic inhibition interaction to not be statistically significant.
We sought evidence for confirming interactions and non-interactions from three sources of pharmacokinetic data — published research articles, drug product labeling, and published observation-based case reports. If the data came from a research article then, the study must have satisfied the definition and inclusion criteria of the evidence type A DDI clinical trial or any of its sub-types in the DIKB evidence taxonomy (Appendices A and B, supplementary material). If we found the data in drug product labeling then it must have met the inclusion criteria for the DIKB evidence type A non-traceable drug-label statement (Appendix B, supplementary material). Finally, case reports needed to meet the inclusion criteria for evidence of their type listed in Appendix B (supplementary material) and provide quantitative measurements of the systemic concentration of the purported object drug before and after administration of the purported precipitant drug.
We designed criteria for confirming or refuting a metabolic-inhibition DDI before we began collecting evidence. Once the DIKB’s evidence-base was complete, we began building the validation set. We started by enumerating all ordered pairwise combinations of the 35 drugs and drug metabolites in the DIKB’s final evidence-base. We used a single label in the validation set to represent both possible ways that an interaction might occur between a drug or drug metabolite pair. For example, assuming two drugs X and Y, we used a single label X - Y to represent two possible interactions — that X effects a change in the systemic concentration of Y and vice versa. This resulted in a total of 595 labels (Equation 2) excluding same-compound combinations (e.g. simvastatin-simvastatin). Appendix C (supplementary material) lists 595 labels representing all 1190 potential pairwise interaction and non-interactions between the drugs and drug metabolites in the DIKB.
Throughout the evidence collection process we often found clinical trials providing data that were relevant to both establishing the validity of an assertion about some metabolic property and confirming a metabolic interaction or non-interaction. To avoid biasing the validation set, we first entered evidence items that could be applied to both the DIKB’s evidence-base and the validation set in both places. Then, before assessing the system’s accuracy, we identified any evidence item that supported a claim made by the validation set and was also used to support a DIKB assertion that could lead to the same conclusion. We dropped from further analysis the interaction or non-interaction that such evidence supported. In total, seven pairs were dropped from further analysis for this reason. These seven drug/drug or drug/drug-metabolite pairs are shown in Table 1 (supplementary material) along with two other pairs that were accidentally excluded from the experiment described in this paper due to a transcription error. Excluding these nine pairs brought the total number of drug/drug and drug/drug-metabolite pairs used for characterizing the DIKB’s accuracy down to 586.
We searched the primary literature for clinical trials involving any of the 586 drug/drug and drug/metabolite pairs in Appendix C (supplementary material) by querying PubMed 1 and the proprietary University of Washington Metabolism and Transport Drug Interaction Database 2 After completing an intensive search for all relevant clinical trials, we conducted a search in drug product labeling for statements mentioning a pharmacokinetic interaction or non-interaction involving any of the 586 pairs. We searched all labels written for each drug product whose only active pharmaceutical ingredient was a drug in the DIKB. The number of qualifying product labels for each drug ranged from one (atorvastatin, fluvastatin, and rosuvastatin) to 18 (diltiazem) but a large proportion of the statements in one product label were repeated in all of the other available labels. All identified statements were noted and then filtered so that only statements providing quantitative data were used to support validation-set interactions. All searches were done using product labeling in the NLM’s DailyMed database. 3
In total, we found 65 drug-product labeling statements that mentioned a pharmacokinetic interaction or non-interaction between the members of some drug or drug-metabolite pair included in our study. A small sample of these 65 statements is shown in Table 2 (supplementary material). Only 21 statements (31%) reported the quantitative results of a pharmacokinetic clinical trial. We approved these 21 for use in the validation set; the remaining 44 statements were retained so that drug-product labeling, clinical trial literature, and case reports could be compared for their agreement on validation set interactions and non-interactions at a later time.
Having completed searches within clinical trial literature and drug-product labeling, we then did an intensive search of the University of Washington Metabolism and Transport Drug Interaction Database and PubMed for published case reports claiming the occurrence of a DDI between any pair of the active ingredients or metabolites in our study. This search resulted in 35 relevant case reports. We evaluated the full-text article for each of the 35 reports to see if it met the pre-defined inclusion criteria. Unfortunately, none of the 35 reports was accepted for use in the validation set. We rejected most case reports because they did not provide quantitative measurements of the systemic concentration of the purported object drug before and after administration of the purported precipitant drug. Three case reports provided adequate measurements of systemic concentration but failed to meet inclusion criteria by not receiving a causation rating of at least “probable” when co-investigator JH assessed the report using Drug Interaction Probability Scale . Table 3 (supplementary material) cites the reports and provides an explanation for their low causation score.
The interactions and non-interactions in the final validation set are shown in Tables Tables44 and and5.5. The validation set claims that some DDI will occur by metabolic inhibition for 41 drug/drug and drug/drug-metabolite pairs and that no DDI will occur by metabolic inhibition for seven pairs. No interaction or non-interaction could be identified for 538 pairs in the validation set (Appendix C, supplementary material). It is important to stress that many of these pairs might have clinically-relevant DDIs that were missed by our evidence collection process or that have not been reported in the sources we searched.
Once work on the evidence-base and validation set was complete, the two drug experts in our group (CC and JH) then defined one or more LOEs for each assertion type. This was a two step process; co-investigator RB (an informaticist) first identified all evidence types from the DIKB’s evidence taxonomy (Appendix A, supplementary material) that were applicable to each assertion type. He then helped the two drug experts decide the degree of certainty they would have in an assertion instance whose evidence support consisted of the evidence items represented by each evidence type or combination of evidence types. 4
Table 6 shows the set of labels, or ranking categories, that the drug experts used define LOEs. Some labels in Table 6 represent multiple evidence types that the drug experts believed would confer roughly the same degree of justification for drug mechanism assertions. The experts felt that several evidence types would not be useful for supporting or refuting particular assertion types. These evidence types were grouped into a special “not applicable” ranking category and inserted as the lowest-ranking level-of-evidence for the assertion type’s LOE.
While LOEs were designed separately for each assertion type, many of the resulting LOEs were alike. As a result, only 15 LOEs were defined for the 22 assertion types in the DIKB’s rule-based theory of metabolic drug-drug interactions. Table 7 enumerates the assertion types that share the same LOEs. The symbols for each LOE in Table 7 are used in Table 8 to show a concise overview of the ranking categories used for all LOEs. An explicit definition of each LOE is given in Table 9 (supplementary material).
Once they had defined a complete set of LOEs, the drug experts selected two LOEs for each assertion type as its belief criteria; one for evidence supporting an assertion and the other for evidence refuting an assertion (see Table 10). The experts’ belief criteria strategy used the top level-of-evidence from every LOE definition and, therefore, was the most stringent one tested.
We expanded the DIKB so that it could make interaction and non-interaction predictions using every belief criteria strategy possible with the expert-defined LOEs. We were now able to characterize the effect of varying belief criteria on the system’s accuracy and coverage.
A default assumption is a special kind of assertion that is considered justified with no evidence support and that can be asserted or retracted either manually by curators or automatically by the system as it proceeds with inference. Initially, we were going to generate all belief criteria strategies by varying the LOEs chosen as belief criteria for every assertion type that was not labeled as a default assumption. We excluded default assumptions because varying their belief criteria has no effect on the DIKB’s predictions. Table 11 lists the set of assertion types not labeled as default assumptions along with the number of LOEs that the evidence-board defined for each of them. As Equation 3 shows, the total number of belief criteria strategies that the DIKB would have generated for these assertion types is 576,000.
However, inspection of the DIKB’s evidence-base revealed that there were six assertion types for which all of the evidence items, for or against, belonged to the highest-ranked LOE for the type. This meant that varying the LOE chosen as a belief criterion for any of these types (see Table 11) would have no effect on the DIKB’s prediction performance. Excluding these six assertion types and the eight assertion types labeled as default assumptions meant that there were eight assertion types for which varying the LOE chosen as a belief criterion would have an effect on prediction performance. The DIKB generated all 36,000 different belief criteria strategies possible for the remaining eight assertion types by altering the LOEs chosen as belief criteria. The 14 remaining assertion types used the highest-ranking LOEs defined for their types as belief criteria.
We used the R statistical language  to calculate all descriptive and performance statistics. Bruno Falissard’s psy package  was used to calculate a three-valued Cohen’s kappa score as a measure of the degree over random chance to which the DIKB and validation set agreed on interactions, non-interactions, and unknowns. Both R and the Python programming language 5 were used extensively to write various programs that aided our analysis.
We began the experiment by testing the accuracy and coverage of the DIKB using the drug experts’ belief criteria strategy (Section 2.3.3). Using this strategy, the DIKB predicted that 15 drug/drug or drug/drug-metabolite pairs would interact by metabolic inhibition and that two would not (see Table 12). Fourteen interaction predictions were present in the validation set and so were considered true positives. The remaining interaction prediction and the two non-interaction predictions were classified as “unknown” because they were neither confirmed nor refuted by the validation set.
The predicted pharmacokinetic magnitude of all 14 confirmed predictions corresponded with levels observed in clinical trial data. While the system’s predictions and magnitude estimates using the drug experts’ strategy had perfect accuracy, its coverage of known interactions was poor. Only 14 (34%) of the 41 pairs known in the validation set to interact by metabolic inhibition were predicted to interact by the DIKB. Also, the system failed to predict any of the seven pairs known in the validation set to not interact by the same mechanism. The system’s poor coverage using the experts’ belief criteria strategy was expected since it was the most stringent strategy tested.
We then analyzed the computer-generated belief criteria strategies to see what influence they had on the accuracy and coverage DIKB’s predictions. The DIKB failed to make any predictions using one computer-generated strategy due to some unknown error that occurred during the experiment. The results from this strategy were not used in further analysis. We analyzed the remaining 35,999 different strategies for accuracy, coverage, and agreement with the validation set. Table 13 shows summary statistics for each performance parameter we analyzed over all prediction sets.
The DIKB’s sensitivity ranged from 0.88 to 1.0 with 19,583 (54%) of the belief criteria strategies causing the system to operate at maximum sensitivity. The systems specificity ranged from 0.0 to 1.0 with 6,912 (19%) of the belief criteria strategies causing the system to operate at maximum specificity. The system had excellent positive predictive value (range: 0.94 to 1) across all belief criteria strategies. However, we could not characterize the system’s negative predictive value in a meaningful way because the DIKB never predicted more than two of the seven validation set non-interactions using any belief criteria strategy. We also calculated three-valued kappa scores for every prediction set using Cohen’s kappa to see how the agreement between the DIKB and the validation set compared with agreement expected by random chance. The DIKB’s predictions across all prediction sets had moderate agreement (0.4 to 0.5) with the validation set and never reached levels typically considered indicative of significant agreement (≥ 0.7).
It would be informative to show the trade-off in the true-positive and false-positive rate of the DIKB over all belief criteria strategies using a receiver operating characteristic (ROC) curve. However, there is no way to create an ROC curve for the DIKB’s prediction sets because LOEs are rank-ordered. Another issue is that the false-positive rate of the system can not be calculated for 28,511 strategies for which the system made no true negative or false positive predictions. Figure 3 shows a scatter-plot of the DIKB’s true-positive rate versus its false-positive rate over the 7,488 belief criteria strategies for which a false-positive rate can be calculated. The plot shows that a number of strategies cause the system to make no false-positives predictions with high, or nearly perfect, sensitivity.
Figure 4 shows the accuracy and coverage of the DIKB as the proportion of higher-ranking LOEs present in a belief criteria strategy increases. To create the x axis for this plot, we first assigned an integer rank to each level-of-evidence in the LOEs for the eight assertion types for which varying belief criteria had an effect on prediction performance (see Section 2.3.4). The integer ranks were assigned such that the highest-ranking level-of-evidence was assigned a one, the next a two, and so-on. Then, for all 35,999 belief criteria strategies possible with the selected LOEs, the average integer-rank of each level-of-evidence used in the strategy was divided into 1.0. The resulting metric has a range of 0.26 (least stringent) to 1.0 (most stringent) for the set of strategies used in the current study. Figure 4 indicates that the most stringent belief criteria strategy resulted in very accurate predictions but that a large number of less-stringent strategies produced the same accuracy while covering a greater proportion of validation set interactions. Figure 5 shows this finding more clearly by plotting, at four different stringency levels, the proportion of belief criteria strategies that matched the most stringent strategy’s accuracy but had better coverage.
A notable result of this experiment is that 8,351 (23%) of the 35,599 tested strategies caused the DIKB to have equal or better performance in terms of sensitivity, positive predictive value, and agreement with the validation set than the most stringent strategy. Table 14 shows the performance characteristics for 1,152 (3%) strategies that performed at the top level in these three performance categories. It is important to note that neither the most stringent strategy nor any of these “best-performing” strategies predicted non-interactions that were confirmed or refuted by the validation set. This meant that we could not calculate the specificity of the DIKB using either expert or computer-generated strategies.
All of the “best-performing” strategies caused the DIKB to make the same set of 65 interaction and non-interaction predictions. These strategies predicted a metabolic inhibition interaction for 34 (83%) of the 41 interacting pairs in the validation set while making no false positive and no false negative predictions. The system’s coverage of the validation set using any of these “best-performing” strategies was incomplete — it made no predictions for seven interactions and seven non-interactions listed in the validation set. As Table 15 shows, the pharmacokinetic magnitude of 30 of the 34 confirmed (88%) predictions made using the best performing belief criteria strategies matched levels observed in clinical trial data.
The system also predicted 31 metabolic inhibition interactions and nine non-interactions using the 1,152 “best-performing” belief criteria strategies whose validity was unknown by the validation set. These novel interaction predictions (Table 16, supplementary material) represent potentially interacting drug combinations that our review of the literature indicate have not been studied. We searched PubMed for clinical trials involving these pairs and could only find one clinical trial that was not already included in the validation set  unfortunately, this study did not meet the inclusion criteria for its evidence type and so could not be used as evidence for, or against, any interactions.
Fifteen of the published case reports we had collected while constructing the validation set claimed the occurrence of a DDI that matched one of the 31 novel predictions. Each report was reviewed using the Drug Interaction Probability Scale (DIPS)  by clinician co-investigator JH. The DIPS defines four qualitative levels (Doubtful, Possible, Probable, and Highly Probable) representing the degree to which the information provided by the report supports the proposition that a specific drug combination effected an adverse event or events. Six novel predictions were matched with case reports that met the DIPS Probable level; meaning that the predicted interactions were the likely cause of an adverse event occurring in a patient. Seven novel predictions were matched with reports that met the DIPS Possible level; meaning that the predicted interactions could not be excluded from consideration as the cause of an adverse event in a patient.
We could not do a quantitative comparison of the system’s predictions with drug-drug interaction statements from product labeling because a significant proportion of the validation set was constructed from labeling statements. We examined the 44 statements that were not used in the validation set and found that two of the seven novel predictions that were matched with reports meeting the DIPS Possible level could not be inferred from any of the product labeling statements (see Table 17). We believe that the combination of evidence from the case report literature supporting these interactions and the lack of discussion in drug product labeling makes these interactions especially important to investigate further.
We also found one product label statement declaring an interaction that is refuted by clinical trial data present in the validation set. This statement extrapolated an interaction observed between erythromycin and one or more HMG-CoA reductase inhibitors to all drugs in that class:
Erythromycin has been reported to increase concentrations of HMG-CoA reductase inhibitors (e.g., lovastatin and simvastatin). Rare reports of rhabdomyolysis have been reported in patients taking these drugs concomitantly .
The active ingredient rosuvastatin is among the HMG-CoA reductase inhibitors included in our study. The labeling statement indirectly declares a potential pharmacokinetic interaction between erythromycin and rosuvastatin that is refuted by a randomized clinical trial in the validation set . None of the interaction predictions made by the DIKB using the evidence board or the best performing belief criteria strategies were refuted by the validation set (i.e. false positives or false negatives). While the system made no prediction involving erythromycin and rosuvastatin with these strategies, it correctly predicted a non-interaction between erythromycin and rosuvastatin using other, lower specificity, belief criteria strategies. These results indicate that, depending on belief criteria strategies, DDI prediction using drug-mechanism knowledge can be accurate while avoiding the kinds of false predictions that occur when individual drug differences are not recognized.
Changing the LOEs selected as belief criteria altered the system’s prediction accuracy and coverage in the way that we had hypothesized. We found for this data set that, as the criteria for including assertions were relaxed, the DIKB predicted a larger number of true interactions; sometimes at the expense of also making more false predictions. By having the computer iterate through a large set of possible belief criteria strategies we found that a significant proportion (23%) of them caused the DIKB to have equal or better performance in terms of sensitivity, positive predictive value, and agreement with the validation set than the most stringent strategy designed by drug experts.
The experiment also found a particular family of belief criteria strategies that optimized the system’s prediction accuracy and coverage. Table 18 shows the range of LOEs used by the 1,152 “best performing” belief criteria strategies. To better understand the information in Table 18, it is useful to note the situations where a level-of-evidence can be selected as a belief criterion for an assertion type without having any effect on the system’s predictions.
In the current study, the types substrate-of and inhibits were key to making all interaction predictions. In contrast, one assertion type, primary-metabolic-clearance-enzyme, was used exclusively to predict the magnitude of an interaction. Varying this assertion’s belief criteria could not affect the system’s non-magnitude predictions and, therefore, had no effect on the study’s performance metrics. Indeed, Table 18 shows that all four of this assertion type’s LOEs were used by the “best-performing” strategies.
Table 18 shows that there were multiple cases where no evidence items were mapped to a particular level-of-evidence for an assertion type. For example, since none of the 17 evidence items that were linked to controls-formation instances map to the assertion type’s top two ranking LOEs (D: LOE-1 and D: LOE-2), the system’s predictions were not affected when either was chosen as a belief criterion. Similar situations held for the is-not-substrate-of, and primary-metabolic-clearance-enzyme assertion types.
While the inhibits assertion type was relevant for making DDI predictions, Table 18 shows that both of its LOEs (I: LOE-1 and I: LOE-2) were used by the “best-performing” belief criteria strategies. This was because there was only one evidence item that mapped to LOE-2; an item supporting the assertion (ketoconazole inhibits CYP3A4). The same assertion had two other evidence items that mapped to LOE-1. As a result, the assertion was justified no matter which LOE the system chose as a belief criterion for the inhibits assertion type.
These observations suggest that the experiment’s “best-performing” belief criteria strategies are quite likely unique to the set of drugs and evidence items used in the current experiment. For example, further work on the evidence-base might identify new evidence items that map to LOE-2 for the inhibits assertion and that cause the system to make false predictions. Similarly, the effect of evidence items that map to LOE-1 or LOE-2 for the controls--formation assertion type are unknown because such evidence was not present in the current evidence base. We intend to explore if the “best-performing” strategies perform as well for a different set of drugs and evidence items in future work.
So far in our discussion we have focused only on the performance of the DIKB using the “best-performing” strategies. However, 27,648 (77%) of the strategies that we tested caused the DIKB to predict at least one interaction or non-interaction considered invalid by the validation set. The maximum number of interaction or non-interaction predictions refuted by the validation set for any single strategy was three and included either two invalid interactions and one invalid non-interaction or vice versa. Table 19 shows the four invalid predictions that appeared in various combinations among the predictions made using a wide range of strategies. We now briefly discuss the conditions under which each of these predictions were made.
The itraconazole-fluvastatin interaction prediction occurred when the system used strategies that accept drug labeling statements as belief criteria because 1) the assertion (itraconazole inhibits CYP3A4) was a default assumption and 2) the evidence-base recorded one labeling statement (based on a non-cited in vitro study) proposing that CYP3A4 is a minor elimination pathway for fluvastatin (<20% of total clearance) . This prediction was refuted in the validation set by a randomized controlled trial involving 10 healthy volunteers that showed no significant increase in the systemic concentration of fluvastatin in the presence of itraconazole .
The DIKB predicted the fluconazole-rosuvastatin interaction using strategies that allow statements in product labeling to justify the controls-formation and has-metabolite assertion types and non-randomized clinical trial data to justify the inhibits assertion type. In this case, the system inferred that rosuvastatin is a substrate of CYP2C9 because the two assertions (rosuvastatin has-metabolite N-desmethylrosuvastatin) and (CYP2C9 controls-formation-of N-desmethylrosuvastatin) were each supported by evidence items based on labeling information  and the assertion (fluconazole inhibits CYP2C9) was supported by a non-randomized clinical trial . This prediction was refuted in the validation set by a randomized controlled trial involving 14 healthy volunteers that found no statistically significant increase in the systemic concentration of rosuvastatin in the presence of fluconazole .
The DIKB predicted a non-interaction between itraconazole and rosuvastatin via CYP3A4 inhibition using strategies that allow statements in product labeling to justify the is-not-substrate-of assertion type. This was because the evidence-base contained one evidence item, based on a labeling statement, declaring CYP3A4 to not have a role in the metabolic clearance of rosuvastatin . The validation set contained a randomized clinical trial that found a small, but statistically significant, increase in the systemic concentration of rosuvastatin in the presence of itraconazole during .
The system predicted a non-interaction between fluconazole and clarithromycin using strategies that considered 1) inhibits type justified by non-randomized clinical trial data, 2) the is-not-substrate-of assertion type justified by in vitro metabolism identification studies using human microsomes and chemical inhibitors, and 3) the substrate-of assertion type justified by any of the clinical trial types. These strategies caused the system to apply one evidence item  to justify the assertion (fluconazole inhibits CYP2C9) and another item  to justify the assertion (clarithromycin is-not--substrate-of CYP2C9).
This case is interesting because the invalid prediction was overruled when using a less stringent levels-of-evidence for the substrate-of assertion type. The first two LOEs for the substrate-of assertion type are defined by various clinical trial types. Strategies that used the assertion type’s third-ranking LOE, in vitro metabolism identification experiments, as belief criterion predicted a validated interaction between fluconazole and clarithromycin via CYP3A4 inhibition. We think that this case shows the value of examining the performance of the system at all possible belief criteria strategies.
Table 20 shows six interactions and four non-interactions present in the validation set that were never predicted by the DIKB using any belief criteria strategy. We now briefly examine why the system missed these interactions and non-interactions.
The system missed the non-interactions involving pravastatin because its evidence-base was incomplete in regards to the pravastatin’s metabolic properties. Although pravastatin’s drug product label states that it is not metabolized by CYP3A4 “to any significant extent” , this evidence was never entered into the DIKB. Doing so would have caused the system to predict the three missing non-interactions involving pravastatin using some belief criteria strategies. This case shows that missing interactions and non-interaction can help identify errors made while building the DIKB’s evidence-base. We think that an analysis of missing interactions and non-interactions should be done as quality assurance every time a set of “best-performing” strategies is created for a new set of drugs. Once errors are corrected, the process of identifying “best-performing” strategies can be repeated.
It is important to note that entering the aforementioned product label evidence would have caused the system to predict non-interactions for the two missing interactions involving pravastatin (Table 20). If the non-interaction predictions are incorrect then, the interactions likely occur by mechanisms other than those modeled by the DIKB’s current DDI theory. If these mechanisms are known then, the DDI theory and evidence-base should be expanded to include them. Another possible explanation is that non-interaction predictions are correct, but that the studies that support the interactions contain some limitation. In this situation, it would be important to identify the limitation and integrate it into the inclusion criteria used to collect evidence.
Three missing interactions and one non-interaction are accounted for by the fact that were no assertions in the system indicating which enzymes do or do not metabolize 1′-OH-midazolam, ortho-OH-atorvastatin, 4-OH-alprazolam, 14-OH-clarithromycin. Neither are there assertions indicating that these metabolites inhibit a drug-metabolizing enzyme present in the system. The DIKB’s rule-based theory is capable of predicting when inhibition of the parent compound will or will not affect the formation of these metabolites made but we did not include these kinds of predictions in the study
There were two evidence items in the system that supported the assertion (erythromycin inhibits CYP3A4) [17,18] but no assertion or evidence in the system claiming that itraconazole is a substrate of that enzyme. Hence, The system did not predict an interaction between erythromycin and itraconazole with itraconazole as the effected drug.
Conversely, the system had three default assumptions that separately established itraconazole to be both an in vivo and in vitro selective inhibitor of CYP3A4 and erythromycin to be an in vitro probe substrate. However, the system did not predict that itraconazole would interact with erythromycin because it does not assume that all properties established in vitro will hold in vivo. We think that the decision to accept in vitro knowledge as sufficient for inferring an in vivo result should occur within the evidence-model, not the rule-base.
It’s important to note that our experiment only looked at binary performance criteria — predictions were classified as “true” or “false” according to the validation set and the goal was to maximize “true” predictions and minimize “false” predictions. It is likely that a different set of best-performing strategies would be found if our goal was to optimize the accuracy of the system’s magnitude predictions. This could be a worthwhile experiment because the system is capable of order-of-magnitude predictions for some drug combinations. As is shown in Section 3, the DIKB’s magnitude estimates for 14 interactions using the drug experts’ belief criteria strategy corresponded with levels reported in clinical trial data present in the validation set. A set of belief criteria strategies that focused on optimizing magnitude would seek to expand the DIKB’s coverage of known interactions past these 14 while still making correct magnitude predictions. However, this kind of analysis should also examine if there are some features of the rule-based theory of metabolic inhibition interactions used by the DIKB that could limit its ability to make accurate magnitude predictions. The DIKB’s current method for magnitude estimation makes a number of assumptions about many of the factors that can sometimes contribute to an increase in the AUC of an object drug in pharmacokinetic DDI study. It might be desirable to apply a more sophisticated approach that makes fewer assumptions such as that described by Ohno et al .
If a clinical trial is applied as support for an interaction or non-interaction in the validation set while a labeling statement echoing, but not citing, the study supports an assertion that the DIKB uses to predict the same interaction, an assessment of the system’s accuracy will be somewhat biased in favor of the system. The same bias will exist if a validation set interaction or non-interaction rests on only a single non-traceable statement and there are assertions in the DIKB used to predict the interaction or non-interaction that depend on the study that inspired the statement Both situations have the same remedy — exclude the “dual-use” validation set interaction or non-interaction from calculations of the systems accuracy. Unfortunately, we did not implement any strategy to avoid this kind of bias so it is possible that some labeling data was used to support mechanistic assertions that led to predictions validated by the same data but appearing in a different source. Future work will examine if this bias was present and, if so, what effect removing the affected interactions or non-interactions has on the calculations of DIKB accuracy.
Another limitation is that all case report evaluations were conducted using a causality assessment tool (DIPS ) whose validity and reproducibility has not yet been formally evaluated. Furthermore, the reviewer (co-investigator JH) was both the developer of DIPS and a member of the DIKB evidence board. Future work should involve a more rigorous evaluation of the DIKB’s novel interactions designed to overcome any potential bias not addressed in this study.
Mechanism-based DDI prediction is currently part of pre-clinical drug development where it used to identify potential interactions between a new drug candidate and drugs currently on the market . These early-phase modeling efforts are geared towards identifying interactions between a new drug and drugs with which it might be co-administered early on, before much time and money is invested [21,22]. The predictions made using drug-mechanisms in this context are generally qualitative; they indicate that two drugs might interact via a mechanism but offer no estimate of the magnitude of the interaction. Scientists can use qualitative predictions to select the set of clinical trials necessary to establish a new drug’s safety profile .
Considerable effort has been invested in both industry and academia researching how to make quantitative estimates of in vivo DDIs from in vitro evidence. Unfortunately, there is currently no general method for making accurate quantitative estimates of the magnitude of a metabolic inhibition DDI using in vitro data . In spite of its difficulties, software tools have been developed that combine in vitro and in vivo findings to help clinicians reason about potential metabolic DDIs [24,25]. These systems rely heavily on mathematical models to make quantitative estimates of the magnitude of their DDI predictions. In contrast, the DIKB’s theory of metabolic DDIs makes simple magnitude estimates based on qualitative assertions about the degree to which inhibition of an enzyme would affect the clearance of some drug. A comparison of the DIKB’s ability to accurately predict magnitude with that of other systems would be worthwhile and should be the topic of future work.
To the best of our knowledge the DIKB’s computational model of evidence provides features that are unique among all current system’s that represent drug-mechanism knowledge. While other drug knowledge-bases and knowledge-based systems have linked evidence to their drug facts (e.g. DRUGDEX ® 6, Q-DIPS , PharmGKB , and BioCyc ), only the DIKB classifies all evidence entered into the system using a biomedical evidence ontology oriented toward confidence assignment. This enables the system to provide customized views of a body of drug-mechanism knowledge to users who do not agree about the inferential value of particular evidence types. This feature also allowed us to develop the novel approach to identifying the optimal use of available evidence that we have presented in this paper.
The results of the experiment discussed in this paper suggest that the DIKB’s unique approach to representing evidence provides two advantages over the representation methods used in current drug knowledge-bases. One advantage of the method is that experts can prospectively map their confidence in each assertion type to some arrangement of one or more abstract evidence types rather than having to review every relevant evidence item in the system. The experts can then define belief criteria that will influence the prediction accuracy of the system in a predictable way. Relaxing belief criteria causes the system to predict a larger number of true interactions, sometimes at the expense of making more false predictions.
Another advantage of the method is that researchers can characterize the prediction accuracy of a DDI theory using many different sub-sets of a specific body of evidence. This process can identify which evidence-use strategies optimize the system’s prediction accuracy and coverage of known DDIs. While the resulting evidence-use strategies might be unique to the set of drugs and evidence items used to create them, it is reasonable that the clinical relevance of any novel DDIs predicted using such “best-performing” strategies will be within the range present for the known interactions. This would have important implications for drug safety and is an area we intend to explore in future work.
The experiment has also helped to identify an opportunity for research into new computational methods to help support analysis of belief criteria strategies. Currently, the task is difficult because of the complex interplay between the kinds of evidence present in the knowledge-base, how it is linked to each assertion instance, and the relationship between each assertion type and the variables chosen for scoring the system’s prediction performance. In future work we hope to explore this research area as well as validate the DIKB’s approach to a broader range of drugs and drug-interaction mechanisms. We also hope to explore methods for quantifying the amount of confidence that expert users should have in DDI predictions the system makes using LOEs that don’t meet belief criteria. Progress on this topic might allow DIKB predictions to be interpreted in terms of a normative decision-theoretic frame-work such as expected utility theory. Understanding the relationship between the DIKB’s rank-ordered LOEs and quantitative measures of evidential support such as those reviewed by Tentori et al  might be an important first step in this direction.
We envision that, over the next decade, a new generation of highly accurate tools will become available that use pharmacologic theory, drug mechanism knowledge, and patient-specific data to help clinicians assess the combined effect of multiple drugs, the effect of removing a drug from a patients drug regimen, and individual response to therapy due to enzyme polymorphisms. These tools will be a significant advance in medicine and a radical change from the functionality that current prescribing software offers. Our research on how to best represent drug mechanism knowledge for the purpose of making clinically relevant DDI predictions is a small, through important step, toward understanding how to build and deploy the highly accurate tools that we envision.
This project was funded in part by a National Library of Medicine Biomedical and Health Informatics Training Program grant (T15 LM07442) and an award from the Elmer M. Plein Endowment Research Fund (University of Washington School of Pharmacy). The DIKB ontology and evidence taxonomy were developed using the Protégé resource, which is supported by grant LM007885 from the United States National Library of Medicine. The author expresses sincere appreciation to Drs. Tom Hazlet and John Gennari for their comments during the course of this project.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
4The questionnaire shown in Appendix G (supplementary material) was used to help the experts reach consensus on the inductive strength of each evidence-type combination.