In this study, we conducted a large-scale ADR prediction of FDA-approved drugs and investigated three types of feature: (1) chemical structures; (2) biological properties—protein targets, transporters, enzymes, and pathways; (3) phenotypic characteristics—indication and other known ADRs. Our evaluation showed that drug phenotypic information (when available) is informative for ADR prediction, indicating its potential use for early detection of post-market ADR signals. In addition, our study demonstrated that the combination of chemical and biological features improved the AUC as well as precision (~3% increase) and recall (~1%), suggesting that such a data fusion approach is promising for preclinical screening of potential ADRs. The combination of all three types of information (‘chem+bio+pheno’) had lower global AUC than the ‘pheno’-only classifier (but this was not statistically significant), indicating that the simple feature combination method may not work well in this case. We then compared the true positive predictions by classifiers that used individual feature sets (‘chem’, ‘bio’, or ‘pheno’) and measured the overlap between each pair of classifiers. As shown in , 5072 ADRs were detected by ‘chem’ or ‘bio’ but not by ‘pheno’, and 10 581 ADRs were detected by ‘pheno’ but not by ‘chem’ or ‘bio’, indicating that ADRs predicted by each feature type are complementary, and higher performance could be achieved through development of more advanced methods for feature integration. We further analyzed the significance of associations between each of the 4276 features and each of the 1385 ADRs using χ2 statistics in which a feature is regarded as informative if the p<0.05. Distribution of the informative features is shown in online supplementary table S1.
During revision of this paper, Cami
et al45 published a similar study, where they proposed an integrative approach for predicting new ADRs by utilizing structure attributes of the network formed by known drug–ADR relationships from drug safety data, as well as specific drug information including Anatomical Therapeutic Chemical taxonomy, molecular descriptors, and
Medical Dictionary for Regulatory Activities (MedDRA) taxonomy of adverse events. Thus we believe that the models built on large-scale approved drugs have the potential to detect clinically important ADRs at both preclinical and post-market phases for new drugs.
In a further analysis, we found that the contribution of phenotypic features was mostly due to other known ADRs rather than indications. A major reason that existing ADRs contributed significantly to performance could be the existence of high correlations between ADRs. For instance, nausea and headache co-occurred with 596 of the total 832 drugs, and 49 pairs of ADRs co-occurred with more than 400 drugs. As SIDER represents ADRs as unified medical language system (UMLS)
46 concept unique identifiers (CUIs), one side effect may be represented by a group of CUIs (see for seven concepts related to rhabdomyolysis). To predict one ADR CUI by using other ADR CUIs in the same group may introduce biases and overestimate the performance of the model. Therefore, an appropriate grouping schema for ADRs will be investigated in the future. The drug indication information only improved the AUC slightly from 0.9054 (ie, chemical structures only) to 0.9110 (ie, chemical structures + indications). One possible way to improve this is to build a better representation of the indication data. Currently, similar diseases with different CUIs were observed for drug indications in SIDER, for example, C0019693 for ‘HIV infection’ and C0019699 for ‘HIV positive’. Thus, for future work, it may be useful to group the indications.
The improvement produced by biological features was not as much as we initially expected, which may be the result of a few issues. First, the body's response to a drug is a complex process. When a drug enters the body and interacts with its intended targets, favorable effects are expected. However, at the same time, a drug often binds to other protein pockets with varying affinities (off-target interactions), leading to observed side effects. Furthermore, the biological features (ie, protein targets, transporters, enzymes, and pathway) used in this study are relatively simple and probably do not provide the details of molecular processes associated with the drugs.
One problem with the proposed ADR prediction model is imbalanced samples. Of the 1385 ADRs in our dataset, 554 were observed to be associated with fewer than five drugs. Therefore, for these ADR predictions, the dataset has an approximate 1:166 positive to negative ratio, which causes a serious problem for classification algorithms. In the case of an imbalanced classification problem such as this, the large preponderance class will dominate the decision process, which produces classification bias toward the majority class (negative class in this case). As a result, the precision for these ADR predictions would be close to 0%, but accuracy would be near 100%. To compare with results reported in Pauwels
et al,
27 we followed their approach to report global AUC values. However, owing to the imbalance problem, the global AUC could be very high (over 0.9 in this task), but the actual ability to detect and predict positive samples (the ADRs) could be low. Therefore we reported precision and recall in addition to the AUC. As expected, although ‘chem’ features achieved over 0.9 AUC, precision and recall were <0.5 (). Furthermore, when the global AUC and accuracy is used, any improvements in the prediction accuracy of the common ADRs might be diluted by the 554 rare ADRs; thus the contribution of the feature addition could be severely underestimated. For example, after the inclusion of biological properties, the AUC remained relatively similar, but the precision actually improved from 43.37% to 46.23%, with relatively similar recall of 50%. We also analyzed different feature sets by only focusing on ADRs associated with at least 50 drugs so that we have sufficient positive samples. As expected, the results showed more significant contributions by each feature addition in terms of AUC, accuracy, precision and recall because rare ADRs that may distort the measures were excluded. For example, in the case of biological properties, its improvement in AUC was 0.02 for common ADRs as opposed to 0.004 for all ADRs.
Different methods have been proposed to address the imbalanced classification problem.
47–49 As a further analysis, we tested a simple method for addressing the sample imbalance problem by adjusting the class weights of the RF and SVM classifiers (ie, weight = 1 − (class samples/total samples)) and observed improvement in AUC only for RF (increased from 0.9491 to 0.9524). SVM did not improve with class weight adjustment because it is very sensitive to parameters; thus parameters must be reoptimized when weights are adjusted. In the future, we plan to explore other techniques such as feature selection and resampling algorithms as suggested previously.
47–49Furthermore, the clinical validation examples of Baycol and Vioxx support the utility by detecting post-market adverse drug events using information from other medications in the database. For Baycol, the model based on ‘chem’ detected only one ADR related to rhabdomyolysis, while the use of ‘chem+bio’ was able to detect five of seven related ADRs, and the addition of ‘pheno’ did not result in more predictions. For Vioxx, ‘chem+bio+pheno’ was required to detect two of four ADRs related to heart attack. This highlights the utility of chemical and biological data for detecting and predicting likely adverse events, as well as the need for incorporating human adverse event data (phenotypic) as in SIDER to allow detection of other signals. These results suggest that our model has the potential to make clinically important ADR predictions early rather than waiting for sufficient post-market population response data to accumulate.
The study has several limitations, and there is scope for much future work to be carried out. For one, we would like to investigate algorithms that have better interpretability, which can return important features associated with ADRs. Moreover, in this study, representation for phenotypic features was relatively simple. More sophisticated methods (eg, categorizing drug indications via ontologies) could be further examined. Furthermore, a drug acts by inducing perturbations to biological systems, which involve various molecular interactions such as protein–protein interactions, signaling pathways, and pathways of drug action and metabolism.
50 Therefore, in future work, we also plan to incorporate more detailed features such as interaction networks and drug bioactivities into the integrative framework for identification of ADRs.