4.1 Relationship between chemical and pharmacological spaces with respect to drug targets
We investigated the relationship between the chemical space and the pharmacological space about the same drugs. Each panel in shows the scatter-plot of pharmacological effect similarity scores against chemical structure similarity scores for drugs targeting enzymes, ion channels, GPCRs and nuclear receptors, respectively. The Pearson's correlation coefficients are 0. 321, 0.420, 0.344 and 0.391, respectively (the corresponding P-value is almost zero in each case).
Scatter-plots of pharmacological effect similarity scores and chemical structure similarity scores for drugs targeting enzyme, ion channel, GPCR and nuclear receptor, respectively.
It seems that chemical structure similarities are correlated with pharmacological effect similarities to some extent. However, there are many exceptions. For example, there exist many drug pairs which share high structure similarity but do not have similar pharmacological effects. These results suggest that chemical structures similarity does not always correspond to pharmacological effect similarity.
We investigated the relationship between the chemical space, the pharmacological space and the topology of drug–target interactions networks. We constructed the drug–target interaction network for each protein class using a bipartite graph representation (Yildirim et al.
). In the bipartite graph, the heterogeneous nodes correspond to either drugs or target proteins, and edges correspond to interactions between them. The edge is placed between a drug node and a target node if the protein is a known target of the drug.
shows the distributions of chemical structure similarity scores and pharmacological effect similarity scores against the network distance for drugs targeting enzymes, ion channels, GPCRs and nuclear receptors. The top four panels in show the box-plots of drug–drug chemical structure similarities, and the bottom four panels in show the box-plots of drug–drug pharmacological similarities. The network distance means the shortest path between drugs on the bipartite graph representation of each drug–target interaction network. From the figure, we observe several tendencies.
Distributions of chemical structure similarity scores (top four panels) and pharmacological effect similarity scores (bottom four panels) against the network distance of drugs targeting enzymes, ion channels, GPCRs and nuclear receptors.
Firstly, the larger the network distance between drugs, the smaller the variability of chemical structure similarities and pharmacological similarities, respectively. Also, the larger the network distance between drugs, the lower the scores of the chemical structure similarities and the drug pharmacological similarities, respectively. These observations suggest that two drugs sharing high chemical structure similarity or high pharmacological similarity tend to interact with similar target proteins.
Secondly, the above tendency is much clearer in the pharmacological similarity than in the chemical structural similarity. It seems that most pharmacological similarity scores are almost zero at larger distances, while many chemical similarity scores are relatively high even at larger distances. The difference of the distributions between ‘distance 2’ and ‘distance 4’ is important, because ‘distance 2’ corresponds to drug-drug pairs which share the same target proteins, while ‘distance 4’ corresponds to drug–drug pairs which do not share the same target proteins. These observations suggest that pharmacological similarity is more correlated with drug targets than with chemical structure similarity, and the pharmacological similarity information is a more useful source for drug–target identification.
4.2 Performance evaluation of the proposed method
We tested the three different inputs: (i) chemical structure similarity, (ii) true pharmacological similarity, and (iii) predicted pharmacological similarity on their abilities to reconstruct four classes of drug–target interactions involving enzymes, ion channels, GPCRs and nuclear receptors. Note that input (i) corresponds to the previous method (Yamanishi et al.
), and input (ii) and input (iii) correspond to the proposed method in this study. Input (ii) reflects the situation where all compounds in the prediction set have pharmacological information, so we can skip the process of pharmacological effect prediction. Input (iii) reflects the situation where all compounds in the prediction set do not have any pharmacological information.
We performed the following 5-fold cross-validation procedure: drugs in the gold standard set were split into five subsets of roughly equal size, each subset was then taken in turn as a test set, and we performed the training on the remaining four sets. To obtain robust results and accurate comparison, we kept the same experimental conditions, where the same training drugs and test drugs are used across the three different inputs in each cross-validation. We repeated the above cross-validation experiment five times.
shows the averages of the AUC [area under the receiver operating curve (ROC)], sensitivity, specificity and PPV (positive predictive value). The ROC (Gribskov and Robinson, 1996
) is the plot of true positives as a function of false positives based on various thresholds, where true positives are correctly predicted interactions and false positives are predicted interactions that are not present in the gold standard interactions. The upper one percentile in the prediction score is chosen as a threshold for computing sensitivity, specificity and PPV, because high-confidence prediction results are interesting in practical applications.
Statistics of the prediction performance
It seems that the true pharmacological similarity-based method outperforms the chemical structure similarity-based method in all the four protein classes. Especially, the use of pharmacological information is effective in the case of enzyme and ion channel data. It seems that the predicted pharmacological similarity-based method also outperforms the chemical similarity-based method, but the performance is a little worse than that of the true pharmacological similarity-based method. In practical applications, it is rare to obtain the detailed pharmacological information about all compounds to be tested, so the result suggests that the predicted pharmacological information is useful for identification of unknown drug–target interactions even when pharmacological information is not available for compounds of interest. These results serve to highlight the significant performance of the proposed method.
We also made a simple check of the effectiveness of grouping the keywords into the five tag groups. shows the AUC scores of the predicted pharmacological similarity-based method for the five tag groups (caution, interaction, patient, pharmaceutical effect and property) and the combination of the five groups, respectively, where ‘caut’, ‘inte’, ‘pati’, ‘phar’, ‘prop’ and ‘comb’ indicate the five tag groups and the combination, respectively. The low predictive performances of the inte profile is that the number of drugs having the inte keywords is much fewer than those of other types of keywords. It is notable that the remaining four types of keywords (caut, pati, phar and prop) outperformed the comb profiles, indicating the usefulness of discriminating the context of the keywords. It is natural, for example, that the drugs for high blood pressure and the drugs that cause high blood pressure have to be distinguished. These results suggest that appropriate selection of informative keywords and discriminating context will improve the predictive performance.
Barplot of AUC score for the five tag groups (caution, interaction, patient, pharmaceutical effect and property) and their combination.
4.3 Comprehensive prediction for unknown drug–target interactions
After confirming the usefulness of our method, we conducted a comprehensive prediction of interactions between all possible compounds and proteins for the four classes of target proteins studied: enzymes, ion channels, GPCRs and nuclear receptors. In the inference process for these predictions, we used all the known drugs and target proteins in the gold standard data as training data, and predicted potential interactions for all compounds in KEGG LIGAND and all the other drugs in KEGG DRUG (the drugs are absent from the gold standard data). Note that there remain many marketed drugs whose target proteins have not been identified yet. The total number of compounds including drugs in the prediction set is 15 383 in each case. Note that most of the compounds and drugs in the prediction set are not assigned any pharmacological information, so the pharmacological effect prediction is required. All the prediction results for each target protein class can be obtained from the web-supplement. Because of space limitations, we focused on the results for enzyme data below.
We focused on the top 1000 scoring predictions for the enzyme data. We investigated the validity of the predicted pairs based on the databases (e.g. KEGG BRITE, SuperTarget, DrugBank), because they contain information about interactions involving compounds which do not have any pharmacological information. Recall that in the Section 2
we constructed the gold standard set for drug–target interactions involving drugs for which the pharmacological information (by JAPIC package inserts) is available. As a result, we confirmed that 223 out of the top 1000 predictions are now annotated in at least one database. On the other hand, in the case of comprehensive prediction based on chemical structure information only, we confirmed that 140 out of the top 1000 predictions are now annotated in at least one database. We take this result as strong evidence supporting the practical relevance of our approach. shows 10 examples of high scoring compound–protein pairs which were not predicted by chemical structure similarity but predicted by pharmacological similarity.
Examples of compound–protein pairs predicted by the proposed method for enzyme data
Next, we manually investigated the validity of the predicted pairs which were not confirmed in the databases, based on the literatures. We take some analgesic and antipyretic agents as examples, as shown in . Salicylamide (D01811) and acetaminophen (D00217) are both known to act on prostaglandin-endoperoxide synthase 1/2 (PTGS1/2) (Aronoff et al.
). Based on these known interactions, some compounds are suggested to interact with PTGS1/2: ethenzamide (D01466), actarit (D01395), N
-acetylphenylethylamine (C06746) and N
-ethylphenylacetamide (C11487). Among these, D01466 is also an analgesic and antipyretic agent (Darias et al.
), although we could not find the target in the databases we used. On the other hand, D01395 is an anti-rheumatic agent (Ye et al.
). The JAPIC entry including D01811 describes that this drug also has an effect on rheumatism (Frankl , 1953
). We could not find any information about the pharmaceutical effects for other two compounds (C06746 and C11487), but they are structurally similar with the other drugs (D01811, D00217, D01466 and D01395). Therefore, it seems possible that these compounds act on PTGS1/2.
Fig. 4. Examples of the proposed drug–target interactions. Four boxes in the center of the figure are the target proteins, and bold lines indicate the known drug–target interactions. Solid lines represent the proposed interactions based on the (more ...)
On the other hand, PTGS1 has some other interacting analgesic and antipyretic drugs, such as mofezolac (D01718) (Goto et al.
), from which tangeretin (C10190) (Hirano et al.
) is suggested as another potential drug. The structural commonality between these two compounds seems only that they both contain some O
-methyl groups on aromatic rings, therefore this result might not be convincing. As the other questionable example, sodium lactate (D02183) is suggested to act on PTGS2 based on the known interacting drug sodium salicylate (D00566), an analgesic agent (Preston et al.
). However, this result seems not convincing at all, because their common substructures are only sodium ion and carboxylate group, and D02183 is an electrolyte replenisher.
The other group of analgesic and antipyretic drugs may possibly share a different target protein. Fluocinolone acetonide (D01825) (Emerit et al.
) and fluocinonide (D00325) (Schlessinger et al.
) are known to act on human cytosolic calcium-dependent phospholipase A2 (PLA2G4A), which is involved in lipid metabolism and related to various signal transductions (Balsinde et al.
). From resemblance to these two drugs, triamcinolone acetonide (D00983) (Keele, 1969
) and diflorasone diacetate (D01327) (Bluefarb et al.
) are suggested to act on PLA2G4A. These four drugs are all corticosteroids, and are all known to act as analgesic and antipyretic drugs. Therefore we assume these results are convincing.
There are other possible drug–target interactions that belong to different therapeutic categories. For example, Metildigoxin (D02587) is predicted to have an interaction with a human Na+
transporting ATPase (ATP1A1), based on the reported interaction of digoxin (D00298). D00298 is a digitalis-like cardiotonic substance that acts directly on heart muscle (Cumberbatch et al.
). D02587 is the methylated derivative of D00298, and many reports suggest that D02587 has no significant difference from D00298 in terms of their effects on heart functions (Kaufmann et al.
). Therefore, there is no wonder the two compounds share the same target protein.