Search tips
Search criteria

Results 1-7 (7)

Clipboard (0)
more »
Year of Publication
Document Types
1.  Metabolome-scale prediction of intermediate compounds in multistep metabolic pathways with a recursive supervised approach 
Bioinformatics  2014;30(12):i165-i174.
Motivation: Metabolic pathway analysis is crucial not only in metabolic engineering but also in rational drug design. However, the biosynthetic/biodegradation pathways are known only for a small portion of metabolites, and a vast amount of pathways remain uncharacterized. Therefore, an important challenge in metabolomics is the de novo reconstruction of potential reaction networks on a metabolome-scale.
Results: In this article, we develop a novel method to predict the multistep reaction sequences for de novo reconstruction of metabolic pathways in the reaction-filling framework. We propose a supervised approach to learn what we refer to as ‘multistep reaction sequence likeness’, i.e. whether a compound–compound pair is possibly converted to each other by a sequence of enzymatic reactions. In the algorithm, we propose a recursive procedure of using step-specific classifiers to predict the intermediate compounds in the multistep reaction sequences, based on chemical substructure fingerprints/descriptors of compounds. We further demonstrate the usefulness of our proposed method on the prediction of enzymatic reaction networks from a metabolome-scale compound set and discuss characteristic features of the extracted chemical substructure transformation patterns in multistep reaction sequences. Our comprehensively predicted reaction networks help to fill the metabolic gap and to infer new reaction sequences in metabolic pathways.
Availability and implementation: Materials are available for free at
Supplementary information: Supplementary data are available at Bioinformatics online.
PMCID: PMC4058936  PMID: 24931980
2.  Supervised de novo reconstruction of metabolic pathways from metabolome-scale compound sets 
Bioinformatics  2013;29(13):i135-i144.
Motivation: The metabolic pathway is an important biochemical reaction network involving enzymatic reactions among chemical compounds. However, it is assumed that a large number of metabolic pathways remain unknown, and many reactions are still missing even in known pathways. Therefore, the most important challenge in metabolomics is the automated de novo reconstruction of metabolic pathways, which includes the elucidation of previously unknown reactions to bridge the metabolic gaps.
Results: In this article, we develop a novel method to reconstruct metabolic pathways from a large compound set in the reaction-filling framework. We define feature vectors representing the chemical transformation patterns of compound–compound pairs in enzymatic reactions using chemical fingerprints. We apply a sparsity-induced classifier to learn what we refer to as ‘enzymatic-reaction likeness’, i.e. whether compound pairs are possibly converted to each other by enzymatic reactions. The originality of our method lies in the search for potential reactions among many compounds at a time, in the extraction of reaction-related chemical transformation patterns and in the large-scale applicability owing to the computational efficiency. In the results, we demonstrate the usefulness of our proposed method on the de novo reconstruction of 134 metabolic pathways in Kyoto Encyclopedia of Genes and Genomes (KEGG). Our comprehensively predicted reaction networks of 15 698 compounds enable us to suggest many potential pathways and to increase research productivity in metabolomics.
Availability: Softwares are available on request. Supplementary material are available at
PMCID: PMC3694648  PMID: 23812977
3.  Relating drug–protein interaction network with drug side effects 
Bioinformatics  2012;28(18):i522-i528.
Motivation: Identifying the emergence and underlying mechanisms of drug side effects is a challenging task in the drug development process. This underscores the importance of system–wide approaches for linking different scales of drug actions; namely drug-protein interactions (molecular scale) and side effects (phenotypic scale) toward side effect prediction for uncharacterized drugs.
Results: We performed a large-scale analysis to extract correlated sets of targeted proteins and side effects, based on the co-occurrence of drugs in protein-binding profiles and side effect profiles, using sparse canonical correlation analysis. The analysis of 658 drugs with the two profiles for 1368 proteins and 1339 side effects led to the extraction of 80 correlated sets. Enrichment analyses using KEGG and Gene Ontology showed that most of the correlated sets were significantly enriched with proteins that are involved in the same biological pathways, even if their molecular functions are different. This allowed for a biologically relevant interpretation regarding the relationship between drug–targeted proteins and side effects. The extracted side effects can be regarded as possible phenotypic outcomes by drugs targeting the proteins that appear in the same correlated set. The proposed method is expected to be useful for predicting potential side effects of new drug candidate compounds based on their protein-binding profiles.
Supplementary information: Datasets and all results are available at
Availability: Software is available at the above supplementary website.
Contact:, or
PMCID: PMC3436810  PMID: 22962476
4.  Drug target prediction using adverse event report systems: a pharmacogenomic approach 
Bioinformatics  2012;28(18):i611-i618.
Motivation: Unexpected drug activities derived from off-targets are usually undesired and harmful; however, they can occasionally be beneficial for different therapeutic indications. There are many uncharacterized drugs whose target proteins (including the primary target and off-targets) remain unknown. The identification of all potential drug targets has become an important issue in drug repositioning to reuse known drugs for new therapeutic indications.
Results: We defined pharmacological similarity for all possible drugs using the US Food and Drug Administration's (FDA's) adverse event reporting system (AERS) and developed a new method to predict unknown drug–target interactions on a large scale from the integration of pharmacological similarity of drugs and genomic sequence similarity of target proteins in the framework of a pharmacogenomic approach. The proposed method was applicable to a large number of drugs and it was useful especially for predicting unknown drug–target interactions that could not be expected from drug chemical structures. We made a comprehensive prediction for potential off-targets of 1874 drugs with known targets and potential target profiles of 2519 drugs without known targets, which suggests many potential drug–target interactions that were not predicted by previous chemogenomic or pharmacogenomic approaches.
Availability: Softwares are available upon request.
Supplementary Information: Datasets and all results are available at
PMCID: PMC3436840  PMID: 22962489
5.  Drug-target interaction prediction from chemical, genomic and pharmacological data in an integrated framework 
Bioinformatics  2010;26(12):i246-i254.
Motivation: In silico prediction of drug–target interactions from heterogeneous biological data is critical in the search for drugs and therapeutic targets for known diseases such as cancers. There is therefore a strong incentive to develop new methods capable of detecting these potential drug–target interactions efficiently.
Results: In this article, we investigate the relationship between the chemical space, the pharmacological space and the topology of drug–target interaction networks, and show that drug–target interactions are more correlated with pharmacological effect similarity than with chemical structure similarity. We then develop a new method to predict unknown drug–target interactions from chemical, genomic and pharmacological data on a large scale. The proposed method consists of two steps: (i) prediction of pharmacological effects from chemical structures of given compounds and (ii) inference of unknown drug–target interactions based on the pharmacological effect similarity in the framework of supervised bipartite graph inference. The originality of the proposed method lies in the prediction of potential pharmacological similarity for any drug candidate compounds and in the integration of chemical, genomic and pharmacological data in a unified framework. In the results, we make predictions for four classes of important drug–target interactions involving enzymes, ion channels, GPCRs and nuclear receptors. Our comprehensively predicted drug–target interaction networks enable us to suggest many potential drug–target interactions and to increase research productivity toward genomic drug discovery.
Supplementary information: Datasets and all prediction results are available at
Availability: Softwares are available upon request.
PMCID: PMC2881361  PMID: 20529913
6.  E-zyme: predicting potential EC numbers from the chemical transformation pattern of substrate-product pairs 
Bioinformatics  2009;25(12):i179-i186.
Motivation: The IUBMB's Enzyme Nomenclature system, commonly known as the Enzyme Commission (EC) numbers, plays key roles in classifying enzymatic reactions and in linking the enzyme genes or proteins to reactions in metabolic pathways. There are numerous reactions known to be present in various pathways but without any official EC numbers, most of which have no hope to be given ones because of the lack of the published articles on enzyme assays.
Results: In this article we propose a new method to predict the potential EC numbers to given reactant pairs (substrates and products) or uncharacterized reactions, and a web-server named E-zyme as an application. This technology is based on our original biochemical transformation pattern which we call an ‘RDM pattern’, and consists of three steps: (i) graph alignment of a query reactant pair (substrates and products) for computing the query RDM pattern, (ii) multi-layered partial template matching by comparing the query RDM pattern with template patterns related with known EC numbers and (iii) weighted major voting scheme for selecting appropriate EC numbers. As the result, cross-validation experiments show that the proposed method achieves both high coverage and high prediction accuracy at a practical level, and consistently outperforms the previous method.
Availability: The E-zyme system is available at
PMCID: PMC2687977  PMID: 19477985
7.  GeneRegionScan: a Bioconductor package for probe-level analysis of specific, small regions of the genome 
Bioinformatics  2009;25(15):1978-1979.
Summary: Whole-genome microarrays allow us to interrogate the entire transcriptome of a cell. Affymetrix microarrays are constructed using several probes that match to different regions of a gene and a summarization step reduces this complexity into a single value, representing the expression level of the gene or the expression level of an exon in the case of exon arrays. However, this simplification eliminates information that might be useful when focusing on specific genes of interest. To address these limitations, we present a software package for the R platform that allows detailed analysis of expression at the probe level. The package matches the probe sequences against a target gene sequence (either mRNA or DNA) and shows the expression levels of each probe along the gene. It also features functions to fit a linear regression based on several genetic models that enables study of the relationship between gene expression and genotype.
Availability and implementation: The software is implemented as a platform-independent R package available through the Bioconductor repository at It is licensed as GPL 2.0.
Supplementary Information: Supplementary data are available at Bioinformatics online.
PMCID: PMC2712334  PMID: 19398447

Results 1-7 (7)