PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (1054026)

Clipboard (0)
None

Related Articles

1.  Integrating Statistical Predictions and Experimental Verifications for Enhancing Protein-Chemical Interaction Predictions in Virtual Screening 
PLoS Computational Biology  2009;5(6):e1000397.
Predictions of interactions between target proteins and potential leads are of great benefit in the drug discovery process. We present a comprehensively applicable statistical prediction method for interactions between any proteins and chemical compounds, which requires only protein sequence data and chemical structure data and utilizes the statistical learning method of support vector machines. In order to realize reasonable comprehensive predictions which can involve many false positives, we propose two approaches for reduction of false positives: (i) efficient use of multiple statistical prediction models in the framework of two-layer SVM and (ii) reasonable design of the negative data to construct statistical prediction models. In two-layer SVM, outputs produced by the first-layer SVM models, which are constructed with different negative samples and reflect different aspects of classifications, are utilized as inputs to the second-layer SVM. In order to design negative data which produce fewer false positive predictions, we iteratively construct SVM models or classification boundaries from positive and tentative negative samples and select additional negative sample candidates according to pre-determined rules. Moreover, in order to fully utilize the advantages of statistical learning methods, we propose a strategy to effectively feedback experimental results to computational predictions with consideration of biological effects of interest. We show the usefulness of our approach in predicting potential ligands binding to human androgen receptors from more than 19 million chemical compounds and verifying these predictions by in vitro binding. Moreover, we utilize this experimental validation as feedback to enhance subsequent computational predictions, and experimentally validate these predictions again. This efficient procedure of the iteration of the in silico prediction and in vitro or in vivo experimental verifications with the sufficient feedback enabled us to identify novel ligand candidates which were distant from known ligands in the chemical space.
Author Summary
This work describes a statistical method that identifies chemical compounds binding to a target protein given the sequence of the target or distinguishes proteins to which a small molecule binds given the chemical structure of the molecule. As our method can be utilized for virtual screening that seeks for lead compounds in drug discovery, we showed the usefulness of our method in its application to the comprehensive prediction of ligands binding to human androgen receptors and in vitro experimental verification of its predictions. In contrast to most previous virtual screening studies which predict chemical compounds of interest mainly with 3D structure-based methods and experimentally verify them, we proposed a strategy to effectively feedback experimental results for subsequent predictions and applied the strategy to the second predictions followed by the second experimental verification. This feedback strategy makes full use of statistical learning methods and, in practical terms, gave a ligand candidate of interest that structurally differs from known drugs. We hope that this paper will encourage reevaluation of statistical learning methods in virtual screening and that the utilization of statistical methods with efficient feedback strategies will contribute to the acceleration of drug discovery.
doi:10.1371/journal.pcbi.1000397
PMCID: PMC2685987  PMID: 19503826
2.  Analysis of multiple compound–protein interactions reveals novel bioactive molecules 
The authors use machine learning of compound-protein interactions to explore drug polypharmacology and to efficiently identify bioactive ligands, including novel scaffold-hopping compounds for two pharmaceutically important protein families: G-protein coupled receptors and protein kinases.
We have demonstrated that machine learning of multiple compound–protein interactions is useful for efficient ligand screening and for assessing drug polypharmacology.This approach successfully identified novel scaffold-hopping compounds for two pharmaceutically important protein families: G-protein-coupled receptors and protein kinases.These bioactive compounds were not detected by existing computational ligand-screening methods in comparative studies.The results of this study indicate that data derived from chemical genomics can be highly useful for exploring chemical space, and this systems biology perspective could accelerate drug discovery processes.
The discovery of novel bioactive molecules advances our systems-level understanding of biological processes and is crucial for innovation in drug development. Perturbations of biological systems by chemical probes provide broader applications not only for analysis of complex systems but also for intentional manipulations of these systems. Nevertheless, the lack of well-characterized chemical modulators has limited their use. Recently, chemical genomics has emerged as a promising area of research applicable to the exploration of novel bioactive molecules, and researchers are currently striving toward the identification of all possible ligands for all target protein families (Wang et al, 2009). Chemical genomics studies have shown that patterns of compound–protein interactions (CPIs) are too diverse to be understood as simple one-to-one events. There is an urgent need to develop appropriate data mining methods for characterizing and visualizing the full complexity of interactions between chemical space and biological systems. However, no existing screening approach has so far succeeded in identifying novel bioactive compounds using multiple interactions among compounds and target proteins.
High-throughput screening (HTS) and computational screening have greatly aided in the identification of early lead compounds for drug discovery. However, the large number of assays required for HTS to identify drugs that target multiple proteins render this process very costly and time-consuming. Therefore, interest in using in silico strategies for screening has increased. The most common computational approaches, ligand-based virtual screening (LBVS) and structure-based virtual screening (SBVS; Oprea and Matter, 2004; Muegge and Oloff, 2006; McInnes, 2007; Figure 1A), have been used for practical drug development. LBVS aims to identify molecules that are very similar to known active molecules and generally has difficulty identifying compounds with novel structural scaffolds that differ from reference molecules. The other popular strategy, SBVS, is constrained by the number of three-dimensional crystallographic structures available. To circumvent these limitations, we have shown that a new computational screening strategy, chemical genomics-based virtual screening (CGBVS), has the potential to identify novel, scaffold-hopping compounds and assess their polypharmacology by using a machine-learning method to recognize conserved molecular patterns in comprehensive CPI data sets.
The CGBVS strategy used in this study was made up of five steps: CPI data collection, descriptor calculation, representation of interaction vectors, predictive model construction using training data sets, and predictions from test data (Figure 1A). Importantly, step 1, the construction of a data set of chemical structures and protein sequences for known CPIs, did not require the three-dimensional protein structures needed for SBVS. In step 2, compound structures and protein sequences were converted into numerical descriptors. These descriptors were used to construct chemical or biological spaces in which decreasing distance between vectors corresponded to increasing similarity of compound structures or protein sequences. In step 3, we represented multiple CPI patterns by concatenating these chemical and protein descriptors. Using these interaction vectors, we could quantify the similarity of molecular interactions for compound–protein pairs, despite the fact that the ligand and protein similarity maps differed substantially. In step 4, concatenated vectors for CPI pairs (positive samples) and non-interacting pairs (negative samples) were input into an established machine-learning method. In the final step, the classifier constructed using training sets was applied to test data.
To evaluate the predictive value of CGBVS, we first compared its performance with that of LBVS by fivefold cross-validation. CGBVS performed with considerably higher accuracy (91.9%) than did LBVS (84.4%; Figure 1B). We next compared CGBVS and SBVS in a retrospective virtual screening based on the human β2-adrenergic receptor (ADRB2). Figure 1C shows that CGBVS provided higher hit rates than did SBVS. These results suggest that CGBVS is more successful than conventional approaches for prediction of CPIs.
We then evaluated the ability of the CGBVS method to predict the polypharmacology of ADRB2 by attempting to identify novel ADRB2 ligands from a group of G-protein-coupled receptor (GPCR) ligands. We ranked the prediction scores for the interactions of 826 reported GPCR ligands with ADRB2 and then analyzed the 50 highest-ranked compounds in greater detail. Of 21 commercially available compounds, 11 showed ADRB2-binding activity and were not previously reported to be ADRB2 ligands. These compounds included ligands not only for aminergic receptors but also for neuropeptide Y-type 1 receptors (NPY1R), which have low protein homology to ADRB2. Most ligands we identified were not detected by LBVS and SBVS, which suggests that only CGBVS could identify this unexpected cross-reaction for a ligand developed as a target to a peptidergic receptor.
The true value of CGBVS in drug discovery must be tested by assessing whether this method can identify scaffold-hopping lead compounds from a set of compounds that is structurally more diverse. To assess this ability, we analyzed 11 500 commercially available compounds to predict compounds likely to bind to two GPCRs and two protein kinases. Functional assays revealed that nine ADRB2 ligands, three NPY1R ligands, five epidermal growth factor receptor (EGFR) inhibitors, and two cyclin-dependent kinase 2 (CDK2) inhibitors were concentrated in the top-ranked compounds (hit rate=30, 15, 25, and 10%, respectively). We also evaluated the extent of scaffold hopping achieved in the identification of these novel ligands. One ADRB2 ligand, two NPY1R ligands, and one CDK2 inhibitor exhibited scaffold hopping (Figure 4), indicating that CGBVS can use this characteristic to rationally predict novel lead compounds, a crucial and very difficult step in drug discovery. This feature of CGBVS is critically different from existing predictive methods, such as LBVS, which depend on similarities between test and reference ligands, and focus on a single protein or highly homologous proteins. In particular, CGBVS is useful for targets with undefined ligands because this method can use CPIs with target proteins that exhibit lower levels of homology.
In summary, we have demonstrated that data mining of multiple CPIs is of great practical value for exploration of chemical space. As a predictive model, CGBVS could provide an important step in the discovery of such multi-target drugs by identifying the group of proteins targeted by a particular ligand, leading to innovation in pharmaceutical research.
The discovery of novel bioactive molecules advances our systems-level understanding of biological processes and is crucial for innovation in drug development. For this purpose, the emerging field of chemical genomics is currently focused on accumulating large assay data sets describing compound–protein interactions (CPIs). Although new target proteins for known drugs have recently been identified through mining of CPI databases, using these resources to identify novel ligands remains unexplored. Herein, we demonstrate that machine learning of multiple CPIs can not only assess drug polypharmacology but can also efficiently identify novel bioactive scaffold-hopping compounds. Through a machine-learning technique that uses multiple CPIs, we have successfully identified novel lead compounds for two pharmaceutically important protein families, G-protein-coupled receptors and protein kinases. These novel compounds were not identified by existing computational ligand-screening methods in comparative studies. The results of this study indicate that data derived from chemical genomics can be highly useful for exploring chemical space, and this systems biology perspective could accelerate drug discovery processes.
doi:10.1038/msb.2011.5
PMCID: PMC3094066  PMID: 21364574
chemical genomics; data mining; drug discovery; ligand screening; systems chemical biology
3.  Identification of Drosophila MicroRNA Targets 
PLoS Biology  2003;1(3):e60.
MicroRNAs (miRNAs) are short RNA molecules that regulate gene expression by binding to target messenger RNAs and by controlling protein production or causing RNA cleavage. To date, functions have been assigned to only a few of the hundreds of identified miRNAs, in part because of the difficulty in identifying their targets. The short length of miRNAs and the fact that their complementarity to target sequences is imperfect mean that target identification in animal genomes is not possible by standard sequence comparison methods. Here we screen conserved 3′ UTR sequences from the Drosophila melanogaster genome for potential miRNA targets. The screening procedure combines a sequence search with an evaluation of the predicted miRNA–target heteroduplex structures and energies. We show that this approach successfully identifies the five previously validated let-7, lin-4, and bantam targets from a large database and predict new targets for Drosophila miRNAs. Our target predictions reveal striking clusters of functionally related targets among the top predictions for specific miRNAs. These include Notch target genes for miR-7, proapoptotic genes for the miR-2 family, and enzymes from a metabolic pathway for miR-277. We experimentally verified three predicted targets each for miR-7 and the miR-2 family, doubling the number of validated targets for animal miRNAs. Statistical analysis indicates that the best single predicted target sites are at the border of significance; thus, target predictions should be considered as tentative until experimentally validated. We identify features shared by all validated targets that can be used to evaluate target predictions for animal miRNAs. Our initial evaluation and experimental validation of target predictions suggest functions for two miRNAs. For others, the screen suggests plausible functions, such as a role for miR-277 as a metabolic switch controlling amino acid catabolism. Cross-genome comparison proved essential, as it allows reduction of the sequence search space. Improvements in genome annotation and increased availability of cDNA sequences from other genomes will allow more sensitive screens. An increase in the number of confirmed targets is expected to reveal general structural features that can be used to improve their detection. While the screen is likely to miss some targets, our study shows that valid targets can be identified from sequence alone.
A bioinformatic approach suggests many new target genes for Drosophila microRNAs. A number of them are validated experimentally
doi:10.1371/journal.pbio.0000060
PMCID: PMC270017  PMID: 14691535
4.  Code-Assisted Discovery of TAL Effector Targets in Bacterial Leaf Streak of Rice Reveals Contrast with Bacterial Blight and a Novel Susceptibility Gene 
PLoS Pathogens  2014;10(2):e1003972.
Bacterial leaf streak of rice, caused by Xanthomonas oryzae pv. oryzicola (Xoc) is an increasingly important yield constraint in this staple crop. A mesophyll colonizer, Xoc differs from X. oryzae pv. oryzae (Xoo), which invades xylem to cause bacterial blight of rice. Both produce multiple distinct TAL effectors, type III-delivered proteins that transactivate effector-specific host genes. A TAL effector finds its target(s) via a partially degenerate code whereby the modular effector amino acid sequence identifies nucleotide sequences to which the protein binds. Virulence contributions of some Xoo TAL effectors have been shown, and their relevant targets, susceptibility (S) genes, identified, but the role of TAL effectors in leaf streak is uncharacterized. We used host transcript profiling to compare leaf streak to blight and to probe functions of Xoc TAL effectors. We found that Xoc and Xoo induce almost completely different host transcriptional changes. Roughly one in three genes upregulated by the pathogens is preceded by a candidate TAL effector binding element. Experimental analysis of the 44 such genes predicted to be Xoc TAL effector targets verified nearly half, and identified most others as false predictions. None of the Xoc targets is a known bacterial blight S gene. Mutational analysis revealed that Tal2g, which activates two genes, contributes to lesion expansion and bacterial exudation. Use of designer TAL effectors discriminated a sulfate transporter gene as the S gene. Across all targets, basal expression tended to be higher than genome-average, and induction moderate. Finally, machine learning applied to real vs. falsely predicted targets yielded a classifier that recalled 92% of the real targets with 88% precision, providing a tool for better target prediction in the future. Our study expands the number of known TAL effector targets, identifies a new class of S gene, and improves our ability to predict functional targeting.
Author Summary
Many crop and ornamental plants suffer losses due to bacterial pathogens in the genus Xanthomonas. Pathogen manipulation of host gene expression by injected proteins called TAL effectors is important in many of these diseases. A TAL effector finds its gene target(s) by virtue of structural repeats in the protein that differ one from another at two amino acids that together identify one DNA base. The number of repeats and those amino acids thereby code for the DNA sequence the protein binds. This code allows target prediction and engineering TAL effectors for custom gene activation. By combining genome-wide analysis of gene expression with TAL effector binding site prediction and verification using designer TAL effectors, we identified 19 targets of TAL effectors in bacterial leaf streak of rice, a disease of growing importance worldwide caused by X. oryzae pv. oryzicola. Among these was a sulfate transport gene that plays a major role. Comparison of true vs. false predictions using machine learning yielded a classifier that will streamline TAL effector target identification in the future. Probing the diversity and functions of such plant genes is critical to expand our knowledge of disease and defense mechanisms, and open new avenues for effective disease control.
doi:10.1371/journal.ppat.1003972
PMCID: PMC3937315  PMID: 24586171
5.  Towards Complete Sets of Farnesylated and Geranylgeranylated Proteins 
PLoS Computational Biology  2007;3(4):e66.
Three different prenyltransferases attach isoprenyl anchors to C-terminal motifs in substrate proteins. These lipid anchors serve for membrane attachment or protein–protein interactions in many pathways. Although well-tolerated selective prenyltransferase inhibitors are clinically available, their mode of action remains unclear since the known substrate sets of the various prenyltransferases are incomplete. The Prenylation Prediction Suite (PrePS) has been applied for large-scale predictions of prenylated proteins. To prioritize targets for experimental verification, we rank the predictions by their functional importance estimated by evolutionary conservation of the prenylation motifs within protein families. The ranked lists of predictions are accessible as PRENbase (http://mendel.imp.univie.ac.at/sat/PrePS/PRENbase) and can be queried for verification status, type of modifying enzymes (anchor type), and taxonomic distribution. Our results highlight a large group of plant metal-binding chaperones as well as several newly predicted proteins involved in ubiquitin-mediated protein degradation, enriching the known functional repertoire of prenylated proteins. Furthermore, we identify two possibly prenylated proteins in Mimivirus. The section HumanPRENbase provides complete lists of predicted prenylated human proteins—for example, the list of farnesyltransferase targets that cannot become substrates of geranylgeranyltransferase 1 and, therefore, are especially affected by farnesyltransferase inhibitors (FTIs) used in cancer and anti-parasite therapy. We report direct experimental evidence verifying the prediction of the human proteins Prickle1, Prickle2, the BRO1 domain–containing FLJ32421 (termed BROFTI), and Rab28 (short isoform) as exclusive farnesyltransferase targets. We introduce PRENbase, a database of large-scale predictions of protein prenylation substrates ranked by evolutionary conservation of the motif. Experimental evidence is presented for the selective farnesylation of targets with an evolutionary conserved modification site.
Author Summary
Various cellular functions require reversible membrane localization of proteins. This is often facilitated by attaching lipids to the respective proteins, thus anchoring them to the membrane. For example, addition of prenyl lipid anchors (prenylation) is directed by a motif in the protein sequence that can be predicted using a recently developed method. We describe the prediction of protein prenylation in all currently known proteins. The annotated results are available as an online database: PRENbase. A ranking of the predictions is introduced, assuming that existence of a prenylation sequence motif in related proteins from different species (evolutionary conservation) relates to functional importance of the lipid anchor. We present experimental evidence for high-ranked human proteins predicted to be affected by anticancer drugs inhibiting prenylation.
doi:10.1371/journal.pcbi.0030066
PMCID: PMC1847700  PMID: 17411337
6.  A systematic screen for protein–lipid interactions in Saccharomyces cerevisiae 
Lipids are important cellular metabolites, with a wide range of structural and functional diversity. Many operate as signaling molecules. Lipids though have rarely been studied in large-scale interaction screen; they are poorly represented in current biological networks.Here, we describe the use of miniaturized lipid–arrays for the large-scale study of protein–lipid interactions. In yeast, we show general feasibility with a systematic screen implying 172 proteins. We report 530 protein–lipid associations, the majority is novel and several were validated using other techniques.The screen uncovers numerous insights into lipid function in yeast and equivalent systems in humans. It revealed (i) previously undetected cryptic lipid-binding domains, (ii) series of new cellular targets for sphingolipids and (iii) new ligands for some PH domains that can cooperatively bind additional lipids and work as coincidence sensor to integrate both phosphatidylinositol phosphates and sphingolipid signaling pathways.The significant number of biological insights uncovered shows that even major classes of metabolites have been insufficiently studied. This illustrates the general relevance of such systematic screens and calls for further system-wide analyses.
Deciphering the molecular mechanisms behind cellular processes requires the systematic charting of the multitude of interactions between all cellular components. While protein–protein and protein–DNA networks have been the subject of many systematic surveys, other critically important cellular components, such as lipids, have to date rarely been studied in large-scale interaction screens. Growing numbers of lipids are known to operate as signaling molecules. The importance of protein–lipid interactions is evident from the variety of protein domains that have evolved to bind particular lipids (Lemmon, 2008 #392) and from the large list of disorders, such as cancer and bipolar disorder, arising from altered protein–lipid interactions. The current understanding of protein–lipid recognition comes from the study of a limited number of lipids, principally PtdInsPs (Zhu et al, 2001 #16), and lipid-binding domains (LBDs) in isolation (Dowler et al, 2000 #81; Yu and Lemmon, 2001 #396; Yu et al, 2004 #31). For other signaling lipids, such as sphingolipids, intracellular targets and molecular mechanisms are only partially understood (Hannun and Obeid, 2008 #397). The importance of lipids in biological processes and their under-representation in current biological networks suggest the need for systematic, unbiased biochemical screens.
To systematically study protein–lipid interactions, we developed miniaturized arrays that contained sets of 56 lipids covering the main lipid classes in yeast. We used the arrays to determine the binding profiles of 172 soluble proteins. The selection included proteins that contained one or several predicted LBD that were lipid regulated or enzymes involved in lipid metabolism (Figure 1). We obtained 530 protein–lipid interactions (accuracy and coverage: 61 and 60%, respectively). More than half were supported by additional experimental evidences obtained from a large validation effort using a variety of biochemical and cell biology approaches, and the integration of a data set of genetic interactions (Figure 1). As a substantial fraction (45%) of the analyzed proteins were conserved in humans, the protein–lipid data set will have functional implications for higher eukaryotes and thus for human biology.
Overall, 68% of all interactions were novel or unexpected from either protein sequences or known LBDs specificities. We discovered cryptic LBDs that were previously undetected in Ecm25 (a RhoGAP) and Ira2 (a RasGAP). We also identified a set of proteins that bound sphingolipids, a class of bioactive lipids that play important signaling functions in yeast and higher eukaryotes. The exact mode of action for these lipids remains elusive and the data set points to series of new cellular targets. We identified 63 proteins, involved in endocytosis, cell polarity and lipid metabolism that interacted with sphingoid long-chain bases (LCBs), ceramides or phosphorylated LCBs (Figure 5).
Despite the importance of sphingolipids in signaling processes, only a few domains, such as START or Saposins, have been reported to specifically bind these lipids in higher eukaryotes, and none of them have been found in yeast. Interestingly, almost 60% of proteins binding to phosphorylated LCBs in our assay also contained a pleckstrin homology (PH) domain and bound PtdInsPs (Figure 5). This suggests some PH domains might have unanticipated ligands and also have a function in sphingolipid recognition. We showed, using a variety of biochemical and cell-based assays, that the PH domain of Slm1, a component of the TORC2 signaling pathway (Fadri et al, 2005 #429), can bind PtdIns(4,5)P2 and sphingolipid cooperatively. The structure of Slm1-PH, which we solved by X-ray crystallography at 2 Å resolution, suggests the presence of two positively charged binding pockets for anionic lipids. These results indicate that the PH domain of Slm1 might work as a coincidence sensor to integrate both PtdInsP and sphingolipid signaling pathways. This reinforces the emerging notion that cooperative mechanisms have important functions in PH domains functioning (Maffucci and Falasca, 2001 #528). These mechanisms initially described between PtdInsPs and proteins can now be extended to new lipid classes, illustrating the benefit of unbiased and systematic analyses.
This work shows the feasibility and benefits of large-scale analyses combining biochemical arrays and live-cell imaging for charting protein–lipid interactions. Accurate representations of biological processes require systematic charting of the physical and functional links between all cellular components. There is a clear need to expand molecular interaction space from proteome- to metabolome-wide efforts and of systematic classifications of bioactive molecules based on their binding profiles. The data provided here represents an excellent resource to enhance the understanding of lipids function in eukaryotic systems.
Protein–metabolite networks are central to biological systems, but are incompletely understood. Here, we report a screen to catalog protein–lipid interactions in yeast. We used arrays of 56 metabolites to measure lipid-binding fingerprints of 172 proteins, including 91 with predicted lipid-binding domains. We identified 530 protein–lipid associations, the majority of which are novel. To show the data set's biological value, we studied further several novel interactions with sphingolipids, a class of conserved bioactive lipids with an elusive mode of action. Integration of live-cell imaging suggests new cellular targets for these molecules, including several with pleckstrin homology (PH) domains. Validated interactions with Slm1, a regulator of actin polarization, show that PH domains can have unexpected lipid-binding specificities and can act as coincidence sensors for both phosphatidylinositol phosphates and phosphorylated sphingolipids.
doi:10.1038/msb.2010.87
PMCID: PMC3010107  PMID: 21119626
interactome; lipid–array; network; pleckstrin homology domains; sphingolipids
7.  Chemical combination effects predict connectivity in biological systems 
Chemical synergies can be novel probes of biological systems.Simulated response shapes depend on target connectivity in a pathway.Experiments with yeast and cancer cells confirm simulated effects.Profiles across many combinations yield target location information.
Living organisms are built of interacting components, whose function and dysfunction can be described through dynamic network models (Davidson et al, 2002). Systems Biology involves the iterative construction of such models (Ideker et al, 2001), and may eventually improve the understanding of diseases using in silico simulations. Such simulations may eventually permit drugs to be prioritized for clinical trials, reducing potential risks and increasing the likelihood of successful outcomes. Given the complexity of biological systems, constructing realistic models will require large and diverse sets of connectivity data.
Chemical combinations provide a new window into biological connectivity. Information gleaned from targeted combinations, such as paired mutations (Tong et al, 2004), has proven to be especially useful for revealing functional interactions between components. We have been screening chemical combinations for therapeutic synergies (Borisy et al, 2003; Zimmermann et al, 2007), collecting full-dose matrices where combinations are tested in all possible pairings of serially diluted single agent doses (Figure 1). Such screens yield a variety of response surfaces with distinct shapes for combinations that work through different known mechanisms, suggesting that combination effects may contain information on the nature of functional connections between drug targets.
Simulations of biological pathways predict synergistic responses to inhibitors that depend on target connectivity. We explored theoretical predictions by simulating a metabolic pathway with pairs of inhibitors aimed at different targets with varying doses. We found that the shape of each combination response depended on how the inhibitor pair's targets were connected in the pathway (Figure 2). The predicted response shapes were robust to plausible variations in the simulated pathway that did not affect the network topology (e.g., kinetic assumptions, parameter values, and nonlinear response functions), but were very sensitive to topological alterations in the modelled network (e.g., feedback regulation or changing the type of junction at a branch point). These findings suggest that connectivity of the inhibitor targets has a major influence on combination response morphology.
The predicted shapes were experimentally confirmed in yeast combination experiments. The proliferation experiment used drugs focused on the sterol biosynthesis pathway, which is mostly linear between the targets covered in this study, and is known to be regulated by negative feedback (Gardner et al, 2001). The combinations between sterol inhibitors confirmed expectations from our simulations, showing dose-additive responses for pairs targeting the same enzyme and strong synergies across enzymes of the shape predicted in our simulations for linear pathways under negative feedback. Combinations across pathways showed much more variable responses with a trend towards less synergy on average.
Further experimental support was obtained from human cells. A combination screen of 90 annotated drugs in a human tumour cell line (HCT116) proliferation assay produced strong synergies for combinations within pathways and more variable effects between targeted functions. Synergy profiles (sets of all synergy scores involving each drug) also showed a greater degree of similarity for pairs of drugs with related targets. Finally, the most extreme outliers were dominated by inhibitors of kinases that are especially critical for HCT116 proliferation (Awwad et al, 2003), with effects that are consistent across mechanistic replicates, showing that chemical combinations can highlight biologically relevant cellular processes.
This study demonstrates the potential of chemical combinations for exploring functional connectivity in biological systems. This information complements genetic studies by providing more details through variable dosing, by directly targeting single domains of multi-domain proteins, and by probing cell types that are not amenable to mutagenesis. Responses from large chemical combination screens can be used to identify molecular targets through chemical–genetic profiling (Macdonald et al, 2006), or to directly constrain network models by means of a prediction-validation procedure (Ideker et al, 2001). This initial exploration can be extended to cover a wider range of response shapes and network topologies, as well as to combinations of three or more chemical agents. Moreover, this approach may even be applicable to non-biological systems where responses to targeted perturbations can be measured.
Efforts to construct therapeutically useful models of biological systems require large and diverse sets of data on functional connections between their components. Here we show that cellular responses to combinations of chemicals reveal how their biological targets are connected. Simulations of pathways with pairs of inhibitors at varying doses predict distinct response surface shapes that are reproduced in a yeast experiment, with further support from a larger screen using human tumour cells. The response morphology yields detailed connectivity constraints between nearby targets, and synergy profiles across many combinations show relatedness between targets in the whole network. Constraints from chemical combinations complement genetic studies, because they probe different cellular components and can be applied to disease models that are not amenable to mutagenesis. Chemical probes also offer increased flexibility, as they can be continuously dosed, temporally controlled, and readily combined. After extending this initial study to cover a wider range of combination effects and pathway topologies, chemical combinations may be used to refine network models or to identify novel targets. This response surface methodology may even apply to non-biological systems where responses to targeted perturbations can be measured.
doi:10.1038/msb4100116
PMCID: PMC1828746  PMID: 17332758
chemical genetics; combinations and synergy; metabolic and regulatory networks; simulation and data analysis
8.  A Discovery Funnel for Nucleic Acid Binding Drug Candidates 
Drug development research  2011;72(2):178-186.
Computational approaches are becoming increasingly popular for the discovery of drug candidates against a target of interest. Proteins have historically been the primary targets of many virtual screening efforts. While in silico screens targeting proteins has proven successful, other classes of targets, in particular DNA, remain largely unexplored using virtual screening methods. With the realization of the functional importance of many non-cannonical DNA structures such as G-quadruplexes, increased efforts are underway to discover new small molecules that can bind selectively to DNA structures. Here, we describe efforts to build an integrated in silico and in vitro platform for discovering compounds that may bind to a chosen DNA target. Millions of compounds are initially screened in silico for selective binding to a particular structure and ranked to identify several hundred best hits. An important element of our strategy is the inclusion of an array of possible competing structures in the in silico screen. The best hundred or so hits are validated experimentally for binding to the actual target structure by a high-throughput 96-well thermal denaturation assay to yield the top ten candidates. Finally, these most promising candidates are thoroughly characterized for binding to their DNA target by rigorous biophysical methods, including isothermal titration calorimetry, differential scanning calorimetry, spectroscopy and competition dialysis.This platform was validated using quadruplex DNA as a target and a newly discovered quadruplex binding compound with possible anti-cancer activity was discovered. Some considerations when embarking on virtual screening and in silico experiments are also discussed.
doi:10.1002/ddr.20414
PMCID: PMC3090163  PMID: 21566705
drug discovery; in silico screening; SURFLEX-DOCK; DNA; G-quadruplex; high-throughput screening
9.  Genome-Wide Prediction and Validation of Peptides That Bind Human Prosurvival Bcl-2 Proteins 
PLoS Computational Biology  2014;10(6):e1003693.
Programmed cell death is regulated by interactions between pro-apoptotic and prosurvival members of the Bcl-2 family. Pro-apoptotic family members contain a weakly conserved BH3 motif that can adopt an alpha-helical structure and bind to a groove on prosurvival partners Bcl-xL, Bcl-w, Bcl-2, Mcl-1 and Bfl-1. Peptides corresponding to roughly 13 reported BH3 motifs have been verified to bind in this manner. Due to their short lengths and low sequence conservation, BH3 motifs are not detected using standard sequence-based bioinformatics approaches. Thus, it is possible that many additional proteins harbor BH3-like sequences that can mediate interactions with the Bcl-2 family. In this work, we used structure-based and data-based Bcl-2 interaction models to find new BH3-like peptides in the human proteome. We used peptide SPOT arrays to test candidate peptides for interaction with one or more of the prosurvival proteins Bcl-xL, Bcl-w, Bcl-2, Mcl-1 and Bfl-1. For the 36 most promising array candidates, we quantified binding to all five human receptors using direct and competition binding assays in solution. All 36 peptides showed evidence of interaction with at least one prosurvival protein, and 22 peptides bound at least one prosurvival protein with a dissociation constant between 1 and 500 nM; many peptides had specificity profiles not previously observed. We also screened the full-length parent proteins of a subset of array-tested peptides for binding to Bcl-xL and Mcl-1. Finally, we used the peptide binding data, in conjunction with previously reported interactions, to assess the affinity and specificity prediction performance of different models.
Author Summary
Bcl-2 family proteins regulate key cell death vs. survival decisions and are implicated in the development of many cancers. To understand the roles of Bcl-2 family proteins in both normal and diseased cells, it is important to map the interaction network of the family. Low sequence conservation in known Bcl-2 interaction motifs precludes easy identification of possible binding partners, but we developed computational models based on structure and experimental mutation data that show good predictive performance. We used our models to search the human proteome for new Bcl-2 interaction partners. We predicted and experimentally validated more than twice as many tight-binding peptides as were previously known.
doi:10.1371/journal.pcbi.1003693
PMCID: PMC4072508  PMID: 24967846
10.  In Silico Molecular Comparisons of C. elegans and Mammalian Pharmacology Identify Distinct Targets That Regulate Feeding 
PLoS Biology  2013;11(11):e1001712.
This paper takes advantage of similarities between the C. elegans and human pharmacopeia to identify and validate pharmacological targets that regulate C. elegans feeding rates.
Phenotypic screens can identify molecules that are at once penetrant and active on the integrated circuitry of a whole cell or organism. These advantages are offset by the need to identify the targets underlying the phenotypes. Additionally, logistical considerations limit screening for certain physiological and behavioral phenotypes to organisms such as zebrafish and C. elegans. This further raises the challenge of elucidating whether compound-target relationships found in model organisms are preserved in humans. To address these challenges we searched for compounds that affect feeding behavior in C. elegans and sought to identify their molecular mechanisms of action. Here, we applied predictive chemoinformatics to small molecules previously identified in a C. elegans phenotypic screen likely to be enriched for feeding regulatory compounds. Based on the predictions, 16 of these compounds were tested in vitro against 20 mammalian targets. Of these, nine were active, with affinities ranging from 9 nM to 10 µM. Four of these nine compounds were found to alter feeding. We then verified the in vitro findings in vivo through genetic knockdowns, the use of previously characterized compounds with high affinity for the four targets, and chemical genetic epistasis, which is the effect of combined chemical and genetic perturbations on a phenotype relative to that of each perturbation in isolation. Our findings reveal four previously unrecognized pathways that regulate feeding in C. elegans with strong parallels in mammals. Together, our study addresses three inherent challenges in phenotypic screening: the identification of the molecular targets from a phenotypic screen, the confirmation of the in vivo relevance of these targets, and the evolutionary conservation and relevance of these targets to their human orthologs.
Author Summary
Many beneficial pharmacological interventions were first discovered by observing the effects of perturbation of intact biological systems by small organic molecules without a priori knowledge of their targets. This forward pharmacological approach has the advantage of directly identifying new pharmacological agents that are active on complex biological processes. However, because of experimental feasibility, systematic application of this approach is generally limited to small animals such as the roundworm C. elegans and zebrafish, raising the question of whether use of these animals could identify compounds that act on ortholgous mammalian targets. A significant challenge in addressing this question is the determination of the molecular identities of the compounds' targets responsible for the desired phenotypic outcomes. Here we describe a computational approach for target identification based on structural similarities of newly identified compounds to known ligand interactions with mostly mammalian targets. For several of the compounds emerging from a C. elegans phenotypic screen, we predict and confirm mammalian targets using in vitro binding assays. Using genetic and pharmacological assays, we then demonstrate that a subset of these compounds alter C. elegans feeding rates through the C. elegans counterparts of the predicted mammalian targets.
doi:10.1371/journal.pbio.1001712
PMCID: PMC3833878  PMID: 24260022
11.  Target Inhibition Networks: Predicting Selective Combinations of Druggable Targets to Block Cancer Survival Pathways 
PLoS Computational Biology  2013;9(9):e1003226.
A recent trend in drug development is to identify drug combinations or multi-target agents that effectively modify multiple nodes of disease-associated networks. Such polypharmacological effects may reduce the risk of emerging drug resistance by means of attacking the disease networks through synergistic and synthetic lethal interactions. However, due to the exponentially increasing number of potential drug and target combinations, systematic approaches are needed for prioritizing the most potent multi-target alternatives on a global network level. We took a functional systems pharmacology approach toward the identification of selective target combinations for specific cancer cells by combining large-scale screening data on drug treatment efficacies and drug-target binding affinities. Our model-based prediction approach, named TIMMA, takes advantage of the polypharmacological effects of drugs and infers combinatorial drug efficacies through system-level target inhibition networks. Case studies in MCF-7 and MDA-MB-231 breast cancer and BxPC-3 pancreatic cancer cells demonstrated how the target inhibition modeling allows systematic exploration of functional interactions between drugs and their targets to maximally inhibit multiple survival pathways in a given cancer type. The TIMMA prediction results were experimentally validated by means of systematic siRNA-mediated silencing of the selected targets and their pairwise combinations, showing increased ability to identify not only such druggable kinase targets that are essential for cancer survival either individually or in combination, but also synergistic interactions indicative of non-additive drug efficacies. These system-level analyses were enabled by a novel model construction method utilizing maximization and minimization rules, as well as a model selection algorithm based on sequential forward floating search. Compared with an existing computational solution, TIMMA showed both enhanced prediction accuracies in cross validation as well as significant reduction in computation times. Such cost-effective computational-experimental design strategies have the potential to greatly speed-up the drug testing efforts by prioritizing those interventions and interactions warranting further study in individual cancer cases.
Author Summary
Selective inhibition of specific panels of multiple protein targets provides an unprecedented potential for improving therapeutic efficacy of anticancer agents. We introduce a computational systems pharmacology strategy, which uses the concept of target inhibition networks to predict effective multi-target combinations for treating specific cancer types. The strategy is based on integration of two complementary information sources, drug treatment efficacies and drug-target binding affinities, which are readily available in drug screening labs. Compared to the cancer sequencing efforts, which often result in a huge number of non-targetable genetic alterations, the target combinations from our strategy are druggable, by definition, hence enabling more straightforward translation toward clinically actionable treatment strategies. The model predictions were experimentally validated using siRNA-mediated target silencing screens in three case studies involving MDA-MB-231 and MCF-7 breast cancer and BxPC-3 pancreatic cancer cells. In more general terms, the cancer cell-specific target inhibition networks provided additional insights into the drugs' mechanisms of action, for instance, how the cancer cell survival pathways can be targeted by synergistic and synthetic lethal interactions through multi–target perturbations. These results demonstrate that the principles introduced here offer the possibilities to move toward more systematic prediction and evaluation of the most effective drug target combinations.
doi:10.1371/journal.pcbi.1003226
PMCID: PMC3772058  PMID: 24068907
12.  Cross-species chemogenomic profiling reveals evolutionarily conserved drug mode of action 
Chemogenomic screens were performed in both budding and fission yeasts, allowing for a cross-species comparison of drug–gene interaction networks.Drug–module interactions were more conserved than individual drug–gene interactions.Combination of data from both species can improve drug–module predictions and helps identify a compound's mode of action.
Understanding the molecular effects of chemical compounds in living cells is an important step toward rational therapeutics. Drug discovery aims to find compounds that will target a specific pathway or pathogen with minimal side effects. However, even when an effective drug is found, its mode of action (MoA) is typically not well understood. The lack of knowledge regarding a drug's MoA makes the drug discovery process slow and rational therapeutics incredibly difficult. More recently, different high-throughput methods have been developed that attempt to discern how a compound exerts its effects in cells. One of these methods relies on measuring the growth of cells carrying different mutations in the presence of the compounds of interest, commonly referred to as chemogenomics (Wuster and Babu, 2008). The differential growth of the different mutants provides clues as to what the compounds target in the cell (Figure 2). For example, if a drug inhibits a branch in a vital two-branch pathway, then mutations in the second branch might result in cell death if the mutants are grown in the presence of the drug (Figure 2C). As these compound–mutant functional interactions are expected to be relatively rare, one can assume that the growth rate of a mutant–drug combination should generally be equal to the product of the growth rate of the untreated mutant with the growth rate of the drug-treated wild type. This expectation is defined as the neutral model and deviations from this provide a quantitative score that allow us to make informed predictions regarding a drug's MoA (Figure 2B; Parsons et al, 2006).
The availability of these high-throughput approaches now allows us to perform cross-species studies of functional interactions between compounds and genes. In this study, we have performed a quantitative analysis of compound–gene interactions for two fungal species (budding yeast (S. cerevisiae) and fission yeast (S. pombe)) that diverged from each other approximately 500–700 million years ago. A collection of 2957 compounds from the National Cancer Institute (NCI) were screened in both species for inhibition of wild-type cell growth. A total of 132 were found to be bioactive in both fungi and 9, along with 12 additional well-characterized drugs, were selected for subsequent screening. Mutant libraries of 727 and 438 gene deletions were used for S. cerevisiae and S. pombe, respectively, and these were selected based on availability of genetic interaction data from previous studies (Collins et al, 2007; Roguev et al, 2008; Fiedler et al, 2009) and contain an overlap of 190 one-to-one orthologs that can be directly compared. Deviations from the neutral expectation were quantified as drug–gene interactions scores (D-scores) for the 21 compounds against the deletion libraries. Replicates of both screens showed very high correlations (S. cerevisiae r=0.72, S. pombe r=0.76) and reproduced well previously known compound–gene interactions (Supplementary information). We then compared the D-scores for the 190 one-to-one orthologs present in the data set of both species. Despite the high reproducibility, we observed a very poor conservation of these compound–gene interaction scores across these species (r=0.13, Figure 4A).
Previous work had shown that, across these same species, genetic interactions within protein complexes were much more conserved than average genetic interactions (Roguev et al, 2008). Similarly we observed a higher cross-species conservation of the compound–module (complex or pathway) interactions than the overall compound–gene interactions. Specifically, the data derived from fission yeast were a poor predictor of S. cerevisaie drug–gene interactions, but a good predictor of budding yeast compound–module connections (Figure 4B). Also, a combined score from both species improved the prediction of compound–module interactions, above the accuracy observed with the S. cerevisae information alone, but this improvement was not observed for the prediction of drug–gene interactions (Figure 4B). Data from both species were used to predict drug–module interactions, and one specific interaction (compound NSC-207895 interaction with DNA repair complexes) was experimentally verified by showing that the compound activates the DNA damage repair pathway in three species (S. cerevisiae, S. pombe and H. sapiens).
To understand why the combination of chemogenomic data from two species might improve drug–module interaction predictions, we also analyzed previously published cross-species genetic–interaction data. We observed a significant correlation between the conservation of drug–gene and gene–gene interactions among the one-to-one orthologs (r=0.28, P-value=0.0078). Additionally, the strongest interactions of benomyl (a microtubule inhibitor) were to complexes that also had strong and conserved genetic interactions with microtubules (Figure 4C). We hypothesize that a significant number of the compound–gene interactions obtained from chemogenomic studies are not direct interactions with the physical target of the compounds, but include many indirect interactions that genetically interact with the main target(s). This would explain why the compound interaction networks show similar evolutionary patterns as the genetic interactions networks.
In summary, these results shed some light on the interplay between the evolution of genetic networks and the evolution of drug response. Understanding how genetic variability across different species might result in different sensitivity to drugs should improve our capacity to design treatments. Concretely, we hope that this line of research might one day help us create drugs and drug combinations that specifically affect a pathogen or diseased tissue, but not the host.
We present a cross-species chemogenomic screening platform using libraries of haploid deletion mutants from two yeast species, Saccharomyces cerevisiae and Schizosaccharomyces pombe. We screened a set of compounds of known and unknown mode of action (MoA) and derived quantitative drug scores (or D-scores), identifying mutants that are either sensitive or resistant to particular compounds. We found that compound–functional module relationships are more conserved than individual compound–gene interactions between these two species. Furthermore, we observed that combining data from both species allows for more accurate prediction of MoA. Finally, using this platform, we identified a novel small molecule that acts as a DNA damaging agent and demonstrate that its MoA is conserved in human cells.
doi:10.1038/msb.2010.107
PMCID: PMC3018166  PMID: 21179023
chemogenomics; evolution; modularity
13.  Adverse Drug Reaction Prediction Using Scores Produced by Large-Scale Drug-Protein Target Docking on High-Performance Computing Machines 
PLoS ONE  2014;9(9):e106298.
Late-stage or post-market identification of adverse drug reactions (ADRs) is a significant public health issue and a source of major economic liability for drug development. Thus, reliable in silico screening of drug candidates for possible ADRs would be advantageous. In this work, we introduce a computational approach that predicts ADRs by combining the results of molecular docking and leverages known ADR information from DrugBank and SIDER. We employed a recently parallelized version of AutoDock Vina (VinaLC) to dock 906 small molecule drugs to a virtual panel of 409 DrugBank protein targets. L1-regularized logistic regression models were trained on the resulting docking scores of a 560 compound subset from the initial 906 compounds to predict 85 side effects, grouped into 10 ADR phenotype groups. Only 21% (87 out of 409) of the drug-protein binding features involve known targets of the drug subset, providing a significant probe of off-target effects. As a control, associations of this drug subset with the 555 annotated targets of these compounds, as reported in DrugBank, were used as features to train a separate group of models. The Vina off-target models and the DrugBank on-target models yielded comparable median area-under-the-receiver-operating-characteristic-curves (AUCs) during 10-fold cross-validation (0.60–0.69 and 0.61–0.74, respectively). Evidence was found in the PubMed literature to support several putative ADR-protein associations identified by our analysis. Among them, several associations between neoplasm-related ADRs and known tumor suppressor and tumor invasiveness marker proteins were found. A dual role for interstitial collagenase in both neoplasms and aneurysm formation was also identified. These associations all involve off-target proteins and could not have been found using available drug/on-target interaction data. This study illustrates a path forward to comprehensive ADR virtual screening that can potentially scale with increasing number of CPUs to tens of thousands of protein targets and millions of potential drug candidates.
doi:10.1371/journal.pone.0106298
PMCID: PMC4156361  PMID: 25191698
14.  Predicting New Indications for Approved Drugs Using a Proteo-Chemometric Method 
Journal of medicinal chemistry  2012;55(15):6832-6848.
The most effective way to move from target identification to the clinic is to identify already approved drugs with the potential for activating or inhibiting unintended targets (repurposing or repositioning). This is usually achieved by high throughput chemical screening, transcriptome matching or simple in silico ligand docking. We now describe a novel rapid computational proteo-chemometric method called “Train, Match, Fit, Streamline” (TMFS) to map new drug-target interaction space and predict new uses. The TMFS method combines shape, topology and chemical signatures, including docking score and functional contact points of the ligand, to predict potential drug-target interactions with remarkable accuracy. Using the TMFS method, we performed extensive molecular fit computations on 3,671 FDA approved drugs across 2,335 human protein crystal structures. The TMFS method predicts drug-target associations with 91% accuracy for the majority of drugs. Over 58% of the known best ligands for each target were correctly predicted as top ranked, followed by 66%, 76%, 84% and 91% for agents ranked in the top 10, 20, 30 and 40, respectively, out of all 3,671 drugs. Drugs ranked in the top 1–40, that have not been experimentally validated for a particular target now become candidates for repositioning. Furthermore, we used the TMFS method to discover that mebendazole, an anti-parasitic with recently discovered and unexpected anti-cancer properties, has the structural potential to inhibit VEGFR2. We confirmed experimentally that mebendazole inhibits VEGFR2 kinase activity as well as angiogenesis at doses comparable with its known effects on hookworm. TMFS also predicted, and was confirmed with surface plasmon resonance, that dimethyl celecoxib and the anti-inflammatory agent celecoxib can bind cadherin-11, an adhesion molecule important in rheumatoid arthritis and poor prognosis malignancies for which no targeted therapies exist. We anticipate that expanding our TMFS method to the >27,000 clinically active agents available worldwide across all targets will be most useful in the repositioning of existing drugs for new therapeutic targets.
doi:10.1021/jm300576q
PMCID: PMC3419493  PMID: 22780961
15.  Structural Analysis of microRNA-Target Interaction by Sequential Seed Mutagenesis and Stem-Loop 3' RACE 
PLoS ONE  2013;8(11):e81427.
Background
As a consequence of recent RNAseq efforts, miRNAomes of diverse tissues and species are available. However, most interactions between microRNAs and regulated mRNAs are still to be deciphered. While in silico analysis of microRNAs results in prediction of hundreds of potential targets, bona-fide interactions have to be verified e.g. by luciferase reporter assays using fused target sites as well as controls incorporating mutated seed sequences. The aim of this study was the development of a straightforward approach for sequential mutation of multiple target sites within a given 3’ UTR.
Methodology/Principal Findings
The established protocol is based on Seed Mutagenesis Assembly PCR (SMAP) allowing for rapid identification of microRNA target sites. Based on the presented approach, we were able to determine the transcription factor NKX3.1 as a genuine target of miR-155. The sequential mutagenesis of multiple microRNA target sites was examined by miR-29a mediated CASP7 regulation, which revealed one of two predicted target sites as the predominant site of interaction. Since 3’ UTR sequences of non-model organisms are either lacking in databases or computationally predicted, we developed a Stem-Loop 3’ UTR RACE PCR (SLURP) for efficient generation of required 3’ UTR sequence data. The stem-loop primer allows for first strand cDNA synthesis by nested PCR amplification of the 3’ UTR. Besides other applications, the SLURP method was used to gain data on porcine CASP7 3’UTR evaluating evolutionary conservation of the studied interaction.
Conclusions/Significance
Sequential seed mutation of microRNA targets based on the SMAP approach allows for rapid structural analysis of several target sites within a given 3’ UTR. The combination of both methods (SMAP and SLURP) enables targeted analysis of microRNA binding sites in hitherto unknown mRNA 3’ UTRs within a few days.
doi:10.1371/journal.pone.0081427
PMCID: PMC3839922  PMID: 24282594
16.  Prioritization of gene regulatory interactions from large-scale modules in yeast 
BMC Bioinformatics  2008;9:32.
Background
The identification of groups of co-regulated genes and their transcription factors, called transcriptional modules, has been a focus of many studies about biological systems. While methods have been developed to derive numerous modules from genome-wide data, individual links between regulatory proteins and target genes still need experimental verification. In this work, we aim to prioritize regulator-target links within transcriptional modules based on three types of large-scale data sources.
Results
Starting with putative transcriptional modules from ChIP-chip data, we first derive modules in which target genes show both expression and function coherence. The most reliable regulatory links between transcription factors and target genes are established by identifying intersection of target genes in coherent modules for each enriched functional category. Using a combination of genome-wide yeast data in normal growth conditions and two different reference datasets, we show that our method predicts regulatory interactions with significantly higher predictive power than ChIP-chip binding data alone. A comparison with results from other studies highlights that our approach provides a reliable and complementary set of regulatory interactions. Based on our results, we can also identify functionally interacting target genes, for instance, a group of co-regulated proteins related to cell wall synthesis. Furthermore, we report novel conserved binding sites of a glycoprotein-encoding gene, CIS3, regulated by Swi6-Swi4 and Ndd1-Fkh2-Mcm1 complexes.
Conclusion
We provide a simple method to prioritize individual TF-gene interactions from large-scale transcriptional modules. In comparison with other published works, we predict a complementary set of regulatory interactions which yields a similar or higher prediction accuracy at the expense of sensitivity. Therefore, our method can serve as an alternative approach to prioritization for further experimental studies.
doi:10.1186/1471-2105-9-32
PMCID: PMC2244593  PMID: 18211684
17.  Reconstruction and flux-balance analysis of the Plasmodium falciparum metabolic network 
In the paper we present a metabolic reconstruction and flux-balance analysis (FBA) of Plasmodium falciparum, the primary agent of malaria. The compartmentalized metabolic network of the parasite accounts for 1001 reactions and 616 metabolites. Enzyme–gene associations were established for 366 genes and 75% of all enzymatic reactions.The model was able to reproduce phenotypes of experimental gene knockout and drug inhibition assays with up to 90% accuracy. The model also can be used to efficiently integrate mRNA-expression data to improve the accuracy of metabolic predictions.Using FBA of the reconstructed metabolic network, we identified 40 enzymatic drug targets (i.e. in silico essential genes) with no or very low sequence identity to human proteins.We experimentally tested one of the identified drug targets, nicotinate mononucleotide adenylyltransferase, using a recently discovered small-molecule inhibitor.
Malaria remains one of the most severe public health challenges worldwide (WHO, 2008). Although several available drugs have been successful in controlling malaria in the past, most of them are rapidly losing efficacy due to acquired drug resistance in the most lethal causative agent, Plasmodium falciparum (Mackinnon and Marsh, 2010). This creates an urgent need for new drugs and treatments, as well as better understanding of the parasite physiology. With this in mind, we built a genome-scale flux-balance model of the P. falciparum metabolism. Given the complex life cycle of Plasmodium, the flux-balanced model is of direct relevance to the ongoing search to identify new therapeutic drug targets. The model can be used to explore diverse metabolic states of the parasite and identify essential metabolic genes in the context of known alternative pathways (Oberhardt et al, 2009).
The reconstructed model, which is based on Plasmodium-specific databases, genomic annotations, and literature reports, includes 366 genes, 1001 reactions, 616 metabolic species, and 4 cellular compartments. We applied flux-balance analysis (FBA) (Orth et al, 2010) to identify the genes and reactions that are required to produce a set of necessary biomass components. Interestingly, compared with the yeast metabolic network (Duarte et al, 2004), a model eukaryote with a similar genome size, the Plasmodium network has a significantly higher proportion of essential genes; we confirmed this result using a comparative analysis of known gene knockouts in the two microbes. This low level of genetic robustness, which is likely due to the parasitic lifestyle, suggests that many metabolic genes of the parasite can be used as effective drug targets. Indeed, based on the in silico analysis we identified 40 essential P. falciparum genes with no or very little sequence identity to their human homologs.
We used a recently described small-molecule inhibitor (compound 1_03; Sorci et al, 2009) to experimentally verify one of the enzymes identified as essential: nicotinate mononucleotide adenylyltransferase (NMNAT; Figure 2A). This enzyme, and the corresponding NAD synthesis and recycling pathway, have been recently used for anti-microbial development (Magni et al, 2009). However, to the best of our knowledge, they have not been used against P. falciparum. The compound 1_03 was able to completely block host cell escape and reinvasion by arresting parasites in the trophozoite growth stage (Figure 2B). These results demonstrate that the inhibitory compound may be a good starting lead for new anti-malarials.
Importantly, the metabolic model of the parasite can be also used to integrate various genomic data, such as gene expression (Oberhardt et al, 2009). To illustrate these possibilities, we applied gene-expression data as constraints for the flux-balance model (Colijn et al, 2009) in order to predict changes in metabolic exchange fluxes. We found that the model was able to correctly predict the changes in external metabolite concentrations (Olszewski et al, 2009) with about 70% accuracy (Figure 3). The availability of a human metabolic network reconstruction (Duarte et al, 2007) would allow, in the future, to analyze the combined parasite–host network, which would deepen understanding of the P. falciparum metabolic vulnerabilities.
Future improvements of the presented P. falciparum metabolic model, for example incorporation of missing activities and yet undiscovered pathways, will lead to a better understanding of parasite physiology. Ultimately, the improved understanding should significantly accelerate the identification and development of desperately needed new drugs against this devastating disease.
Genome-scale metabolic reconstructions can serve as important tools for hypothesis generation and high-throughput data integration. Here, we present a metabolic network reconstruction and flux-balance analysis (FBA) of Plasmodium falciparum, the primary agent of malaria. The compartmentalized metabolic network accounts for 1001 reactions and 616 metabolites. Enzyme–gene associations were established for 366 genes and 75% of all enzymatic reactions. Compared with other microbes, the P. falciparum metabolic network contains a relatively high number of essential genes, suggesting little redundancy of the parasite metabolism. The model was able to reproduce phenotypes of experimental gene knockout and drug inhibition assays with up to 90% accuracy. Moreover, using constraints based on gene-expression data, the model was able to predict the direction of concentration changes for external metabolites with 70% accuracy. Using FBA of the reconstructed network, we identified 40 enzymatic drug targets (i.e. in silico essential genes), with no or very low sequence identity to human proteins. To demonstrate that the model can be used to make clinically relevant predictions, we experimentally tested one of the identified drug targets, nicotinate mononucleotide adenylyltransferase, using a recently discovered small-molecule inhibitor.
doi:10.1038/msb.2010.60
PMCID: PMC2964117  PMID: 20823846
flux-balance analysis; Plasmodium falciparum metabolism; systems biology
18.  Nonsense-Mediated mRNA Decay Immunity Can Help Identify Human Polycistronic Transcripts 
PLoS ONE  2014;9(3):e91535.
Eukaryotic polycistronic transcription units are rare and only a few examples are known, mostly being the outcome of serendipitous discovery. We claim that nonsense-mediated mRNA decay (NMD) immune structure is a common characteristic of polycistronic transcripts, and that this immunity is an emergent property derived from all functional CDSs. The human RefSeq transcriptome was computationally screened for transcripts capable of eliciting NMD, and which contain an additional ORF(s) potentially capable of rescuing the transcript from NMD. Transcripts were further analyzed implementing domain-based strategies in order to estimate the potential of the candidate ORF to encode a functional protein. Consequently, we predict the existence of forty nine novel polycistronic transcripts.
Experimental verification was carried out utilizing two different types of analyses. First, five Gene Expression Omnibus (GEO) datasets from published NMD-inhibition studies were used, aiming to explore whether a given mRNA is indeed insensitive to NMD. All known bicistronic transcripts and eleven out of the twelve predicted genes that were analyzed, displayed NMD insensitivity using various NMD inhibitors. For three genes, a mixed expression pattern was observed presenting both NMD sensitivity and insensitivity in different cell types. Second, we used published global translation initiation sequencing data from HEK293 cells to verify the existence of translation initiation sites in our predicted polycistronic genes. In five of our genes, the predicted rescuing uORFs are indeed identified as translation initiation sites, and in two additional genes, one of two predicted rescuing uORF is verified. These results validate our computational analysis and reinforce the possibility that NMD-immune architecture is a parameter by which polycistronic genes can be identified. Moreover, we present evidence for NMD-mediated regulation controlling the production of one or more proteins encoded in the polycistronic transcript.
doi:10.1371/journal.pone.0091535
PMCID: PMC3951408  PMID: 24621851
19.  Non-canonical peroxisome targeting signals: identification of novel PTS1 tripeptides and characterization of enhancer elements by computational permutation analysis 
BMC Plant Biology  2012;12:142.
Background
High-accuracy prediction tools are essential in the post-genomic era to define organellar proteomes in their full complexity. We recently applied a discriminative machine learning approach to predict plant proteins carrying peroxisome targeting signals (PTS) type 1 from genome sequences. For Arabidopsis thaliana 392 gene models were predicted to be peroxisome-targeted. The predictions were extensively tested in vivo, resulting in a high experimental verification rate of Arabidopsis proteins previously not known to be peroxisomal.
Results
In this study, we experimentally validated the predictions in greater depth by focusing on the most challenging Arabidopsis proteins with unknown non-canonical PTS1 tripeptides and prediction scores close to the threshold. By in vivo subcellular targeting analysis, three novel PTS1 tripeptides (QRL>, SQM>, and SDL>) and two novel tripeptide residues (Q at position −3 and D at pos. -2) were identified. To understand why, among many Arabidopsis proteins carrying the same C-terminal tripeptides, these proteins were specifically predicted as peroxisomal, the residues upstream of the PTS1 tripeptide were computationally permuted and the changes in prediction scores were analyzed. The newly identified Arabidopsis proteins were found to contain four to five amino acid residues of high predicted targeting enhancing properties at position −4 to −12 in front of the non-canonical PTS1 tripeptide. The identity of the predicted targeting enhancing residues was unexpectedly diverse, comprising besides basic residues also proline, hydroxylated (Ser, Thr), hydrophobic (Ala, Val), and even acidic residues.
Conclusions
Our computational and experimental analyses demonstrate that the plant PTS1 tripeptide motif is more diverse than previously thought, including an increasing number of non-canonical sequences and allowed residues. Specific targeting enhancing elements can be predicted for particular sequences of interest and are far more diverse in amino acid composition and positioning than previously assumed. Machine learning methods become indispensable to predict which specific proteins, among numerous candidate proteins carrying the same non-canonical PTS1 tripeptide, contain sufficient enhancer elements in terms of number, positioning and total strength to cause peroxisome targeting.
doi:10.1186/1471-2229-12-142
PMCID: PMC3487989  PMID: 22882975
20.  In silico selection of RNA aptamers 
Nucleic Acids Research  2009;37(12):e87.
In vitro selection of RNA aptamers that bind to a specific ligand usually begins with a random pool of RNA sequences. We propose a computational approach for designing a starting pool of RNA sequences for the selection of RNA aptamers for specific analyte binding. Our approach consists of three steps: (i) selection of RNA sequences based on their secondary structure, (ii) generating a library of three-dimensional (3D) structures of RNA molecules and (iii) high-throughput virtual screening of this library to select aptamers with binding affinity to a desired small molecule. We developed a set of criteria that allows one to select a sequence with potential binding affinity from a pool of random sequences and developed a protocol for RNA 3D structure prediction. As verification, we tested the performance of in silico selection on a set of six known aptamer–ligand complexes. The structures of the native sequences for the ligands in the testing set were among the top 5% of the selected structures. The proposed approach reduces the RNA sequences search space by four to five orders of magnitude—significantly accelerating the experimental screening and selection of high-affinity aptamers.
doi:10.1093/nar/gkp408
PMCID: PMC2709588  PMID: 19465396
21.  AMMOS: Automated Molecular Mechanics Optimization tool for in silico Screening 
BMC Bioinformatics  2008;9:438.
Background
Virtual or in silico ligand screening combined with other computational methods is one of the most promising methods to search for new lead compounds, thereby greatly assisting the drug discovery process. Despite considerable progresses made in virtual screening methodologies, available computer programs do not easily address problems such as: structural optimization of compounds in a screening library, receptor flexibility/induced-fit, and accurate prediction of protein-ligand interactions. It has been shown that structural optimization of chemical compounds and that post-docking optimization in multi-step structure-based virtual screening approaches help to further improve the overall efficiency of the methods. To address some of these points, we developed the program AMMOS for refining both, the 3D structures of the small molecules present in chemical libraries and the predicted receptor-ligand complexes through allowing partial to full atom flexibility through molecular mechanics optimization.
Results
The program AMMOS carries out an automatic procedure that allows for the structural refinement of compound collections and energy minimization of protein-ligand complexes using the open source program AMMP. The performance of our package was evaluated by comparing the structures of small chemical entities minimized by AMMOS with those minimized with the Tripos and MMFF94s force fields. Next, AMMOS was used for full flexible minimization of protein-ligands complexes obtained from a mutli-step virtual screening. Enrichment studies of the selected pre-docked complexes containing 60% of the initially added inhibitors were carried out with or without final AMMOS minimization on two protein targets having different binding pocket properties. AMMOS was able to improve the enrichment after the pre-docking stage with 40 to 60% of the initially added active compounds found in the top 3% to 5% of the entire compound collection.
Conclusion
The open source AMMOS program can be helpful in a broad range of in silico drug design studies such as optimization of small molecules or energy minimization of pre-docked protein-ligand complexes. Our enrichment study suggests that AMMOS, designed to minimize a large number of ligands pre-docked in a protein target, can successfully be applied in a final post-processing step and that it can take into account some receptor flexibility within the binding site area.
doi:10.1186/1471-2105-9-438
PMCID: PMC2588602  PMID: 18925937
22.  JRC GMO-Matrix: a web application to support Genetically Modified Organisms detection strategies 
BMC Bioinformatics  2014;15(1):417.
Background
The polymerase chain reaction (PCR) is the current state of the art technique for DNA-based detection of Genetically Modified Organisms (GMOs). A typical control strategy starts by analyzing a sample for the presence of target sequences (GM-elements) known to be present in many GMOs. Positive findings from this “screening” are then confirmed with GM (event) specific test methods. A reliable knowledge of which GMOs are detected by combinations of GM-detection methods is thus crucial to minimize the verification efforts.
Description
In this article, we describe a novel platform that links the information of two unique databases built and maintained by the European Union Reference Laboratory for Genetically Modified Food and Feed (EU-RL GMFF) at the Joint Research Centre (JRC) of the European Commission, one containing the sequence information of known GM-events and the other validated PCR-based detection and identification methods. The new platform compiles in silico determinations of the detection of a wide range of GMOs by the available detection methods using existing scripts that simulate PCR amplification and, when present, probe binding. The correctness of the information has been verified by comparing the in silico conclusions to experimental results for a subset of forty-nine GM events and six methods.
Conclusions
The JRC GMO-Matrix is unique for its reliance on DNA sequence data and its flexibility in integrating novel GMOs and new detection methods. Users can mine the database using a set of web interfaces that thus provide a valuable support to GMO control laboratories in planning and evaluating their GMO screening strategies. The platform is accessible at http://gmo-crl.jrc.ec.europa.eu/jrcgmomatrix/.
doi:10.1186/s12859-014-0417-8
PMCID: PMC4310036  PMID: 25547877
Genetically Modified Organism; Matrix approach; Screening; qPCR
23.  FINDSITEX: A structure based, small molecule virtual screening approach with application to all identified human GPCRs 
Molecular Pharmaceutics  2012;9(6):1775-1784.
We have developed FINDSITEX, an extension of FINDSITE, a protein threading based algorithm for the inference of protein binding sites, biochemical function and virtual ligand screening, that removes the limitation that holo protein structures (those containing bound ligands) of a sufficiently large set of distant evolutionarily related proteins to the target be solved; rather, predicted protein structures and experimental ligand binding information are employed. To provide the predicted protein structures, a fast and accurate version of our recently developed TASSERVMT, TASSERVMT-lite, for template-based protein structural modeling applicable up to 1000 residues is developed and tested, with comparable performance to the top CASP9 servers. Then, a hybrid approach that combines structure alignments with an evolutionary similarity score for identifying functional relationships between target and proteins with binding data has been developed. By way of illustration, FINDSITEX is applied to 998 identified human G-protein coupled receptors (GPCRs). First, TASSERVMT-lite provides updates of all human GPCR structures previously modeled in our lab. We then use these structures and the new function similarity detection algorithm to screen all human GPCRs against the ZINC8 non-redundant (TC<0.7) ligand set combined with ligands from the GLIDA database (a total of 88,949 compounds). Testing (excluding GPCRs whose sequence identity > 30% to the target from the binding data library) on a 168 human GPCR set with known binding data, the average enrichment factor in the top 1% of the compound library (EF0.01) is 22.7, whereas EF0.01 by FINDSITE is 7.1. For virtual screening when just the target and its native ligands are excluded, then the average EF0.01 reaches 41.4. We also analyze off-target interactions for the 168 protein test set. All predicted structures, virtual screening data and off-target interactions for the 998 human GPCRs are available at http://cssb.biology.gatech.edu/skolnick/webservice/gpcr/index.html.
doi:10.1021/mp3000716
PMCID: PMC3396429  PMID: 22574683
TASSERVMT; FINDSITE; GPCR modeling; template-based modeling; virtual screening
24.  In silico Target Fishing for Rationalized Ligand Discovery Exemplified on Constituents of Ruta graveolens 
Planta medica  2008;75(3):195-204.
The identification of targets whose interaction is likely to result in the successful treatment of a disease is of growing interest for natural product scientists. In the current study we performed an exemplary application of a virtual parallel screening approach to identify potential targets for 16 secondary metabolites isolated and identified from the aerial parts of the medicinal plant Ruta graveolens L. Low energy conformers of the isolated constituents were simultaneously screened against a set of 2208 pharmacophore models generated in-house for the in silico prediction of putative biological targets, i. e., target fishing. Based on the predicted ligand-target interactions, we focused on three biological targets, namely acetylcholinesterase (AChE), the human rhinovirus (HRV) coat protein and the cannabinoid receptor type-2 (CB2). For a critical evaluation of the applied parallel screening approach, virtual hits and non-hits were assayed on the respective targets. For AChE the highest scoring virtual hit, arborinine, showed the best inhibitory in vitro activity on AChE (IC50 34.7 μM). Determination of the anti-HRV-2 effect revealed 6,7,8-trimethoxycoumarin and arborinine to be the most active antiviral constituents with IC50 values of 11.98 μM and 3.19 μM, respectively. Of these, arborinine was predicted virtually. Of all the molecules subjected to parallel screening, one virtual CB2 ligand was obtained, i.e., rutamarin. Interestingly, in experimental studies only this compound showed a selective activity to the CB2 receptor (Ki of 7.4 μM) by using a radioligand displacement assay. The applied parallel screening paradigm with constituents of R. graveolens on three different proteins has shown promise as an in silico tool for rational target fishing and pharmacological profiling of extracts and single chemical entities in natural product research.
doi:10.1055/s-0028-1088397
PMCID: PMC3525952  PMID: 19096995
Ruta graveolens L; Rutaceae; pharmacophore modelling; virtual parallel screening; acetylcholinesterase; cannabinoid receptor 2; human rhinovirus coat protein
25.  Imperfect Duplicate Insertions Type of Mutations in Plasmepsin V Modulates Binding Properties of PEXEL Motifs of Export Proteins in Indian Plasmodium vivax 
PLoS ONE  2013;8(3):e60077.
Introduction
Plasmepsin V (PM-V) have functionally conserved orthologues across the Plasmodium genus who's binding and antigenic processing at the PEXEL motifs for export about 200–300 essential proteins is important for the virulence and viability of the causative Plasmodium species. This study was undertaken to determine P. vivax plasmepsin V Ind (PvPM-V-Ind) PEXEL motif export pathway for pathogenicity-related proteins/antigens export thereby altering plasmodium exportome during erythrocytic stages.
Method
We identify and characterize Plasmodium vivax plasmepsin-V-Ind (mutant) gene by cloning, sequence analysis, in silico bioinformatic protocols and structural modeling predictions based on docking studies on binding capacity with PEXEL motifs processing in terms of binding and accessibility of export proteins.
Results
Cloning and sequence analysis for genetic diversity demonstrates PvPM-V-Ind (mutant) gene is highly conserved among all isolates from different geographical regions of India. Imperfect duplicate insertion types of mutations (SVSE from 246–249 AA and SLSE from 266–269 AA) were identified among all Indian isolates in comparison to P.vivax Sal-1 (PvPM-V-Sal 1) isolate. In silico bioinformatics interaction studies of PEXEL peptide and active enzyme reveal that PvPM-V-Ind (mutant) is only active in endoplasmic reticulum lumen and membrane embedding is essential for activation of plasmepsin V. Structural modeling predictions based on docking studies with PEXEL motif show significant variation in substrate protein binding of these imperfect mutations with data mined PEXEL sequences. The predicted variation in the docking score and interacting amino acids of PvPM-V-Ind (mutant) proteins with PEXEL and lopinavir suggests a modulation in the activity of PvPM-V in terms of binding and accessibility at these sites.
Conclusion/Significance
Our functional modeled validation of PvPM-V-Ind (mutant) imperfect duplicate insertions with data mined PEXEL sequences leading to altered binding and substrate accessibility of the enzyme makes it a plausible target to investigate export mechanisms for in silico virtual screening and novel pharmacophore designing.
doi:10.1371/journal.pone.0060077
PMCID: PMC3612065  PMID: 23555891

Results 1-25 (1054026)