PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (1834580)

Clipboard (0)
None

Related Articles

1.  Bioinformatics in microbial biotechnology – a mini review 
The revolutionary growth in the computation speed and memory storage capability has fueled a new era in the analysis of biological data. Hundreds of microbial genomes and many eukaryotic genomes including a cleaner draft of human genome have been sequenced raising the expectation of better control of microorganisms. The goals are as lofty as the development of rational drugs and antimicrobial agents, development of new enhanced bacterial strains for bioremediation and pollution control, development of better and easy to administer vaccines, the development of protein biomarkers for various bacterial diseases, and better understanding of host-bacteria interaction to prevent bacterial infections. In the last decade the development of many new bioinformatics techniques and integrated databases has facilitated the realization of these goals. Current research in bioinformatics can be classified into: (i) genomics – sequencing and comparative study of genomes to identify gene and genome functionality, (ii) proteomics – identification and characterization of protein related properties and reconstruction of metabolic and regulatory pathways, (iii) cell visualization and simulation to study and model cell behavior, and (iv) application to the development of drugs and anti-microbial agents. In this article, we will focus on the techniques and their limitations in genomics and proteomics. Bioinformatics research can be classified under three major approaches: (1) analysis based upon the available experimental wet-lab data, (2) the use of mathematical modeling to derive new information, and (3) an integrated approach that integrates search techniques with mathematical modeling. The major impact of bioinformatics research has been to automate the genome sequencing, automated development of integrated genomics and proteomics databases, automated genome comparisons to identify the genome function, automated derivation of metabolic pathways, gene expression analysis to derive regulatory pathways, the development of statistical techniques, clustering techniques and data mining techniques to derive protein-protein and protein-DNA interactions, and modeling of 3D structure of proteins and 3D docking between proteins and biochemicals for rational drug design, difference analysis between pathogenic and non-pathogenic strains to identify candidate genes for vaccines and anti-microbial agents, and the whole genome comparison to understand the microbial evolution. The development of bioinformatics techniques has enhanced the pace of biological discovery by automated analysis of large number of microbial genomes. We are on the verge of using all this knowledge to understand cellular mechanisms at the systemic level. The developed bioinformatics techniques have potential to facilitate (i) the discovery of causes of diseases, (ii) vaccine and rational drug design, and (iii) improved cost effective agents for bioremediation by pruning out the dead ends. Despite the fast paced global effort, the current analysis is limited by the lack of available gene-functionality from the wet-lab data, the lack of computer algorithms to explore vast amount of data with unknown functionality, limited availability of protein-protein and protein-DNA interactions, and the lack of knowledge of temporal and transient behavior of genes and pathways.
doi:10.1186/1475-2859-4-19
PMCID: PMC1182391  PMID: 15985162
2.  Analysis of multiple compound–protein interactions reveals novel bioactive molecules 
The authors use machine learning of compound-protein interactions to explore drug polypharmacology and to efficiently identify bioactive ligands, including novel scaffold-hopping compounds for two pharmaceutically important protein families: G-protein coupled receptors and protein kinases.
We have demonstrated that machine learning of multiple compound–protein interactions is useful for efficient ligand screening and for assessing drug polypharmacology.This approach successfully identified novel scaffold-hopping compounds for two pharmaceutically important protein families: G-protein-coupled receptors and protein kinases.These bioactive compounds were not detected by existing computational ligand-screening methods in comparative studies.The results of this study indicate that data derived from chemical genomics can be highly useful for exploring chemical space, and this systems biology perspective could accelerate drug discovery processes.
The discovery of novel bioactive molecules advances our systems-level understanding of biological processes and is crucial for innovation in drug development. Perturbations of biological systems by chemical probes provide broader applications not only for analysis of complex systems but also for intentional manipulations of these systems. Nevertheless, the lack of well-characterized chemical modulators has limited their use. Recently, chemical genomics has emerged as a promising area of research applicable to the exploration of novel bioactive molecules, and researchers are currently striving toward the identification of all possible ligands for all target protein families (Wang et al, 2009). Chemical genomics studies have shown that patterns of compound–protein interactions (CPIs) are too diverse to be understood as simple one-to-one events. There is an urgent need to develop appropriate data mining methods for characterizing and visualizing the full complexity of interactions between chemical space and biological systems. However, no existing screening approach has so far succeeded in identifying novel bioactive compounds using multiple interactions among compounds and target proteins.
High-throughput screening (HTS) and computational screening have greatly aided in the identification of early lead compounds for drug discovery. However, the large number of assays required for HTS to identify drugs that target multiple proteins render this process very costly and time-consuming. Therefore, interest in using in silico strategies for screening has increased. The most common computational approaches, ligand-based virtual screening (LBVS) and structure-based virtual screening (SBVS; Oprea and Matter, 2004; Muegge and Oloff, 2006; McInnes, 2007; Figure 1A), have been used for practical drug development. LBVS aims to identify molecules that are very similar to known active molecules and generally has difficulty identifying compounds with novel structural scaffolds that differ from reference molecules. The other popular strategy, SBVS, is constrained by the number of three-dimensional crystallographic structures available. To circumvent these limitations, we have shown that a new computational screening strategy, chemical genomics-based virtual screening (CGBVS), has the potential to identify novel, scaffold-hopping compounds and assess their polypharmacology by using a machine-learning method to recognize conserved molecular patterns in comprehensive CPI data sets.
The CGBVS strategy used in this study was made up of five steps: CPI data collection, descriptor calculation, representation of interaction vectors, predictive model construction using training data sets, and predictions from test data (Figure 1A). Importantly, step 1, the construction of a data set of chemical structures and protein sequences for known CPIs, did not require the three-dimensional protein structures needed for SBVS. In step 2, compound structures and protein sequences were converted into numerical descriptors. These descriptors were used to construct chemical or biological spaces in which decreasing distance between vectors corresponded to increasing similarity of compound structures or protein sequences. In step 3, we represented multiple CPI patterns by concatenating these chemical and protein descriptors. Using these interaction vectors, we could quantify the similarity of molecular interactions for compound–protein pairs, despite the fact that the ligand and protein similarity maps differed substantially. In step 4, concatenated vectors for CPI pairs (positive samples) and non-interacting pairs (negative samples) were input into an established machine-learning method. In the final step, the classifier constructed using training sets was applied to test data.
To evaluate the predictive value of CGBVS, we first compared its performance with that of LBVS by fivefold cross-validation. CGBVS performed with considerably higher accuracy (91.9%) than did LBVS (84.4%; Figure 1B). We next compared CGBVS and SBVS in a retrospective virtual screening based on the human β2-adrenergic receptor (ADRB2). Figure 1C shows that CGBVS provided higher hit rates than did SBVS. These results suggest that CGBVS is more successful than conventional approaches for prediction of CPIs.
We then evaluated the ability of the CGBVS method to predict the polypharmacology of ADRB2 by attempting to identify novel ADRB2 ligands from a group of G-protein-coupled receptor (GPCR) ligands. We ranked the prediction scores for the interactions of 826 reported GPCR ligands with ADRB2 and then analyzed the 50 highest-ranked compounds in greater detail. Of 21 commercially available compounds, 11 showed ADRB2-binding activity and were not previously reported to be ADRB2 ligands. These compounds included ligands not only for aminergic receptors but also for neuropeptide Y-type 1 receptors (NPY1R), which have low protein homology to ADRB2. Most ligands we identified were not detected by LBVS and SBVS, which suggests that only CGBVS could identify this unexpected cross-reaction for a ligand developed as a target to a peptidergic receptor.
The true value of CGBVS in drug discovery must be tested by assessing whether this method can identify scaffold-hopping lead compounds from a set of compounds that is structurally more diverse. To assess this ability, we analyzed 11 500 commercially available compounds to predict compounds likely to bind to two GPCRs and two protein kinases. Functional assays revealed that nine ADRB2 ligands, three NPY1R ligands, five epidermal growth factor receptor (EGFR) inhibitors, and two cyclin-dependent kinase 2 (CDK2) inhibitors were concentrated in the top-ranked compounds (hit rate=30, 15, 25, and 10%, respectively). We also evaluated the extent of scaffold hopping achieved in the identification of these novel ligands. One ADRB2 ligand, two NPY1R ligands, and one CDK2 inhibitor exhibited scaffold hopping (Figure 4), indicating that CGBVS can use this characteristic to rationally predict novel lead compounds, a crucial and very difficult step in drug discovery. This feature of CGBVS is critically different from existing predictive methods, such as LBVS, which depend on similarities between test and reference ligands, and focus on a single protein or highly homologous proteins. In particular, CGBVS is useful for targets with undefined ligands because this method can use CPIs with target proteins that exhibit lower levels of homology.
In summary, we have demonstrated that data mining of multiple CPIs is of great practical value for exploration of chemical space. As a predictive model, CGBVS could provide an important step in the discovery of such multi-target drugs by identifying the group of proteins targeted by a particular ligand, leading to innovation in pharmaceutical research.
The discovery of novel bioactive molecules advances our systems-level understanding of biological processes and is crucial for innovation in drug development. For this purpose, the emerging field of chemical genomics is currently focused on accumulating large assay data sets describing compound–protein interactions (CPIs). Although new target proteins for known drugs have recently been identified through mining of CPI databases, using these resources to identify novel ligands remains unexplored. Herein, we demonstrate that machine learning of multiple CPIs can not only assess drug polypharmacology but can also efficiently identify novel bioactive scaffold-hopping compounds. Through a machine-learning technique that uses multiple CPIs, we have successfully identified novel lead compounds for two pharmaceutically important protein families, G-protein-coupled receptors and protein kinases. These novel compounds were not identified by existing computational ligand-screening methods in comparative studies. The results of this study indicate that data derived from chemical genomics can be highly useful for exploring chemical space, and this systems biology perspective could accelerate drug discovery processes.
doi:10.1038/msb.2011.5
PMCID: PMC3094066  PMID: 21364574
chemical genomics; data mining; drug discovery; ligand screening; systems chemical biology
3.  Protein-Protein Docking with Dynamic Residue Protonation States 
PLoS Computational Biology  2014;10(12):e1004018.
Protein-protein interactions depend on a host of environmental factors. Local pH conditions influence the interactions through the protonation states of the ionizable residues that can change upon binding. In this work, we present a pH-sensitive docking approach, pHDock, that can sample side-chain protonation states of five ionizable residues (Asp, Glu, His, Tyr, Lys) on-the-fly during the docking simulation. pHDock produces successful local docking funnels in approximately half (79/161) the protein complexes, including 19 cases where standard RosettaDock fails. pHDock also performs better than the two control cases comprising docking at pH 7.0 or using fixed, predetermined protonation states. On average, the top-ranked pHDock structures have lower interface RMSDs and recover more native interface residue-residue contacts and hydrogen bonds compared to RosettaDock. Addition of backbone flexibility using a computationally-generated conformational ensemble further improves native contact and hydrogen bond recovery in the top-ranked structures. Although pHDock is designed to improve docking, it also successfully predicts a large pH-dependent binding affinity change in the Fc–FcRn complex, suggesting that it can be exploited to improve affinity predictions. The approaches in the study contribute to the goal of structural simulations of whole-cell protein-protein interactions including all the environmental factors, and they can be further expanded for pH-sensitive protein design.
Author Summary
Protein-protein interactions are fundamental for biological function and are strongly influenced by their local environment. Cellular pH is tightly controlled and is one of the critical environmental factors that regulates protein-protein interactions. Three-dimensional structures of the protein complexes can help us understand the mechanism of the interactions. Since experimental determination of the structures of protein-protein complexes is expensive and time-consuming, computational docking algorithms are helpful to predict the structures. However, none of the current protein-protein docking algorithms account for the critical environmental pH effects. So we developed a pH-sensitive docking algorithm that can dynamically pick the favorable protonation states of the ionizable amino-acid residues. Compared to our previous standard docking algorithm, the new algorithm improves docking accuracy and generates higher-quality predictions over a large dataset of protein-protein complexes. We also use a case study to demonstrate efficacy of the algorithm in predicting a large pH-dependent binding affinity change that cannot be captured by the other methods that neglect pH effects. In principle, the approaches in the study can be used for rational design of pH-dependent protein inhibitors or industrial enzymes that are active over a wide range of pH values.
doi:10.1371/journal.pcbi.1004018
PMCID: PMC4263365  PMID: 25501663
4.  SnugDock: Paratope Structural Optimization during Antibody-Antigen Docking Compensates for Errors in Antibody Homology Models 
PLoS Computational Biology  2010;6(1):e1000644.
High resolution structures of antibody-antigen complexes are useful for analyzing the binding interface and to make rational choices for antibody engineering. When a crystallographic structure of a complex is unavailable, the structure must be predicted using computational tools. In this work, we illustrate a novel approach, named SnugDock, to predict high-resolution antibody-antigen complex structures by simultaneously structurally optimizing the antibody-antigen rigid-body positions, the relative orientation of the antibody light and heavy chains, and the conformations of the six complementarity determining region loops. This approach is especially useful when the crystal structure of the antibody is not available, requiring allowances for inaccuracies in an antibody homology model which would otherwise frustrate rigid-backbone docking predictions. Local docking using SnugDock with the lowest-energy RosettaAntibody homology model produced more accurate predictions than standard rigid-body docking. SnugDock can be combined with ensemble docking to mimic conformer selection and induced fit resulting in increased sampling of diverse antibody conformations. The combined algorithm produced four medium (Critical Assessment of PRediction of Interactions-CAPRI rating) and seven acceptable lowest-interface-energy predictions in a test set of fifteen complexes. Structural analysis shows that diverse paratope conformations are sampled, but docked paratope backbones are not necessarily closer to the crystal structure conformations than the starting homology models. The accuracy of SnugDock predictions suggests a new genre of general docking algorithms with flexible binding interfaces targeted towards making homology models useful for further high-resolution predictions.
Author Summary
Antibodies are proteins that are key elements of the immune system and increasingly used as drugs. Antibodies bind tightly and specifically to antigens to block their activity or to mark them for destruction. Three-dimensional structures of the antibody-antigen complexes are useful for understanding their mechanism and for designing improved antibody drugs. Experimental determination of structures is laborious and not always possible, so we have developed tools to predict structures of antibody-antigen complexes computationally. Computer-predicted models of antibodies, or homology models, typically have errors which can frustrate algorithms for prediction of protein-protein interfaces (docking), and result in incorrect predictions. Here, we have created and tested a new docking algorithm which incorporates flexibility to overcome structural errors in the antibody structural model. The algorithm allows both intramolecular and interfacial flexibility in the antibody during docking, resulting in improved accuracy approaching that when using experimentally determined antibody structures. Structural analysis of the predicted binding region of the complex will enable the protein engineer to make rational choices for better antibody drug designs.
doi:10.1371/journal.pcbi.1000644
PMCID: PMC2800046  PMID: 20098500
5.  Inhibition of the NEMO/IKKβ association complex formation, a novel mechanism associated with the NF-κB activation suppression by Withania somnifera’s key metabolite withaferin A 
BMC Genomics  2010;11(Suppl 4):S25.
Background
Nuclear Factor kappa B (NF-κB) is a transcription factor involved in the regulation of cell signaling responses and is a key regulator of cellular processes involved in the immune response, differentiation, cell proliferation, and apoptosis. The constitutive activation of NF-κB contributes to multiple cellular outcomes and pathophysiological conditions such as rheumatoid arthritis, asthma, inflammatory bowel disease, AIDS and cancer. Thus there lies a huge therapeutic potential beneath inhibition of NF-κB signalling pathway for reducing these chronic ailments. Withania somnifera, a reputed herb in ayurvedic medicine, comprises a large number of steroidal lactones known as withanolides which show plethora of pharmacological activities like anti- inflammatory, antitumor, antibacterial, antioxidant, anticonvulsive, and immunosuppressive. Though a few studies have been reported depicting the effect of WA (withaferin A) on suppression of NF-κB activation, the mechanism behind this is still eluding the researchers. The study conducted here is an attempt to explore NF-κB signalling pathway modulating capability of Withania somnifera’s major constituent WA and to elucidate its possible mode of action using molecular docking and molecular dynamics simulations studies.
Results
Formation of active IKK (IκB kinase) complex comprising NEMO (NF-κB Essential Modulator) and IKKβ subunits is one of the essential steps for NF-κB signalling pathway, non-assembly of which can lead to prevention of the above mentioned vulnerable disorders. As observed from our semi-flexible docking analysis, WA forms strong intermolecular interactions with the NEMO chains thus building steric as well as thermodynamic barriers to the incoming IKKβ subunits, which in turn pave way to naive complex formation capability of NEMO with IKKβ. Docking of WA into active NEMO/IKKβ complex using flexible docking in which key residues of the complex were kept flexible also suggest the disruption of the active complex. Thus the molecular docking analysis of WA into NEMO and active NEMO/IKKβ complex conducted in this study provides significant evidence in support of the proposed mechanism of NF-κB activation suppression by inhibition or disruption of active NEMO/IKKβ complex formation being accounted by non-assembly of the catalytically active NEMO/IKKβ complex. Results from the molecular dynamics simulations in water show that the trajectories of the native protein and the protein complexed with WA are stable over a considerably long time period of 2.6 ns.
Conclusions
NF-κB is one of the most attractive topics in current biological, biochemical, and pharmacological research, and in the recent years the number of studies focusing on its inhibition/regulation has increased manifolds. Small ligands (both natural and synthetic) are gaining particular attention in this context. Our computational analysis provided a rationalization of the ability of naturally occurring withaferin A to alter the NF-κB signalling pathway along with its proposed mode of inhibition of the pathway. The absence of active IKK multisubunit complex would prevent degradation of IκB proteins, as the IκB proteins would not get phosphorylated by IKK. This would ultimately lead to non-release of NF-κB and its further translocation to the nucleus thus arresting its nefarious acts. Conclusively our results strongly suggest that withaferin A is a potent anticancer agent as ascertained by its potent NF-κB modulating capability. Moreover the present MD simulations made clear the dynamic structural stability of NEMO/IKKβ in complex with the drug WA, together with the inhibitory mechanism.
doi:10.1186/1471-2164-11-S4-S25
PMCID: PMC3005936  PMID: 21143809
6.  High performance transcription factor-DNA docking with GPU computing 
Proteome Science  2012;10(Suppl 1):S17.
Background
Protein-DNA docking is a very challenging problem in structural bioinformatics and has important implications in a number of applications, such as structure-based prediction of transcription factor binding sites and rational drug design. Protein-DNA docking is very computational demanding due to the high cost of energy calculation and the statistical nature of conformational sampling algorithms. More importantly, experiments show that the docking quality depends on the coverage of the conformational sampling space. It is therefore desirable to accelerate the computation of the docking algorithm, not only to reduce computing time, but also to improve docking quality.
Methods
In an attempt to accelerate the sampling process and to improve the docking performance, we developed a graphics processing unit (GPU)-based protein-DNA docking algorithm. The algorithm employs a potential-based energy function to describe the binding affinity of a protein-DNA pair, and integrates Monte-Carlo simulation and a simulated annealing method to search through the conformational space. Algorithmic techniques were developed to improve the computation efficiency and scalability on GPU-based high performance computing systems.
Results
The effectiveness of our approach is tested on a non-redundant set of 75 TF-DNA complexes and a newly developed TF-DNA docking benchmark. We demonstrated that the GPU-based docking algorithm can significantly accelerate the simulation process and thereby improving the chance of finding near-native TF-DNA complex structures. This study also suggests that further improvement in protein-DNA docking research would require efforts from two integral aspects: improvement in computation efficiency and energy function design.
Conclusions
We present a high performance computing approach for improving the prediction accuracy of protein-DNA docking. The GPU-based docking algorithm accelerates the search of the conformational space and thus increases the chance of finding more near-native structures. To the best of our knowledge, this is the first ad hoc effort of applying GPU or GPU clusters to the protein-DNA docking problem.
doi:10.1186/1477-5956-10-S1-S17
PMCID: PMC3380734  PMID: 22759575
7.  Membrane protein structure determination — The next generation☆☆☆ 
Biochimica et Biophysica Acta  2014;1838(1):78-87.
The field of Membrane Protein Structural Biology has grown significantly since its first landmark in 1985 with the first three-dimensional atomic resolution structure of a membrane protein. Nearly twenty-six years later, the crystal structure of the beta2 adrenergic receptor in complex with G protein has contributed to another landmark in the field leading to the 2012 Nobel Prize in Chemistry. At present, more than 350 unique membrane protein structures solved by X-ray crystallography (http://blanco.biomol.uci.edu/mpstruc/exp/list, Stephen White Lab at UC Irvine) are available in the Protein Data Bank. The advent of genomics and proteomics initiatives combined with high-throughput technologies, such as automation, miniaturization, integration and third-generation synchrotrons, has enhanced membrane protein structure determination rate. X-ray crystallography is still the only method capable of providing detailed information on how ligands, cofactors, and ions interact with proteins, and is therefore a powerful tool in biochemistry and drug discovery. Yet the growth of membrane protein crystals suitable for X-ray diffraction studies amazingly remains a fine art and a major bottleneck in the field. It is often necessary to apply as many innovative approaches as possible. In this review we draw attention to the latest methods and strategies for the production of suitable crystals for membrane protein structure determination. In addition we also highlight the impact that third-generation synchrotron radiation has made in the field, summarizing the latest strategies used at synchrotron beamlines for screening and data collection from such demanding crystals. This article is part of a Special Issue entitled: Structural and biophysical characterisation of membrane protein-ligand binding.
Graphical abstract
Highlights
•Overview of the most recent advances regarding the growth of membrane protein crystals•Rational design of new crystallization screens for membrane proteins•New automated method for dehydration of membrane proteins•High-throughput approach in seeding of membrane protein crystals•Recent developments in membrane protein structure determination
doi:10.1016/j.bbamem.2013.07.010
PMCID: PMC3898769  PMID: 23860256
Membrane protein; Crystal dehydration; Crystal seeding; Macromolecular crystallography; In situ data collection; XFEL
8.  SCOWLP update: 3D classification of protein-protein, -peptide, -saccharide and -nucleic acid interactions, and structure-based binding inferences across folds 
BMC Bioinformatics  2011;12:398.
Background
Protein interactions are essential for coordinating cellular functions. Proteomic studies have already elucidated a huge amount of protein-protein interactions that require detailed functional analysis. Understanding the structural basis of each individual interaction through their structural determination is necessary, yet an unfeasible task. Therefore, computational tools able to predict protein binding regions and recognition modes are required to rationalize putative molecular functions for proteins. With this aim, we previously created SCOWLP, a structural classification of protein binding regions at protein family level, based on the information obtained from high-resolution 3D protein-protein and protein-peptide complexes.
Description
We present here a new version of SCOWLP that has been enhanced by the inclusion of protein-nucleic acid and protein-saccharide interactions. SCOWLP takes interfacial solvent into account for a detailed characterization of protein interactions. In addition, the binding regions obtained per protein family have been enriched by the inclusion of predicted binding regions, which have been inferred from structurally related proteins across all existing folds. These inferences might become very useful to suggest novel recognition regions and compare structurally similar interfaces from different families.
Conclusions
The updated SCOWLP has new functionalities that allow both, detection and comparison of protein regions recognizing different types of ligands, which include other proteins, peptides, nucleic acids and saccharides, within a solvated environment. Currently, SCOWLP allows the analysis of predicted protein binding regions based on structure-based inferences across fold space. These predictions may have a unique potential in assisting protein docking, in providing insights into protein interaction networks, and in guiding rational engineering of protein ligands. The newly designed SCOWLP web application has an improved user-friendly interface that facilitates its usage, and is available at http://www.scowlp.org.
doi:10.1186/1471-2105-12-398
PMCID: PMC3210135  PMID: 21992011
9.  Towards the prediction of protein interaction partners using physical docking 
Prediction of physical protein-protein interactions represents a key challenge in computational systems biology. This study provides a proof-of-principle that high-throughput in silico protein docking results can be used to predict interaction partners.
Deciphering the whole network of protein interactions for a given proteome (‘interactome') is the goal of many experimental and computational efforts in Systems Biology. Separately the prediction of the structure of protein complexes by docking methods is a well-established scientific area. To date, docking programs have not been used to predict interaction partners. We provide a proof of principle for such an approach. Using a set of protein complexes representing known interactors in their unbound form, we show that a standard docking program can distinguish the true interactors from a background of 922 non-redundant potential interactors. We additionally show that true interactions can be distinguished from non-likely interacting proteins within the same structural family. Our approach may be put in the context of the proposed ‘funnel-energy model'; the docking algorithm may not find the native complex, but it distinguishes binding partners because of the higher probability of favourable models compared with a collection of non-binders. The potential exists to develop this proof of principle into new approaches for predicting interaction partners and reconstructing biological networks.
doi:10.1038/msb.2011.3
PMCID: PMC3063693  PMID: 21326236
interactome; protein docking; protein–protein interaction
10.  Pushing Structural Information into the Yeast Interactome by High-Throughput Protein Docking Experiments 
PLoS Computational Biology  2009;5(8):e1000490.
The last several years have seen the consolidation of high-throughput proteomics initiatives to identify and characterize protein interactions and macromolecular complexes in model organisms. In particular, more that 10,000 high-confidence protein-protein interactions have been described between the roughly 6,000 proteins encoded in the budding yeast genome (Saccharomyces cerevisiae). However, unfortunately, high-resolution three-dimensional structures are only available for less than one hundred of these interacting pairs. Here, we expand this structural information on yeast protein interactions by running the first-ever high-throughput docking experiment with some of the best state-of-the-art methodologies, according to our benchmarks. To increase the coverage of the interaction space, we also explore the possibility of using homology models of varying quality in the docking experiments, instead of experimental structures, and assess how it would affect the global performance of the methods. In total, we have applied the docking procedure to 217 experimental structures and 1,023 homology models, providing putative structural models for over 3,000 protein-protein interactions in the yeast interactome. Finally, we analyze in detail the structural models obtained for the interaction between SAM1-anthranilate synthase complex and the MET30-RNA polymerase III to illustrate how our predictions can be straightforwardly used by the scientific community. The results of our experiment will be integrated into the general 3D-Repertoire pipeline, a European initiative to solve the structures of as many as possible protein complexes in yeast at the best possible resolution. All docking results are available at http://gatealoy.pcb.ub.es/HT_docking/.
Author Summary
Proteins are the main perpetrators of most biological processes. However, they seldom act alone, and most cellular functions are, in fact, carried out by large macromolecular complexes and regulated through intricate protein-protein interaction networks. Consequently, large efforts have been devoted to unveil protein interrelationships in a high-throughput manner, and the last several years have seen the consecution of the first interactome drafts for several model organisms. Unfortunately, these studies only reveal whether two proteins interact, but not the molecular bases of these interactions. A full comprehension of how proteins bind and form complexes can only come from high-resolution, three-dimensional (3D) structures, since they provide the key quasi-atomic details necessary to understand how the individual components in a complex or pathway are assembled and coordinated to function as a molecular unit. Here, we use protein docking experiments, in a high-throughput manner, to predict the 3D structure of over 3,000 interactions in yeast, which will be used to complement the complex structures obtained within the 3D-Repertoire pan-European initiative (http://www.3drepertoire.org).
doi:10.1371/journal.pcbi.1000490
PMCID: PMC2722787  PMID: 19714207
11.  Computational Discovery of Putative Leads for Drug Repositioning through Drug-Target Interaction Prediction 
PLoS Computational Biology  2016;12(11):e1005219.
De novo experimental drug discovery is an expensive and time-consuming task. It requires the identification of drug-target interactions (DTIs) towards targets of biological interest, either to inhibit or enhance a specific molecular function. Dedicated computational models for protein simulation and DTI prediction are crucial for speed and to reduce the costs associated with DTI identification. In this paper we present a computational pipeline that enables the discovery of putative leads for drug repositioning that can be applied to any microbial proteome, as long as the interactome of interest is at least partially known. Network metrics calculated for the interactome of the bacterial organism of interest were used to identify putative drug-targets. Then, a random forest classification model for DTI prediction was constructed using known DTI data from publicly available databases, resulting in an area under the ROC curve of 0.91 for classification of out-of-sampling data. A drug-target network was created by combining 3,081 unique ligands and the expected ten best drug targets. This network was used to predict new DTIs and to calculate the probability of the positive class, allowing the scoring of the predicted instances. Molecular docking experiments were performed on the best scoring DTI pairs and the results were compared with those of the same ligands with their original targets. The results obtained suggest that the proposed pipeline can be used in the identification of new leads for drug repositioning. The proposed classification model is available at http://bioinformatics.ua.pt/software/dtipred/.
Author Summary
The emergence of multi-resistant bacterial strains and the existing void in the discovery and development of new classes of antibiotics is a growing concern. Indeed, some bacterial strains are now resistant to last-line antibiotics and considered untreatable. Drug repositioning has been suggested as a strategy to minimize time and cost expenses until the drug reaches the market, compared to traditional drug design. Drug-target interactions (DTIs) are the basis of rational drug design and thus, we proposed a computational approach to predict DTIs solely based on the primary sequence of the protein and the simplified molecular-input line-entry system of the ligand. In addition, network metrics are used to identify vital putative drug-targets in bacteria. Molecular docking experiments were performed to compare the binding affinities between a given ligand and a putative drug-target, as well as with their original targets. According to the docking results, the predicted DTIs have better or similar binding activities than the ligand and their real target, indicating the validity of the proposed model.
doi:10.1371/journal.pcbi.1005219
PMCID: PMC5125559  PMID: 27893735
12.  Calculating an optimal box size for ligand docking and virtual screening against experimental and predicted binding pockets 
Background
Computational approaches have emerged as an instrumental methodology in modern research. For example, virtual screening by molecular docking is routinely used in computer-aided drug discovery. One of the critical parameters for ligand docking is the size of a search space used to identify low-energy binding poses of drug candidates. Currently available docking packages often come with a default protocol for calculating the box size, however, many of these procedures have not been systematically evaluated.
Methods
In this study, we investigate how the docking accuracy of AutoDock Vina is affected by the selection of a search space. We propose a new procedure for calculating the optimal docking box size that maximizes the accuracy of binding pose prediction against a non-redundant and representative dataset of 3,659 protein-ligand complexes selected from the Protein Data Bank. Subsequently, we use the Directory of Useful Decoys, Enhanced to demonstrate that the optimized docking box size also yields an improved ranking in virtual screening. Binding pockets in both datasets are derived from the experimental complex structures and, additionally, predicted by eFindSite.
Results
A systematic analysis of ligand binding poses generated by AutoDock Vina shows that the highest accuracy is achieved when the dimensions of the search space are 2.9 times larger than the radius of gyration of a docking compound. Subsequent virtual screening benchmarks demonstrate that this optimized docking box size also improves compound ranking. For instance, using predicted ligand binding sites, the average enrichment factor calculated for the top 1 % (10 %) of the screening library is 8.20 (3.28) for the optimized protocol, compared to 7.67 (3.19) for the default procedure. Depending on the evaluation metric, the optimal docking box size gives better ranking in virtual screening for about two-thirds of target proteins.
Conclusions
This fully automated procedure can be used to optimize docking protocols in order to improve the ranking accuracy in production virtual screening simulations. Importantly, the optimized search space systematically yields better results than the default method not only for experimental pockets, but also for those predicted from protein structures. A script for calculating the optimal docking box size is freely available at www.brylinski.org/content/docking-box-size.
Graphical AbstractWe developed a procedure to optimize the box size in molecular docking calculations. Left panel shows the predicted binding pose of NADP (green sticks) compared to the experimental complex structure of human aldose reductase (blue sticks) using a default protocol. Right panel shows the docking accuracy using an optimized box size.
doi:10.1186/s13321-015-0067-5
PMCID: PMC4468813  PMID: 26082804
Molecular docking; AutoDock Vina; Docking protocols; Ligand binding site prediction; Ligand virtual screening; Docking box size; Search space
13.  Calculating an optimal box size for ligand docking and virtual screening against experimental and predicted binding pockets 
Background
Computational approaches have emerged as an instrumental methodology in modern research. For example, virtual screening by molecular docking is routinely used in computer-aided drug discovery. One of the critical parameters for ligand docking is the size of a search space used to identify low-energy binding poses of drug candidates. Currently available docking packages often come with a default protocol for calculating the box size, however, many of these procedures have not been systematically evaluated.
Methods
In this study, we investigate how the docking accuracy of AutoDock Vina is affected by the selection of a search space. We propose a new procedure for calculating the optimal docking box size that maximizes the accuracy of binding pose prediction against a non-redundant and representative dataset of 3,659 protein-ligand complexes selected from the Protein Data Bank. Subsequently, we use the Directory of Useful Decoys, Enhanced to demonstrate that the optimized docking box size also yields an improved ranking in virtual screening. Binding pockets in both datasets are derived from the experimental complex structures and, additionally, predicted by eFindSite.
Results
A systematic analysis of ligand binding poses generated by AutoDock Vina shows that the highest accuracy is achieved when the dimensions of the search space are 2.9 times larger than the radius of gyration of a docking compound. Subsequent virtual screening benchmarks demonstrate that this optimized docking box size also improves compound ranking. For instance, using predicted ligand binding sites, the average enrichment factor calculated for the top 1 % (10 %) of the screening library is 8.20 (3.28) for the optimized protocol, compared to 7.67 (3.19) for the default procedure. Depending on the evaluation metric, the optimal docking box size gives better ranking in virtual screening for about two-thirds of target proteins.
Conclusions
This fully automated procedure can be used to optimize docking protocols in order to improve the ranking accuracy in production virtual screening simulations. Importantly, the optimized search space systematically yields better results than the default method not only for experimental pockets, but also for those predicted from protein structures. A script for calculating the optimal docking box size is freely available at www.brylinski.org/content/docking-box-size.
Graphical AbstractWe developed a procedure to optimize the box size in molecular docking calculations. Left panel shows the predicted binding pose of NADP (green sticks) compared to the experimental complex structure of human aldose reductase (blue sticks) using a default protocol. Right panel shows the docking accuracy using an optimized box size.
doi:10.1186/s13321-015-0067-5
PMCID: PMC4468813  PMID: 26082804
Molecular docking; AutoDock Vina; Docking protocols; Ligand binding site prediction; Ligand virtual screening; Docking box size; Search space
14.  Choosing the Optimal Rigid Receptor for Docking and Scoring in the CSAR 2013/2014 Experiment 
The 2013/2014 Community Structure–Activity Resource (CSAR) challenge was designed to prospectively validate advancement in the field of docking and scoring receptor–small molecule interactions. Purely computational methods have been found to be quite limiting. Thus, the challenges assessed methods that combined both experimental data and computational approaches. Here, we describe our contribution to solve three important challenges in rational drug discovery: rank-ordering protein primary sequences based on affinity to a compound, determining close-to-native bound conformations out of a set of decoy poses, and rank-ordering sets of congeneric compounds based on affinity to a given protein. We showed that the most significant contribution to a meaningful enrichment of native-like models was the identification of the best receptor structure for docking and scoring. Depending on the target, the optimal receptor for cross-docking and scoring was identified by a self-consistent docking approach that used the Vina scoring function, by aligning compounds to the closest cocrystal or by selecting the cocrystal receptor with the largest pocket. For tRNA (m1G37) methyltransferase (TRMD), ranking a set of 31 congeneric binding compounds cross-docked to the optimal receptor resulted in a R2 = 0.67; whereas, using any other of the 13 receptor structures led to almost no enrichment of native-like complex structures. Furthermore, although redocking predicted lower RMSDs relative to the bound structures, the ranking based on multiple receptor structures did not improve the correlation coefficient. Our predictions highlight the role of rational structure-based modeling in maximizing the outcome of virtual screening, as well as limitations scoring multiple receptors.
Graphical abstract
doi:10.1021/acs.jcim.5b00338
PMCID: PMC4744803  PMID: 26222931
15.  Insights into the Interactions of Fasciola hepatica Cathepsin L3 with a Substrate and Potential Novel Inhibitors through In Silico Approaches 
PLoS Neglected Tropical Diseases  2015;9(5):e0003759.
Background
Fasciola hepatica is the causative agent of fascioliasis, a disease affecting grazing animals, causing economic losses in global agriculture and currently being an important human zoonosis. Overuse of chemotherapeutics against fascioliasis has increased the populations of drug resistant parasites. F. hepatica cathepsin L3 is a protease that plays important roles during the life cycle of fluke. Due to its particular collagenolytic activity it is considered an attractive target against the infective phase of F. hepatica.
Methodology/Principal Findings
Starting with a three dimensional model of FhCL3 we performed a structure-based design of novel inhibitors through a computational study that combined virtual screening, molecular dynamics simulations, and binding free energy (ΔGbind) calculations. Virtual screening was carried out by docking inhibitors obtained from the MYBRIDGE-HitFinder database inside FhCL3 and human cathepsin L substrate-binding sites. On the basis of dock-scores, five compounds were predicted as selective inhibitors of FhCL3. Molecular dynamic simulations were performed and, subsequently, an end-point method was employed to predict ΔGbind values. Two compounds with the best ΔGbind values (-10.68 kcal/mol and -7.16 kcal/mol), comparable to that of the positive control (-10.55 kcal/mol), were identified. A similar approach was followed to structurally and energetically characterize the interface of FhCL3 in complex with a peptidic substrate. Finally, through pair-wise and per-residue free energy decomposition we identified residues that are critical for the substrate/ligand binding and for the enzyme specificity.
Conclusions/Significance
The present study is the first computer-aided drug design approach against F. hepatica cathepsins. Here we predict the principal determinants of binding of FhCL3 in complex with a natural substrate by detailed energetic characterization of protease interaction surface. We also propose novel compounds as FhCL3 inhibitors. Overall, these results will foster the future rational design of new inhibitors against FhCL3, as well as other F. hepatica cathepsins.
Author Summary
Fascioliosis is considered an emerging disease in humans, causing important losses in global agriculture through the infection of livestock animals. The outcome of resistant parasites has increased the search for new drugs which may contribute to disease control. In recent decades, Fasciola cathepsins (FhCs) have been defined as the principal virulence factors of this parasite. Despite being in the same protein family, they have different specificities and, thus, distinct roles throughout the fluke life cycle. Differences in specificity have been attributed to a few variations in the sequence of key FhCs subsites. Currently, the structure-based drug design of inhibitors against Fasciola cathepsin Ls (FhCLs) with unknown structures is possible due to the availability of the three-dimensional structure of FhCL1. Our detailed structural analysis of the major infective juvenile enzyme (FhCL3) identifies the molecular determinants for protein binding. Also, novel potential inhibitors against FhCL3 are proposed, which might reduce host invasion and penetration processes. These compounds are predicted to interact with the binding site of the enzyme, therefore they could prevent substrate processing by competitive inhibition. The structure-based drug design strategy described here will be useful for the development of new potent and selective inhibitors against other FhCs.
doi:10.1371/journal.pntd.0003759
PMCID: PMC4433193  PMID: 25978322
16.  An Integrated Framework Advancing Membrane Protein Modeling and Design 
PLoS Computational Biology  2015;11(9):e1004398.
Membrane proteins are critical functional molecules in the human body, constituting more than 30% of open reading frames in the human genome. Unfortunately, a myriad of difficulties in overexpression and reconstitution into membrane mimetics severely limit our ability to determine their structures. Computational tools are therefore instrumental to membrane protein structure prediction, consequently increasing our understanding of membrane protein function and their role in disease. Here, we describe a general framework facilitating membrane protein modeling and design that combines the scientific principles for membrane protein modeling with the flexible software architecture of Rosetta3. This new framework, called RosettaMP, provides a general membrane representation that interfaces with scoring, conformational sampling, and mutation routines that can be easily combined to create new protocols. To demonstrate the capabilities of this implementation, we developed four proof-of-concept applications for (1) prediction of free energy changes upon mutation; (2) high-resolution structural refinement; (3) protein-protein docking; and (4) assembly of symmetric protein complexes, all in the membrane environment. Preliminary data show that these algorithms can produce meaningful scores and structures. The data also suggest needed improvements to both sampling routines and score functions. Importantly, the applications collectively demonstrate the potential of combining the flexible nature of RosettaMP with the power of Rosetta algorithms to facilitate membrane protein modeling and design.
Author Summary
Over 30% of the human proteome consists of proteins embedded in biological membranes. These proteins are critical in many processes such as transport of materials in and out of the cell and transmitting signals to other cells in the body. They are implicated in a large number of diseases; in fact, they are targeted by over 50% of pharmaceutical drugs on the market. Since the membrane environment makes experimental structure determination extremely difficult, there is a need for alternative, computational approaches. Here, we describe a new framework, RosettaMP, for computational modeling and design of membrane protein structures, integrated in the Rosetta3 software suite. This framework includes a set of tools for representing the membrane bilayer, moving the protein, altering its sequence, and estimating free energies. We demonstrate tools to predict the effects of mutations, refine atomic details of protein structures, simulate protein binding, and assemble symmetric complexes, all in the membrane bilayer. Taken together, these applications demonstrate the potential of RosettaMP to facilitate membrane protein structure prediction and design, enabling us to understand the function of these proteins and their role in human disease.
doi:10.1371/journal.pcbi.1004398
PMCID: PMC4556676  PMID: 26325167
17.  CPORT: A Consensus Interface Predictor and Its Performance in Prediction-Driven Docking with HADDOCK 
PLoS ONE  2011;6(3):e17695.
Background
Macromolecular complexes are the molecular machines of the cell. Knowledge at the atomic level is essential to understand and influence their function. However, their number is huge and a significant fraction is extremely difficult to study using classical structural methods such as NMR and X-ray crystallography. Therefore, the importance of large-scale computational approaches in structural biology is evident. This study combines two of these computational approaches, interface prediction and docking, to obtain atomic-level structures of protein-protein complexes, starting from their unbound components.
Methodology/Principal Findings
Here we combine six interface prediction web servers into a consensus method called CPORT (Consensus Prediction Of interface Residues in Transient complexes). We show that CPORT gives more stable and reliable predictions than each of the individual predictors on its own. A protocol was developed to integrate CPORT predictions into our data-driven docking program HADDOCK. For cases where experimental information is limited, this prediction-driven docking protocol presents an alternative to ab initio docking, the docking of complexes without the use of any information. Prediction-driven docking was performed on a large and diverse set of protein-protein complexes in a blind manner. Our results indicate that the performance of the HADDOCK-CPORT combination is competitive with ZDOCK-ZRANK, a state-of-the-art ab initio docking/scoring combination. Finally, the original interface predictions could be further improved by interface post-prediction (contact analysis of the docking solutions).
Conclusions/Significance
The current study shows that blind, prediction-driven docking using CPORT and HADDOCK is competitive with ab initio docking methods. This is encouraging since prediction-driven docking represents the absolute bottom line for data-driven docking: any additional biological knowledge will greatly improve the results obtained by prediction-driven docking alone. Finally, the fact that original interface predictions could be further improved by interface post-prediction suggests that prediction-driven docking has not yet been pushed to the limit. A web server for CPORT is freely available at http://haddock.chem.uu.nl/services/CPORT.
doi:10.1371/journal.pone.0017695
PMCID: PMC3064578  PMID: 21464987
18.  Large-Scale Off-Target Identification Using Fast and Accurate Dual Regularized One-Class Collaborative Filtering and Its Application to Drug Repurposing 
PLoS Computational Biology  2016;12(10):e1005135.
Target-based screening is one of the major approaches in drug discovery. Besides the intended target, unexpected drug off-target interactions often occur, and many of them have not been recognized and characterized. The off-target interactions can be responsible for either therapeutic or side effects. Thus, identifying the genome-wide off-targets of lead compounds or existing drugs will be critical for designing effective and safe drugs, and providing new opportunities for drug repurposing. Although many computational methods have been developed to predict drug-target interactions, they are either less accurate than the one that we are proposing here or computationally too intensive, thereby limiting their capability for large-scale off-target identification. In addition, the performances of most machine learning based algorithms have been mainly evaluated to predict off-target interactions in the same gene family for hundreds of chemicals. It is not clear how these algorithms perform in terms of detecting off-targets across gene families on a proteome scale. Here, we are presenting a fast and accurate off-target prediction method, REMAP, which is based on a dual regularized one-class collaborative filtering algorithm, to explore continuous chemical space, protein space, and their interactome on a large scale. When tested in a reliable, extensive, and cross-gene family benchmark, REMAP outperforms the state-of-the-art methods. Furthermore, REMAP is highly scalable. It can screen a dataset of 200 thousands chemicals against 20 thousands proteins within 2 hours. Using the reconstructed genome-wide target profile as the fingerprint of a chemical compound, we predicted that seven FDA-approved drugs can be repurposed as novel anti-cancer therapies. The anti-cancer activity of six of them is supported by experimental evidences. Thus, REMAP is a valuable addition to the existing in silico toolbox for drug target identification, drug repurposing, phenotypic screening, and side effect prediction. The software and benchmark are available at https://github.com/hansaimlim/REMAP.
Author Summary
High-throughput techniques have generated vast amounts of diverse omics and phenotypic data. However, these sets of data have not yet been fully explored to improve the effectiveness and efficiency of drug discovery, a process which has traditionally adopted a one-drug-one-gene paradigm. Consequently, the cost of bringing a drug to market is astounding and the failure rate is daunting. The failure of the target-based drug discovery is in large part due to the fact that a drug rarely interacts only with its intended receptor, but also generally binds to other receptors. To rationally design potent and safe therapeutics, we need to identify all the possible cellular proteins interacting with a drug in an organism. Existing experimental techniques are not sufficient to address this problem, and will benefit from computational modeling. However, it is a daunting task to reliably screen millions of chemicals against hundreds of thousands of proteins. Here, we introduce a fast and accurate method REMAP for large-scale predictions of drug-target interactions. REMAP outperforms state-of-the-art algorithms in terms of both speed and accuracy, and has been successfully applied to drug repurposing. Thus, REMAP may have broad applications in drug discovery.
doi:10.1371/journal.pcbi.1005135
PMCID: PMC5055357  PMID: 27716836
19.  DOCLASP - Docking ligands to target proteins using spatial and electrostatic congruence extracted from a known holoenzyme and applying simple geometrical transformations 
F1000Research  2016;3:262.
The ability to accurately and effectively predict the interaction between proteins and small drug-like compounds has long intrigued researchers for pedagogic, humanitarian and economic reasons. Protein docking methods (AutoDock, GOLD, DOCK, FlexX and Glide to name a few) rank a large number of possible conformations of protein-ligand complexes using fast algorithms. Previously, it has been shown that structural congruence leading to the same enzymatic function necessitates the congruence of electrostatic properties (CLASP). The current work presents a methodology for docking a ligand into a target protein, provided that there is at least one known holoenzyme with ligand bound - DOCLASP (Docking using CLASP). The contact points of the ligand in the holoenzyme defines a motif, which is used to query the target enzyme using CLASP. If there are significant matches, the holoenzyme and the target protein are superimposed based on congruent atoms. The same linear and rotational transformations are also applied to the ligand, thus creating a unified coordinate framework having the holoenzyme, the ligand and the target enzyme. In the current work, the dipeptidyl peptidase-IV inhibitor vildagliptin was docked to the PI-PLC structure complexed with myo-inositol using DOCLASP. Also, corroboration of the docking of phenylthiourea to the modelled structure of polyphenol oxidase (JrPPO1) from walnut is provided based on the subsequently solved structure of JrPPO1 (PDBid:5CE9). Analysis of the binding of the antitrypanosomial drug suramin to nine non-homologous proteins in the PDB database shows a diverse set of binding motifs, and multiple binding sites in the phospholipase A2-likeproteins from the Bothrops genus of pitvipers. The conformational changes in the suramin molecule on binding highlights the challenges in docking flexible ligands into an already ’plastic’ binding site. Thus, DOCLASP presents a method for ’soft docking’ ligands to proteins with low computational requirements.
doi:10.12688/f1000research.5145.3
PMCID: PMC4934513  PMID: 27429737
protein, docking ligand, congruence
20.  Text Mining for Protein Docking 
PLoS Computational Biology  2015;11(12):e1004630.
The rapidly growing amount of publicly available information from biomedical research is readily accessible on the Internet, providing a powerful resource for predictive biomolecular modeling. The accumulated data on experimentally determined structures transformed structure prediction of proteins and protein complexes. Instead of exploring the enormous search space, predictive tools can simply proceed to the solution based on similarity to the existing, previously determined structures. A similar major paradigm shift is emerging due to the rapidly expanding amount of information, other than experimentally determined structures, which still can be used as constraints in biomolecular structure prediction. Automated text mining has been widely used in recreating protein interaction networks, as well as in detecting small ligand binding sites on protein structures. Combining and expanding these two well-developed areas of research, we applied the text mining to structural modeling of protein-protein complexes (protein docking). Protein docking can be significantly improved when constraints on the docking mode are available. We developed a procedure that retrieves published abstracts on a specific protein-protein interaction and extracts information relevant to docking. The procedure was assessed on protein complexes from Dockground (http://dockground.compbio.ku.edu). The results show that correct information on binding residues can be extracted for about half of the complexes. The amount of irrelevant information was reduced by conceptual analysis of a subset of the retrieved abstracts, based on the bag-of-words (features) approach. Support Vector Machine models were trained and validated on the subset. The remaining abstracts were filtered by the best-performing models, which decreased the irrelevant information for ~ 25% complexes in the dataset. The extracted constraints were incorporated in the docking protocol and tested on the Dockground unbound benchmark set, significantly increasing the docking success rate.
Author Summary
Protein interactions are central for many cellular processes. Physical characterization of these interactions is essential for understanding of life processes and applications in biology and medicine. Because of the inherent limitations of experimental techniques and rapid development of computational power and methodology, computer modeling is a tool of choice in many studies. Publicly available information from biomedical research is readily accessible on the Internet, providing a powerful resource for modeling of proteins and protein complexes. A major paradigm shift in modeling of protein complexes is emerging due to the rapidly expanding amount of such information, which can be used as modeling constraints. Text mining has been widely used in recreating networks of protein interactions, as well as in detecting small molecule binding sites on proteins. Combining and expanding these two well-developed areas of research, we applied the text mining to physical modeling of protein complexes (protein docking). Our procedure retrieves published abstracts on a protein-protein interaction and extracts the relevant information. The results show that correct information on binding can be obtained for about half of protein complexes. The extracted constraints were incorporated in a modeling procedure, significantly improving its performance.
doi:10.1371/journal.pcbi.1004630
PMCID: PMC4674139  PMID: 26650466
21.  Design of Multi-Specificity in Protein Interfaces 
PLoS Computational Biology  2007;3(8):e164.
Interactions in protein networks may place constraints on protein interface sequences to maintain correct and avoid unwanted interactions. Here we describe a “multi-constraint” protein design protocol to predict sequences optimized for multiple criteria, such as maintaining sets of interactions, and apply it to characterize the mechanism and extent to which 20 multi-specific proteins are constrained by binding to multiple partners. We find that multi-specific binding is accommodated by at least two distinct patterns. In the simplest case, all partners share key interactions, and sequences optimized for binding to either single or multiple partners recover only a subset of native amino acid residues as optimal. More interestingly, for signaling interfaces functioning as network “hubs,” we identify a different, “multi-faceted” mode, where each binding partner prefers its own subset of wild-type residues within the promiscuous binding site. Here, integration of preferences across all partners results in sequences much more “native-like” than seen in optimization for any single binding partner alone, suggesting these interfaces are substantially optimized for multi-specificity. The two strategies make distinct predictions for interface evolution and design. Shared interfaces may be better small molecule targets, whereas multi-faceted interactions may be more “designable” for altered specificity patterns. The computational methodology presented here is generalizable for examining how naturally occurring protein sequences have been selected to satisfy a variety of positive and negative constraints, as well as for rationally designing proteins to have desired patterns of altered specificity.
Author Summary
Computational methods have recently led to remarkable successes in the design of molecules with novel functions. These approaches offer great promise for creating highly selective molecules to accurately control biological processes. However, to reach these goals modeling procedures are needed that are able to define the optimal “fitness” of a protein to function correctly within complex biological networks and in the context of many possible interaction partners. To make progress toward these goals, we describe a computational design procedure that predicts protein sequences optimized to bind not only to a single protein but also to a set of target interaction partners. Application of the method to characterize “hub” proteins in cellular interaction networks gives insights into the mechanisms nature has used to tune protein surfaces to recognize multiple correct partner proteins. Our study also provides a starting point to engineer designer molecules that could modulate or replace naturally occurring protein interaction networks to combat misregulation in disease or to build new sets of protein interactions for synthetic biology.
doi:10.1371/journal.pcbi.0030164
PMCID: PMC1950952  PMID: 17722975
22.  Extrapolating the effect of deleterious nsSNPs in the binding adaptability of flavopiridol with CDK7 protein: a molecular dynamics approach 
Human Genomics  2013;7(1):10.
Background
Recent reports suggest the role of nonsynonymous single nucleotide polymorphisms (nsSNPs) in cyclin-dependent kinase 7 (CDK7) gene associated with defect in the DNA repair mechanism that may contribute to cancer risk. Among the various inhibitors developed so far, flavopiridol proved to be a potential antitumor drug in the phase-III clinical trial for chronic lymphocytic leukemia. Here, we described a theoretical assessment for the discovery of new drugs or drug targets in CDK7 protein owing to the changes caused by deleterious nsSNPs.
Methods
Three nsSNPs (I63R, H135R, and T285M) were predicted to have functional impact on protein function by SIFT, PolyPhen2, I-Mutant3, PANTHER, SNPs&GO, PhD-SNP, and screening for non-acceptable polymorphisms (SNAP). Furthermore, we analyzed the native and proposed mutant models in atomic level 10 ns simulation using the molecular dynamics (MD) approach. Finally, with the aid of Autodock 4.0 and PatchDock, we analyzed the binding efficacy of flavopiridol with CDK7 protein with respect to the deleterious mutations.
Results
By comparing the results of all seven prediction tools, three nsSNPs (I63R, H135R, and T285M) were predicted to have functional impact on the protein function. The results of protein stability analysis inferred that I63R and H135R exhibited less deviation in root mean square deviation in comparison with the native and T285M protein. The flexibility of all the three mutant models of CDK7 protein is diverse in comparison with the native protein. Following to that, docking study revealed the change in the active site residues and decrease in the binding affinity of flavopiridol with mutant proteins.
Conclusion
This theoretical approach is entirely based on computational methods, which has the ability to identify the disease-related SNPs in complex disorders by contrasting their costs and capabilities with those of the experimental methods. The identification of disease related SNPs by computational methods has the potential to create personalized tools for the diagnosis, prognosis, and treatment of diseases.
Lay abstract
Cell cycle regulatory protein, CDK7, is linked with DNA repair mechanism which can contribute to cancer risk. The main aim of this study is to extrapolate the relationship between the nsSNPs and their effects in drug-binding capability. In this work, we propose a new methodology which (1) efficiently identified the deleterious nsSNPs that tend to have functional effect on protein function upon mutation by computational tools, (2) analyze d the native protein and proposed mutant models in atomic level using MD approach, and (3) investigated the protein-ligand interactions to analyze the binding ability by docking analysis. This theoretical approach is entirely based on computational methods, which has the ability to identify the disease-related SNPs in complex disorders by contrasting their costs and capabilities with those of the experimental methods. Overall, this approach has the potential to create personalized tools for the diagnosis, prognosis, and treatment of diseases.
doi:10.1186/1479-7364-7-10
PMCID: PMC3726351  PMID: 23561625
nsSNPs; CDK7; Flavopiridol; Molecular dynamics; Docking
23.  High-Performance Drug Discovery: Computational Screening by Combining Docking and Molecular Dynamics Simulations 
PLoS Computational Biology  2009;5(10):e1000528.
Virtual compound screening using molecular docking is widely used in the discovery of new lead compounds for drug design. However, this method is not completely reliable and therefore unsatisfactory. In this study, we used massive molecular dynamics simulations of protein-ligand conformations obtained by molecular docking in order to improve the enrichment performance of molecular docking. Our screening approach employed the molecular mechanics/Poisson-Boltzmann and surface area method to estimate the binding free energies. For the top-ranking 1,000 compounds obtained by docking to a target protein, approximately 6,000 molecular dynamics simulations were performed using multiple docking poses in about a week. As a result, the enrichment performance of the top 100 compounds by our approach was improved by 1.6–4.0 times that of the enrichment performance of molecular dockings. This result indicates that the application of molecular dynamics simulations to virtual screening for lead discovery is both effective and practical. However, further optimization of the computational protocols is required for screening various target proteins.
Author Summary
Lead discovery is one of the most important processes in rational drug design. To improve the rate of the detection of lead compounds, various technologies such as high-throughput screening and combinatorial chemistry have been introduced into the pharmaceutical industry. However, since these technologies alone may not improve lead productivity, computational screening has become important. A central method for computational screening is molecular docking. This method generally docks many flexible ligands to a rigid protein and predicts the binding affinity for each ligand in a practical time. However, its ability to detect lead compounds is less reliable. In contrast, molecular dynamics simulations can treat both proteins and ligands in a flexible manner, directly estimate the effect of explicit water molecules, and provide more accurate binding affinity, although their computational costs and times are significantly greater than those of molecular docking. Therefore, we developed a special purpose computer “MDGRAPE-3” for molecular dynamics simulations and applied it to computational screening. In this paper, we report an effective method for computational screening; this method is a combination of molecular docking and massive-scale molecular dynamics simulations. The proposed method showed a higher and more stable enrichment performance than the molecular docking method used alone.
doi:10.1371/journal.pcbi.1000528
PMCID: PMC2746282  PMID: 19816553
24.  Designing Inhibitors of M2 Proton Channel against H1N1 Swine Influenza Virus 
PLoS ONE  2010;5(2):e9388.
Background
M2 proton channel of H1N1 influenza A virus is the target protein of anti-flu drugs amantadine and rimantadine. However, the two once powerful adamantane-based drugs lost their 90% bioactivity because of mutations of virus in recent twenty years. The NMR structure of the M2 channel protein determined by Schnell and Chou (Nature, 2008, 451, 591–595) may help people to solve the drug-resistant problem and develop more powerful new drugs against H1N1 influenza virus.
Methodology
Docking calculation is performed to build the complex structure between receptor M2 proton channel and ligands, including existing drugs amantadine and rimantadine, and two newly designed inhibitors. The computer-aided drug design methods are used to calculate the binding free energies, with the computational biology techniques to analyze the interactions between M2 proton channel and adamantine-based inhibitors.
Conclusions
1) The NMR structure of M2 proton channel provides a reliable structural basis for rational drug design against influenza virus. 2) The channel gating mechanism and the inhibiting mechanism of M2 proton channel, revealed by the NMR structure of M2 proton channel, provides the new ideas for channel inhibitor design. 3) The newly designed adamantane-based inhibitors based on the modeled structure of H1N1-M2 proton channel have two pharmacophore groups, which act like a “barrel hoop”, holding two adjacent helices of the H1N1-M2 tetramer through the two pharmacophore groups outside the channel. 4) The inhibitors with such binding mechanism may overcome the drug resistance problem of influenza A virus to the adamantane-based drugs.
doi:10.1371/journal.pone.0009388
PMCID: PMC2826421  PMID: 20186344
25.  WISDOM-II: Screening against multiple targets implicated in malaria using computational grid infrastructures 
Malaria Journal  2009;8:88.
Background
Despite continuous efforts of the international community to reduce the impact of malaria on developing countries, no significant progress has been made in the recent years and the discovery of new drugs is more than ever needed. Out of the many proteins involved in the metabolic activities of the Plasmodium parasite, some are promising targets to carry out rational drug discovery.
Motivation
Recent years have witnessed the emergence of grids, which are highly distributed computing infrastructures particularly well fitted for embarrassingly parallel computations like docking. In 2005, a first attempt at using grids for large-scale virtual screening focused on plasmepsins and ended up in the identification of previously unknown scaffolds, which were confirmed in vitro to be active plasmepsin inhibitors. Following this success, a second deployment took place in the fall of 2006 focussing on one well known target, dihydrofolate reductase (DHFR), and on a new promising one, glutathione-S-transferase.
Methods
In silico drug design, especially vHTS is a widely and well-accepted technology in lead identification and lead optimization. This approach, therefore builds, upon the progress made in computational chemistry to achieve more accurate in silico docking and in information technology to design and operate large scale grid infrastructures.
Results
On the computational side, a sustained infrastructure has been developed: docking at large scale, using different strategies in result analysis, storing of the results on the fly into MySQL databases and application of molecular dynamics refinement are MM-PBSA and MM-GBSA rescoring. The modeling results obtained are very promising. Based on the modeling results, In vitro results are underway for all the targets against which screening is performed.
Conclusion
The current paper describes the rational drug discovery activity at large scale, especially molecular docking using FlexX software on computational grids in finding hits against three different targets (PfGST, PfDHFR, PvDHFR (wild type and mutant forms) implicated in malaria. Grid-enabled virtual screening approach is proposed to produce focus compound libraries for other biological targets relevant to fight the infectious diseases of the developing world.
doi:10.1186/1475-2875-8-88
PMCID: PMC2691744  PMID: 19409081

Results 1-25 (1834580)