A total of 74 morphologically distinct bacterial colonies were selected during isolation of bacteria from different parts of tomato plant (rhizoplane, phylloplane and rhizosphere) as well as nearby bulk soil. The isolates were screened for plant growth promoting (PGP) traits such as production of indole acetic acid, siderophore, chitinase and hydrogen cyanide as well as phosphate solubilization. Seven isolates viz., NR4, NR6, RP3, PP1, RS4, RP6 and NR1 that exhibited multiple PGP traits were identified, based on morphological, biochemical and 16S rRNA gene sequence analysis, as species that belonged to four genera Aeromonas, Pseudomonas,Bacillus and Enterobacter. All the seven isolates were positive for 1-aminocyclopropane-1-carboxylate deaminase. Isolate NR6 was antagonistic to Fusarium solani and Fusarium moniliforme, and both PP1 and RP6 isolates were antagonistic to F. moniliforme. Except RP6, all isolates adhered significantly to glass surface suggestive of biofilm formation. Seed bacterization of tomato, groundnut, sorghum and chickpea with the seven bacterial isolates resulted in varied growth response in laboratory assay on half strength Murashige and Skoog medium. Most of the tomato isolates positively influenced tomato growth. The growth response was either neutral or negative with groundnut, sorghum and chickpea. Overall, the results suggested that bacteria with PGP traits do not positively influence the growth of all plants, and certain PGP bacteria may exhibit host-specificity. Among the isolates that positively influenced growth of tomato (NR1, RP3, PP1, RS4 and RP6) only RS4 was isolated from tomato rhizosphere. Therefore, the best PGP bacteria can also be isolated from zones other than rhizosphere or rhizoplane of a plant.
Electronic supplementary material
The online version of this article (doi:10.1007/s12088-014-0470-z) contains supplementary material, which is available to authorized users.
PGPR; Host specificity; Rhizosphere; Tomato; Biofilm
Rapid evolution and high sequence diversity enable Human Immunodeficiency Virus (HIV) populations to acquire mutations to escape antiretroviral drugs and host immune responses, and thus are major obstacles for the control of the pandemic. One strategy to overcome this problem is to focus drugs and vaccines on regions of the viral genome in which mutations are likely to cripple function through destabilization of viral proteins. Studies relying on sequence conservation alone have had only limited success in determining critically important regions. We tested the ability of two structure-based computational models to assign sites in the HIV-1 capsid protein (CA) that would be refractory to mutational change. The destabilizing mutations predicted by these models were rarely found in a database of 5811 HIV-1 CA coding sequences, with none being present at a frequency greater than 2%. Furthermore, 90% of variants with the low predicted stability (from a set of 184 CA variants whose replication fitness or infectivity has been studied in vitro) had aberrant capsid structures and reduced viral infectivity. Based on the predicted stability, we identified 45 CA sites prone to destabilizing mutations. More than half of these sites are targets of one or more known CA inhibitors. The CA regions enriched with these sites also overlap with peptides shown to induce cellular immune responses associated with lower viral loads in infected individuals. Lastly, a joint scoring metric that takes into account both sequence conservation and protein structure stability performed better at identifying deleterious mutations than sequence conservation or structure stability information alone. The computational sequence–structure stability approach proposed here might therefore be useful for identifying immutable sites in a protein for experimental validation as potential targets for drug and vaccine development.
HIV-1; capsid protein (CA); point mutation modeling; protein structural stability prediction; tolerated sequence space; destabilizing mutations; J0101
The Computational Analysis of Novel Drug Opportunities (CANDO) platform (http://protinfo.org/cando) uses similarity of compound–proteome interaction signatures to infer homology of compound/drug behavior. We constructed interaction signatures for 3733 human ingestible compounds covering 48,278 protein structures mapping to 2030 indications based on basic science methodologies to predict and analyze protein structure, function, and interactions developed by us and others. Our signature comparison and ranking approach yielded benchmarking accuracies of 12–25% for 1439 indications with at least two approved compounds. We prospectively validated 49/82 ‘high value’ predictions from nine studies covering seven indications, with comparable or better activity to existing drugs, which serve as novel repurposed therapeutics. Our approach may be generalized to compounds beyond those approved by the FDA, and can also consider mutations in protein structures to enable personalization. Our platform provides a holistic multiscale modeling framework of complex atomic, molecular, and physiological systems with broader applications in medicine and engineering.
An established paradigm in current drug development is (i) to identify a single protein target whose inhibition is likely to result in the successful treatment of a disease of interest; (ii) to assay experimentally large libraries of small-molecule compounds in vitro and in vivo to identify promising inhibitors in model systems; and (iii) to determine whether the findings are extensible to humans. This complex process, which is largely based on trial and error, is risk-, time- and cost-intensive. Computational (virtual) screening of drug-like compounds simultaneously against the atomic structures of multiple protein targets, taking into account protein–inhibitor dynamics, might help to identify lead inhibitors more efficiently, particularly for complex drug-resistant diseases. Here we discuss the potential benefits of this approach, using HIV-1 and Plasmodium falciparum infections as examples. We propose a virtual drug discovery ‘pipeline’ that will not only identify lead inhibitors efficiently, but also help minimize side-effects and toxicity, thereby increasing the likelihood of successful therapies.
The 1,000 plants (1KP) project is an international multi-disciplinary consortium that has generated transcriptome data from over 1,000 plant species, with exemplars for all of the major lineages across the Viridiplantae (green plants) clade. Here, we describe how to access the data used in a phylogenomics analysis of the first 85 species, and how to visualize our gene and species trees. Users can develop computational pipelines to analyse these data, in conjunction with data of their own that they can upload. Computationally estimated protein-protein interactions and biochemical pathways can be visualized at another site. Finally, we comment on our future plans and how they fit within this scalable system for the dissemination, visualization, and analysis of large multi-species data sets.
Viridiplantae; Biodiversity; Transcriptomes; Phylogenomics; Interactions; Pathways
Motivation: fast_protein_cluster is a fast, parallel and memory efficient package used to cluster 60 000 sets of protein models (with up to 550 000 models per set) generated by the Nutritious Rice for the World project.
Results: fast_protein_cluster is an optimized and extensible toolkit that supports Root Mean Square Deviation after optimal superposition (RMSD) and Template Modeling score (TM-score) as metrics. RMSD calculations using a laptop CPU are 60× faster than qcprot and 3× faster than current graphics processing unit (GPU) implementations. New GPU code further increases the speed of RMSD and TM-score calculations. fast_protein_cluster provides novel k-means and hierarchical clustering methods that are up to 250× and 2000× faster, respectively, than Clusco, and identify significantly more accurate models than Spicker and Clusco.
Availability and implementation: fast_protein_cluster is written in C++ using OpenMP for multi-threading support. Custom streaming Single Instruction Multiple Data (SIMD) extensions and advanced vector extension intrinsics code accelerate CPU calculations, and OpenCL kernels support AMD and Nvidia GPUs. fast_protein_cluster is available under the M.I.T. license. (http://software.compbio.washington.edu/fast_protein_cluster)
Supplementary data are available at Bioinformatics online.
Enamel matrix self-assembly has long been suggested as the driving force behind aligned nanofibrous hydroxyapatite formation. We tested if amelogenin, the main enamel matrix protein, can self-assemble into ribbon-like structures in physiologic solutions. Ribbons 17nm wide were observed to grow several microns in length, requiring calcium, phosphate, and pH 4.0–6.0. The pH range suggests that the formation of ion bridges through protonated histidine residues is essential to self-assembly, supported by a statistical analysis of 212 phosphate-binding proteins predicting twelve phosphate-binding histidines. Thermophoretic analysis verified the importance of calcium and phosphate in self-assembly. X-ray scattering characterized amelogenin dimers with dimensions fitting the cross-section of the amelogenin ribbon, leading to the hypothesis that antiparallel dimers are the building blocks of the ribbons. Over 5–7 days, ribbons self-organized into bundles composed of aligned ribbons mimicking the structure of enamel crystallites in enamel rods. These observations confirm reports of filamentous organic components in developing enamel and provide a new model for matrix-templated enamel mineralization.
Enamel; amelogenin; self-assembly; protonated histidine; biomineralization
Cytochrome P450 2C9 (CYP2C9) is crucial in excretion of commonly prescribed drugs. However, changes in metabolic activity caused by CYP2C9 polymorphisms inevitably result in adverse drug effects. CYP2C9*2 and *3 are prevalent in Caucasian populations whereas CYP2C9*13 is remarkable in Asian populations. Single amino acid substitutions caused by these mutations are located outside catalytic cavity but affect kinetic activities of mutants compared to wild-type enzyme. To relate distal effects of these mutations and defective drug metabolisms, simulations of CYP2C9 binding to anti-coagulant (S)-warfarin were performed as a system model. Representative (S)-warfarin-bound forms of wild-type and mutants were sorted and assessed through knowledge-based scoring function. Interatomic interactions towards (S)-warfarin were predicted to be less favorable in mutant structures in correlation with larger distance between hydroxylation site of (S)-warfarin and reactive oxyferryl heme than wild-type structure. Using computational approach could delineate complication of CYP polymorphism in management of drug therapy.
Protein structure information is essential to understand protein function. Computational methods to accurately predict protein structure from the sequence have primarily been evaluated on protein sequences representing full-length native proteins. Here, we demonstrate that top-performing structure prediction methods can accurately predict the partial structures of proteins encoded by sequences that contain approximately 50% or more of the full-length protein sequence. We hypothesize that structure prediction may be useful for predicting functions of proteins whose corresponding genes are mapped expressed sequence tags (ESTs) that encode partial-length amino acid sequences. Additionally, we identify a confidence score representing the quality of a predicted structure as a useful means of predicting the likelihood that an arbitrary polypeptide sequence represents a portion of a foldable protein sequence (“foldability”). This work has ramifications for the prediction of protein structure with limited or noisy sequence information, as well as genome annotation.
protein structure prediction; EST; expressed sequence tag; protein folding; protein design
The protein encoded by GmRLK18-1 (Glyma_18_02680 on chromosome 18) was a receptor like kinase (RLK) encoded within the soybean (Glycine max L. Merr.) Rhg1/Rfs2 locus. The locus underlies resistance to the soybean cyst nematode (SCN) Heterodera glycines (I.) and causal agent of sudden death syndrome (SDS) Fusarium virguliforme (Aoki). Previously the leucine rich repeat (LRR) domain was expressed in Escherichia coli.
The aims here were to evaluate the LRRs ability to; homo-dimerize; bind larger proteins; and bind to small peptides. Western analysis suggested homo-dimers could form after protein extraction from roots. The purified LRR domain, from residue 131–485, was seen to form a mixture of monomers and homo-dimers in vitro. Cross-linking experiments in vitro showed the H274N region was close (<11.1 A) to the highly conserved cysteine residue C196 on the second homo-dimer subunit. Binding constants of 20–142 nM for peptides found in plant and nematode secretions were found. Effects on plant phenotypes including wilting, stem bending and resistance to infection by SCN were observed when roots were treated with 50 pM of the peptides. Far-Western analyses followed by MS showed methionine synthase and cyclophilin bound strongly to the LRR domain. A second LRR from GmRLK08-1 (Glyma_08_g11350) did not show these strong interactions.
The LRR domain of the GmRLK18-1 protein formed both a monomer and a homo-dimer. The LRR domain bound avidly to 4 different CLE peptides, a cyclophilin and a methionine synthase. The CLE peptides GmTGIF, GmCLE34, GmCLE3 and HgCLE were previously reported to be involved in root growth inhibition but here GmTGIF and HgCLE were shown to alter stem morphology and resistance to SCN. One of several models from homology and ab-initio modeling was partially validated by cross-linking. The effect of the 3 amino acid replacements present among RLK allotypes, A87V, Q115K and H274N were predicted to alter domain stability and function. Therefore, the LRR domain of GmRLK18-1 might underlie both root development and disease resistance in soybean and provide an avenue to develop new variants and ligands that might promote reduced losses to SCN.
Receptor; Leucine-rich repeat; Ligand; Peptide; Cross-link; Predicted
We introduce the concept of metaconsensus and employ it to make high confidence predictions of early enzyme functions and the metabolic properties that they may have produced. Several independent studies have used comparative bioinformatics methods to identify taxonomically broad features of genomic sequence data, protein structure data, and metabolic pathway data in order to predict physiological features that were present in early, ancestral life forms. But all such methods carry with them some level of technical bias. Here, we cross-reference the results of these previous studies to determine enzyme functions predicted to be ancient by multiple methods. We survey modern metabolic pathways to identify those that maintain the highest frequency of metaconsensus enzymes. Using the full set of modern reactions catalyzed by these metaconsensus enzyme functions, we reconstruct a representative metabolic network that may reflect the core metabolism of early life forms. Our results show that ten enzyme functions, four hydrolases, three transferases, one oxidoreductase, one lyase, and one ligase, are determined by metaconsensus to be present at least as late as the last universal common ancestor. Subnetworks within central metabolic processes related to sugar and starch metabolism, amino acid biosynthesis, phospholipid metabolism, and CoA biosynthesis, have high frequencies of these enzyme functions. We demonstrate that a large metabolic network can be generated from this small number of enzyme functions.
Motivation: Accurate comparisons of different protein structures play important roles in structural biology, structure prediction and functional annotation. The root-mean-square-deviation (RMSD) after optimal superposition is the predominant measure of similarity due to the ease and speed of computation. However, global RMSD is dependent on the length of the protein and can be dominated by divergent loops that can obscure local regions of similarity. A more sophisticated measure of structure similarity, Template Modeling (TM)-score, avoids these problems, and it is one of the measures used by the community-wide experiments of critical assessment of protein structure prediction to compare predicted models with experimental structures. TM-score calculations are, however, much slower than RMSD calculations. We have therefore implemented a very fast version of TM-score for Graphical Processing Units (TM-score-GPU), using a new and novel hybrid Kabsch/quaternion method for calculating the optimal superposition and RMSD that is designed for parallel applications. This acceleration in speed allows TM-score to be used efficiently in computationally intensive applications such as for clustering of protein models and genome-wide comparisons of structure.
Results: TM-score-GPU was applied to six sets of models from Nutritious Rice for the World for a total of 3 million comparisons. TM-score-GPU is 68 times faster on an ATI 5870 GPU, on average, than the original CPU single-threaded implementation on an AMD Phenom II 810 quad-core processor.
Availability and implementation: The complete source, including the GPU code and the hybrid RMSD subroutine, can be downloaded and used without restriction at http://software.compbio.washington.edu/misc/downloads/tmscore/. The implementation is in C++/OpenCL.
Supplementary data are available at Bioinformatics online.
Cementum is the outer-, mineralized-tissue covering the tooth root and an essential part of the system of periodontal tissue that anchors the tooth to the bone. Periodontal disease results from the destructive behavior of the host elicited by an infectious biofilm adhering to the tooth root and left untreated, may lead to tooth loss. We describe a novel protocol for identifying peptide sequences from native proteins with the potential to repair damaged dental tissues by controlling hydroxyapatite biomineralization. Using amelogenin as a case study and a bioinformatics scoring matrix, we identified regions within amelogenin that are shared with a set of hydroxyapatite-binding peptides (HABPs) previously selected by phage display. One 22-amino acid long peptide regions referred to as amelogenin-derived peptide 5 (ADP5) was shown to facilitate cell-free formation of a cementum-like hydroxyapatite mineral layer on demineralized human root dentin that, in turn, supported attachment of periodontal ligament cells in vitro. Our findings have several implications in peptide-assisted mineral formation that mimic biomineralization. By further elaborating the mechanism for protein control over the biomineral formed, we afford new insights into the evolution of protein–mineral interactions. By exploiting small peptide domains of native proteins, our understanding of structure–function relationships of biomineralizing proteins can be extended and these peptides can be utilized to engineer mineral formation. Finally, the cementomimetic layer formed by ADP5 has the potential clinical application to repair diseased root surfaces so as to promote the regeneration of periodontal tissues and thereby reduce the morbidity associated with tooth loss.
amelogenin; amelogenin-derived peptides; bioinformatics; biomineralization; cementomimetics; cementum; demineralization; remineralization
The genetic contribution to the pathogenesis of isolated single suture craniosynostosis is poorly understood. The role of mutations in genes known to be associated with syndromic synostosis appears to be limited. We present our findings of a candidate gene resequencing approach to identify rare variants associated with the most common forms of isolated craniosynostosis. Resequencing of the coding regions, splice junction sites, and 5′ and 3′ untranslated regions of 27 candidate genes in 186 cases of isolated nonsyndromic single suture synostosis revealed three novel and two rare sequence variants (R406H, R595H, N857S, P190S, M446V) in insulin-like growth factor I receptor (IGF1R) that are enriched relative to control samples. Mapping the resultant amino acid changes to the modeled homodimer protein structure suggests a structural basis for segregation between these and other disease-associated mutations found in IGF1R. These data suggest that IGF1R mutations may contribute to the risk and in some cases cause single suture craniosynostosis.
craniosynostosis; IGF1R; non-syndromic; isolated; simple; sagittal; coronal; metopic; resequencing; non-synonymous SNP
Advancements in sequencing techniques place personalized genomic medicine upon the horizon, bringing along the responsibility of clinicians to understand the likelihood for a mutation to cause disease, and of scientists to separate etiology from nonpathologic variability. Pathogenicity is discernable from patterns of interactions between a missense mutation, the surrounding protein structure, and intermolecular interactions. Physicochemical stability calculations are not accessible without structures, as is the case for the vast majority of human proteins, so diagnostic accuracy remains in infancy. To model the effects of missense mutations on functional stability without structure, we combine novel protein sequence analysis algorithms to discern spatial distributions of sequence, evolutionary, and physicochemical conservation, through a new approach to optimize component selection. Novel components include a combinatory substitution matrix and two heuristic algorithms that detect positions which confer structural support to interaction interfaces. The method reaches 0.91 AUC in ten-fold cross-validation to predict alteration of function for 6,392 in vitro mutations. For clinical utility we trained the method on 7,022 disease associated missense mutations within the Online Mendelian inheritance in man amongst a larger randomized set. In a blinded prospective test to delineate mutations unique to 186 patients with craniosynostosis from those in the 95 highly variant Coriell controls and 1000 age matched controls, we achieved roughly 1/3 sensitivity and perfect specificity. The component algorithms retained during machine learning constitute novel protein sequence analysis techniques to describe environments supporting neutrality or pathology of mutations. This approach to pathogenetics enables new insight into the mechanistic relationship of missense mutations to disease phenotypes in our patients.
Computational biology; protein stability; machine learning; missense mutation; nonsynonymous SNP; sequence analysis
The diversity of characterized protein functions found amongst experimentally interrogated proteins suggests that a vast array of unknown functions remains undiscovered. These protein functions are imparted by specific geometric distributions of amino acid residue chemical moieties, each contributing a functional interaction. We hypothesize that individual residue function contributions are predictable through sequence analytic knowledge based algorithms, and that they can be recombined to understand composite protein function by predicting spatial relation in tertiary structure. We assess the former by training a meta-functional signature algorithm to specifically predict calcium ion binding residues from protein sequence. We estimate the latter by testing for match between predictive contribution of positions in predicted secondary structures and patterns of side chain proximity forced by secondary structure moieties. Specific training for calcium binding results in 83% area under the receiver operator characteristic curve added value over random (AUCoR) and p<10−300 significance as measured by Kendall’s τ in ten fold cross validation for parallel sets of 811 residues in 336 proteins and 696 residues in 299 proteins. Training for generalized function results in 63% AUCoR and p≅10−221 for the same tests. Including inference of side chain proximity improves predictive ability by 2% AUCoR consistently. The results demonstrate that protein meta-functional signatures can be trained to predict specific protein functions by considering amino acid identity and structural features accessible from sequence, laying the groundwork for composite sequence based function site prediction.
Protein sequence analysis; Protein function prediction; Calcium; Protein binding site; Functional signature
Successful protein structure prediction requires accurate low-resolution scoring functions so that protein main chain conformations that are close to the native can be identified. Once that is accomplished, a more detailed and time-consuming treatment to produce all-atom models can be undertaken. The earliest low-resolution scoring used simple distance-based "contact potentials," but more recently, the relative orientations of interacting amino acids have been taken into account to improve performance.
We developed a new knowledge-based scoring function, LoCo, that locates the interaction partners of each individual residue within a local coordinate system based only on the position of its main chain N, Cα and C atoms. LoCo was trained on a large set of experimentally determined structures and optimized using standard sets of modeled structures, or "decoys." No structure used to train or optimize the function was included among those used to test it. When tested against 29 other published main chain functions on a group of 77 commonly used decoy sets, our function outperformed all others in Cα RMSD rank of the best-scoring decoy, with statistically significant p-values < 0.05 for 26 out of the 29 other functions considered. LoCo is fast, requiring on average less than 6 microseconds per residue for interaction and scoring on commonly-used computer hardware.
Our function demonstrates an unmatched combination of accuracy, speed, and simplicity and shows excellent promise for protein structure prediction. Broader applications may include protein-protein interactions and protein design.
In this review, we provide an overview of the methods employed in four recent studies that described novel methods for computational prediction of secreted effectors from type III and IV secretion systems in Gram-negative bacteria. We present the results of these studies in terms of performance at accurately predicting secreted effectors and similarities found between secretion signals that may reflect biologically relevant features for recognition. We discuss the Web-based tools for secreted effector prediction described in these studies and announce the availability of our tool, the SIEVE server (http://www.sysbep.org/sieve). Finally, we assess the accuracies of the three type III effector prediction methods on a small set of proteins not known prior to the development of these tools that we recently discovered and validated using both experimental and computational approaches. Our comparison shows that all methods use similar approaches and, in general, arrive at similar conclusions. We discuss the possibility of an order-dependent motif in the secretion signal, which was a point of disagreement in the studies. Our results show that there may be classes of effectors in which the signal has a loosely defined motif and others in which secretion is dependent only on compositional biases. Computational prediction of secreted effectors from protein sequences represents an important step toward better understanding the interaction between pathogens and hosts.
Calculation of the root mean square deviation (RMSD) between the atomic coordinates of two optimally superposed structures is a basic component of structural comparison techniques. We describe a quaternion based method, GPU-Q-J, that is stable with single precision calculations and suitable for graphics processor units (GPUs). The application was implemented on an ATI 4770 graphics card in C/C++ and Brook+ in Linux where it was 260 to 760 times faster than existing unoptimized CPU methods. Source code is available from the Compbio website http://software.compbio.washington.edu/misc/downloads/st_gpu_fit/ or from the author LHH.
The Nutritious Rice for the World Project (NRW) on World Community Grid predicted de novo, the structures of over 62,000 small proteins and protein domains returning a total of 10 billion candidate structures. Clustering ensembles of structures on this scale requires calculation of large similarity matrices consisting of RMSDs between each pair of structures in the set. As a real-world test, we calculated the matrices for 6 different ensembles from NRW. The GPU method was 260 times faster that the fastest existing CPU based method and over 500 times faster than the method that had been previously used.
GPU-Q-J is a significant advance over previous CPU methods. It relieves a major bottleneck in the clustering of large numbers of structures for NRW. It also has applications in structure comparison methods that involve multiple superposition and RMSD determination steps, particularly when such methods are applied on a proteome and genome wide scale.
Immunologic responses of the tooth to caries begin with odontoblasts recognizing carious bacteria. Inflammatory propagation eventually leads to tooth pulp necrosis and danger to health. The present study aims to determine cytokine gene expression profiles generated within human teeth in response to dental caries in vivo and to build a mechanistic model of these responses and the downstream signaling network.
We demonstrate profound differential up-regulation of inflammatory genes in the odontoblast layer (ODL) in human teeth with caries in vivo, while the pulp remains largely unchanged. Interleukins, chemokines, and all tested receptors thereof were differentially up-regulated in ODL of carious teeth, well over one hundred-fold for 35 of 84 genes. By interrogating reconstructed protein interaction networks corresponding to the differentially up-regulated genes, we develop the hypothesis that pro-inflammatory cytokines highly expressed in ODL of carious teeth, IL-1β, IL-1α, and TNF-α, carry the converged inflammatory signal. We show that IL1β amplifies antimicrobial peptide production in odontoblasts in vitro 100-fold more than lipopolysaccharide, in a manner matching subsequent in vivo measurements.
Our data suggest that ODL amplifies bacterial signals dramatically by self-feedback cytokine-chemokine signal-receptor cycling, and signal convergence through IL1R1 and possibly others, to increase defensive capacity including antimicrobial peptide production to protect the tooth and contain the battle against carious bacteria within the dentin.
We examine the ability of current state-of-the-art methods in protein structure prediction to discriminate topologically distant folds encoded by highly similar (>90% sequence identity) designed proteins in blind protein structure prediction experiments. We detail the corresponding prognosis for the protein fold recognition field and highlight the features of the methodologies that successfully deciphered this folding riddle.
Several novel and established knowledge-based discriminatory function formulations and reference state derivations have been evaluated to identify parameter sets capable of distinguishing native and near-native biomolecular interactions from incorrect ones. We developed the r·m·r function, a novel atomic level radial distribution function with mean reference state that averages over all pairwise atom types from a reduced atom type composition, using experimentally determined intermolecular complexes in the Cambridge Structural Database (CSD) and the Protein Data Bank (PDB) as the information sources. We demonstrate that r·m·r had the best discriminatory accuracy and power for protein-small molecule and protein-DNA interactions, regardless of whether the native complex was included or excluded from the test set. The superior performance of the r·m·r discriminatory function compared to seventeen alternative functions evaluated on publicly available test sets for protein-small molecule and protein-DNA interactions indicated that the function was not over optimized through back testing on a single class of biomolecular interactions. The initial success of the reduced composition and superior performance with the CSD as the distribution set over the PDB implies that further improvements and generality of the function are possible by deriving probabilities from subsets of the CSD, using structures that consist of only the atom types to be considered for given biomolecular interactions. The method is available as a web server module at http://protinfo.compbio.washington.edu.
discriminatory function; knowledge-based; protein-small molecule; protein-DNA; protein-ligand; complexes; biomolecular interactions
Viral fusogenic envelope proteins are important targets for the development of inhibitors of viral entry. We report an approach for the computational design of peptide inhibitors of the dengue 2 virus (DENV-2) envelope (E) protein using high-resolution structural data from a pre-entry dimeric form of the protein. By using predictive strategies together with computational optimization of binding “pseudoenergies”, we were able to design multiple peptide sequences that showed low micromolar viral entry inhibitory activity. The two most active peptides, DN57opt and 1OAN1, were designed to displace regions in the domain II hinge, and the first domain I/domain II beta sheet connection, respectively, and show fifty percent inhibitory concentrations of 8 and 7 µM respectively in a focus forming unit assay. The antiviral peptides were shown to interfere with virus:cell binding, interact directly with the E proteins and also cause changes to the viral surface using biolayer interferometry and cryo-electron microscopy, respectively. These peptides may be useful for characterization of intermediate states in the membrane fusion process, investigation of DENV receptor molecules, and as lead compounds for drug discovery.
Virus surface proteins mediate interactions with target cells during the initial events in the process of infection. Inhibiting these proteins is therefore a major target for the development of antiviral drugs. However, there are a very large number of different viruses, each with their own distinct surface proteins and, with just a few exceptions, it is not clear how to build novel molecules to inhibit them. Here we applied a computational binding optimization strategy to an atomic resolution structure of dengue virus serotype 2 envelope protein to generate peptide sequences that should interact strongly with this protein. We picked dengue virus as a target because it is the causative agent for the most important mosquito transmitted viral disease. Out of a small number of candidates designed and tested, we identified two different highly inhibitory peptides. To verify our results, we showed that these peptides block virus:cell binding, interfere with a step during viral entry, alter the surface structure of dengue viral particles, and that they interact directly with dengue virus envelope protein. We expect that our approach may be generally applicable to other viral surface proteins where a high resolution structure is available.
The principal bottleneck in protein structure prediction is the refinement of models from lower accuracies to the resolution observed by experiment. We developed a novel constraints-based refinement method that identifies a high number of accurate input constraints from initial models and rebuilds them using restrained torsion angle dynamics (rTAD). We previously created a Bayesian statistics-based residue-specific all-atom probability discriminatory function (RAPDF) to discriminate native-like models by measuring the probability of accuracy for atom type distances within a given model. Here, we exploit RAPDF to score (i.e., filter) constraints from initial predictions that may or may not be close to a native-like state, obtain consensus of top scoring constraints amongst five initial models, and compile sets with no redundant residue pair constraints. We find that this method consistently produces a large and highly accurate set of distance constraints from which to build refinement models. We further optimize the balance between accuracy and coverage of constraints by producing multiple structure sets using different constraint distance cutoffs, and note that the cutoff governs spatially near versus distant effects in model generation. This complete procedure of deriving distance constraints for rTAD simulations improves the quality of initial predictions significantly in all cases evaluated by us. Our procedure represents a significant step in solving the protein structure prediction and refinement problem, by enabling the use of consensus constraints, RAPDF, and rTAD for protein structure modeling and refinement.
protein structure prediction; refinement; knowledge-based functions