PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-25 (44)
 

Clipboard (0)
None

Select a Filter Below

Year of Publication
more »
1.  fast_protein_cluster: parallel and optimized clustering of large-scale protein modeling data 
Bioinformatics  2014;30(12):1774-1776.
Motivation: fast_protein_cluster is a fast, parallel and memory efficient package used to cluster 60 000 sets of protein models (with up to 550 000 models per set) generated by the Nutritious Rice for the World project.
Results: fast_protein_cluster is an optimized and extensible toolkit that supports Root Mean Square Deviation after optimal superposition (RMSD) and Template Modeling score (TM-score) as metrics. RMSD calculations using a laptop CPU are 60× faster than qcprot and 3× faster than current graphics processing unit (GPU) implementations. New GPU code further increases the speed of RMSD and TM-score calculations. fast_protein_cluster provides novel k-means and hierarchical clustering methods that are up to 250× and 2000× faster, respectively, than Clusco, and identify significantly more accurate models than Spicker and Clusco.
Availability and implementation: fast_protein_cluster is written in C++ using OpenMP for multi-threading support. Custom streaming Single Instruction Multiple Data (SIMD) extensions and advanced vector extension intrinsics code accelerate CPU calculations, and OpenCL kernels support AMD and Nvidia GPUs. fast_protein_cluster is available under the M.I.T. license. (http://software.compbio.washington.edu/fast_protein_cluster)
Contact: lhhung@compbio.washington.edu
Supplementary information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/btu098
PMCID: PMC4058946  PMID: 24532722
2.  Self-assembly of Filamentous Amelogenin Requires Calcium and Phosphate: From Dimers via Nanoribbons to Fibrils 
Biomacromolecules  2012;13(11):3494-3502.
Enamel matrix self-assembly has long been suggested as the driving force behind aligned nanofibrous hydroxyapatite formation. We tested if amelogenin, the main enamel matrix protein, can self-assemble into ribbon-like structures in physiologic solutions. Ribbons 17nm wide were observed to grow several microns in length, requiring calcium, phosphate, and pH 4.0–6.0. The pH range suggests that the formation of ion bridges through protonated histidine residues is essential to self-assembly, supported by a statistical analysis of 212 phosphate-binding proteins predicting twelve phosphate-binding histidines. Thermophoretic analysis verified the importance of calcium and phosphate in self-assembly. X-ray scattering characterized amelogenin dimers with dimensions fitting the cross-section of the amelogenin ribbon, leading to the hypothesis that antiparallel dimers are the building blocks of the ribbons. Over 5–7 days, ribbons self-organized into bundles composed of aligned ribbons mimicking the structure of enamel crystallites in enamel rods. These observations confirm reports of filamentous organic components in developing enamel and provide a new model for matrix-templated enamel mineralization.
doi:10.1021/bm300942c
PMCID: PMC3496023  PMID: 22974364
Enamel; amelogenin; self-assembly; protonated histidine; biomineralization
3.  Correction: Distal Effect of Amino Acid Substitutions in CYP2C9 Polymorphic Variants Causes Differences in Interatomic Interactions against (S)-Warfarin 
PLoS ONE  2013;8(9):10.1371/annotation/416be1ef-f439-445a-96f8-b1d2f01c6957.
doi:10.1371/annotation/416be1ef-f439-445a-96f8-b1d2f01c6957
PMCID: PMC3796585  PMID: 24137505
4.  Distal Effect of Amino Acid Substitutions in CYP2C9 Polymorphic Variants Causes Differences in Interatomic Interactions against (S)-Warfarin 
PLoS ONE  2013;8(9):e74053.
Cytochrome P450 2C9 (CYP2C9) is crucial in excretion of commonly prescribed drugs. However, changes in metabolic activity caused by CYP2C9 polymorphisms inevitably result in adverse drug effects. CYP2C9*2 and *3 are prevalent in Caucasian populations whereas CYP2C9*13 is remarkable in Asian populations. Single amino acid substitutions caused by these mutations are located outside catalytic cavity but affect kinetic activities of mutants compared to wild-type enzyme. To relate distal effects of these mutations and defective drug metabolisms, simulations of CYP2C9 binding to anti-coagulant (S)-warfarin were performed as a system model. Representative (S)-warfarin-bound forms of wild-type and mutants were sorted and assessed through knowledge-based scoring function. Interatomic interactions towards (S)-warfarin were predicted to be less favorable in mutant structures in correlation with larger distance between hydroxylation site of (S)-warfarin and reactive oxyferryl heme than wild-type structure. Using computational approach could delineate complication of CYP polymorphism in management of drug therapy.
doi:10.1371/journal.pone.0074053
PMCID: PMC3759441  PMID: 24023924
5.  Structure Prediction of Partial-Length Protein Sequences 
Protein structure information is essential to understand protein function. Computational methods to accurately predict protein structure from the sequence have primarily been evaluated on protein sequences representing full-length native proteins. Here, we demonstrate that top-performing structure prediction methods can accurately predict the partial structures of proteins encoded by sequences that contain approximately 50% or more of the full-length protein sequence. We hypothesize that structure prediction may be useful for predicting functions of proteins whose corresponding genes are mapped expressed sequence tags (ESTs) that encode partial-length amino acid sequences. Additionally, we identify a confidence score representing the quality of a predicted structure as a useful means of predicting the likelihood that an arbitrary polypeptide sequence represents a portion of a foldable protein sequence (“foldability”). This work has ramifications for the prediction of protein structure with limited or noisy sequence information, as well as genome annotation.
doi:10.3390/ijms140714892
PMCID: PMC3742278  PMID: 23867606
protein structure prediction; EST; expressed sequence tag; protein folding; protein design
6.  Homo-dimerization and ligand binding by the leucine-rich repeat domain at RHG1/RFS2 underlying resistance to two soybean pathogens 
BMC Plant Biology  2013;13:43.
Background
The protein encoded by GmRLK18-1 (Glyma_18_02680 on chromosome 18) was a receptor like kinase (RLK) encoded within the soybean (Glycine max L. Merr.) Rhg1/Rfs2 locus. The locus underlies resistance to the soybean cyst nematode (SCN) Heterodera glycines (I.) and causal agent of sudden death syndrome (SDS) Fusarium virguliforme (Aoki). Previously the leucine rich repeat (LRR) domain was expressed in Escherichia coli.
Results
The aims here were to evaluate the LRRs ability to; homo-dimerize; bind larger proteins; and bind to small peptides. Western analysis suggested homo-dimers could form after protein extraction from roots. The purified LRR domain, from residue 131–485, was seen to form a mixture of monomers and homo-dimers in vitro. Cross-linking experiments in vitro showed the H274N region was close (<11.1 A) to the highly conserved cysteine residue C196 on the second homo-dimer subunit. Binding constants of 20–142 nM for peptides found in plant and nematode secretions were found. Effects on plant phenotypes including wilting, stem bending and resistance to infection by SCN were observed when roots were treated with 50 pM of the peptides. Far-Western analyses followed by MS showed methionine synthase and cyclophilin bound strongly to the LRR domain. A second LRR from GmRLK08-1 (Glyma_08_g11350) did not show these strong interactions.
Conclusions
The LRR domain of the GmRLK18-1 protein formed both a monomer and a homo-dimer. The LRR domain bound avidly to 4 different CLE peptides, a cyclophilin and a methionine synthase. The CLE peptides GmTGIF, GmCLE34, GmCLE3 and HgCLE were previously reported to be involved in root growth inhibition but here GmTGIF and HgCLE were shown to alter stem morphology and resistance to SCN. One of several models from homology and ab-initio modeling was partially validated by cross-linking. The effect of the 3 amino acid replacements present among RLK allotypes, A87V, Q115K and H274N were predicted to alter domain stability and function. Therefore, the LRR domain of GmRLK18-1 might underlie both root development and disease resistance in soybean and provide an avenue to develop new variants and ligands that might promote reduced losses to SCN.
doi:10.1186/1471-2229-13-43
PMCID: PMC3626623  PMID: 23497186
Receptor; Leucine-rich repeat; Ligand; Peptide; Cross-link; Predicted
7.  The Enzymatic and Metabolic Capabilities of Early Life 
PLoS ONE  2012;7(9):e39912.
We introduce the concept of metaconsensus and employ it to make high confidence predictions of early enzyme functions and the metabolic properties that they may have produced. Several independent studies have used comparative bioinformatics methods to identify taxonomically broad features of genomic sequence data, protein structure data, and metabolic pathway data in order to predict physiological features that were present in early, ancestral life forms. But all such methods carry with them some level of technical bias. Here, we cross-reference the results of these previous studies to determine enzyme functions predicted to be ancient by multiple methods. We survey modern metabolic pathways to identify those that maintain the highest frequency of metaconsensus enzymes. Using the full set of modern reactions catalyzed by these metaconsensus enzyme functions, we reconstruct a representative metabolic network that may reflect the core metabolism of early life forms. Our results show that ten enzyme functions, four hydrolases, three transferases, one oxidoreductase, one lyase, and one ligase, are determined by metaconsensus to be present at least as late as the last universal common ancestor. Subnetworks within central metabolic processes related to sugar and starch metabolism, amino acid biosynthesis, phospholipid metabolism, and CoA biosynthesis, have high frequencies of these enzyme functions. We demonstrate that a large metabolic network can be generated from this small number of enzyme functions.
doi:10.1371/journal.pone.0039912
PMCID: PMC3438178  PMID: 22970111
8.  Accelerated protein structure comparison using TM-score-GPU 
Bioinformatics  2012;28(16):2191-2192.
Motivation: Accurate comparisons of different protein structures play important roles in structural biology, structure prediction and functional annotation. The root-mean-square-deviation (RMSD) after optimal superposition is the predominant measure of similarity due to the ease and speed of computation. However, global RMSD is dependent on the length of the protein and can be dominated by divergent loops that can obscure local regions of similarity. A more sophisticated measure of structure similarity, Template Modeling (TM)-score, avoids these problems, and it is one of the measures used by the community-wide experiments of critical assessment of protein structure prediction to compare predicted models with experimental structures. TM-score calculations are, however, much slower than RMSD calculations. We have therefore implemented a very fast version of TM-score for Graphical Processing Units (TM-score-GPU), using a new and novel hybrid Kabsch/quaternion method for calculating the optimal superposition and RMSD that is designed for parallel applications. This acceleration in speed allows TM-score to be used efficiently in computationally intensive applications such as for clustering of protein models and genome-wide comparisons of structure.
Results: TM-score-GPU was applied to six sets of models from Nutritious Rice for the World for a total of 3 million comparisons. TM-score-GPU is 68 times faster on an ATI 5870 GPU, on average, than the original CPU single-threaded implementation on an AMD Phenom II 810 quad-core processor.
Availability and implementation: The complete source, including the GPU code and the hybrid RMSD subroutine, can be downloaded and used without restriction at http://software.compbio.washington.edu/misc/downloads/tmscore/. The implementation is in C++/OpenCL.
Contact: ram@compbio.washington.edu
Supplementary Information: Supplementary data are available at Bioinformatics online.
doi:10.1093/bioinformatics/bts345
PMCID: PMC3413391  PMID: 22718788
9.  Cementomimetics—constructing a cementum-like biomineralized microlayer via amelogenin-derived peptides 
Cementum is the outer-, mineralized-tissue covering the tooth root and an essential part of the system of periodontal tissue that anchors the tooth to the bone. Periodontal disease results from the destructive behavior of the host elicited by an infectious biofilm adhering to the tooth root and left untreated, may lead to tooth loss. We describe a novel protocol for identifying peptide sequences from native proteins with the potential to repair damaged dental tissues by controlling hydroxyapatite biomineralization. Using amelogenin as a case study and a bioinformatics scoring matrix, we identified regions within amelogenin that are shared with a set of hydroxyapatite-binding peptides (HABPs) previously selected by phage display. One 22-amino acid long peptide regions referred to as amelogenin-derived peptide 5 (ADP5) was shown to facilitate cell-free formation of a cementum-like hydroxyapatite mineral layer on demineralized human root dentin that, in turn, supported attachment of periodontal ligament cells in vitro. Our findings have several implications in peptide-assisted mineral formation that mimic biomineralization. By further elaborating the mechanism for protein control over the biomineral formed, we afford new insights into the evolution of protein–mineral interactions. By exploiting small peptide domains of native proteins, our understanding of structure–function relationships of biomineralizing proteins can be extended and these peptides can be utilized to engineer mineral formation. Finally, the cementomimetic layer formed by ADP5 has the potential clinical application to repair diseased root surfaces so as to promote the regeneration of periodontal tissues and thereby reduce the morbidity associated with tooth loss.
doi:10.1038/ijos.2012.40
PMCID: PMC3412665  PMID: 22743342
amelogenin; amelogenin-derived peptides; bioinformatics; biomineralization; cementomimetics; cementum; demineralization; remineralization
10.  IGF1R Variants Associated with Isolated Single Suture Craniosynostosis 
The genetic contribution to the pathogenesis of isolated single suture craniosynostosis is poorly understood. The role of mutations in genes known to be associated with syndromic synostosis appears to be limited. We present our findings of a candidate gene resequencing approach to identify rare variants associated with the most common forms of isolated craniosynostosis. Resequencing of the coding regions, splice junction sites, and 5′ and 3′ untranslated regions of 27 candidate genes in 186 cases of isolated nonsyndromic single suture synostosis revealed three novel and two rare sequence variants (R406H, R595H, N857S, P190S, M446V) in insulin-like growth factor I receptor (IGF1R) that are enriched relative to control samples. Mapping the resultant amino acid changes to the modeled homodimer protein structure suggests a structural basis for segregation between these and other disease-associated mutations found in IGF1R. These data suggest that IGF1R mutations may contribute to the risk and in some cases cause single suture craniosynostosis.
doi:10.1002/ajmg.a.33781
PMCID: PMC3059230  PMID: 21204214
craniosynostosis; IGF1R; non-syndromic; isolated; simple; sagittal; coronal; metopic; resequencing; non-synonymous SNP
11.  Disease Risk of Missense Mutations Using Structural Inference from Predicted Function 
Current protein & peptide science  2010;11(7):573-588.
Advancements in sequencing techniques place personalized genomic medicine upon the horizon, bringing along the responsibility of clinicians to understand the likelihood for a mutation to cause disease, and of scientists to separate etiology from nonpathologic variability. Pathogenicity is discernable from patterns of interactions between a missense mutation, the surrounding protein structure, and intermolecular interactions. Physicochemical stability calculations are not accessible without structures, as is the case for the vast majority of human proteins, so diagnostic accuracy remains in infancy. To model the effects of missense mutations on functional stability without structure, we combine novel protein sequence analysis algorithms to discern spatial distributions of sequence, evolutionary, and physicochemical conservation, through a new approach to optimize component selection. Novel components include a combinatory substitution matrix and two heuristic algorithms that detect positions which confer structural support to interaction interfaces. The method reaches 0.91 AUC in ten-fold cross-validation to predict alteration of function for 6,392 in vitro mutations. For clinical utility we trained the method on 7,022 disease associated missense mutations within the Online Mendelian inheritance in man amongst a larger randomized set. In a blinded prospective test to delineate mutations unique to 186 patients with craniosynostosis from those in the 95 highly variant Coriell controls and 1000 age matched controls, we achieved roughly 1/3 sensitivity and perfect specificity. The component algorithms retained during machine learning constitute novel protein sequence analysis techniques to describe environments supporting neutrality or pathology of mutations. This approach to pathogenetics enables new insight into the mechanistic relationship of missense mutations to disease phenotypes in our patients.
PMCID: PMC3095817  PMID: 20887259
Computational biology; protein stability; machine learning; missense mutation; nonsynonymous SNP; sequence analysis
12.  A protein sequence meta-functional signature for calcium binding residue prediction 
Pattern recognition letters  2010;31(14):2103-2112.
The diversity of characterized protein functions found amongst experimentally interrogated proteins suggests that a vast array of unknown functions remains undiscovered. These protein functions are imparted by specific geometric distributions of amino acid residue chemical moieties, each contributing a functional interaction. We hypothesize that individual residue function contributions are predictable through sequence analytic knowledge based algorithms, and that they can be recombined to understand composite protein function by predicting spatial relation in tertiary structure. We assess the former by training a meta-functional signature algorithm to specifically predict calcium ion binding residues from protein sequence. We estimate the latter by testing for match between predictive contribution of positions in predicted secondary structures and patterns of side chain proximity forced by secondary structure moieties. Specific training for calcium binding results in 83% area under the receiver operator characteristic curve added value over random (AUCoR) and p<10−300 significance as measured by Kendall’s τ in ten fold cross validation for parallel sets of 811 residues in 336 proteins and 696 residues in 299 proteins. Training for generalized function results in 63% AUCoR and p≅10−221 for the same tests. Including inference of side chain proximity improves predictive ability by 2% AUCoR consistently. The results demonstrate that protein meta-functional signatures can be trained to predict specific protein functions by considering amino acid identity and structural features accessible from sequence, laying the groundwork for composite sequence based function site prediction.
doi:10.1016/j.patrec.2010.04.012
PMCID: PMC2932634  PMID: 20824111
Protein sequence analysis; Protein function prediction; Calcium; Protein binding site; Functional signature
13.  LoCo: a novel main chain scoring function for protein structure prediction based on local coordinates 
BMC Bioinformatics  2011;12:368.
Background
Successful protein structure prediction requires accurate low-resolution scoring functions so that protein main chain conformations that are close to the native can be identified. Once that is accomplished, a more detailed and time-consuming treatment to produce all-atom models can be undertaken. The earliest low-resolution scoring used simple distance-based "contact potentials," but more recently, the relative orientations of interacting amino acids have been taken into account to improve performance.
Results
We developed a new knowledge-based scoring function, LoCo, that locates the interaction partners of each individual residue within a local coordinate system based only on the position of its main chain N, Cα and C atoms. LoCo was trained on a large set of experimentally determined structures and optimized using standard sets of modeled structures, or "decoys." No structure used to train or optimize the function was included among those used to test it. When tested against 29 other published main chain functions on a group of 77 commonly used decoy sets, our function outperformed all others in Cα RMSD rank of the best-scoring decoy, with statistically significant p-values < 0.05 for 26 out of the 29 other functions considered. LoCo is fast, requiring on average less than 6 microseconds per residue for interaction and scoring on commonly-used computer hardware.
Conclusions
Our function demonstrates an unmatched combination of accuracy, speed, and simplicity and shows excellent promise for protein structure prediction. Broader applications may include protein-protein interactions and protein design.
doi:10.1186/1471-2105-12-368
PMCID: PMC3184297  PMID: 21920038
14.  Computational Prediction of Type III and IV Secreted Effectors in Gram-Negative Bacteria ▿  
Infection and Immunity  2010;79(1):23-32.
In this review, we provide an overview of the methods employed in four recent studies that described novel methods for computational prediction of secreted effectors from type III and IV secretion systems in Gram-negative bacteria. We present the results of these studies in terms of performance at accurately predicting secreted effectors and similarities found between secretion signals that may reflect biologically relevant features for recognition. We discuss the Web-based tools for secreted effector prediction described in these studies and announce the availability of our tool, the SIEVE server (http://www.sysbep.org/sieve). Finally, we assess the accuracies of the three type III effector prediction methods on a small set of proteins not known prior to the development of these tools that we recently discovered and validated using both experimental and computational approaches. Our comparison shows that all methods use similar approaches and, in general, arrive at similar conclusions. We discuss the possibility of an order-dependent motif in the secretion signal, which was a point of disagreement in the studies. Our results show that there may be classes of effectors in which the signal has a loosely defined motif and others in which secretion is dependent only on compositional biases. Computational prediction of secreted effectors from protein sequences represents an important step toward better understanding the interaction between pathogens and hosts.
doi:10.1128/IAI.00537-10
PMCID: PMC3019878  PMID: 20974833
15.  GPU-Q-J, a fast method for calculating root mean square deviation (RMSD) after optimal superposition 
BMC Research Notes  2011;4:97.
Background
Calculation of the root mean square deviation (RMSD) between the atomic coordinates of two optimally superposed structures is a basic component of structural comparison techniques. We describe a quaternion based method, GPU-Q-J, that is stable with single precision calculations and suitable for graphics processor units (GPUs). The application was implemented on an ATI 4770 graphics card in C/C++ and Brook+ in Linux where it was 260 to 760 times faster than existing unoptimized CPU methods. Source code is available from the Compbio website http://software.compbio.washington.edu/misc/downloads/st_gpu_fit/ or from the author LHH.
Findings
The Nutritious Rice for the World Project (NRW) on World Community Grid predicted de novo, the structures of over 62,000 small proteins and protein domains returning a total of 10 billion candidate structures. Clustering ensembles of structures on this scale requires calculation of large similarity matrices consisting of RMSDs between each pair of structures in the set. As a real-world test, we calculated the matrices for 6 different ensembles from NRW. The GPU method was 260 times faster that the fastest existing CPU based method and over 500 times faster than the method that had been previously used.
Conclusions
GPU-Q-J is a significant advance over previous CPU methods. It relieves a major bottleneck in the clustering of large numbers of structures for NRW. It also has applications in structure comparison methods that involve multiple superposition and RMSD determination steps, particularly when such methods are applied on a proteome and genome wide scale.
doi:10.1186/1756-0500-4-97
PMCID: PMC3087690  PMID: 21453553
16.  Caries induced cytokine network in the odontoblast layer of human teeth 
BMC Immunology  2011;12:9.
Background
Immunologic responses of the tooth to caries begin with odontoblasts recognizing carious bacteria. Inflammatory propagation eventually leads to tooth pulp necrosis and danger to health. The present study aims to determine cytokine gene expression profiles generated within human teeth in response to dental caries in vivo and to build a mechanistic model of these responses and the downstream signaling network.
Results
We demonstrate profound differential up-regulation of inflammatory genes in the odontoblast layer (ODL) in human teeth with caries in vivo, while the pulp remains largely unchanged. Interleukins, chemokines, and all tested receptors thereof were differentially up-regulated in ODL of carious teeth, well over one hundred-fold for 35 of 84 genes. By interrogating reconstructed protein interaction networks corresponding to the differentially up-regulated genes, we develop the hypothesis that pro-inflammatory cytokines highly expressed in ODL of carious teeth, IL-1β, IL-1α, and TNF-α, carry the converged inflammatory signal. We show that IL1β amplifies antimicrobial peptide production in odontoblasts in vitro 100-fold more than lipopolysaccharide, in a manner matching subsequent in vivo measurements.
Conclusions
Our data suggest that ODL amplifies bacterial signals dramatically by self-feedback cytokine-chemokine signal-receptor cycling, and signal convergence through IL1R1 and possibly others, to increase defensive capacity including antimicrobial peptide production to protect the tooth and contain the battle against carious bacteria within the dentin.
doi:10.1186/1471-2172-12-9
PMCID: PMC3036664  PMID: 21261944
17.  Diversity of protein structures and difficulties in fold recognition: the curious case of protein G 
We examine the ability of current state-of-the-art methods in protein structure prediction to discriminate topologically distant folds encoded by highly similar (>90% sequence identity) designed proteins in blind protein structure prediction experiments. We detail the corresponding prognosis for the protein fold recognition field and highlight the features of the methodologies that successfully deciphered this folding riddle.
doi:10.3410/B1-69
PMCID: PMC2832337  PMID: 20209018
18.  A Generalized Knowledge-Based Discriminatory Function for Biomolecular Interactions 
Proteins  2009;76(1):115-128.
Several novel and established knowledge-based discriminatory function formulations and reference state derivations have been evaluated to identify parameter sets capable of distinguishing native and near-native biomolecular interactions from incorrect ones. We developed the r·m·r function, a novel atomic level radial distribution function with mean reference state that averages over all pairwise atom types from a reduced atom type composition, using experimentally determined intermolecular complexes in the Cambridge Structural Database (CSD) and the Protein Data Bank (PDB) as the information sources. We demonstrate that r·m·r had the best discriminatory accuracy and power for protein-small molecule and protein-DNA interactions, regardless of whether the native complex was included or excluded from the test set. The superior performance of the r·m·r discriminatory function compared to seventeen alternative functions evaluated on publicly available test sets for protein-small molecule and protein-DNA interactions indicated that the function was not over optimized through back testing on a single class of biomolecular interactions. The initial success of the reduced composition and superior performance with the CSD as the distribution set over the PDB implies that further improvements and generality of the function are possible by deriving probabilities from subsets of the CSD, using structures that consist of only the atom types to be considered for given biomolecular interactions. The method is available as a web server module at http://protinfo.compbio.washington.edu.
doi:10.1002/prot.22323
PMCID: PMC2891153  PMID: 19127590
discriminatory function; knowledge-based; protein-small molecule; protein-DNA; protein-ligand; complexes; biomolecular interactions
19.  Structural Optimization and De Novo Design of Dengue Virus Entry Inhibitory Peptides 
Viral fusogenic envelope proteins are important targets for the development of inhibitors of viral entry. We report an approach for the computational design of peptide inhibitors of the dengue 2 virus (DENV-2) envelope (E) protein using high-resolution structural data from a pre-entry dimeric form of the protein. By using predictive strategies together with computational optimization of binding “pseudoenergies”, we were able to design multiple peptide sequences that showed low micromolar viral entry inhibitory activity. The two most active peptides, DN57opt and 1OAN1, were designed to displace regions in the domain II hinge, and the first domain I/domain II beta sheet connection, respectively, and show fifty percent inhibitory concentrations of 8 and 7 µM respectively in a focus forming unit assay. The antiviral peptides were shown to interfere with virus:cell binding, interact directly with the E proteins and also cause changes to the viral surface using biolayer interferometry and cryo-electron microscopy, respectively. These peptides may be useful for characterization of intermediate states in the membrane fusion process, investigation of DENV receptor molecules, and as lead compounds for drug discovery.
Author Summary
Virus surface proteins mediate interactions with target cells during the initial events in the process of infection. Inhibiting these proteins is therefore a major target for the development of antiviral drugs. However, there are a very large number of different viruses, each with their own distinct surface proteins and, with just a few exceptions, it is not clear how to build novel molecules to inhibit them. Here we applied a computational binding optimization strategy to an atomic resolution structure of dengue virus serotype 2 envelope protein to generate peptide sequences that should interact strongly with this protein. We picked dengue virus as a target because it is the causative agent for the most important mosquito transmitted viral disease. Out of a small number of candidates designed and tested, we identified two different highly inhibitory peptides. To verify our results, we showed that these peptides block virus:cell binding, interfere with a step during viral entry, alter the surface structure of dengue viral particles, and that they interact directly with dengue virus envelope protein. We expect that our approach may be generally applicable to other viral surface proteins where a high resolution structure is available.
doi:10.1371/journal.pntd.0000721
PMCID: PMC2889824  PMID: 20582308
20.  A novel method for predicting and using distance constraints of high accuracy for refining protein structure prediction 
Proteins  2009;77(1):220-234.
The principal bottleneck in protein structure prediction is the refinement of models from lower accuracies to the resolution observed by experiment. We developed a novel constraints-based refinement method that identifies a high number of accurate input constraints from initial models and rebuilds them using restrained torsion angle dynamics (rTAD). We previously created a Bayesian statistics-based residue-specific all-atom probability discriminatory function (RAPDF) to discriminate native-like models by measuring the probability of accuracy for atom type distances within a given model. Here, we exploit RAPDF to score (i.e., filter) constraints from initial predictions that may or may not be close to a native-like state, obtain consensus of top scoring constraints amongst five initial models, and compile sets with no redundant residue pair constraints. We find that this method consistently produces a large and highly accurate set of distance constraints from which to build refinement models. We further optimize the balance between accuracy and coverage of constraints by producing multiple structure sets using different constraint distance cutoffs, and note that the cutoff governs spatially near versus distant effects in model generation. This complete procedure of deriving distance constraints for rTAD simulations improves the quality of initial predictions significantly in all cases evaluated by us. Our procedure represents a significant step in solving the protein structure prediction and refinement problem, by enabling the use of consensus constraints, RAPDF, and rTAD for protein structure modeling and refinement.
doi:10.1002/prot.22434
PMCID: PMC2874729  PMID: 19422061
protein structure prediction; refinement; knowledge-based functions
21.  The evolution and functional repertoire of translation proteins following the origin of life 
Biology Direct  2010;5:15.
Background
The RNA world hypothesis posits that the earliest genetic system consisted of informational RNA molecules that directed the synthesis of modestly functional RNA molecules. Further evidence suggests that it was within this RNA-based genetic system that life developed the ability to synthesize proteins by translating genetic code. Here we investigate the early development of the translation system through an evolutionary survey of protein architectures associated with modern translation.
Results
Our analysis reveals a structural expansion of translation proteins immediately following the RNA world and well before the establishment of the DNA genome. Subsequent functional annotation shows that representatives of the ten most ancestral protein architectures are responsible for all of the core protein functions found in modern translation.
Conclusions
We propose that this early robust translation system evolved by virtue of a positive feedback cycle in which the system was able to create increasingly complex proteins to further enhance its own function.
Reviewers
This article was reviewed by Janet Siefert, George Fox, and Antonio Lazcano (nominated by Laura Landweber)
doi:10.1186/1745-6150-5-15
PMCID: PMC2873265  PMID: 20377891
22.  Diversity of protein structures and difficulties in fold recognition: the curious case of protein G 
We examine the ability of current state-of-the-art methods in protein structure prediction to discriminate topologically distant folds encoded by highly similar (>90% sequence identity) designed proteins in blind protein structure prediction experiments. We detail the corresponding prognosis for the protein fold recognition field and highlight the features of the methodologies that successfully deciphered this folding riddle.
doi:10.3410/B1-69
PMCID: PMC2832337  PMID: 20209018
23.  Comprehensive computational analysis of Hmd enzymes and paralogs in methanogenic Archaea 
Background
Methanogenesis is the sole means of energy production in methanogenic Archaea. H2-forming methylenetetrahydromethanopterin dehydrogenase (Hmd) catalyzes a step in the hydrogenotrophic methanogenesis pathway in class I methanogens. At least one hmd paralog has been identified in nine of the eleven complete genome sequences of class I hydrogenotrophic methanogens. The products of these paralog genes have thus far eluded any detailed functional characterization.
Results
Here we present a thorough computational analysis of Hmd enzymes and paralogs that includes state of the art phylogenetic inference, structure prediction, and functional site prediction techniques. We determine that the Hmd enzymes are phylogenetically distinct from Hmd paralogs but share a common overall structure. We predict that the active site of the Hmd enzyme is conserved as a functional site in Hmd paralogs and use this observation to propose possible molecular functions of the paralog that are consistent with previous experimental evidence. We also identify an uncharacterized site in the N-terminal domains of both proteins that is predicted by our methods to directly impart function.
Conclusion
This study contributes to our understanding of the evolutionary history, structural conservation, and functional roles, of the Hmd enzymes and paralogs. The results of our phylogenetic and structural analysis constitute datasets that will aid in the future study of the Hmd protein family. Our functional site predictions generate several testable hypotheses that will guide further experimental characterization of the Hmd paralog. This work also represents a novel approach to protein function prediction in which multiple computational methods are integrated to achieve a detailed characterization of proteins that are not well understood.
doi:10.1186/1471-2148-9-199
PMCID: PMC2739858  PMID: 19671178
24.  Protinfo PPC: A web server for atomic level prediction of protein complexes 
Nucleic Acids Research  2009;37(Web Server issue):W519-W525.
‘Protinfo PPC’ (Prediction of Protein Complex) is a web server that predicts atomic level structures of interacting proteins from their amino-acid sequences. It uses the interolog method to search for experimental protein complex structures that are homologous to the input sequences submitted by a user. These structures are then used as starting templates to generate protein complex models, which are returned to the user in Protein Data Bank format via email. The server supports modeling of both homo and hetero multimers and generally produces full atomic level models (including insertion/deletion regions) of protein complexes as long as at least one putative homologous template for the query sequences is found. The modeling pipeline behind Protinfo PPC has been rigorously benchmarked and proven to produce highly accurate protein complex models. The fully automated all atom comparative modeling service for protein complexes provided by Protinfo PPC server offers wide capabilities ranging from prediction of protein complex interactions to identification of possible interaction sites, which will be useful for researchers studying these topics. The Protinfo PPC web server is available at http://protinfo.compbio.washington.edu/ppc/
doi:10.1093/nar/gkp306
PMCID: PMC2703994  PMID: 19420059
25.  Accurate Prediction of Secreted Substrates and Identification of a Conserved Putative Secretion Signal for Type III Secretion Systems 
PLoS Pathogens  2009;5(4):e1000375.
The type III secretion system is an essential component for virulence in many Gram-negative bacteria. Though components of the secretion system apparatus are conserved, its substrates—effector proteins—are not. We have used a novel computational approach to confidently identify new secreted effectors by integrating protein sequence-based features, including evolutionary measures such as the pattern of homologs in a range of other organisms, G+C content, amino acid composition, and the N-terminal 30 residues of the protein sequence. The method was trained on known effectors from the plant pathogen Pseudomonas syringae and validated on a set of effectors from the animal pathogen Salmonella enterica serovar Typhimurium (S. Typhimurium) after eliminating effectors with detectable sequence similarity. We show that this approach can predict known secreted effectors with high specificity and sensitivity. Furthermore, by considering a large set of effectors from multiple organisms, we computationally identify a common putative secretion signal in the N-terminal 20 residues of secreted effectors. This signal can be used to discriminate 46 out of 68 total known effectors from both organisms, suggesting that it is a real, shared signal applicable to many type III secreted effectors. We use the method to make novel predictions of secreted effectors in S. Typhimurium, some of which have been experimentally validated. We also apply the method to predict secreted effectors in the genetically intractable human pathogen Chlamydia trachomatis, identifying the majority of known secreted proteins in addition to providing a number of novel predictions. This approach provides a new way to identify secreted effectors in a broad range of pathogenic bacteria for further experimental characterization and provides insight into the nature of the type III secretion signal.
Author Summary
Pathogenic bacteria release a number of different proteins that function to interfere with host defenses and allow bacterial invasion, persistence, and replication in the host. In many bacterial pathogens, the type III secretion system is used to inject these virulence factors directly to the cytoplasm of the host cell. The secreted proteins do not have well-conserved sequences and do not have any kind of common identifiable signal sequence to target them for secretion. This makes it very difficult to identify secreted proteins of this kind without experimental investigation, as can be done in other secretion systems. In this study, we develop a computational approach to detect secreted virulence factors from genomic protein sequences. We use this method to compare the N-terminal regions of proteins from S. Typhimurium and a plant pathogen, P. syringae, and show that this approach is the most effective method of computational identification of type III secreted proteins to date. We further use this approach to identify a sequence pattern in these proteins that presumably helps direct virulence proteins to the type III secretion apparatus. We provide novel predictions of secreted proteins in these two organisms, as well as in the human pathogen C. trachomatis. Better understanding of secreted virulence factors in pathogens will lead to new ways of combating important infectious diseases and provide understanding of the complex interaction between pathogen and host.
doi:10.1371/journal.ppat.1000375
PMCID: PMC2668754  PMID: 19390620

Results 1-25 (44)