Bacteriophages encode endolysins to lyse their host cell and allow escape of their progeny. Endolysins are also active against Gram-positive bacteria when applied from the outside and are thus attractive anti-bacterial agents. LysK, an endolysin from staphylococcal phage K, contains an N-terminal cysteine-histidine dependent amido-hydrolase/peptidase domain (CHAPK), a central amidase domain and a C-terminal SH3b cell wall-binding domain. CHAPK cleaves bacterial peptidoglycan between the tetra-peptide stem and the penta-glycine bridge.
The CHAPK domain of LysK was crystallized and high-resolution diffraction data was collected both from a native protein crystal and a methylmercury chloride derivatized crystal. The anomalous signal contained in the derivative data allowed the location of heavy atom sites and phase determination. The resulting structures were completed, refined and analyzed. The presence of calcium and zinc ions in the structure was confirmed by X-ray fluorescence emission spectroscopy. Zymogram analysis was performed on the enzyme and selected site-directed mutants.
The structure of CHAPK revealed a papain-like topology with a hydrophobic cleft, where the catalytic triad is located. Ordered buffer molecules present in this groove may mimic the peptidoglycan substrate. When compared to previously solved CHAP domains, CHAPK contains an additional lobe in its N-terminal domain, with a structural calcium ion, coordinated by residues Asp45, Asp47, Tyr49, His51 and Asp56. The presence of a zinc ion in the active site was also apparent, coordinated by the catalytic residue Cys54 and a possible substrate analogue. Site-directed mutagenesis was used to demonstrate that residues involved in calcium binding and of the proposed active site were important for enzyme activity.
The high-resolution structure of the CHAPK domain of LysK was determined, suggesting the location of the active site, the substrate-binding groove and revealing the presence of a structurally important calcium ion. A zinc ion was found more loosely bound. Based on the structure, we propose a possible reaction mechanism. Future studies will be aimed at co-crystallizing CHAPK with substrate analogues and elucidating its role in the complete LysK protein. This, in turn, may lead to the design of site-directed mutants with altered activity or substrate specificity.
Bacteriophage; Calcium; Crystallography; Endolysin; Peptidoglycan; Protease; Staphylococcus; Zinc
The endolysin LysK derived from staphylococcal phage K has previously been shown to have two enzymatic domains, one of which is an N-acetylmuramoyl-L-alanine amidase and the other a cysteine/histidine-dependant amidohydrolase/peptidase designated CHAPk. The latter, when cloned as a single-domain truncated enzyme, is conveniently overexpressed in a highly-soluble form. This enzyme was shown to be highly active in vitro against live cell suspensions of S. aureus. In the current study, the IVIS imaging system was used to demonstrate the effective elimination of a lux labeled S. aureus from the nares of BALB/c mice.
Staphylococcus; decolonization; lysin; bacteriophage; nasal
New antibacterial agents are urgently needed for the elimination of biofilm-forming bacteria that are highly resistant to traditional antimicrobial agents. Proliferation of such bacteria can lead to significant economic losses in the agri-food sector. This study demonstrates the potential of the bacteriophage-derived peptidase, CHAPK, as a biocidal agent for the rapid disruption of biofilm-forming staphylococci, commonly associated with bovine mastitis. Purified CHAPK applied to biofilms of Staphylococcus aureus DPC5246 completely eliminated the staphylococcal biofilms within 4 h. In addition, CHAPK was able to prevent biofilm formation by this strain. The CHAPK lysin also reduced S. aureus in a skin decolonization model. Our data demonstrates the potential of CHAPK as a biocidal agent for prevention and treatment of biofilm-associated staphylococcal infections or as a decontaminating agent in the food and healthcare sectors.
Genome sequencing projects have ciphered millions of protein sequence, which require knowledge of their structure and function to improve the understanding of their biological role. Although experimental methods can provide detailed information for a small fraction of these proteins, computational modeling is needed for the majority of protein molecules which are experimentally uncharacterized. The I-TASSER server is an on-line workbench for high-resolution modeling of protein structure and function. Given a protein sequence, a typical output from the I-TASSER server includes secondary structure prediction, predicted solvent accessibility of each residue, homologous template proteins detected by threading and structure alignments, up to five full-length tertiary structural models, and structure-based functional annotations for enzyme classification, Gene Ontology terms and protein-ligand binding sites. All the predictions are tagged with a confidence score which tells how accurate the predictions are without knowing the experimental data. To facilitate the special requests of end users, the server provides channels to accept user-specified inter-residue distance and contact maps to interactively change the I-TASSER modeling; it also allows users to specify any proteins as template, or to exclude any template proteins during the structure assembly simulations. The structural information could be collected by the users based on experimental evidences or biological insights with the purpose of improving the quality of I-TASSER predictions. The server was evaluated as the best programs for protein structure and function predictions in the recent community-wide CASP experiments. There are currently >20,000 registered scientists from over 100 countries who are using the on-line I-TASSER server.
Predicting 3-dimensional protein structures from amino-acid sequences is an important unsolved problem in computational structural biology. The problem becomes relatively easier if close homologous proteins have been solved, as high-resolution models can be built by aligning target sequences to the solved homologous structures. However, for sequences without similar folds in the Protein Data Bank (PDB) library, the models have to be predicted from scratch. Progress in the ab initio structure modeling is slow. The aim of this study was to extend the TASSER (threading/assembly/refinement) method for the ab initio modeling and examine systemically its ability to fold small single-domain proteins.
We developed I-TASSER by iteratively implementing the TASSER method, which is used in the folding test of three benchmarks of small proteins. First, data on 16 small proteins (< 90 residues) were used to generate I-TASSER models, which had an average Cα-root mean square deviation (RMSD) of 3.8Å, with 6 of them having a Cα-RMSD < 2.5Å. The overall result was comparable with the all-atomic ROSETTA simulation, but the central processing unit (CPU) time by I-TASSER was much shorter (150 CPU days vs. 5 CPU hours). Second, data on 20 small proteins (< 120 residues) were used. I-TASSER folded four of them with a Cα-RMSD < 2.5Å. The average Cα-RMSD of the I-TASSER models was 3.9Å, whereas it was 5.9Å using TOUCHSTONE-II software. Finally, 20 non-homologous small proteins (< 120 residues) were taken from the PDB library. An average Cα-RMSD of 3.9Å was obtained for the third benchmark, with seven cases having a Cα-RMSD < 2.5Å.
Our simulation results show that I-TASSER can consistently predict the correct folds and sometimes high-resolution models for small single-domain proteins. Compared with other ab initio modeling methods such as ROSETTA and TOUCHSTONE II, the average performance of I-TASSER is either much better or is similar within a lower computational time. These data, together with the significant performance of automated I-TASSER server (the Zhang-Server) in the 'free modeling' section of the recent Critical Assessment of Structure Prediction (CASP)7 experiment, demonstrate new progresses in automated ab initio model generation. The I-TASSER server is freely available for academic users .
Template-based protein structure modeling is commonly used for protein structure prediction. Based on the observation that multiple template-based methods often perform better than single template-based methods, we further explore the use of a variable number of multiple templates for a given target in the latest variant of TASSER, TASSERVMT. We first develop an algorithm that improves the target-template alignment for a given template. The improved alignment, called the SP3 alternative alignment, is generated by a parametric alignment method coupled with short TASSER refinement on models selected using knowledge-based scores. The refined top model is then structurally aligned to the template to produce the SP3 alternative alignment. Templates identified using SP3 threading are combined with the SP3 alternative and HHEARCH alignments to provide target alignments to each template. These template models are then grouped into sets containing a variable number of template/alignment combinations. For each set, we run short TASSER simulations to build full-length models. Then, the models from all sets of templates are pooled, and the top 20–50 models selected using FTCOM ranking method. These models are then subjected to a single longer TASSER refinement run for final prediction. We benchmarked our method by comparison with our previously developed approach, pro-sp3-TASSER, on a set with 874 Easy and 318 Hard targets. The average GDT-TS score improvements for the first model are 3.5% and 4.3% for Easy and Hard targets, respectively. When tested on the 112 CASP9 targets, our method improves the average GDT-TS scores as compared to pro-sp3-TASSER by 8.2% and 9.3% for the 80 Easy and 32 Hard targets, respectively. It also shows slightly better results than the top ranked CASP9 Zhang-Server, QUARK and HHpredA methods. The program is available for download at http://cssb.biology.gatech.edu/.
template-based modeling; threading; alignment; SP3; TASSER
The I-TASSER algorithm for protein 3D structure prediction was tested in CASP8, with the procedure fully automated in both the Server and Human sections. The quality of the server models is close to that of human ones but incorporating more diverse templates from other servers improves the results of human predictions in the distant homology category. For the first time, the sequence-based contact predictions from machine learning techniques are found helpful for both template-based modeling (TBM) and template-free modeling (FM). In TBM, although the average accuracy of the sequence-based contact predictions is lower than that from template-based ones, the novel contacts in the sequence-based predictions, which are complementary to the threading templates in the weakly or unaligned regions, are important to improve the global and local packing of these regions. Moreover, the newly developed atomic structural refinement algorithm was tested in CASP8 and found to improve the hydrogen-bonding networks and the overall TM-score, which is mainly due to its ability of removing steric clashes so that the models can be generated from cluster centroids. Nevertheless, one of the major issues of the I-TASSER pipeline is the model selection where the best models could not be appropriately recognized when the correct templates are detected only by the minority of the threading algorithms. There are also problems related with domain-splitting and mirror image recognition which mainly influences the performance of I-TASSER modeling in the FM-based structure predictions.
Protein structure prediction; threading; I-TASSER; CASP8; contact prediction; free modeling
Prediction of 3-dimensional protein structures from amino acid sequences represents one of the most important problems in computational structural biology. The community-wide Critical Assessment of Structure Prediction (CASP) experiments have been designed to obtain an objective assessment of the state-of-the-art of the field, where I-TASSER was ranked as the best method in the server section of the recent 7th CASP experiment. Our laboratory has since then received numerous requests about the public availability of the I-TASSER algorithm and the usage of the I-TASSER predictions.
An on-line version of I-TASSER is developed at the KU Center for Bioinformatics which has generated protein structure predictions for thousands of modeling requests from more than 35 countries. A scoring function (C-score) based on the relative clustering structural density and the consensus significance score of multiple threading templates is introduced to estimate the accuracy of the I-TASSER predictions. A large-scale benchmark test demonstrates a strong correlation between the C-score and the TM-score (a structural similarity measurement with values in [0, 1]) of the first models with a correlation coefficient of 0.91. Using a C-score cutoff > -1.5 for the models of correct topology, both false positive and false negative rates are below 0.1. Combining C-score and protein length, the accuracy of the I-TASSER models can be predicted with an average error of 0.08 for TM-score and 2 Å for RMSD.
The I-TASSER server has been developed to generate automated full-length 3D protein structural predictions where the benchmarked scoring system helps users to obtain quantitative assessments of the I-TASSER models. The output of the I-TASSER server for each query includes up to five full-length models, the confidence score, the estimated TM-score and RMSD, and the standard deviation of the estimations. The I-TASSER server is freely available to the academic community at .
We developed BSP-SLIM, a new method for ligand-protein blind docking using low-resolution protein structures. For a given sequence, protein structures are first predicted by I-TASSER; putative ligand binding sites are transferred from holo-template structures which are analogous to the I-TASSER models; ligand-protein docking conformations are then constructed by shape and chemical match of ligand with the negative image of binding pockets. BSP-SLIM was tested on 71 ligand-protein complexes from the Astex diverse set where the protein structures were predicted by I-TASSER with an average RMSD 2.92 Å on the binding residues. Using I-TASSER models, the median ligand RMSD of BSP-SLIM docking is 3.99 Å which is 5.94 Å lower than that by AutoDock; the median binding-site error by BSP-SLIM is 1.77 Å which is 6.23 Å lower than that by AutoDock and 3.43 Å lower than that by LIGSITECSC. Compared to the models using crystal protein structures, the median ligand RMSD by BSP-SLIM using I-TASSER models increases by 0.87 Å, while that by AutoDock increases by 8.41 Å; the median binding-site error by BSP-SLIM increase by 0.69 Å while that by AutoDock and LIGSITECSC increases by 7.31 Å and 1.41 Å, respectively. As case studies, BSP-SLIM was used in virtual screening for six target proteins, which prioritized actives of 25% and 50% in the top 9.2% and 17% of the library on average, respectively. These results demonstrate the usefulness of the template-based coarse-grained algorithms in the low-resolution ligand-protein docking and drug-screening. An on-line BSP-SLIM server is freely available at http://zhanglab.ccmb.med.umich.edu/BSP-SLIM.
Blind ligand-protein docking; protein structure prediction; low-resolution docking
Thioredoxin reductase 1 (TXNRD1) and thioredoxin interacting protein (TXNIP) also known as thioredoxin binding protein 2 or vitamin D3-upregulated protein 1 are key players in oxidative stress control. Thioredoxin (TRX) is one of the major components of the thiol reducing system and plays multiple roles in cellular processes. Computational analyses of TXNRD1, TXNIP and TRX expressions have not been analyzed in relation to prognosis of breast cancer. High expression of TXNRD1 and low expression of TXNIP are associated with worst prognosis in breast cancer.
Using bioinformatics applications we studied sequence analysis, molecular modeling, template and fold recognition, docking and scoring of thioredoxin as a target.
The resultant model obtained was validated based on the templates from I-TASSER server and binding site residues were predicted. The predicted model was used for Threading and Fold recognition and was optimized using GROMACS. The generated model was validated using programs such as Procheck, Ramachandran plot, verify-3d and Errat value from Saves server, and the results show that the model is reliable. Next we obtained small molecules from pubchem and chembank which are databases for selecting suitable ligands for our modeled target. These molecules were screened for docking, using GOLD and scoring was obtained using Chemscore as a scoring function.
This study predicted the ligand interaction of four molecules with the minimized protein modeled structure and the best ligand with top scores from about 500 molecules screened. These were 3-hydroxy-2,3-diphenylbutanoic acid, 4-amino-3-pentadecylphenol, 3-(hydroxyimino)-2,4-diphenylbutanenitrile and 2-ethyl-1,2-diphenylbutyl carbamate, which are proposed as possible hit molecules for the drug discovery and development process.
Breast cancer; Chemotherapy; Sequence analysis; Thioredoxins
We develop and test a new pipeline in CASP10 to predict protein structures based on an interplay of I-TASSER and QUARK for both free-modeling (FM) and template-based modeling (TBM) targets. The most noteworthy observation is that sorting through the threading template pool using the QUARK-based ab initio models as probes allows the detection of distant-homology templates which might be ignored by the traditional sequence profile-based threading alignment algorithms. Further template assembly refinement by I-TASSER resulted in successful folding of two medium-sized FM targets with >150 residues. For TBM, the multiple threading alignments from LOMETS are, for the first time, incorporated into the ab initio QUARK simulations, which were further refined by I-TASSER assembly refinement. Compared with the traditional threading assembly refinement procedures, the inclusion of the threading-constrained ab initio folding models can consistently improve the quality of the full-length models as assessed by the GDT-HA and hydrogen-bonding scores. Despite the success, significant challenges still exist in domain boundary prediction and consistent folding of medium-size proteins (especially beta-proteins) for nonhomologous targets. Further developments of sensitive fold-recognition and ab initio folding methods are critical for solving these problems.
protein structure prediction; CASP10; threading; ab initio folding; I-TASSER; QUARK
Staphylococcus aureus is a food-borne pathogen and the most common cause of infections in hospitalized patients. The increase in the resistance of this pathogen to antibacterials has made necessary the development of new anti-staphylococcal agents. In this context, bacteriophage lytic enzymes such as endolysins and structural peptidoglycan (PG) hydrolases have received considerable attention as possible antimicrobials against gram-positive bacteria.
S. aureus bacteriophage vB_SauS-phiIPLA88 (phiIPLA88) contains a virion-associated muralytic enzyme (HydH5) encoded by orf58, which is located in the morphogenetic module. Comparative bioinformatic analysis revealed that HydH5 significantly resembled other peptidoglycan hydrolases encoded by staphylococcal phages. The protein consists of 634 amino acid residues. Two putative lytic domains were identified: an N-terminal CHAP (cysteine, histidine-dependent amidohydrolase/peptidase) domain (135 amino acid residues), and a C-terminal LYZ2 (lysozyme subfamily 2) domain (147 amino acid residues). These domains were also found when a predicted three-dimensional structure of HydH5 was made which provided the basis for deletion analysis. The complete HydH5 protein and truncated proteins containing only each catalytic domain were overproduced in E. coli and purified from inclusion bodies by subsequent refolding. Truncated and full-length HydH5 proteins were all able to bind and lyse S. aureus Sa9 cells as shown by binding assays, zymogram analyses and CFU reduction analysis. HydH5 demonstrated high antibiotic activity against early exponential cells, at 45°C and in the absence of divalent cations (Ca2+, Mg2+, Mn2+). Thermostability assays showed that HydH5 retained 72% of its activity after 5 min at 100°C.
The virion-associated PG hydrolase HydH5 has lytic activity against S. aureus, which makes it attractive as antimicrobial for food biopreservation and anti-staphylococcal therapy.
I-TASSER is an automated pipeline for protein tertiary structure prediction using multiple threading alignments and iterative structure assembly simulations. In CASP9 experiments, two new algorithms, QUARK and FG-MD, were added to the I-TASSER pipeline for improving the structural modeling accuracy. QUARK is a de novo structure prediction algorithm used for structure modeling of proteins that lack detectable template structures. For distantly homologous targets, QUARK models are found useful as a reference structure for selecting good threading alignments and guiding the I-TASSER structure assembly simulations. FG-MD is an atomic-level structural refinement program that uses structural fragments collected from the PDB structures to guide molecular dynamics simulation and improve the local structure of predicted model, including hydrogen-bonding networks, torsion angles and steric clashes. Despite considerable progress in both the template-based and template-free structure modeling, significant improvements on protein target classification, domain parsing, model selection, and ab initio folding of beta-proteins are still needed to further improve the I-TASSER pipeline.
protein structure prediction; threading; contact prediction; ab initio folding; CASP
Expresso is a multiple sequence alignment server that aligns sequences using structural information. The user only needs to provide sequences. The server runs BLAST to identify close homologues of the sequences within the PDB database. These PDB structures are used as templates to guide the alignment of the original sequences using structure-based sequence alignment methods like SAP or Fugue. The final result is a multiple sequence alignment of the original sequences based on the structural information of the templates. An advanced mode makes it possible to either upload private structures or specify which PDB templates should be used to model each sequence. Providing the suitable structural information is available, Expresso delivers sequence alignments with accuracy comparable with structure-based alignments. The server is available on .
In a variety of threading methods, often poorly ranked (low z-score) templates have good alignments. Here, a new method, TASSER_low-zsc that identifies these low z-score ranked templates to improve protein structure prediction accuracy is described. The approach consists of clustering of threading templates by affinity propagation on the basis of structural similarity (thread_cluster) followed by TASSER modeling, with final models selected using a TASSER_QA variant. To establish generality of the approach, templates provided by two threading methods, SP3 and SPARKS2, are examined. The SP3 and SPARKS2 benchmark datasets consist of 351 and 357 medium/hard proteins (those with moderate to poor quality templates and/or alignments) of length ≤ 250 residues respectively. For SP3 medium and hard targets, using thread_cluster, the TM-scores of the best template improve by ~4% and ~9% over the original set (without low z-score templates) respectively; after TASSER modeling/refinement and ranking, the best model improves by ~7% and ~9% over the best model generated with the original template set. Moreover, TASSER_low-zsc generates 22% (43%) more foldable medium (hard) targets. Similar improvements are observed with low ranked templates from SPARKS2. The template clustering approach could be applied to other modeling methods that utilize multiple templates to improve structure prediction.
Structure prediction; threading; TASSER; tertiary structure
During the past few years a significant rise in aspergillosis caused by filamentous fungus Aspergillus fumigatus has been recorded
particularly in immunocompromised patients. At present, there are limited numbers of antifungal agents to combat these infections
and the situation has become more complex due to emergence of antifungal resistance and side-effects of antifungal drugs. These
situations have increased the demand for novel drug targets. Recent studies have revealed that the β-1,3-endoglucanase (ENGL1)
plays an essential role in cell wall remodeling that is absolutely required during growth and morphogenesis of filamentous fungi
and thus is a promising target for the development of antifungal agents. Unfortunately no structural information of fungal β-
glucanases has yet been available in the Protein Databank (PDB). Therefore in the present study, 3D structure of β-(1,3)-
endoglucanase (ENGL1) was modeled by using I-TASSER server and validated with PROCHECK and VERIFY 3D. The best model
was selected, energy minimized and used to analyze structure function relationship with substrate β-(1,3)-glucan by C-DOCKER
(Accelrys DS 2.0). The results indicated that amino acids (GLU 380, GLN 383, ASP 384, TYR 395, SER 712, and ARG 713) present in
β-1,3-endoglucanase receptor are of core importance for binding activities and these residues are having strong hydrogen bond
interactions with β-(1,3)-glucan. The predicted model and docking studies permits initial inferences about the unexplored 3D
structure of the β-(1,3)-endoglucanase and may be promote in relational designing of molecules for structure-function studies.
Homology modeling; β-(1,3)-endoglucanase; Aspergillus fumigatus; Docking, β-(1,3)-glucan
The I-TASSER server is an integrated platform for automated protein structure and function prediction based on the sequence-to-structure-to-function paradigm. Starting from an amino acid sequence, I-TASSER first generates three-dimensional atomic models from multiple threading alignments and iterative structural assembly simulations. The function of the protein is then inferred by structurally matching the 3D models with other known proteins. The output from a typical server run contains full-length secondary and tertiary structure predictions, and functional annotations on ligand-binding sites, Enzyme Commission numbers and Gene Ontology terms. An estimate of accuracy of the predictions is provided based on the confidence score of the modeling. This protocol provides new insights and guidelines for designing of on-line server systems for the state-of-the-art protein structure and function predictions. The server is available at http://zhang.bioinformatics.ku.edu/I-TASSER.
I-TASSER; protein structure prediction; protein function prediction
Osteopontin (Eta, secreted sialoprotein 1, opn) is secreted from different cell types including cancer cells. Three splice variant forms namely osteopontin-a, osteopontin-b and osteopontin-c have been identified. The main astonishing feature is that osteopontin-c is found to be elevated in almost all types of cancer cells. This was the vital point to consider it for sequence analysis and structure predictions which provide ample chances for prognostic, therapeutic and preventive cancer research.
Osteopontin-c gene sequence was determined from Breast Cancer sample and was translated to protein sequence. It was then analyzed using various software and web tools for binding pockets, docking and druggability analysis. Due to the lack of homological templates, tertiary structure was predicted using ab-initio method server – I-TASSER and was evaluated after refinement using web tools. Refined structure was compared with known bone sialoprotein electron microscopic structure and docked with CD44 for binding analysis and binding pockets were identified for drug designing.
Signal sequence of about sixteen amino acid residues was identified using signal sequence prediction servers. Due to the absence of known structures of similar proteins, three dimensional structure of osteopontin-c was predicted using I-TASSER server. The predicted structure was refined with the help of SUMMA server and was validated using SAVES server. Molecular dynamic analysis was carried out using GROMACS software. The final model was built and was used for docking with CD44. Druggable pockets were identified using pocket energies.
The tertiary structure of osteopontin-c was predicted successfully using the ab-initio method and the predictions showed that osteopontin-c is of fibrous nature comparable to firbronectin. Docking studies showed the significant similarities of QSAET motif in the interaction of CD44 and osteopontins between the normal and splice variant forms of osteopontins and binding pockets analyses revealed several pockets which paved the way to the identification of a druggable pocket.
Methylobacteria are ubiquitous in the biosphere which are capable of growing on C1 compounds such as formate, formaldehyde,
methanol and methylamine as well as on a wide range of multi-carbon growth substrates such as C2, C3 and C4 compounds due to
the methylotrophic enzymes methanol dehydrogenase (MDH). MDH is performing these functions with the help of a key protein
mxaF. Unfortunately, detailed structural analysis and homology modeling of mxaF is remains undefined. Hence, the objective of
this research is the characterization and three dimensional modeling of mxaF protein from three different methylotrophs by using
I-TASSER server. The predicted model were further optimize and validate by Profile 3D, Errat, Verifiy3-D and PROCHECK server.
Predicted and best evaluated models have been successfully deposited to PMDB database with PMDB ID PM0077505, PM0077506
and PM0077507. Active site identification revealed 11, 13 and 14 putative functional site residues in respected models. It may play a
major role during protein-protein, and protein-cofactor interactions. This study can provide us an ab-initio and detail information to
understand the structure, mechanism of action and regulation of mxaF protein.
Methylobacteria; mxaF protein; homology modeling; functional site
Protein tertiary structure prediction is a fundamental problem in computational biology and identifying the most native-like model from a set of predicted models is a key sub-problem. Consensus methods work well when the redundant models in the set are the most native-like, but fail when the most native-like model is unique. In contrast, structure-based methods score models independently and can be applied to model sets of any size and redundancy level. Additionally, structure-based methods have a variety of important applications including analogous fold recognition, refinement of sequence-structure alignments, and de novo prediction. The purpose of this work was to develop a structure-based model selection method based on predicted structural features that could be applied successfully to any set of models.
Here we introduce SELECTpro, a novel structure-based model selection method derived from an energy function comprising physical, statistical, and predicted structural terms. Novel and unique energy terms include predicted secondary structure, predicted solvent accessibility, predicted contact map, β-strand pairing, and side-chain hydrogen bonding.
SELECTpro participated in the new model quality assessment (QA) category in CASP7, submitting predictions for all 95 targets and achieved top results. The average difference in GDT-TS between models ranked first by SELECTpro and the most native-like model was 5.07. This GDT-TS difference was less than 1% of the GDT-TS of the most native-like model for 18 targets, and less than 10% for 66 targets. SELECTpro also ranked the single most native-like first for 15 targets, in the top five for 39 targets, and in the top ten for 53 targets, more often than any other method. Because the ranking metric is skewed by model redundancy and ignores poor models with a better ranking than the most native-like model, the BLUNDER metric is introduced to overcome these limitations. SELECTpro is also evaluated on a recent benchmark set of 16 small proteins with large decoy sets of 12500 to 20000 models for each protein, where it outperforms the benchmarked method (I-TASSER).
SELECTpro is an effective model selection method that scores models independently and is appropriate for use on any model set. SELECTpro is available for download as a stand alone application at: . SELECTpro is also available as a public server at the same site.
Although residue-residue contact maps dictate the topology of proteins, sequence-based ab initio contact predictions have been found little use in actual structure prediction due to the low accuracy. We developed a composite set of nine SVM-based contact predictors which are used in I-TASSER simulation in combination with sparse template contact restraints. When testing the strategy on 273 non-homologous targets, remarkable improvements of I-TASSER models were observed for both easy and hard targets, with P-value by student s t-test below 0.00001 and 0.001, respectively. In several cases, TM-score increases by >30%, which essentially converts “non-foldable” targets into “foldable” ones. In CASP9, I-TASSER employed ab initio contact predictions, and generated models for 26 FM targets with a GDT-score 16% and 44% higher than the second and third best servers from other groups, respectively. These findings demonstrate a new avenue to improve the accuracy of protein structure prediction especially for free-modeling targets.
protein structure prediction; ab initio folding; contact prediction; threading
The 3-D structure of none of the eukaryotic sialyltransferases (SiaTs) has been determined so far. Sequence alignment algorithms such as BLAST and PSI-BLAST could not detect a homolog of these enzymes from the protein databank. SiaTs, thus, belong to the hard/medium target category in the CASP experiments. The objective of the current work is to model the 3-D structures of human SiaTs which transfer the sialic acid in α2,3-linkage viz., ST3Gal I, II, III, IV, V, and VI, using fold-recognition and comparative modeling methods. The pair-wise sequence similarity among these six enzymes ranges from 41 to 63%.
Unlike the sequence similarity servers, fold-recognition servers identified CstII, a α2,3/8 dual-activity SiaT from Campylobacter jejuni as the homolog of all the six ST3Gals; the level of sequence similarity between CstII and ST3Gals is only 15–20% and the similarity is restricted to well-characterized motif regions of ST3Gals. Deriving template-target sequence alignments for the entire ST3Gal sequence was not straightforward: the fold-recognition servers could not find a template for the region preceding the L-motif and that between the L- and S-motifs. Multiple structural templates were identified to model these regions and template identification-modeling-evaluation had to be performed iteratively to choose the most appropriate templates. The modeled structures have acceptable stereochemical properties and are also able to provide qualitative rationalizations for some of the site-directed mutagenesis results reported in literature. Apart from the predicted models, an unexpected but valuable finding from this study is the sequential and structural relatedness of family GT42 and family GT29 SiaTs.
The modeled 3-D structures can be used for docking and other modeling studies and for the rational identification of residues to be mutated to impart desired properties such as altered stability, substrate specificity, etc. Several studies in literature have focused on the development of tools and/or servers for the large-scale/automated modeling of 3-D structures of proteins. In contrast, the present study focuses on modeling the 3-D structure of a specific protein of interest to a biochemist and illustrates the associated difficulties. It is also able to establish a sequence/structure relationship between sialyltransferases of two distinct families.
G protein–coupled receptors (GPCRs), encoded by about 5% of human genes, comprise the largest family of integral membrane proteins and act as cell surface receptors responsible for the transduction of endogenous signal into a cellular response. Although tertiary structural information is crucial for function annotation and drug design, there are few experimentally determined GPCR structures. To address this issue, we employ the recently developed threading assembly refinement (TASSER) method to generate structure predictions for all 907 putative GPCRs in the human genome. Unlike traditional homology modeling approaches, TASSER modeling does not require solved homologous template structures; moreover, it often refines the structures closer to native. These features are essential for the comprehensive modeling of all human GPCRs when close homologous templates are absent. Based on a benchmarked confidence score, approximately 820 predicted models should have the correct folds. The majority of GPCR models share the characteristic seven-transmembrane helix topology, but 45 ORFs are predicted to have different structures. This is due to GPCR fragments that are predominantly from extracellular or intracellular domains as well as database annotation errors. Our preliminary validation includes the automated modeling of bovine rhodopsin, the only solved GPCR in the Protein Data Bank. With homologous templates excluded, the final model built by TASSER has a global Cα root-mean-squared deviation from native of 4.6 Å, with a root-mean-squared deviation in the transmembrane helix region of 2.1 Å. Models of several representative GPCRs are compared with mutagenesis and affinity labeling data, and consistent agreement is demonstrated. Structure clustering of the predicted models shows that GPCRs with similar structures tend to belong to a similar functional class even when their sequences are diverse. These results demonstrate the usefulness and robustness of the in silico models for GPCR functional analysis. All predicted GPCR models are freely available for noncommercial users on our Web site (http://www.bioinformatics.buffalo.edu/GPCR).
G protein–coupled receptors (GPCRs) are a large superfamily of integral membrane proteins that transduce signals across the cell membrane. Because of the breadth and importance of the physiological roles undertaken by the GPCR family, many of its members are important pharmacological targets. Although the knowledge of a protein's native structure can provide important insight into understanding its function and for the design of new drugs, the experimental determination of the three-dimensional structure of GPCR membrane proteins has proved to be very difficult. This is demonstrated by the fact that there is only one solved GPCR structure (from bovine rhodopsin) deposited in the Protein Data Bank library. In contrast, there are no human GPCR structures in the Protein Data Bank. To address the need for the tertiary structures of human GPCRs, using just sequence information, the authors use a newly developed threading-assembly-refinement method to generate models for all 907 registered GPCRs in the human genome. About 820 GPCRs are anticipated to have correct topology and transmembrane helix arrangement. A subset of the resulting models is validated by comparison with mutagenesis experimental data, and consistent agreement is demonstrated.
Aging in the world population has increased every year. Superoxide dismutase
2 (Mn-SOD or SOD2) protects against oxidative stress, a main factor influencing
cellular longevity. Polymorphisms in SOD2 have been associated with the development
of neurodegenerative diseases, such as Alzheimer’s and Parkinson’s
disease, as well as psychiatric disorders, such as schizophrenia, depression
and bipolar disorder. In this study, all of the described natural variants
(S10I, A16V, E66V, G76R, I82T and R156W) of SOD2 were subjected to in
silico analysis using eight different algorithms: SNPeffect, PolyPhen-2,
PhD-SNP, PMUT, SIFT, SNAP, SNPs&GO and nsSNPAnalyzer. This analysis revealed
disparate results for a few of the algorithms. The results showed that, from
at least one algorithm, each amino acid substitution appears to harmfully
affect the protein. Structural theoretical models were created for variants
through comparative modelling performed using the MHOLline server (which includes
MODELLER and PROCHECK) and ab initio modelling, using the
I-Tasser server. The predicted models were evaluated using TM-align, and the
results show that the models were constructed with high accuracy. The RMSD
values of the modelled mutants indicated likely pathogenicity for all missense
mutations. Structural phylogenetic analysis using ConSurf revealed that human
SOD2 is highly conserved. As a result, a human-curated database was generated
that enables biologists and clinicians to explore SOD2 nsSNPs, including predictions
of their effects and visualisation of the alignment of both the wild-type
and mutant structures. The database is freely available at http://bioinfogroup.com/database
and will be regularly updated.
This article introduces a new interface for T-Coffee, a consistency-based multiple sequence alignment program. This interface provides an easy and intuitive access to the most popular functionality of the package. These include the default T-Coffee mode for protein and nucleic acid sequences, the M-Coffee mode that allows combining the output of any other aligners, and template-based modes of T-Coffee that deliver high accuracy alignments while using structural or homology derived templates. These three available template modes are Expresso for the alignment of protein with a known 3D-Structure, R-Coffee to align RNA sequences with conserved secondary structures and PSI-Coffee to accurately align distantly related sequences using homology extension. The new server benefits from recent improvements of the T-Coffee algorithm and can align up to 150 sequences as long as 10 000 residues and is available from both http://www.tcoffee.org and its main mirror http://tcoffee.crg.cat.