The Msh4–Msh5 protein complex in eukaryotes is involved in stabilizing Holliday junctions and its progenitors to facilitate crossing over during Meiosis I. These functions of the Msh4–Msh5 complex are essential for proper chromosomal segregation during the first meiotic division. The Msh4/5 proteins are homologous to the bacterial mismatch repair protein MutS and other MutS homologs (Msh2, Msh3, Msh6). Saccharomyces cerevisiae msh4/5 point mutants were identified recently that show two fold reduction in crossing over, compared to wild-type without affecting chromosome segregation. Three distinct classes of msh4/5 point mutations could be sorted based on their meiotic phenotypes. These include msh4/5 mutations that have a) crossover and viability defects similar to msh4/5 null mutants; b) intermediate defects in crossing over and viability and c) defects only in crossing over. The absence of a crystal structure for the Msh4–Msh5 complex has hindered an understanding of the structural aspects of Msh4–Msh5 function as well as molecular explanation for the meiotic defects observed in msh4/5 mutations. To address this problem, we generated a structural model of the S. cerevisiae Msh4–Msh5 complex using homology modeling. Further, structural analysis tailored with evolutionary information is used to predict sites with potentially critical roles in Msh4–Msh5 complex formation, DNA binding and to explain asymmetry within the Msh4–Msh5 complex. We also provide a structural rationale for the meiotic defects observed in the msh4/5 point mutations. The mutations are likely to affect stability of the Msh4/5 proteins and/or interactions with DNA. The Msh4–Msh5 model will facilitate the design and interpretation of new mutational data as well as structural studies of this important complex involved in meiotic chromosome segregation.
Formylglycinamide ribonucleotide amidotransferase (FGAR-AT) is a 140 kDa bi-functional enzyme involved in a coupled reaction, where the glutaminase active site produces ammonia that is subsequently utilized to convert FGAR to its corresponding amidine in an ATP assisted fashion. The structure of FGAR-AT has been previously determined in an inactive state and the mechanism of activation remains largely unknown. In the current study, hydrophobic cavities were used as markers to identify regions involved in domain movements that facilitate catalytic coupling and subsequent activation of the enzyme. Three internal hydrophobic cavities were located by xenon trapping experiments on FGAR-AT crystals and further, these cavities were perturbed via site-directed mutagenesis. Biophysical characterization of the mutants demonstrated that two of these three voids are crucial for stability and function of the protein, although being ∼20 Å from the active centers. Interestingly, correlation analysis corroborated the experimental findings, and revealed that amino acids lining the functionally important cavities form correlated sets (co-evolving residues) that connect these regions to the amidotransferase active center. It was further proposed that the first cavity is transient and allows for breathing motion to occur and thereby serves as an allosteric hotspot. In contrast, the third cavity which lacks correlated residues was found to be highly plastic and accommodated steric congestion by local adjustment of the structure without affecting either stability or activity.
Antimicrobial peptides represent one of the most promising future strategies for combating infections and microbial drug resistance. Tritrpticin is a 13mer tryptophan-rich cationic antimicrobial peptide with a broad spectrum of activity whose application in antimicrobial therapy has been hampered by ambiguity about its biological target and consequently the molecular interactions necessary for its antimicrobial activity. The present study provides clues about the mechanism of action of tritripticin by using a unique monoclonal antibody (mAb) as a ‘physiological’ structural scaffold. A pool of mAbs were generated against tritrpticin and based on its high affinity and ability to bind tritrpticin analogs, mAb 6C6D7 was selected and characterized further. In a screening of phage displayed random peptides, this antibody was able to identify a novel antimicrobial peptide with low sequence homology to tritrpticin, suggesting that the mAb possessed the physico-chemical characteristics mimicking the natural receptor. Subsequently, thermodynamics and molecular modeling identified a core group of hydrophobic residues in tritrpticin arranged in a distorted’s’ shaped conformation as critical for antibody binding. Comparison of the mAb induced conformation with the micelle bound structure of tritrpticin reveals how a common motif may be able to interact with multiple classes of biomolecules thus extending the target range of this innate immune peptide. Based on the concurrence between thermodynamic and structural data our results reveal a template that can be used to design novel antimicrobial pharmacophores while simultaneously demonstrating at a more fundamental level the potential of mAbs to act as receptor surrogates.
We highlight an unrecognized physiological role for the Greek key motif, an evolutionarily conserved super-secondary structural topology of the βγ-crystallins. These proteins constitute the bulk of the human eye lens, packed at very high concentrations in a compact, globular, short-range order, generating transparency. Congenital cataract (affecting 400,000 newborns yearly worldwide), associated with 54 mutations in βγ-crystallins, occurs in two major phenotypes nuclear cataract, which blocks the central visual axis, hampering the development of the growing eye and demanding earliest intervention, and the milder peripheral progressive cataract where surgery can wait. In order to understand this phenotypic dichotomy at the molecular level, we have studied the structural and aggregation features of representative mutations.
Wild type and several representative mutant proteins were cloned, expressed and purified and their secondary and tertiary structural details, as well as structural stability, were compared in solution, using spectroscopy. Their tendencies to aggregate in vitro and in cellulo were also compared. In addition, we analyzed their structural differences by molecular modeling in silico.
Based on their properties, mutants are seen to fall into two classes. Mutants A36P, L45PL54P, R140X, and G165fs display lowered solubility and structural stability, expose several buried residues to the surface, aggregate in vitro and in cellulo, and disturb/distort the Greek key motif. And they are associated with nuclear cataract. In contrast, mutants P24T and R77S, associated with peripheral cataract, behave quite similar to the wild type molecule, and do not affect the Greek key topology.
When a mutation distorts even one of the four Greek key motifs, the protein readily self-aggregates and precipitates, consistent with the phenotype of nuclear cataract, while mutations not affecting the motif display ‘native state aggregation’, leading to peripheral cataract, thus offering a protein structural rationale for the cataract phenotypic dichotomy “distort motif, lose central vision”.
Protein structure alignment is a crucial step in protein structure–function analysis. Despite the advances in protein structure alignment algorithms, some of the local conformationally similar regions are mislabeled as structurally variable regions (SVRs). These regions are not well superimposed because of differences in their spatial orientations. The Database of Structural Alignments (DoSA) addresses this gap in identification of local structural similarities obscured in global protein structural alignments by realigning SVRs using an algorithm based on protein blocks. A set of protein blocks is a structural alphabet that abstracts protein structures into 16 unique local structural motifs. DoSA provides unique information about 159 780 conformationally similar and 56 140 conformationally dissimilar SVRs in 74 705 pairwise structural alignments of homologous proteins. The information provided on conformationally similar and dissimilar SVRs can be helpful to model loop regions. It is also conceivable that conformationally similar SVRs with conserved residues could potentially contribute toward functional integrity of homologues, and hence identifying such SVRs could be helpful in understanding the structural basis of protein function.
While phosphotyrosine modification is an established regulatory mechanism in eukaryotes, it is less well characterized in bacteria due to low prevalence. To gain insight into the extent and biological importance of tyrosine phosphorylation in Escherichia coli, we used immunoaffinity-based phosphotyrosine peptide enrichment combined with high resolution mass spectrometry analysis to comprehensively identify tyrosine phosphorylated proteins and accurately map phosphotyrosine sites. We identified a total of 512 unique phosphotyrosine sites on 342 proteins in E. coli K12 and the human pathogen enterohemorrhagic E. coli (EHEC) O157:H7, representing the largest phosphotyrosine proteome reported to date in bacteria. This large number of tyrosine phosphorylation sites allowed us to define five phosphotyrosine site motifs. Tyrosine phosphorylated proteins belong to various functional classes such as metabolism, gene expression and virulence. We demonstrate for the first time that proteins of a type III secretion system (T3SS), required for the attaching and effacing (A/E) lesion phenotype characteristic for intestinal colonization by certain EHEC strains, are tyrosine phosphorylated by bacterial kinases. Yet, A/E lesion and metabolic phenotypes were unaffected by the mutation of the two currently known tyrosine kinases, Etk and Wzc. Substantial residual tyrosine phosphorylation present in an etk wzc double mutant strongly indicated the presence of hitherto unknown tyrosine kinases in E. coli. We assess the functional importance of tyrosine phosphorylation and demonstrate that the phosphorylated tyrosine residue of the regulator SspA positively affects expression and secretion of T3SS proteins and formation of A/E lesions. Altogether, our study reveals that tyrosine phosphorylation in bacteria is more prevalent than previously recognized, and suggests the involvement of phosphotyrosine-mediated signaling in a broad range of cellular functions and virulence.
While phosphotyrosine modification is established in eukaryote cell signaling, it is less characterized in bacteria. Despite that deletion of bacterial tyrosine kinases is known to affect various cellular functions and virulence of bacterial pathogens, few phosphotyrosine proteins are currently known. To gain insight into the extent and biological function of tyrosine phosphorylation in E. coli, we carried out an in-depth phosphotyrosine protein profiling using a mass spectrometry-based proteomics approach. Our study on E. coli K12 and the human pathogen enterohemorrhagic E. coli (EHEC) O157:H7, which is a common cause of food-borne outbreaks of diarrhea, hemorrhagic colitis and hemolytic uremic syndrome, reveal that tyrosine phosphorylation is far more prevalent than previously recognized. Target proteins are involved in a broad range of cellular functions and virulence. Proteins of the type III secretion system (T3SS), required for the attaching and effacing lesion phenotype characteristic for intestinal colonization by EHEC, are tyrosine phosphorylated. The expression of these T3SS proteins and A/E lesion formation is affected by a tyrosine phosphorylated residue on the regulator SspA. Also, our data indicates the presence of hitherto unknown E. coli tyrosine kinases. Overall, tyrosine phosphorylation seems to be involved in controlling cellular core processes and virulence of bacteria.
Coiled coils are well suited to drive subunit oligomerization and are widely used in applications ranging from basic research to medicine. The optimization of these applications requires a detailed understanding of the molecular determinants that control of coiled-coil formation. Although many of these determinants have been identified and characterized in great detail, a puzzling observation is that their presence does not necessarily correlate with the oligomerization state of a given coiled-coil structure. Thus, other determinants must play a key role. To address this issue, we recently investigated the unrelated coiled-coil domains from GCN4, ATF1 and cortexillin-1 as model systems. We found that well-known trimer-specific oligomerization-state determinants, such as the distribution of isoleucine residues at heptad-repeat core positions or the trimerization motif Arg-h-x-x-h-Glu (where h = hydrophobic amino acid; x = any amino acid), switch the peptide’s topology from a dimer to a trimer only when inserted into the trigger sequence, a site indispensable for coiled-coil formation. Because high-resolution structural information could not be obtained for the full-length, three-stranded cortexillin-1 coiled coil, we here report the detailed biophysical and structural characterization of a shorter variant spanning the trigger sequence using circular dichroism, anatytical ultracentrifugation and x-ray crystallography. We show that the peptide forms a stable α-helical trimer in solution. We further determined the crystal structure of an optimised variant at a resolution of 1.65 Å, revealing that the peptide folds into a parallel, three-stranded coiled coil. The two complemented R-IxxIE trimerization motifs and the additional hydrophobic core isoleucine residue adopt the conformations seen in other extensively characterized parallel, three-stranded coiled coils. These findings not only confirm the structural basis for the switch in oligomerization state from a dimer to a trimer observed for the full-length cortexillin-1 coiled-coil domain, but also provide further evidence for a general link between oligomerization-state specificity and trigger-sequence function.
3′,5′-cyclic adenosine monophosphate (cAMP) dependent protein kinase or protein kinase A (PKA) has served as a prototype for the large family of protein kinases that are crucially important for signal transduction in eukaryotic cells. The PKA catalytic subunits Cα and Cβ, encoded by the two genes PRKACA and PRKACB, respectively, are among the best understood and characterized human kinases. Here we have studied the evolution of this gene family in chordates, arthropods, mollusks and other animals employing probabilistic methods and show that Cα and Cβ arose by duplication of an ancestral PKA catalytic subunit in a common ancestor of vertebrates. The two genes have subsequently been duplicated in teleost fishes. The evolution of the PRKACG retroposon in simians was also investigated. Although the degree of sequence conservation in the PKA Cα/Cβ kinase family is exceptionally high, a small set of signature residues defining Cα and Cβ subfamilies were identified. These conserved residues might be important for functions that are unique to the Cα or Cβ clades. This study also provides a good example of a seemingly simple phylogenetic problem which, due to a very high degree of sequence conservation and corresponding weak phylogenetic signals, combined with problematic nonphylogenetic signals, is nontrivial for state-of-the-art probabilistic phylogenetic methods.
The precise prediction of protein domains, which are the structural, functional and evolutionary units of proteins, has been a research focus in recent years. Although many methods have been presented for predicting protein domains and boundaries, the accuracy of predictions could be improved.
In this study we present a novel approach, DomHR, which is an accurate predictor of protein domain boundaries based on a creative hinge region strategy. A hinge region was defined as a segment of amino acids that covers part of a domain region and a boundary region. We developed a strategy to construct profiles of domain-hinge-boundary (DHB) features generated by sequence-domain/hinge/boundary alignment against a database of known domain structures. The DHB features had three elements: normalized domain, hinge, and boundary probabilities. The DHB features were used as input to identify domain boundaries in a sequence. DomHR used a nonredundant dataset as the training set, the DHB and predicted shape string as features, and a conditional random field as the classification algorithm. In predicted hinge regions, a residue was determined to be a domain or a boundary according to a decision threshold. After decision thresholds were optimized, DomHR was evaluated by cross-validation, large-scale prediction, independent test and CASP (Critical Assessment of Techniques for Protein Structure Prediction) tests. All results confirmed that DomHR outperformed other well-established, publicly available domain boundary predictors for prediction accuracy.
The DomHR is available at http://cal.tongji.edu.cn/domain/.
The presence of energetically less favourable cis peptides in protein structures has been observed to be strongly associated with its structural integrity and function. Inter-conversion between the cis and trans conformations also has an important role in the folding process. In this study, we analyse the extent of conservation of cis peptides among similar folds. We look at both the amino acid preferences and local structural changes associated with such variations.
Nearly 34% of the Xaa-Proline cis bonds are not conserved in structural relatives; Proline also has a high tendency to get replaced by another amino acid in the trans conformer. At both positions bounding the peptide bond, Glycine has a higher tendency to lose the cis conformation. The cis conformation of more than 30% of β turns of type VIb and IV are not found to be conserved in similar structures. A different view using Protein Block based description of backbone conformation, suggests that many of the local conformational changes are highly different from the general local structural variations observed among structurally similar proteins.
Changes between cis and trans conformations are found to be associated with the evolution of new functions facilitated by local structural changes. This is most frequent in enzymes where new calalytic activity emerges with local changes in the active site. Cis-trans changes are also seen to facilitate inter-domain and inter-protein interactions. As in the case of folding, cis-trans conversions have been used as an important driving factor in evolution.
folds; cis peptides; omega dihedral; cis-trans isomerization; structural alignment; structural alphabet; Protein Blocks; Protein Data Bank
The Molybdenum cofactor (Moco) biosynthesis pathway is an evolutionary conserved pathway seen in almost all eukaryotes including the pathogenic species Mycobacterium tuberculosis. This pathway comprises of several novel reactions which include the initial formation of precursor Z from guanosine triphosphate (GTP), catalysed by two enzymes MoaA and MoaC. Although Moco biosynthesis is well understood, the first step is still not clear. In M. tuberculosis H37Rv, three orthologous genes of MoaC have been annotated: moaC1 (Rv3111), moaC2 (Rv0864) and moaC3 (Rv3324c). Rv0864 (MoaC2) is a 17.5 kDa protein and is reported to be down-regulated by ∼3 times in the nutrient starvation model for Mycobacterium tuberculosis. The crystal structure of Moco-biosynthesis protein MoaC2 from Mycobacterium tuberculosis (2.20 Å resolution, space group P213) has been determined. Based on a comparative analysis of structures of homologous proteins, conserved residues were identified and are implicated in structural and functional roles. Molecular docking studies with probable ligands carried out in order to identify its ligand, suggests that pteridinebenzomonophosphate as the most likely ligand. Sequence based interaction study identified MoaA1 to interact with MoaC2. A homology model of MoaA1 was then complexed with MoaC2 and protein–protein interactions are also discussed.
Bcl-XL is a member of Bcl-2 family of proteins involved in the regulation of intrinsic pathway of apoptosis. Its overexpression in many human cancers makes it an important target for anti-cancer drugs. Bcl-XL interacts with the BH3 domain of several pro-apoptotic Bcl-2 partners. This helical bundle protein has a pronounced hydrophobic groove which acts as a binding region for the BH3 domains. Eight independent molecular dynamics simulations of the apo/holo forms of Bcl-XL were carried out to investigate the behavior of solvent-exposed hydrophobic groove. The simulations used either a twin-range cut-off or particle mesh Ewald (PME) scheme to treat long-range interactions. Destabilization of the BH3 domain-containing helix H2 was observed in all four twin-range cut-off simulations. Most of the other major helices remained stable. The unwinding of H2 can be related to the ability of Bcl-XL to bind diverse BH3 ligands. The loss of helical character can also be linked to the formation of homo- or hetero-dimers in Bcl-2 proteins. Several experimental studies have suggested that exposure of BH3 domain is a crucial event before they form dimers. Thus unwinding of H2 seems to be functionally very important. The four PME simulations, however, revealed a stable helix H2. It is possible that the H2 unfolding might occur in PME simulations at longer time scales. Hydrophobic residues in the hydrophobic groove are involved in stable interactions among themselves. The solvent accessible surface areas of bulky hydrophobic residues in the groove are significantly buried by the loop LB connecting the helix H2 and subsequent helix. These observations help to understand how the hydrophobic patch in Bcl-XL remains stable in the solvent-exposed state. We suggest that both the destabilization of helix H2 and the conformational heterogeneity of loop LB are important factors for binding of diverse ligands in the hydrophobic groove of Bcl-XL.
Development of sensitive sequence search procedures for the detection of distant relationships between proteins at superfamily/fold level is still a big challenge. The intermediate sequence search approach is the most frequently employed manner of identifying remote homologues effectively. In this study, examination of serine proteases of prolyl oligopeptidase, rhomboid and subtilisin protein families were carried out using plant serine proteases as queries from two genomes including A. thaliana and O. sativa and 13 other families of unrelated folds to identify the distant homologues which could not be obtained using PSI-BLAST.
We have proposed to start with multiple queries of classical serine protease members to identify remote homologues in families, using a rigorous approach like Cascade PSI-BLAST. We found that classical sequence based approaches, like PSI-BLAST, showed very low sequence coverage in identifying plant serine proteases. The algorithm was applied on enriched sequence database of homologous domains and we obtained overall average coverage of 88% at family, 77% at superfamily or fold level along with specificity of ∼100% and Mathew’s correlation coefficient of 0.91. Similar approach was also implemented on 13 other protein families representing every structural class in SCOP database. Further investigation with statistical tests, like jackknifing, helped us to better understand the influence of neighbouring protein families.
Our study suggests that employment of multiple queries of a family for the Cascade PSI-BLAST searches is useful for predicting distant relationships effectively even at superfamily level. We have proposed a generalized strategy to cover all the distant members of a particular family using multiple query sequences. Our findings reveal that prior selection of sequences as query and the presence of neighbouring families can be important for covering the search space effectively in minimal computational time. This study also provides an understanding of the ‘bridging’ role of related families.
Calmodulin (CaM) is a highly conserved eukaryotic protein that binds specifically to more than 100 target proteins in response to calcium (Ca2+) signal. CaM adopts a considerable degree of structural plasticity to accomplish this physiological role; however, the nature and extent of this plasticity remain to be fully understood. Here, we report the crystal structure of a novel trans conformation of ligand-free CaM where the relative disposition of two lobes of CaM is different, a conformation to-date not reported. While no major structural changes were observed in the independent N- and C-lobes as compared with previously reported structures of Ca2+/CaM, the central helix was tilted by ∼90° at Arg75. This is the first crystal structure of CaM to show a drastic conformational change in the central helix, and reveals one of several possible conformations of CaM to engage with its binding partner.
Scanning through genomes for potential transcription factor binding sites (TFBSs) is becoming increasingly important in this post-genomic era. The position weight matrix (PWM) is the standard representation of TFBSs utilized when scanning through sequences for potential binding sites. However, many transcription factor (TF) motifs are short and highly degenerate, and methods utilizing PWMs to scan for sites are plagued by false positives. Furthermore, many important TFs do not have well-characterized PWMs, making identification of potential binding sites even more difficult. One approach to the identification of sites for these TFs has been to use the 3D structure of the TF to predict the DNA structure around the TF and then to generate a PWM from the predicted 3D complex structure. However, this approach is dependent on the similarity of the predicted structure to the native structure. We introduce here a novel approach to identify TFBSs utilizing structure information that can be applied to TFs without characterized PWMs, as long as a 3D complex structure (TF/DNA) exists. This approach utilizes an energy function that is uniquely trained on each structure. Our approach leads to increased prediction accuracy and robustness compared with those using a more general energy function. The software is freely available upon request.
Many of the most important functions in the cell are carried out by proteins organized in large molecular machines. Cryo-electron microscopy (cryo-EM) is increasingly being used to obtain low resolution density maps of these large assemblies. A new method, ATTRACT-EM, for the computational assembly of molecular assemblies from their components has been developed. Based on concepts from the protein-protein docking field, it utilizes cryo-EM density maps to assemble molecular subunits at near atomic detail, starting from millions of initial subunit configurations. The search efficiency was further enhanced by recombining partial solutions, the inclusion of symmetry information, and refinement using a molecular force field. The approach was tested on the GroES-GroEL system, using an experimental cryo-EM map at 23.5 Å resolution, and on several smaller complexes. Inclusion of experimental information on the symmetry of the systems and the application of a new gradient vector matching algorithm allowed the efficient identification of docked assemblies in close agreement with experiment. Application to the GroES-GroEL complex resulted in a top ranked model with a deviation of 4.6 Å (and a 2.8 Å model within the top 10) from the GroES-GroEL crystal structure, a significant improvement over existing methods.
Efforts to increase affinity in the design of new therapeutic molecules have tended to lead to greater lipophilicity, a factor that is generally agreed to be contributing to the low success rate of new drug candidates. Our aim is to provide a structural perspective to the study of lipophilic efficiency and to compare molecular interactions created over evolutionary time with those designed by humans. We show that natural complexes typically engage in more polar contacts than synthetic molecules bound to proteins. The synthetic molecules also have a higher proportion of unmatched heteroatoms at the interface than the natural sets. These observations suggest that there are lessons to be learnt from Nature, which could help us to improve the characteristics of man-made molecules. In particular, it is possible to increase the density of polar contacts without increasing lipophilicity and this is best achieved early in discovery while molecules remain relatively small.
The correlation of genetic distances between pairs of protein sequence alignments has been used to infer protein-protein interactions. It has been suggested that these correlations are based on the signal of co-evolution between interacting proteins. However, although mutations in different proteins associated with maintaining an interaction clearly occur (particularly in binding interfaces and neighbourhoods), many other factors contribute to correlated rates of sequence evolution. Proteins in the same genome are usually linked by shared evolutionary history and so it would be expected that there would be topological similarities in their phylogenetic trees, whether they are interacting or not. For this reason the underlying species tree is often corrected for. Moreover processes such as expression level, are known to effect evolutionary rates. However, it has been argued that the correlated rates of evolution used to predict protein interaction explicitly includes shared evolutionary history; here we test this hypothesis.
In order to identify the evolutionary mechanisms giving rise to the correlations between interaction proteins, we use phylogenetic methods to distinguish similarities in tree topologies from similarities in genetic distances. We use a range of datasets of interacting and non-interacting proteins from Saccharomyces cerevisiae. We find that the signal of correlated evolution between interacting proteins is predominantly a result of shared evolutionary rates, rather than similarities in tree topology, independent of evolutionary divergence.
Since interacting proteins do not have tree topologies that are more similar than the control group of non-interacting proteins, it is likely that coevolution does not contribute much to, if any, of the observed correlations.
Co-evolution; Correlated evolution; Protein evolution; Phylogenetic; Protein-protein complexes; Protein-protein interactions
The ability to catalytically cleave protein substrates after synthesis is fundamental for all forms of life. Accordingly, site-specific proteolysis is one of the most important post-translational modifications. The key to understanding the physiological role of a protease is to identify its natural substrate(s). Knowledge of the substrate specificity of a protease can dramatically improve our ability to predict its target protein substrates, but this information must be utilized in an effective manner in order to efficiently identify protein substrates by in silico approaches. To address this problem, we present PROSPER, an integrated feature-based server for in silico identification of protease substrates and their cleavage sites for twenty-four different proteases. PROSPER utilizes established specificity information for these proteases (derived from the MEROPS database) with a machine learning approach to predict protease cleavage sites by using different, but complementary sequence and structure characteristics. Features used by PROSPER include local amino acid sequence profile, predicted secondary structure, solvent accessibility and predicted native disorder. Thus, for proteases with known amino acid specificity, PROSPER provides a convenient, pre-prepared tool for use in identifying protein substrates for the enzymes. Systematic prediction analysis for the twenty-four proteases thus far included in the database revealed that the features we have included in the tool strongly improve performance in terms of cleavage site prediction, as evidenced by their contribution to performance improvement in terms of identifying known cleavage sites in substrates for these enzymes. In comparison with two state-of-the-art prediction tools, PoPS and SitePrediction, PROSPER achieves greater accuracy and coverage. To our knowledge, PROSPER is the first comprehensive server capable of predicting cleavage sites of multiple proteases within a single substrate sequence using machine learning techniques. It is freely available at http://lightning.med.monash.edu.au/PROSPER/.
Hexachlorocyclohexane dehydrochlorinase (LinA) mediates dehydrochlorination of γ-HCH to 1, 3, 4, 6-tetrachloro-1,4-cyclohexadiene that constitutes first step of the aerobic degradation pathway. We report the 3.5 Å crystal structure of a thermostable LinA-type2 protein, obtained from a soil metagenome, in the hexagonal space group P6322 with unit cell parameters a = b = 162.5, c = 186.3 Å, respectively. The structure was solved by molecular replacement using the co-ordinates of LinA-type1 that exhibits mesophile-like properties. Structural comparison of LinA-type2 and -type1 proteins suggests that thermostability of LinA-type2 might partly arise due to presence of higher number of ionic interactions, along with 4% increase in the intersubunit buried surface area. Mutational analysis involving the differing residues between the -type1 and -type2 proteins, circular dichroism experiments and functional assays suggest that Q20 and G23 are determinants of stability for LinA-type2. It was earlier reported that LinA-type1 exhibits enantioselectivity for the (−) enantiomer of α-HCH. Contrastingly, we identified that -type2 protein prefers the (+) enantiomer of α-HCH. Structural analysis and molecular docking experiments suggest that changed residues K20Q, L96C and A131G, vicinal to the active site are probably responsible for the altered enantioselectivity of LinA-type2. Overall the study has identified features responsible for the thermostability and enantioselectivity of LinA-type2 that can be exploited for the design of variants for specific biotechnological applications.
Mammalian methionine adenosyltransferase II (MAT II) is the only hetero-oligomer in this family of enzymes that synthesize S-adenosylmethionine using methionine and ATP as substrates. Binding of regulatory β subunits and catalytic α2 dimers is known to increase the affinity for methionine, although scarce additional information about this interaction is available. This work reports the use of recombinant α2 and β subunits to produce oligomers showing kinetic parameters comparable to MAT II purified from several tissues. According to isothermal titration calorimetry data and densitometric scanning of the stained hetero-oligomer bands on denatured gels, the composition of these oligomers is that of a hetero-trimer with α2 dimers associated to single β subunits. Additionally, the regulatory subunit is able to bind NADP+ with a 1∶1 stoichiometry, the cofactor enhancing β to α2-dimer binding affinity. Mutants lacking residues involved in NADP+ binding and N-terminal truncations of the β subunit were able to oligomerize with α2-dimers, although the kinetic properties appeared altered. These data together suggest a role for both parts of the sequence in the regulatory role exerted by the β subunit on catalysis. Moreover, preparation of a structural model for the hetero-oligomer, using the available crystal data, allowed prediction of the regions involved in β to α2-dimer interaction. Finally, the implications that the presence of different N-terminals in the β subunit could have on MAT II behavior are discussed in light of the recent identification of several splicing forms of this subunit in hepatoma cells.
Interaction of non-structural protein 5A (NS5A) of Hepatitis C virus (HCV) with human kinases namely, casein kinase 1α (ck1α) and protein kinase R (PKR) have different functional implications such as regulation of viral replication and evasion of interferon induced immune response respectively. Understanding the structural and molecular basis of interactions of the viral protein with two different human kinases can be useful in developing strategies for treatment against HCV.
Serine 232 of NS5A is known to be phosphorylated by human ck1α. A structural model of NS5A peptide containing phosphoacceptor residue Serine 232 bound to ck1α has been generated using the known 3-D structures of kinase-peptide complexes. The substrate interacting residues in ck1α has been identified from the model and these are found to be conserved well in the ck1 family. ck1α – substrate peptide complex has also been used to understand the structural basis of association between ck1α and its other viral stress induced substrate, tumour suppressor p53 transactivation domain which has a crystal structure available.
Interaction of NS5A with another human kinase PKR is primarily genotype specific. NS5A from genotype 1b has been shown to interact and inhibit PKR whereas NS5A from genotype 2a/3a are unable to bind and inhibit PKR efficiently. This is one of the main reasons for the varied response to interferon therapy in HCV patients across different genotypes. Using PKR crystal structure, sequence alignment and evolutionary trace analysis some of the critical residues responsible for the interaction of NS5A 1b with PKR have been identified.
The substrate interacting residues in ck1α have been identified using the structural model of kinase - substrate peptide. The PKR interacting NS5A 1b residues have also been predicted using PKR crystal structure, NS5A sequence analysis along with known experimental results. Functional significance and nature of interaction of interferon sensitivity determining region and variable region 3 of NS5A in different genotypes with PKR which was experimentally shown are also supported by the findings of evolutionary trace analysis. Designing inhibitors to prevent this interaction could enable the HCV genotype 1 infected patients respond well to interferon therapy.
Casein kinase 1α; Hepatitis C virus; Interferon therapy; Kinase-substrate complex; Non-structural protein 5A; Protein kinase R
Prediction of protein catalytic residues provides useful information for the studies of protein functions. Most of the existing methods combine both structure and sequence information but heavily rely on sequence conservation from multiple sequence alignments. The contribution of structure information is usually less than that of sequence conservation in existing methods. We found a novel structure feature, residue side chain orientation, which is the first structure-based feature that achieves prediction results comparable to that of evolutionary sequence conservation. We developed a structure-based method, Enzyme Catalytic residue SIde-chain Arrangement (EXIA), which is based on residue side chain orientations and backbone flexibility of protein structure. The prediction that uses EXIA outperforms existing structure-based features. The prediction quality of combing EXIA and sequence conservation exceeds that of the state-of-the-art prediction methods. EXIA is designed to predict catalytic residues from single protein structure without needing sequence or structure alignments. It provides invaluable information when there is no sufficient or reliable homology information for target protein. We found that catalytic residues have very special side chain orientation and designed the EXIA method based on the newly discovered feature. It was also found that EXIA performs well for a dataset of enzymes without any bounded ligand in their crystallographic structures.
The ability to improve protein thermostability via protein engineering is of great scientific interest and also has significant practical value. In this report we present PROTS-RF, a robust model based on the Random Forest algorithm capable of predicting thermostability changes induced by not only single-, but also double- or multiple-point mutations. The model is built using 41 features including evolutionary information, secondary structure, solvent accessibility and a set of fragment-based features. It achieves accuracies of 0.799,0.782, 0.787, and areas under receiver operating characteristic (ROC) curves of 0.873, 0.868 and 0.862 for single-, double- and multiple- point mutation datasets, respectively. Contrary to previous suggestions, our results clearly demonstrate that a robust predictive model trained for predicting single point mutation induced thermostability changes can be capable of predicting double and multiple point mutations. It also shows high levels of robustness in the tests using hypothetical reverse mutations. We demonstrate that testing datasets created based on physical principles can be highly useful for testing the robustness of predictive models.
The specificity of protein-protein interactions is encoded in those parts of the sequence that compose the binding interface. Therefore, understanding how changes in protein sequence influence interaction specificity, and possibly the phenotype, requires knowing the location of binding sites in those sequences. However, large-scale detection of protein interfaces remains a challenge. Here, we present a sequence- and interactome-based approach to mine interaction motifs from the recently published Arabidopsis thaliana interactome. The resultant proteome-wide predictions are available via www.ab.wur.nl/sliderbio and set the stage for further investigations of protein-protein binding sites. To assess our method, we first show that, by using a priori information calculated from protein sequences, such as evolutionary conservation and residue surface accessibility, we improve the performance of interface prediction compared to using only interactome data. Next, we present evidence for the functional importance of the predicted sites, which are under stronger selective pressure than the rest of protein sequence. We also observe a tendency for compensatory mutations in the binding sites of interacting proteins. Subsequently, we interrogated the interactome data to formulate testable hypotheses for the molecular mechanisms underlying effects of protein sequence mutations. Examples include proteins relevant for various developmental processes. Finally, we observed, by analysing pairs of paralogs, a correlation between functional divergence and sequence divergence in interaction sites. This analysis suggests that large-scale prediction of binding sites can cast light on evolutionary processes that shape protein-protein interaction networks.