The function of a protein can be deciphered with higher accuracy from its structure than from its amino acid sequence. Due to the huge gap in the available protein sequence and structural space, tools that can generate functionally homogeneous clusters using only the sequence information, hold great importance. For this, traditional alignment-based tools work well in most cases and clustering is performed on the basis of sequence similarity. But, in the case of multi-domain proteins, the alignment quality might be poor due to varied lengths of the proteins, domain shuffling or circular permutations. Multi-domain proteins are ubiquitous in nature, hence alignment-free tools, which overcome the shortcomings of alignment-based protein comparison methods, are required. Further, existing tools classify proteins using only domain-level information and hence miss out on the information encoded in the tethered regions or accessory domains. Our method, on the other hand, takes into account the full-length sequence of a protein, consolidating the complete sequence information to understand a given protein better.
Our web-server, CLAP (Classification of Proteins), is one such alignment-free software for automatic classification of protein sequences. It utilizes a pattern-matching algorithm that assigns local matching scores (LMS) to residues that are a part of the matched patterns between two sequences being compared. CLAP works on full-length sequences and does not require prior domain definitions.
Pilot studies undertaken previously on protein kinases and immunoglobulins have shown that CLAP yields clusters, which have high functional and domain architectural similarity. Moreover, parsing at a statistically determined cut-off resulted in clusters that corroborated with the sub-family level classification of that particular domain family.
CLAP is a useful protein-clustering tool, independent of domain assignment, domain order, sequence length and domain diversity. Our method can be used for any set of protein sequences, yielding functionally relevant clusters with high domain architectural homogeneity. The CLAP web server is freely available for academic use at http://nslab.mbu.iisc.ernet.in/clap/.
Alignment-free comparison; Domain architectures; Multi-domain proteins; Protein classification
The highly modular nature of protein kinases generates diverse functional roles mediated by evolutionary events such as domain recombination, insertion and deletion of domains. Usually domain architecture of a kinase is related to the subfamily to which the kinase catalytic domain belongs. However outlier kinases with unusual domain architectures serve in the expansion of the functional space of the protein kinase family. For example, Src kinases are made-up of SH2 and SH3 domains in addition to the kinase catalytic domain. A kinase which lacks these two domains but retains sequence characteristics within the kinase catalytic domain is an outlier that is likely to have modes of regulation different from classical src kinases. This study defines two types of outlier kinases: hybrids and rogues depending on the nature of domain recombination. Hybrid kinases are those where the catalytic kinase domain belongs to a kinase subfamily but the domain architecture is typical of another kinase subfamily. Rogue kinases are those with kinase catalytic domain characteristic of a kinase subfamily but the domain architecture is typical of neither that subfamily nor any other kinase subfamily. This report provides a consolidated set of such hybrid and rogue kinases gleaned from six eukaryotic genomes–S.cerevisiae, D. melanogaster, C.elegans, M.musculus, T.rubripes and H.sapiens–and discusses their functions. The presence of such kinases necessitates a revisiting of the classification scheme of the protein kinase family using full length sequences apart from classical classification using solely the sequences of kinase catalytic domains. The study of these kinases provides a good insight in engineering signalling pathways for a desired output. Lastly, identification of hybrids and rogues in pathogenic protozoa such as P.falciparum sheds light on possible strategies in host-pathogen interactions.
Hepatitis C virus (HCV) is the causative agent of end-stage liver disease. Recent advances in the last decade in anti HCV treatment strategies have dramatically increased the viral clearance rate. However, several limitations are still associated, which warrant a great need of novel, safe and selective drugs against HCV infection. Towards this objective, we explored highly potent and selective small molecule inhibitors, the ellagitannins, from the crude extract of Pomegranate (Punica granatum) fruit peel. The pure compounds, punicalagin, punicalin, and ellagic acid isolated from the extract specifically blocked the HCV NS3/4A protease activity in vitro. Structural analysis using computational approach also showed that ligand molecules interact with the catalytic and substrate binding residues of NS3/4A protease, leading to inhibition of the enzyme activity. Further, punicalagin and punicalin significantly reduced the HCV replication in cell culture system. More importantly, these compounds are well tolerated ex vivo and‘no observed adverse effect level' (NOAEL) was established upto an acute dose of 5000 mg/kg in BALB/c mice. Additionally, pharmacokinetics study showed that the compounds are bioavailable. Taken together, our study provides a proof-of-concept approach for the potential use of antiviral and non-toxic principle ellagitannins from pomegranate in prevention and control of HCV induced complications.
We hypothesized that the AAV2 vector is targeted for destruction in the cytoplasm by the host cellular kinase/ubiquitination/proteasomal machinery and that modification of their targets on AAV2 capsid may improve its transduction efficiency. In vitro analysis with pharmacological inhibitors of cellular serine/threonine kinases (protein kinase A, protein kinase C, casein kinase II) showed an increase (20–90%) on AAV2-mediated gene expression. The three-dimensional structure of AAV2 capsid was then analyzed to predict the sites of ubiquitination and phosphorylation. Three phosphodegrons, which are the phosphorylation sites recognized as degradation signals by ubiquitin ligases, were identified. Mutation targets comprising eight serine (S) or seven threonine (T) or nine lysine (K) residues were selected in and around phosphodegrons on the basis of their solvent accessibility, overlap with the receptor binding regions, overlap with interaction interfaces of capsid proteins, and their evolutionary conservation across AAV serotypes. AAV2-EGFP vectors with the wild-type (WT) capsid or mutant capsids (15 S/T→alanine [A] or 9 K→arginine [R] single mutant or 2 double K→R mutants) were then evaluated in vitro. The transduction efficiencies of 11 S/T→A and 7 K→R vectors were significantly higher (∼63–90%) than the AAV2-WT vectors (∼30–40%). Further, hepatic gene transfer of these mutant vectors in vivo resulted in higher vector copy numbers (up to 4.9-fold) and transgene expression (up to 14-fold) than observed from the AAV2-WT vector. One of the mutant vectors, S489A, generated ∼8-fold fewer antibodies that could be cross-neutralized by AAV2-WT. This study thus demonstrates the feasibility of the use of these novel AAV2 capsid mutant vectors in hepatic gene therapy.
Gabriel and colleagues examine the in vitro and in vivo efficacy of novel AAV2 vectors, which are modified at critical serine/threonine/lysine residues of the vector capsid. In vitro, they find that the transduction efficiencies of 11 S/T → A and 7 K → R vectors are significantly higher than the AAV2-wild type (WT) vectors. In vivo, they find that hepatic gene transfer of these mutant vectors results in higher vector copy numbers (up to 4.9-fold) and transgene expression (up to 14-fold) than observed from the AAV2-WT vector.
Recombinant adeno-associated virus vectors based on serotype 8 (AAV8) have shown significant promise for liver-directed gene therapy. However, to overcome the vector dose dependent immunotoxicity seen with AAV8 vectors, it is important to develop better AAV8 vectors that provide enhanced gene expression at significantly low vector doses. Since it is known that AAV vectors during intracellular trafficking are targeted for destruction in the cytoplasm by the host–cellular kinase/ubiquitination/proteasomal machinery, we modified specific serine/threonine kinase or ubiquitination targets on the AAV8 capsid to augment its transduction efficiency. Point mutations at specific serine (S)/threonine (T)/lysine (K) residues were introduced in the AAV8 capsid at the positions equivalent to that of the effective AAV2 mutants, generated successfully earlier. Extensive structure analysis was carried out subsequently to evaluate the structural equivalence between the two serotypes. scAAV8 vectors with the wild-type (WT) and each one of the S/T→Alanine (A) or K-Arginine (R) mutant capsids were evaluated for their liver transduction efficiency in C57BL/6 mice in vivo. Two of the AAV8-S→A mutants (S279A and S671A), and a K137R mutant vector, demonstrated significantly higher enhanced green fluorescent protein (EGFP) transcript levels (∼9- to 46-fold) in the liver compared to animals that received WT-AAV8 vectors alone. The best performing AAV8 mutant (K137R) vector also had significantly reduced ubiquitination of the viral capsid, reduced activation of markers of innate immune response, and a concomitant two-fold reduction in the levels of neutralizing antibody formation in comparison to WT-AAV8 vectors. Vector biodistribution studies revealed that the K137R mutant had a significantly higher and preferential transduction of the liver (106 vs. 7.7 vector copies/mouse diploid genome) when compared to WT-AAV8 vectors. To further study the utility of the K137R-AAV8 mutant in therapeutic gene transfer, we delivered human coagulation factor IX (h.FIX) under the control of liver-specific promoters (LP1 or hAAT) into C57BL/6 mice. The circulating levels of h.FIX:Ag were higher in all the K137R-AAV8 treated groups up to 8 weeks post-hepatic gene transfer. These studies demonstrate the feasibility of the use of this novel AAV8 vectors for potential gene therapy of hemophilia B.
Sen and colleagues generated AAV8 capsid point mutants by replacing specific serine/threonine kinase or ubiquitination target residues. Two of the mutants yielded significantly higher transgene expression over AAV8 when injected into mice, and the best performing vector also exhibited significantly reduced capsid ubiquitination, innate immune response activation, and neutralizing antibody formation.
The Msh4–Msh5 protein complex in eukaryotes is involved in stabilizing Holliday junctions and its progenitors to facilitate crossing over during Meiosis I. These functions of the Msh4–Msh5 complex are essential for proper chromosomal segregation during the first meiotic division. The Msh4/5 proteins are homologous to the bacterial mismatch repair protein MutS and other MutS homologs (Msh2, Msh3, Msh6). Saccharomyces cerevisiae msh4/5 point mutants were identified recently that show two fold reduction in crossing over, compared to wild-type without affecting chromosome segregation. Three distinct classes of msh4/5 point mutations could be sorted based on their meiotic phenotypes. These include msh4/5 mutations that have a) crossover and viability defects similar to msh4/5 null mutants; b) intermediate defects in crossing over and viability and c) defects only in crossing over. The absence of a crystal structure for the Msh4–Msh5 complex has hindered an understanding of the structural aspects of Msh4–Msh5 function as well as molecular explanation for the meiotic defects observed in msh4/5 mutations. To address this problem, we generated a structural model of the S. cerevisiae Msh4–Msh5 complex using homology modeling. Further, structural analysis tailored with evolutionary information is used to predict sites with potentially critical roles in Msh4–Msh5 complex formation, DNA binding and to explain asymmetry within the Msh4–Msh5 complex. We also provide a structural rationale for the meiotic defects observed in the msh4/5 point mutations. The mutations are likely to affect stability of the Msh4/5 proteins and/or interactions with DNA. The Msh4–Msh5 model will facilitate the design and interpretation of new mutational data as well as structural studies of this important complex involved in meiotic chromosome segregation.
We highlight an unrecognized physiological role for the Greek key motif, an evolutionarily conserved super-secondary structural topology of the βγ-crystallins. These proteins constitute the bulk of the human eye lens, packed at very high concentrations in a compact, globular, short-range order, generating transparency. Congenital cataract (affecting 400,000 newborns yearly worldwide), associated with 54 mutations in βγ-crystallins, occurs in two major phenotypes nuclear cataract, which blocks the central visual axis, hampering the development of the growing eye and demanding earliest intervention, and the milder peripheral progressive cataract where surgery can wait. In order to understand this phenotypic dichotomy at the molecular level, we have studied the structural and aggregation features of representative mutations.
Wild type and several representative mutant proteins were cloned, expressed and purified and their secondary and tertiary structural details, as well as structural stability, were compared in solution, using spectroscopy. Their tendencies to aggregate in vitro and in cellulo were also compared. In addition, we analyzed their structural differences by molecular modeling in silico.
Based on their properties, mutants are seen to fall into two classes. Mutants A36P, L45PL54P, R140X, and G165fs display lowered solubility and structural stability, expose several buried residues to the surface, aggregate in vitro and in cellulo, and disturb/distort the Greek key motif. And they are associated with nuclear cataract. In contrast, mutants P24T and R77S, associated with peripheral cataract, behave quite similar to the wild type molecule, and do not affect the Greek key topology.
When a mutation distorts even one of the four Greek key motifs, the protein readily self-aggregates and precipitates, consistent with the phenotype of nuclear cataract, while mutations not affecting the motif display ‘native state aggregation’, leading to peripheral cataract, thus offering a protein structural rationale for the cataract phenotypic dichotomy “distort motif, lose central vision”.
Protein structure alignment is a crucial step in protein structure–function analysis. Despite the advances in protein structure alignment algorithms, some of the local conformationally similar regions are mislabeled as structurally variable regions (SVRs). These regions are not well superimposed because of differences in their spatial orientations. The Database of Structural Alignments (DoSA) addresses this gap in identification of local structural similarities obscured in global protein structural alignments by realigning SVRs using an algorithm based on protein blocks. A set of protein blocks is a structural alphabet that abstracts protein structures into 16 unique local structural motifs. DoSA provides unique information about 159 780 conformationally similar and 56 140 conformationally dissimilar SVRs in 74 705 pairwise structural alignments of homologous proteins. The information provided on conformationally similar and dissimilar SVRs can be helpful to model loop regions. It is also conceivable that conformationally similar SVRs with conserved residues could potentially contribute toward functional integrity of homologues, and hence identifying such SVRs could be helpful in understanding the structural basis of protein function.
While phosphotyrosine modification is an established regulatory mechanism in eukaryotes, it is less well characterized in bacteria due to low prevalence. To gain insight into the extent and biological importance of tyrosine phosphorylation in Escherichia coli, we used immunoaffinity-based phosphotyrosine peptide enrichment combined with high resolution mass spectrometry analysis to comprehensively identify tyrosine phosphorylated proteins and accurately map phosphotyrosine sites. We identified a total of 512 unique phosphotyrosine sites on 342 proteins in E. coli K12 and the human pathogen enterohemorrhagic E. coli (EHEC) O157:H7, representing the largest phosphotyrosine proteome reported to date in bacteria. This large number of tyrosine phosphorylation sites allowed us to define five phosphotyrosine site motifs. Tyrosine phosphorylated proteins belong to various functional classes such as metabolism, gene expression and virulence. We demonstrate for the first time that proteins of a type III secretion system (T3SS), required for the attaching and effacing (A/E) lesion phenotype characteristic for intestinal colonization by certain EHEC strains, are tyrosine phosphorylated by bacterial kinases. Yet, A/E lesion and metabolic phenotypes were unaffected by the mutation of the two currently known tyrosine kinases, Etk and Wzc. Substantial residual tyrosine phosphorylation present in an etk wzc double mutant strongly indicated the presence of hitherto unknown tyrosine kinases in E. coli. We assess the functional importance of tyrosine phosphorylation and demonstrate that the phosphorylated tyrosine residue of the regulator SspA positively affects expression and secretion of T3SS proteins and formation of A/E lesions. Altogether, our study reveals that tyrosine phosphorylation in bacteria is more prevalent than previously recognized, and suggests the involvement of phosphotyrosine-mediated signaling in a broad range of cellular functions and virulence.
While phosphotyrosine modification is established in eukaryote cell signaling, it is less characterized in bacteria. Despite that deletion of bacterial tyrosine kinases is known to affect various cellular functions and virulence of bacterial pathogens, few phosphotyrosine proteins are currently known. To gain insight into the extent and biological function of tyrosine phosphorylation in E. coli, we carried out an in-depth phosphotyrosine protein profiling using a mass spectrometry-based proteomics approach. Our study on E. coli K12 and the human pathogen enterohemorrhagic E. coli (EHEC) O157:H7, which is a common cause of food-borne outbreaks of diarrhea, hemorrhagic colitis and hemolytic uremic syndrome, reveal that tyrosine phosphorylation is far more prevalent than previously recognized. Target proteins are involved in a broad range of cellular functions and virulence. Proteins of the type III secretion system (T3SS), required for the attaching and effacing lesion phenotype characteristic for intestinal colonization by EHEC, are tyrosine phosphorylated. The expression of these T3SS proteins and A/E lesion formation is affected by a tyrosine phosphorylated residue on the regulator SspA. Also, our data indicates the presence of hitherto unknown E. coli tyrosine kinases. Overall, tyrosine phosphorylation seems to be involved in controlling cellular core processes and virulence of bacteria.
The presence of energetically less favourable cis peptides in protein structures has been observed to be strongly associated with its structural integrity and function. Inter-conversion between the cis and trans conformations also has an important role in the folding process. In this study, we analyse the extent of conservation of cis peptides among similar folds. We look at both the amino acid preferences and local structural changes associated with such variations.
Nearly 34% of the Xaa-Proline cis bonds are not conserved in structural relatives; Proline also has a high tendency to get replaced by another amino acid in the trans conformer. At both positions bounding the peptide bond, Glycine has a higher tendency to lose the cis conformation. The cis conformation of more than 30% of β turns of type VIb and IV are not found to be conserved in similar structures. A different view using Protein Block based description of backbone conformation, suggests that many of the local conformational changes are highly different from the general local structural variations observed among structurally similar proteins.
Changes between cis and trans conformations are found to be associated with the evolution of new functions facilitated by local structural changes. This is most frequent in enzymes where new calalytic activity emerges with local changes in the active site. Cis-trans changes are also seen to facilitate inter-domain and inter-protein interactions. As in the case of folding, cis-trans conversions have been used as an important driving factor in evolution.
folds; cis peptides; omega dihedral; cis-trans isomerization; structural alignment; structural alphabet; Protein Blocks; Protein Data Bank
Development of sensitive sequence search procedures for the detection of distant relationships between proteins at superfamily/fold level is still a big challenge. The intermediate sequence search approach is the most frequently employed manner of identifying remote homologues effectively. In this study, examination of serine proteases of prolyl oligopeptidase, rhomboid and subtilisin protein families were carried out using plant serine proteases as queries from two genomes including A. thaliana and O. sativa and 13 other families of unrelated folds to identify the distant homologues which could not be obtained using PSI-BLAST.
We have proposed to start with multiple queries of classical serine protease members to identify remote homologues in families, using a rigorous approach like Cascade PSI-BLAST. We found that classical sequence based approaches, like PSI-BLAST, showed very low sequence coverage in identifying plant serine proteases. The algorithm was applied on enriched sequence database of homologous domains and we obtained overall average coverage of 88% at family, 77% at superfamily or fold level along with specificity of ∼100% and Mathew’s correlation coefficient of 0.91. Similar approach was also implemented on 13 other protein families representing every structural class in SCOP database. Further investigation with statistical tests, like jackknifing, helped us to better understand the influence of neighbouring protein families.
Our study suggests that employment of multiple queries of a family for the Cascade PSI-BLAST searches is useful for predicting distant relationships effectively even at superfamily level. We have proposed a generalized strategy to cover all the distant members of a particular family using multiple query sequences. Our findings reveal that prior selection of sequences as query and the presence of neighbouring families can be important for covering the search space effectively in minimal computational time. This study also provides an understanding of the ‘bridging’ role of related families.
The correlation of genetic distances between pairs of protein sequence alignments has been used to infer protein-protein interactions. It has been suggested that these correlations are based on the signal of co-evolution between interacting proteins. However, although mutations in different proteins associated with maintaining an interaction clearly occur (particularly in binding interfaces and neighbourhoods), many other factors contribute to correlated rates of sequence evolution. Proteins in the same genome are usually linked by shared evolutionary history and so it would be expected that there would be topological similarities in their phylogenetic trees, whether they are interacting or not. For this reason the underlying species tree is often corrected for. Moreover processes such as expression level, are known to effect evolutionary rates. However, it has been argued that the correlated rates of evolution used to predict protein interaction explicitly includes shared evolutionary history; here we test this hypothesis.
In order to identify the evolutionary mechanisms giving rise to the correlations between interaction proteins, we use phylogenetic methods to distinguish similarities in tree topologies from similarities in genetic distances. We use a range of datasets of interacting and non-interacting proteins from Saccharomyces cerevisiae. We find that the signal of correlated evolution between interacting proteins is predominantly a result of shared evolutionary rates, rather than similarities in tree topology, independent of evolutionary divergence.
Since interacting proteins do not have tree topologies that are more similar than the control group of non-interacting proteins, it is likely that coevolution does not contribute much to, if any, of the observed correlations.
Co-evolution; Correlated evolution; Protein evolution; Phylogenetic; Protein-protein complexes; Protein-protein interactions
Interaction of non-structural protein 5A (NS5A) of Hepatitis C virus (HCV) with human kinases namely, casein kinase 1α (ck1α) and protein kinase R (PKR) have different functional implications such as regulation of viral replication and evasion of interferon induced immune response respectively. Understanding the structural and molecular basis of interactions of the viral protein with two different human kinases can be useful in developing strategies for treatment against HCV.
Serine 232 of NS5A is known to be phosphorylated by human ck1α. A structural model of NS5A peptide containing phosphoacceptor residue Serine 232 bound to ck1α has been generated using the known 3-D structures of kinase-peptide complexes. The substrate interacting residues in ck1α has been identified from the model and these are found to be conserved well in the ck1 family. ck1α – substrate peptide complex has also been used to understand the structural basis of association between ck1α and its other viral stress induced substrate, tumour suppressor p53 transactivation domain which has a crystal structure available.
Interaction of NS5A with another human kinase PKR is primarily genotype specific. NS5A from genotype 1b has been shown to interact and inhibit PKR whereas NS5A from genotype 2a/3a are unable to bind and inhibit PKR efficiently. This is one of the main reasons for the varied response to interferon therapy in HCV patients across different genotypes. Using PKR crystal structure, sequence alignment and evolutionary trace analysis some of the critical residues responsible for the interaction of NS5A 1b with PKR have been identified.
The substrate interacting residues in ck1α have been identified using the structural model of kinase - substrate peptide. The PKR interacting NS5A 1b residues have also been predicted using PKR crystal structure, NS5A sequence analysis along with known experimental results. Functional significance and nature of interaction of interferon sensitivity determining region and variable region 3 of NS5A in different genotypes with PKR which was experimentally shown are also supported by the findings of evolutionary trace analysis. Designing inhibitors to prevent this interaction could enable the HCV genotype 1 infected patients respond well to interferon therapy.
Casein kinase 1α; Hepatitis C virus; Interferon therapy; Kinase-substrate complex; Non-structural protein 5A; Protein kinase R
The constant increase in the number of solved protein structures is of great help in understanding the basic principles behind protein folding and evolution. 3-D structural knowledge is valuable in designing and developing methods for comparison, modelling and prediction of protein structures. These approaches for structure analysis can be directly implicated in studying protein function and for drug design. The backbone of a protein structure favours certain local conformations which include α-helices, β-strands and turns. Libraries of limited number of local conformations (Structural Alphabets) were developed in the past to obtain a useful categorization of backbone conformation. Protein Block (PB) is one such Structural Alphabet that gave a reasonable structure approximation of 0.42 Å. In this study, we use PB description of local structures to analyse conformations that are preferred sites for structural variations and insertions, among group of related folds. This knowledge can be utilized in improving tools for structure comparison that work by analysing local structure similarities. Conformational differences between homologous proteins are known to occur often in the regions comprising turns and loops. Interestingly, these differences are found to have specific preferences depending upon the structural classes of proteins. Such class-specific preferences are mainly seen in the all-β class with changes involving short helical conformations and hairpin turns. A test carried out on a benchmark dataset also indicates that the use of knowledge on the class specific variations can improve the performance of a PB based structure comparison approach. The preference for the indel sites also seem to be confined to a few backbone conformations involving β-turns and helix C-caps. These are mainly associated with short loops joining the regular secondary structures that mediate a reversal in the chain direction. Rare β-turns of type I’ and II’ are also identified as preferred sites for insertions.
Most homodimeric proteins have symmetric structure. Although symmetry is known to confer structural and functional advantage, asymmetric organization is also observed. Using a non-redundant dataset of 223 high-resolution crystal structures of biologically relevant homodimers, we address questions on the prevalence and significance of asymmetry. We used two measures to quantify global and interface asymmetry, and assess the correlation of several molecular and structural parameters with asymmetry. We have identified rare cases (11/223) of biologically relevant homodimers with pronounced global asymmetry. Asymmetry serves as a means to bring about 2∶1 binding between the homodimer and another molecule; it also enables cellular signalling arising from asymmetric macromolecular ligands such as DNA. Analysis of these cases reveals two possible mechanisms by which possible infinite array formation is prevented. In case of homodimers associating via non-topologically equivalent surfaces in their tertiary structures, ligand-dependent mechanisms are used. For stable dimers binding via large surfaces, ligand-dependent structural change regulates polymerisation/depolymerisation; for unstable dimers binding via smaller surfaces that are not evolutionarily well conserved, dimerisation occurs only in the presence of the ligand. In case of homodimers associating via interaction surfaces with parts of the surfaces topologically equivalent in the tertiary structures, steric hindrance serves as the preventive mechanism of infinite array. We also find that homodimers exhibiting grossly symmetric organization rarely exhibit either perfect local symmetry or high local asymmetry. Binding of small ligands at the interface does not cause any significant variation in interface asymmetry. However, identification of biologically relevant interface asymmetry in grossly symmetric homodimers is confounded by the presence of similar small magnitude changes caused due to artefacts of crystallisation. Our study provides new insights regarding accommodation of asymmetry in homodimers.
Most signalling and regulatory proteins participate in transient protein-protein interactions during biological processes. They usually serve as key regulators of various cellular processes and are often stable in both protein-bound and unbound forms. Availability of high-resolution structures of their unbound and bound forms provides an opportunity to understand the molecular mechanisms involved. In this work, we have addressed the question “What is the nature, extent, location and functional significance of structural changes which are associated with formation of protein-protein complexes?”
A database of 76 non-redundant sets of high resolution 3-D structures of protein-protein complexes, representing diverse functions, and corresponding unbound forms, has been used in this analysis. Structural changes associated with protein-protein complexation have been investigated using structural measures and Protein Blocks description. Our study highlights that significant structural rearrangement occurs on binding at the interface as well as at regions away from the interface to form a highly specific, stable and functional complex. Notably, predominantly unaltered interfaces interact mainly with interfaces undergoing substantial structural alterations, revealing the presence of at least one structural regulatory component in every complex.
Interestingly, about one-half of the number of complexes, comprising largely of signalling proteins, show substantial localized structural change at surfaces away from the interface. Normal mode analysis and available information on functions on some of these complexes suggests that many of these changes are allosteric. This change is largely manifest in the proteins whose interfaces are altered upon binding, implicating structural change as the possible trigger of allosteric effect. Although large-scale studies of allostery induced by small-molecule effectors are available in literature, this is, to our knowledge, the first study indicating the prevalence of allostery induced by protein effectors.
The enrichment of allosteric sites in signalling proteins, whose mutations commonly lead to diseases such as cancer, provides support for the usage of allosteric modulators in combating these diseases.
Transient protein-protein interactions play crucial roles in all facets of cellular physiology. Here, using an analysis on known 3-D structures of transient protein-protein complexes, their corresponding uncomplexed forms and energy calculations we seek to understand the roles of protein-protein interfacial residues in the unbound forms. We show that there are conformationally near invariant and evolutionarily conserved interfacial residues which are rigid and they account for ∼65% of the core interface. Interestingly, some of these residues contribute significantly to the stabilization of the interface structure in the uncomplexed form. Such residues have strong energetic basis to perform dual roles of stabilizing the structure of the uncomplexed form as well as the complex once formed while they maintain their rigid nature throughout. This feature is evolutionarily well conserved at both the structural and sequence levels. We believe this analysis has general bearing in the prediction of interfaces and understanding molecular recognition.
In eukaryotic organisms clathrin-coated vesicles are instrumental in the processes of endocytosis as well as intracellular protein trafficking. Hence, it is important to understand how these vesicles have evolved across eukaryotes, to carry cargo molecules of varied shapes and sizes. The intricate nature and functional diversity of the vesicles are maintained by numerous interacting protein partners of the vesicle system. However, to delineate functionally important residues participating in protein-protein interactions of the assembly is a daunting task as there are no high-resolution structures of the intact assembly available. The two cryoEM structures closely representing intact assembly were determined at very low resolution and provide positions of Cα atoms alone. In the present study, using the method developed by us earlier, we predict the protein-protein interface residues in clathrin assembly, taking guidance from the available low-resolution structures. The conservation status of these interfaces when investigated across eukaryotes, revealed a radial distribution of evolutionary constraints, i.e., if the members of the clathrin vesicular assembly can be imagined to be arranged in spherical manner, the cargo being at the center and clathrins being at the periphery, the detailed phylogenetic analysis of these members of the assembly indicated high-residue variation in the members of the assembly closer to the cargo while high conservation was noted in clathrins and in other proteins at the periphery of the vesicle. This points to the strategy adopted by the nature to package diverse proteins but transport them through a highly conserved mechanism.
The cell cycle phase at starvation influences post-starvation differentiation and morphogenesis in Dictyostelium discoideum. We found that when expressed in Saccharomyces cerevisiae, a D. discoideum cDNA that encodes the ribosomal protein S4 (DdS4) rescues mutations in the cell cycle genes cdc24, cdc42 and bem1. The products of these genes affect morphogenesis in yeast via a coordinated moulding of the cytoskeleton during bud site selection. D. discoideum cells that over- or under-expressed DdS4 did not show detectable changes in protein synthesis but displayed similar developmental aberrations whose intensity was graded with the extent of over- or under-expression. This suggested that DdS4 might influence morphogenesis via a stoichiometric effect – specifically, by taking part in a multimeric complex similar to the one involving Cdc24p, Cdc42p and Bem1p in yeast. In support of the hypothesis, the S. cerevisiae proteins Cdc24p, Cdc42p and Bem1p as well as their D. discoideum cognates could be co-precipitated with antibodies to DdS4. Computational analysis and mutational studies explained these findings: a C-terminal domain of DdS4 is the functional equivalent of an SH3 domain in the yeast scaffold protein Bem1p that is central to constructing the bud site selection complex. Thus in addition to being part of the ribosome, DdS4 has a second function, also as part of a multi-protein complex. We speculate that the existence of the second role can act as a safeguard against perturbations to ribosome function caused by spontaneous variations in DdS4 levels.
Evolutionarily divergent proteins have been shown to change their interacting partners. RNA polymerase assembly is one of the
rare cases which retain its component proteins in the course of evolution. This ubiquitous molecular assembly, involved in
transcription, consists of four core subunits (alpha, beta, betaprime, and omega), which assemble to form the core enzyme.
Remarkably, the orientation of the four subunits in the complex is conserved from prokaryotes to eukaryotes although their
sequence similarity is low. We have studied how the sequence divergence of the core subunits of RNA polymerase is
accommodated in the formation of the multi-molecular assembly, with special reference to eubacterial species. Analysis of domain
composition and order of the core subunits in >85 eubacterial species indicates complete conservation. However, sequence analysis
indicates that interface residues of alpha and omega subunits are more divergent than those of beta, betaprime, and sigma70
subunits. Although beta and betaprime are generally well-conserved, residues involved in interaction with divergent subunits are
not conserved. Insertions/deletions are also observed near interacting regions even in case of the most conserved subunits, beta
and betaprime. Homology modelling of three divergent RNA polymerase complexes, from Helicobacter pylori, Mycoplasma pulmonis
and Onion yellows phytoplasma, indicates that insertions/deletions can be accommodated near the interface as they generally occur
at the periphery. Evaluation of the modeled interfaces indicates that they are physico-chemically similar to that of the template
interfaces in Thermus thermophilus, indicating that nature has evolved to retain the obligate complex in spite of substantial
substitutions and insertions/deletions.
RNA polymerase; eubacteria; homology modeling; obligate interactions; protein-protein interactions; sequence conservation;
Multi-domain proteins have many advantages with respect to stability and folding inside cells. Here we attempt to understand the intricate relationship between the domain-domain interactions and the stability of domains in isolation. We provide quantitative treatment and proof for prevailing intuitive ideas on the strategies employed by nature to stabilize otherwise unstable domains. We find that domains incapable of independent stability are stabilized by favourable interactions with tethered domains in the multi-domain context. Stability of such folds to exist independently is optimized by evolution. Specific residue mutations in the sites equivalent to inter-domain interface enhance the overall solvation, thereby stabilizing these domain folds independently. A few naturally occurring variants at these sites alter communication between domains and affect stability leading to disease manifestation. Our analysis provides safe guidelines for mutagenesis which have attractive applications in obtaining stable fragments and domain constructs essential for structural studies by crystallography and NMR.
Sensitive remote homology detection and accurate alignments especially in the midnight zone of sequence similarity are needed for better function annotation and structural modeling of proteins. An algorithm, AlignHUSH for HMM-HMM alignment has been developed which is capable of recognizing distantly related domain families The method uses structural information, in the form of predicted secondary structure probabilities, and hydrophobicity of amino acids to align HMMs of two sets of aligned sequences. The effect of using adjoining column(s) information has also been investigated and is found to increase the sensitivity of HMM-HMM alignments and remote homology detection.
We have assessed the performance of AlignHUSH using known evolutionary relationships available in SCOP. AlignHUSH performs better than the best HMM-HMM alignment methods and is observed to be even more sensitive at higher error rates. Accuracy of the alignments obtained using AlignHUSH has been assessed using the structure-based alignments available in BaliBASE. The alignment length and the alignment quality are found to be appropriate for homology modeling and function annotation. The alignment accuracy is found to be comparable to existing methods for profile-profile alignments.
A new method to align HMMs has been developed and is shown to have better sensitivity at error rates of 10% and above when compared to other available programs. The proposed method could effectively aid obtaining clues to functions of proteins of yet unknown function.
A web-server incorporating the AlignHUSH method is available at http://crick.mbu.iisc.ernet.in/~alignhush/
Protein structures are classically described in terms of secondary structures. Even if the regular secondary structures have relevant physical meaning, their recognition from atomic coordinates has some important limitations such as uncertainties in the assignment of boundaries of helical and β-strand regions. Further, on an average about 50% of all residues are assigned to an irregular state, i.e., the coil. Thus different research teams have focused on abstracting conformation of protein backbone in the localized short stretches. Using different geometric measures, local stretches in protein structures are clustered in a chosen number of states. A prototype representative of the local structures in each cluster is generally defined. These libraries of local structures prototypes are named as “structural alphabets”. We have developed a structural alphabet, named Protein Blocks, not only to approximate the protein structure, but also to predict them from sequence. Since its development, we and other teams have explored numerous new research fields using this structural alphabet. We review here some of the most interesting applications.
protein structures; biochemistry; amino acids; secondary structures; propensities; structural alphabet; structure prediction; structural superimposition; mutation; binding site; Bayes theorem; Support Vector Machines.
With the immense growth in the number of available protein structures, fast and accurate structure comparison has been essential. We propose an efficient method for structure comparison, based on a structural alphabet. Protein Blocks (PBs) is a widely used structural alphabet with 16 pentapeptide conformations that can fairly approximate a complete protein chain. Thus a 3D structure can be translated into a 1D sequence of PBs. With a simple Needleman–Wunsch approach and a raw PB substitution matrix, PB-based structural alignments were better than many popular methods. iPBA web server presents an improved alignment approach using (i) specialized PB Substitution Matrices (SM) and (ii) anchor-based alignment methodology. With these developments, the quality of ∼88% of alignments was improved. iPBA alignments were also better than DALI, MUSTANG and GANGSTA+ in >80% of the cases. The webserver is designed to for both pairwise comparisons and database searches. Outputs are given as sequence alignment and superposed 3D structures displayed using PyMol and Jmol. A local alignment option for detecting subs-structural similarity is also embedded. As a fast and efficient ‘sequence-based’ structure comparison tool, we believe that it will be quite useful to the scientific community. iPBA can be accessed at http://www.dsimb.inserm.fr/dsimb_tools/ipba/.
Structure comparison tools can be used to align related protein structures to identify structurally conserved and variable regions and to infer functional and evolutionary relationships. While the conserved regions often superimpose well, the variable regions appear non superimposable. Differences in homologous protein structures are thought to be due to evolutionary plasticity to accommodate diverged sequences during evolution. One of the kinds of differences between 3-D structures of homologous proteins is rigid body displacement. A glaring example is not well superimposed equivalent regions of homologous proteins corresponding to α-helical conformation with different spatial orientations. In a rigid body superimposition, these regions would appear variable although they may contain local similarity. Also, due to high spatial deviation in the variable region, one-to-one correspondence at the residue level cannot be determined accurately. Another kind of difference is conformational variability and the most common example is topologically equivalent loops of two homologues but with different conformations. In the current study, we present a refined view of the “structurally variable” regions which may contain local similarity obscured in global alignment of homologous protein structures. As structural alphabet is able to describe local structures of proteins precisely through Protein Blocks approach, conformational similarity has been identified in a substantial number of ‘variable’ regions in a large data set of protein structural alignments; optimal residue-residue equivalences could be achieved on the basis of Protein Blocks which led to improved local alignments. Also, through an example, we have demonstrated how the additional information on local backbone structures through protein blocks can aid in comparative modeling of a loop region. In addition, understanding on sequence-structure relationships can be enhanced through our approach. This has been illustrated through examples where the equivalent regions in homologous protein structures share sequence similarity to varied extent but do not preserve local structure.