In the post-genomic era where sequences are being determined at a rapid rate, we are highly reliant on computational methods for their tentative biochemical characterization. The Pfam database currently contains 3,786 families corresponding to “Domains of Unknown Function” (DUF) or “Uncharacterized Protein Family” (UPF), of which 3,087 families have no reported three-dimensional structure, constituting almost one-fourth of the known protein families in search for both structure and function.
We applied a ‘computational structural genomics’ approach using five state-of-the-art remote similarity detection methods to detect the relationship between uncharacterized DUFs and domain families of known structures. The association with a structural domain family could serve as a start point in elucidating the function of a DUF. Amongst these five methods, searches in SCOP-NrichD database have been applied for the first time. Predictions were classified into high, medium and low- confidence based on the consensus of results from various approaches and also annotated with enzyme and Gene ontology terms. 614 uncharacterized DUFs could be associated with a known structural domain, of which high confidence predictions, involving at least four methods, were made for 54 families. These structure-function relationships for the 614 DUF families can be accessed on-line at http://proline.biochem.iisc.ernet.in/RHD_DUFS/. For potential enzymes in this set, we assessed their compatibility with the associated fold and performed detailed structural and functional annotation by examining alignments and extent of conservation of functional residues. Detailed discussion is provided for interesting assignments for DUF3050, DUF1636, DUF1572, DUF2092 and DUF659.
This study provides insights into the structure and potential function for nearly 20 % of the DUFs. Use of different computational approaches enables us to reliably recognize distant relationships, especially when they converge to a common assignment because the methods are often complementary. We observe that while pointers to the structural domain can offer the right clues to the function of a protein, recognition of its precise functional role is still ‘non-trivial’ with many DUF domains conserving only some of the critical residues. It is not clear whether these are functional vestiges or instances involving alternate substrates and interacting partners.
This article was reviewed by Drs Eugene Koonin, Frank Eisenhaber and Srikrishna Subramanian.
Electronic supplementary material
The online version of this article (doi:10.1186/s13062-015-0069-2) contains supplementary material, which is available to authorized users.
Domain of Unknown Function (DUF); Fold assignment; Function annotation; Remote similarity; Homology detection and protein evolution
During 11–12 August 2014, a Protein Bioinformatics and Community Resources Retreat was held at the Wellcome Trust Genome Campus in Hinxton, UK. This meeting brought together the principal investigators of several specialized protein resources (such as CAZy, TCDB and MEROPS) as well as those from protein databases from the large Bioinformatics centres (including UniProt and RefSeq). The retreat was divided into five sessions: (1) key challenges, (2) the databases represented, (3) best practices for maintenance and curation, (4) information flow to and from large data centers and (5) communication and funding. An important outcome of this meeting was the creation of a Specialist Protein Resource Network that we believe will improve coordination of the activities of its member resources. We invite further protein database resources to join the network and continue the dialogue.
In this study, we combine available high resolution structural information on eukaryotic ribosomes with low resolution cryo-EM data on the Hepatitis C Viral RNA (IRES) human ribosome complex. Aided further by the prediction of RNA-protein interactions and restrained docking studies, we gain insights on their interaction at the residue level. We identified the components involved at the major and minor contact regions, and propose that there are energetically favorable local interactions between 40S ribosomal proteins and IRES domains. Domain II of the IRES interacts with ribosomal proteins S5 and S25 while the pseudoknot and the downstream domain IV region bind to ribosomal proteins S26, S28 and S5. We also provide support using UV cross-linking studies to validate our proposition of interaction between the S5 and IRES domains II and IV. We found that domain IIIe makes contact with the ribosomal protein S3a (S1e). Our model also suggests that the ribosomal protein S27 interacts with domain IIIc while S7 has a weak contact with a single base RNA bulge between junction IIIabc and IIId. The interacting residues are highly conserved among mammalian homologs while IRES RNA bases involved in contact do not show strict conservation. IRES RNA binding sites for S25 and S3a show the best conservation among related viral IRESs. The new contacts identified between ribosomal proteins and RNA are consistent with previous independent studies on RNA-binding properties of ribosomal proteins reported in literature, though information at the residue level is not available in previous studies.
RNA-Protein interactions; Cryo electron microscopy; Hepatitis C; Protein modeling
We have developed an integrated database for Mycobacterium tuberculosis H37Rv (Mtb) that collates information on protein sequences, domain assignments, functional annotation and 3D structural information along with protein–protein and protein–small molecule interactions. SInCRe (Structural Interactome Computational Resource) is developed out of CamBan (Cambridge and Bangalore) collaboration. The motivation for development of this database is to provide an integrated platform to allow easily access and interpretation of data and results obtained by all the groups in CamBan in the field of Mtb informatics. In-house algorithms and databases developed independently by various academic groups in CamBan are used to generate Mtb-specific datasets and are integrated in this database to provide a structural dimension to studies on tuberculosis. The SInCRe database readily provides information on identification of functional domains, genome-scale modelling of structures of Mtb proteins and characterization of the small-molecule binding sites within Mtb. The resource also provides structure-based function annotation, information on small-molecule binders including FDA (Food and Drug Administration)-approved drugs, protein–protein interactions (PPIs) and natural compounds that bind to pathogen proteins potentially and result in weakening or elimination of host–pathogen protein–protein interactions. Together they provide prerequisites for identification of off-target binding.
As the volume of data relating to proteins increases, researchers rely more and more on the analysis of published data, thus increasing the importance of good access to these data that varies from the supplemental material of individual papers, all the way to major reference databases with professional staff and long-term funding. Specialist protein resources fill an important middle ground, providing interactive web interfaces to their databases for a focused topic or family of proteins, using specialised approaches that are not feasible in the major reference databases. Many are labours of love, run by a single lab with little or no dedicated funding and there are many challenges to building and maintaining them. This perspective arose from a meeting of several specialist protein resources and major reference databases held at the Wellcome Trust Genome Campus (Cambridge, UK) on the 11th and 12th of August 2014. During this meeting some common key challenges involved in creating and maintaining such resources were discussed, along with various approaches to address them. In laying out these challenges, we aim to inform users about how these issues impact our resources and illustrate ways in which our working together could enhance their accuracy, currency, and overall value.
Specialist Protein Resource; Key Challenges; Biocuration; Longevity; Big data; Mis-annotation
Translation initiation in Hepatitis C Virus (HCV) is mediated by Internal Ribosome Entry Site (IRES), which is independent of cap-structure and uses a limited number of canonical initiation factors. During translation initiation IRES–40S complex formation depends on high affinity interaction of IRES with ribosomal proteins. Earlier, it has been shown that ribosomal protein S5 (RPS5) interacts with HCV IRES. Here, we have extensively characterized the HCV IRES–RPS5 interaction and demonstrated its role in IRES function. Computational modelling and RNA–protein interaction studies demonstrated that the beta hairpin structure within RPS5 is critically required for the binding with domains II and IV. Mutations disrupting IRES–RPS5 interaction drastically reduced the 80S complex formation and the corresponding IRES activity. Computational analysis and UV cross-linking experiments using various IRES-mutants revealed interplay between domains II and IV mediated by RPS5. In addition, present study demonstrated that RPS5 interaction is unique to HCV IRES and is not involved in 40S–3′ UTR interaction. Further, partial silencing of RPS5 resulted in preferential inhibition of HCV RNA translation. However, global translation was marginally affected by partial silencing of RPS5. Taken together, results provide novel molecular insights into IRES–RPS5 interaction and unravel its functional significance in mediating internal initiation of translation.
Protein structures are valuable tools to understand protein function. Nonetheless, proteins are often considered as rigid macromolecules while their structures exhibit specific flexibility, which is essential to complete their functions. Analyses of protein structures and dynamics are often performed with a simplified three-state description, i.e., the classical secondary structures. More precise and complete description of protein backbone conformation can be obtained using libraries of small protein fragments that are able to approximate every part of protein structures. These libraries, called structural alphabets (SAs), have been widely used in structure analysis field, from definition of ligand binding sites to superimposition of protein structures. SAs are also well suited to analyze the dynamics of protein structures. Here, we review innovative approaches that investigate protein flexibility based on SAs description. Coupled to various sources of experimental data (e.g., B-factor) and computational methodology (e.g., Molecular Dynamic simulation), SAs turn out to be powerful tools to analyze protein dynamics, e.g., to examine allosteric mechanisms in large set of structures in complexes, to identify order/disorder transition. SAs were also shown to be quite efficient to predict protein flexibility from amino-acid sequence. Finally, in this review, we exemplify the interest of SAs for studying flexibility with different cases of proteins implicated in pathologies and diseases.
protein structures; disorder; secondary structure; structural alphabet; protein folding; allostery; protein complexes; protein—DNA interactions
The function of a protein can be deciphered with higher accuracy from its structure than from its amino acid sequence. Due to the huge gap in the available protein sequence and structural space, tools that can generate functionally homogeneous clusters using only the sequence information, hold great importance. For this, traditional alignment-based tools work well in most cases and clustering is performed on the basis of sequence similarity. But, in the case of multi-domain proteins, the alignment quality might be poor due to varied lengths of the proteins, domain shuffling or circular permutations. Multi-domain proteins are ubiquitous in nature, hence alignment-free tools, which overcome the shortcomings of alignment-based protein comparison methods, are required. Further, existing tools classify proteins using only domain-level information and hence miss out on the information encoded in the tethered regions or accessory domains. Our method, on the other hand, takes into account the full-length sequence of a protein, consolidating the complete sequence information to understand a given protein better.
Our web-server, CLAP (Classification of Proteins), is one such alignment-free software for automatic classification of protein sequences. It utilizes a pattern-matching algorithm that assigns local matching scores (LMS) to residues that are a part of the matched patterns between two sequences being compared. CLAP works on full-length sequences and does not require prior domain definitions.
Pilot studies undertaken previously on protein kinases and immunoglobulins have shown that CLAP yields clusters, which have high functional and domain architectural similarity. Moreover, parsing at a statistically determined cut-off resulted in clusters that corroborated with the sub-family level classification of that particular domain family.
CLAP is a useful protein-clustering tool, independent of domain assignment, domain order, sequence length and domain diversity. Our method can be used for any set of protein sequences, yielding functionally relevant clusters with high domain architectural homogeneity. The CLAP web server is freely available for academic use at http://nslab.mbu.iisc.ernet.in/clap/.
Alignment-free comparison; Domain architectures; Multi-domain proteins; Protein classification
NrichD (http://proline.biochem.iisc.ernet.in/NRICHD/) is a database of computationally designed protein-like sequences, augmented into natural sequence databases that can perform hops in protein sequence space to assist in the detection of remote relationships. Establishing protein relationships in the absence of structural evidence or natural ‘intermediately related sequences’ is a challenging task. Recently, we have demonstrated that the computational design of artificial intermediary sequences/linkers is an effective approach to fill naturally occurring voids in protein sequence space. Through a large-scale assessment we have demonstrated that such sequences can be plugged into commonly employed search databases to improve the performance of routinely used sequence search methods in detecting remote relationships. Since it is anticipated that such data sets will be employed to establish protein relationships, two databases that have already captured these relationships at the structural and functional domain level, namely, the SCOP database and the Pfam database, have been ‘enriched’ with these artificial intermediary sequences. NrichD database currently contains 3 611 010 artificial sequences that have been generated between 27 882 pairs of families from 374 SCOP folds. The data sets are freely available for download. Additional features include the design of artificial sequences between any two protein families of interest to the user.
The highly modular nature of protein kinases generates diverse functional roles mediated by evolutionary events such as domain recombination, insertion and deletion of domains. Usually domain architecture of a kinase is related to the subfamily to which the kinase catalytic domain belongs. However outlier kinases with unusual domain architectures serve in the expansion of the functional space of the protein kinase family. For example, Src kinases are made-up of SH2 and SH3 domains in addition to the kinase catalytic domain. A kinase which lacks these two domains but retains sequence characteristics within the kinase catalytic domain is an outlier that is likely to have modes of regulation different from classical src kinases. This study defines two types of outlier kinases: hybrids and rogues depending on the nature of domain recombination. Hybrid kinases are those where the catalytic kinase domain belongs to a kinase subfamily but the domain architecture is typical of another kinase subfamily. Rogue kinases are those with kinase catalytic domain characteristic of a kinase subfamily but the domain architecture is typical of neither that subfamily nor any other kinase subfamily. This report provides a consolidated set of such hybrid and rogue kinases gleaned from six eukaryotic genomes–S.cerevisiae, D. melanogaster, C.elegans, M.musculus, T.rubripes and H.sapiens–and discusses their functions. The presence of such kinases necessitates a revisiting of the classification scheme of the protein kinase family using full length sequences apart from classical classification using solely the sequences of kinase catalytic domains. The study of these kinases provides a good insight in engineering signalling pathways for a desired output. Lastly, identification of hybrids and rogues in pathogenic protozoa such as P.falciparum sheds light on possible strategies in host-pathogen interactions.
Hepatitis C virus (HCV) is the causative agent of end-stage liver disease. Recent advances in the last decade in anti HCV treatment strategies have dramatically increased the viral clearance rate. However, several limitations are still associated, which warrant a great need of novel, safe and selective drugs against HCV infection. Towards this objective, we explored highly potent and selective small molecule inhibitors, the ellagitannins, from the crude extract of Pomegranate (Punica granatum) fruit peel. The pure compounds, punicalagin, punicalin, and ellagic acid isolated from the extract specifically blocked the HCV NS3/4A protease activity in vitro. Structural analysis using computational approach also showed that ligand molecules interact with the catalytic and substrate binding residues of NS3/4A protease, leading to inhibition of the enzyme activity. Further, punicalagin and punicalin significantly reduced the HCV replication in cell culture system. More importantly, these compounds are well tolerated ex vivo and‘no observed adverse effect level' (NOAEL) was established upto an acute dose of 5000 mg/kg in BALB/c mice. Additionally, pharmacokinetics study showed that the compounds are bioavailable. Taken together, our study provides a proof-of-concept approach for the potential use of antiviral and non-toxic principle ellagitannins from pomegranate in prevention and control of HCV induced complications.
We hypothesized that the AAV2 vector is targeted for destruction in the cytoplasm by the host cellular kinase/ubiquitination/proteasomal machinery and that modification of their targets on AAV2 capsid may improve its transduction efficiency. In vitro analysis with pharmacological inhibitors of cellular serine/threonine kinases (protein kinase A, protein kinase C, casein kinase II) showed an increase (20–90%) on AAV2-mediated gene expression. The three-dimensional structure of AAV2 capsid was then analyzed to predict the sites of ubiquitination and phosphorylation. Three phosphodegrons, which are the phosphorylation sites recognized as degradation signals by ubiquitin ligases, were identified. Mutation targets comprising eight serine (S) or seven threonine (T) or nine lysine (K) residues were selected in and around phosphodegrons on the basis of their solvent accessibility, overlap with the receptor binding regions, overlap with interaction interfaces of capsid proteins, and their evolutionary conservation across AAV serotypes. AAV2-EGFP vectors with the wild-type (WT) capsid or mutant capsids (15 S/T→alanine [A] or 9 K→arginine [R] single mutant or 2 double K→R mutants) were then evaluated in vitro. The transduction efficiencies of 11 S/T→A and 7 K→R vectors were significantly higher (∼63–90%) than the AAV2-WT vectors (∼30–40%). Further, hepatic gene transfer of these mutant vectors in vivo resulted in higher vector copy numbers (up to 4.9-fold) and transgene expression (up to 14-fold) than observed from the AAV2-WT vector. One of the mutant vectors, S489A, generated ∼8-fold fewer antibodies that could be cross-neutralized by AAV2-WT. This study thus demonstrates the feasibility of the use of these novel AAV2 capsid mutant vectors in hepatic gene therapy.
Gabriel and colleagues examine the in vitro and in vivo efficacy of novel AAV2 vectors, which are modified at critical serine/threonine/lysine residues of the vector capsid. In vitro, they find that the transduction efficiencies of 11 S/T → A and 7 K → R vectors are significantly higher than the AAV2-wild type (WT) vectors. In vivo, they find that hepatic gene transfer of these mutant vectors results in higher vector copy numbers (up to 4.9-fold) and transgene expression (up to 14-fold) than observed from the AAV2-WT vector.
Recombinant adeno-associated virus vectors based on serotype 8 (AAV8) have shown significant promise for liver-directed gene therapy. However, to overcome the vector dose dependent immunotoxicity seen with AAV8 vectors, it is important to develop better AAV8 vectors that provide enhanced gene expression at significantly low vector doses. Since it is known that AAV vectors during intracellular trafficking are targeted for destruction in the cytoplasm by the host–cellular kinase/ubiquitination/proteasomal machinery, we modified specific serine/threonine kinase or ubiquitination targets on the AAV8 capsid to augment its transduction efficiency. Point mutations at specific serine (S)/threonine (T)/lysine (K) residues were introduced in the AAV8 capsid at the positions equivalent to that of the effective AAV2 mutants, generated successfully earlier. Extensive structure analysis was carried out subsequently to evaluate the structural equivalence between the two serotypes. scAAV8 vectors with the wild-type (WT) and each one of the S/T→Alanine (A) or K-Arginine (R) mutant capsids were evaluated for their liver transduction efficiency in C57BL/6 mice in vivo. Two of the AAV8-S→A mutants (S279A and S671A), and a K137R mutant vector, demonstrated significantly higher enhanced green fluorescent protein (EGFP) transcript levels (∼9- to 46-fold) in the liver compared to animals that received WT-AAV8 vectors alone. The best performing AAV8 mutant (K137R) vector also had significantly reduced ubiquitination of the viral capsid, reduced activation of markers of innate immune response, and a concomitant two-fold reduction in the levels of neutralizing antibody formation in comparison to WT-AAV8 vectors. Vector biodistribution studies revealed that the K137R mutant had a significantly higher and preferential transduction of the liver (106 vs. 7.7 vector copies/mouse diploid genome) when compared to WT-AAV8 vectors. To further study the utility of the K137R-AAV8 mutant in therapeutic gene transfer, we delivered human coagulation factor IX (h.FIX) under the control of liver-specific promoters (LP1 or hAAT) into C57BL/6 mice. The circulating levels of h.FIX:Ag were higher in all the K137R-AAV8 treated groups up to 8 weeks post-hepatic gene transfer. These studies demonstrate the feasibility of the use of this novel AAV8 vectors for potential gene therapy of hemophilia B.
Sen and colleagues generated AAV8 capsid point mutants by replacing specific serine/threonine kinase or ubiquitination target residues. Two of the mutants yielded significantly higher transgene expression over AAV8 when injected into mice, and the best performing vector also exhibited significantly reduced capsid ubiquitination, innate immune response activation, and neutralizing antibody formation.
The Msh4–Msh5 protein complex in eukaryotes is involved in stabilizing Holliday junctions and its progenitors to facilitate crossing over during Meiosis I. These functions of the Msh4–Msh5 complex are essential for proper chromosomal segregation during the first meiotic division. The Msh4/5 proteins are homologous to the bacterial mismatch repair protein MutS and other MutS homologs (Msh2, Msh3, Msh6). Saccharomyces cerevisiae msh4/5 point mutants were identified recently that show two fold reduction in crossing over, compared to wild-type without affecting chromosome segregation. Three distinct classes of msh4/5 point mutations could be sorted based on their meiotic phenotypes. These include msh4/5 mutations that have a) crossover and viability defects similar to msh4/5 null mutants; b) intermediate defects in crossing over and viability and c) defects only in crossing over. The absence of a crystal structure for the Msh4–Msh5 complex has hindered an understanding of the structural aspects of Msh4–Msh5 function as well as molecular explanation for the meiotic defects observed in msh4/5 mutations. To address this problem, we generated a structural model of the S. cerevisiae Msh4–Msh5 complex using homology modeling. Further, structural analysis tailored with evolutionary information is used to predict sites with potentially critical roles in Msh4–Msh5 complex formation, DNA binding and to explain asymmetry within the Msh4–Msh5 complex. We also provide a structural rationale for the meiotic defects observed in the msh4/5 point mutations. The mutations are likely to affect stability of the Msh4/5 proteins and/or interactions with DNA. The Msh4–Msh5 model will facilitate the design and interpretation of new mutational data as well as structural studies of this important complex involved in meiotic chromosome segregation.
We highlight an unrecognized physiological role for the Greek key motif, an evolutionarily conserved super-secondary structural topology of the βγ-crystallins. These proteins constitute the bulk of the human eye lens, packed at very high concentrations in a compact, globular, short-range order, generating transparency. Congenital cataract (affecting 400,000 newborns yearly worldwide), associated with 54 mutations in βγ-crystallins, occurs in two major phenotypes nuclear cataract, which blocks the central visual axis, hampering the development of the growing eye and demanding earliest intervention, and the milder peripheral progressive cataract where surgery can wait. In order to understand this phenotypic dichotomy at the molecular level, we have studied the structural and aggregation features of representative mutations.
Wild type and several representative mutant proteins were cloned, expressed and purified and their secondary and tertiary structural details, as well as structural stability, were compared in solution, using spectroscopy. Their tendencies to aggregate in vitro and in cellulo were also compared. In addition, we analyzed their structural differences by molecular modeling in silico.
Based on their properties, mutants are seen to fall into two classes. Mutants A36P, L45PL54P, R140X, and G165fs display lowered solubility and structural stability, expose several buried residues to the surface, aggregate in vitro and in cellulo, and disturb/distort the Greek key motif. And they are associated with nuclear cataract. In contrast, mutants P24T and R77S, associated with peripheral cataract, behave quite similar to the wild type molecule, and do not affect the Greek key topology.
When a mutation distorts even one of the four Greek key motifs, the protein readily self-aggregates and precipitates, consistent with the phenotype of nuclear cataract, while mutations not affecting the motif display ‘native state aggregation’, leading to peripheral cataract, thus offering a protein structural rationale for the cataract phenotypic dichotomy “distort motif, lose central vision”.
Protein structure alignment is a crucial step in protein structure–function analysis. Despite the advances in protein structure alignment algorithms, some of the local conformationally similar regions are mislabeled as structurally variable regions (SVRs). These regions are not well superimposed because of differences in their spatial orientations. The Database of Structural Alignments (DoSA) addresses this gap in identification of local structural similarities obscured in global protein structural alignments by realigning SVRs using an algorithm based on protein blocks. A set of protein blocks is a structural alphabet that abstracts protein structures into 16 unique local structural motifs. DoSA provides unique information about 159 780 conformationally similar and 56 140 conformationally dissimilar SVRs in 74 705 pairwise structural alignments of homologous proteins. The information provided on conformationally similar and dissimilar SVRs can be helpful to model loop regions. It is also conceivable that conformationally similar SVRs with conserved residues could potentially contribute toward functional integrity of homologues, and hence identifying such SVRs could be helpful in understanding the structural basis of protein function.
While phosphotyrosine modification is an established regulatory mechanism in eukaryotes, it is less well characterized in bacteria due to low prevalence. To gain insight into the extent and biological importance of tyrosine phosphorylation in Escherichia coli, we used immunoaffinity-based phosphotyrosine peptide enrichment combined with high resolution mass spectrometry analysis to comprehensively identify tyrosine phosphorylated proteins and accurately map phosphotyrosine sites. We identified a total of 512 unique phosphotyrosine sites on 342 proteins in E. coli K12 and the human pathogen enterohemorrhagic E. coli (EHEC) O157:H7, representing the largest phosphotyrosine proteome reported to date in bacteria. This large number of tyrosine phosphorylation sites allowed us to define five phosphotyrosine site motifs. Tyrosine phosphorylated proteins belong to various functional classes such as metabolism, gene expression and virulence. We demonstrate for the first time that proteins of a type III secretion system (T3SS), required for the attaching and effacing (A/E) lesion phenotype characteristic for intestinal colonization by certain EHEC strains, are tyrosine phosphorylated by bacterial kinases. Yet, A/E lesion and metabolic phenotypes were unaffected by the mutation of the two currently known tyrosine kinases, Etk and Wzc. Substantial residual tyrosine phosphorylation present in an etk wzc double mutant strongly indicated the presence of hitherto unknown tyrosine kinases in E. coli. We assess the functional importance of tyrosine phosphorylation and demonstrate that the phosphorylated tyrosine residue of the regulator SspA positively affects expression and secretion of T3SS proteins and formation of A/E lesions. Altogether, our study reveals that tyrosine phosphorylation in bacteria is more prevalent than previously recognized, and suggests the involvement of phosphotyrosine-mediated signaling in a broad range of cellular functions and virulence.
While phosphotyrosine modification is established in eukaryote cell signaling, it is less characterized in bacteria. Despite that deletion of bacterial tyrosine kinases is known to affect various cellular functions and virulence of bacterial pathogens, few phosphotyrosine proteins are currently known. To gain insight into the extent and biological function of tyrosine phosphorylation in E. coli, we carried out an in-depth phosphotyrosine protein profiling using a mass spectrometry-based proteomics approach. Our study on E. coli K12 and the human pathogen enterohemorrhagic E. coli (EHEC) O157:H7, which is a common cause of food-borne outbreaks of diarrhea, hemorrhagic colitis and hemolytic uremic syndrome, reveal that tyrosine phosphorylation is far more prevalent than previously recognized. Target proteins are involved in a broad range of cellular functions and virulence. Proteins of the type III secretion system (T3SS), required for the attaching and effacing lesion phenotype characteristic for intestinal colonization by EHEC, are tyrosine phosphorylated. The expression of these T3SS proteins and A/E lesion formation is affected by a tyrosine phosphorylated residue on the regulator SspA. Also, our data indicates the presence of hitherto unknown E. coli tyrosine kinases. Overall, tyrosine phosphorylation seems to be involved in controlling cellular core processes and virulence of bacteria.
The presence of energetically less favourable cis peptides in protein structures has been observed to be strongly associated with its structural integrity and function. Inter-conversion between the cis and trans conformations also has an important role in the folding process. In this study, we analyse the extent of conservation of cis peptides among similar folds. We look at both the amino acid preferences and local structural changes associated with such variations.
Nearly 34% of the Xaa-Proline cis bonds are not conserved in structural relatives; Proline also has a high tendency to get replaced by another amino acid in the trans conformer. At both positions bounding the peptide bond, Glycine has a higher tendency to lose the cis conformation. The cis conformation of more than 30% of β turns of type VIb and IV are not found to be conserved in similar structures. A different view using Protein Block based description of backbone conformation, suggests that many of the local conformational changes are highly different from the general local structural variations observed among structurally similar proteins.
Changes between cis and trans conformations are found to be associated with the evolution of new functions facilitated by local structural changes. This is most frequent in enzymes where new calalytic activity emerges with local changes in the active site. Cis-trans changes are also seen to facilitate inter-domain and inter-protein interactions. As in the case of folding, cis-trans conversions have been used as an important driving factor in evolution.
folds; cis peptides; omega dihedral; cis-trans isomerization; structural alignment; structural alphabet; Protein Blocks; Protein Data Bank
Development of sensitive sequence search procedures for the detection of distant relationships between proteins at superfamily/fold level is still a big challenge. The intermediate sequence search approach is the most frequently employed manner of identifying remote homologues effectively. In this study, examination of serine proteases of prolyl oligopeptidase, rhomboid and subtilisin protein families were carried out using plant serine proteases as queries from two genomes including A. thaliana and O. sativa and 13 other families of unrelated folds to identify the distant homologues which could not be obtained using PSI-BLAST.
We have proposed to start with multiple queries of classical serine protease members to identify remote homologues in families, using a rigorous approach like Cascade PSI-BLAST. We found that classical sequence based approaches, like PSI-BLAST, showed very low sequence coverage in identifying plant serine proteases. The algorithm was applied on enriched sequence database of homologous domains and we obtained overall average coverage of 88% at family, 77% at superfamily or fold level along with specificity of ∼100% and Mathew’s correlation coefficient of 0.91. Similar approach was also implemented on 13 other protein families representing every structural class in SCOP database. Further investigation with statistical tests, like jackknifing, helped us to better understand the influence of neighbouring protein families.
Our study suggests that employment of multiple queries of a family for the Cascade PSI-BLAST searches is useful for predicting distant relationships effectively even at superfamily level. We have proposed a generalized strategy to cover all the distant members of a particular family using multiple query sequences. Our findings reveal that prior selection of sequences as query and the presence of neighbouring families can be important for covering the search space effectively in minimal computational time. This study also provides an understanding of the ‘bridging’ role of related families.
The correlation of genetic distances between pairs of protein sequence alignments has been used to infer protein-protein interactions. It has been suggested that these correlations are based on the signal of co-evolution between interacting proteins. However, although mutations in different proteins associated with maintaining an interaction clearly occur (particularly in binding interfaces and neighbourhoods), many other factors contribute to correlated rates of sequence evolution. Proteins in the same genome are usually linked by shared evolutionary history and so it would be expected that there would be topological similarities in their phylogenetic trees, whether they are interacting or not. For this reason the underlying species tree is often corrected for. Moreover processes such as expression level, are known to effect evolutionary rates. However, it has been argued that the correlated rates of evolution used to predict protein interaction explicitly includes shared evolutionary history; here we test this hypothesis.
In order to identify the evolutionary mechanisms giving rise to the correlations between interaction proteins, we use phylogenetic methods to distinguish similarities in tree topologies from similarities in genetic distances. We use a range of datasets of interacting and non-interacting proteins from Saccharomyces cerevisiae. We find that the signal of correlated evolution between interacting proteins is predominantly a result of shared evolutionary rates, rather than similarities in tree topology, independent of evolutionary divergence.
Since interacting proteins do not have tree topologies that are more similar than the control group of non-interacting proteins, it is likely that coevolution does not contribute much to, if any, of the observed correlations.
Co-evolution; Correlated evolution; Protein evolution; Phylogenetic; Protein-protein complexes; Protein-protein interactions
Interaction of non-structural protein 5A (NS5A) of Hepatitis C virus (HCV) with human kinases namely, casein kinase 1α (ck1α) and protein kinase R (PKR) have different functional implications such as regulation of viral replication and evasion of interferon induced immune response respectively. Understanding the structural and molecular basis of interactions of the viral protein with two different human kinases can be useful in developing strategies for treatment against HCV.
Serine 232 of NS5A is known to be phosphorylated by human ck1α. A structural model of NS5A peptide containing phosphoacceptor residue Serine 232 bound to ck1α has been generated using the known 3-D structures of kinase-peptide complexes. The substrate interacting residues in ck1α has been identified from the model and these are found to be conserved well in the ck1 family. ck1α – substrate peptide complex has also been used to understand the structural basis of association between ck1α and its other viral stress induced substrate, tumour suppressor p53 transactivation domain which has a crystal structure available.
Interaction of NS5A with another human kinase PKR is primarily genotype specific. NS5A from genotype 1b has been shown to interact and inhibit PKR whereas NS5A from genotype 2a/3a are unable to bind and inhibit PKR efficiently. This is one of the main reasons for the varied response to interferon therapy in HCV patients across different genotypes. Using PKR crystal structure, sequence alignment and evolutionary trace analysis some of the critical residues responsible for the interaction of NS5A 1b with PKR have been identified.
The substrate interacting residues in ck1α have been identified using the structural model of kinase - substrate peptide. The PKR interacting NS5A 1b residues have also been predicted using PKR crystal structure, NS5A sequence analysis along with known experimental results. Functional significance and nature of interaction of interferon sensitivity determining region and variable region 3 of NS5A in different genotypes with PKR which was experimentally shown are also supported by the findings of evolutionary trace analysis. Designing inhibitors to prevent this interaction could enable the HCV genotype 1 infected patients respond well to interferon therapy.
Casein kinase 1α; Hepatitis C virus; Interferon therapy; Kinase-substrate complex; Non-structural protein 5A; Protein kinase R
The constant increase in the number of solved protein structures is of great help in understanding the basic principles behind protein folding and evolution. 3-D structural knowledge is valuable in designing and developing methods for comparison, modelling and prediction of protein structures. These approaches for structure analysis can be directly implicated in studying protein function and for drug design. The backbone of a protein structure favours certain local conformations which include α-helices, β-strands and turns. Libraries of limited number of local conformations (Structural Alphabets) were developed in the past to obtain a useful categorization of backbone conformation. Protein Block (PB) is one such Structural Alphabet that gave a reasonable structure approximation of 0.42 Å. In this study, we use PB description of local structures to analyse conformations that are preferred sites for structural variations and insertions, among group of related folds. This knowledge can be utilized in improving tools for structure comparison that work by analysing local structure similarities. Conformational differences between homologous proteins are known to occur often in the regions comprising turns and loops. Interestingly, these differences are found to have specific preferences depending upon the structural classes of proteins. Such class-specific preferences are mainly seen in the all-β class with changes involving short helical conformations and hairpin turns. A test carried out on a benchmark dataset also indicates that the use of knowledge on the class specific variations can improve the performance of a PB based structure comparison approach. The preference for the indel sites also seem to be confined to a few backbone conformations involving β-turns and helix C-caps. These are mainly associated with short loops joining the regular secondary structures that mediate a reversal in the chain direction. Rare β-turns of type I’ and II’ are also identified as preferred sites for insertions.
Most homodimeric proteins have symmetric structure. Although symmetry is known to confer structural and functional advantage, asymmetric organization is also observed. Using a non-redundant dataset of 223 high-resolution crystal structures of biologically relevant homodimers, we address questions on the prevalence and significance of asymmetry. We used two measures to quantify global and interface asymmetry, and assess the correlation of several molecular and structural parameters with asymmetry. We have identified rare cases (11/223) of biologically relevant homodimers with pronounced global asymmetry. Asymmetry serves as a means to bring about 2∶1 binding between the homodimer and another molecule; it also enables cellular signalling arising from asymmetric macromolecular ligands such as DNA. Analysis of these cases reveals two possible mechanisms by which possible infinite array formation is prevented. In case of homodimers associating via non-topologically equivalent surfaces in their tertiary structures, ligand-dependent mechanisms are used. For stable dimers binding via large surfaces, ligand-dependent structural change regulates polymerisation/depolymerisation; for unstable dimers binding via smaller surfaces that are not evolutionarily well conserved, dimerisation occurs only in the presence of the ligand. In case of homodimers associating via interaction surfaces with parts of the surfaces topologically equivalent in the tertiary structures, steric hindrance serves as the preventive mechanism of infinite array. We also find that homodimers exhibiting grossly symmetric organization rarely exhibit either perfect local symmetry or high local asymmetry. Binding of small ligands at the interface does not cause any significant variation in interface asymmetry. However, identification of biologically relevant interface asymmetry in grossly symmetric homodimers is confounded by the presence of similar small magnitude changes caused due to artefacts of crystallisation. Our study provides new insights regarding accommodation of asymmetry in homodimers.
Most signalling and regulatory proteins participate in transient protein-protein interactions during biological processes. They usually serve as key regulators of various cellular processes and are often stable in both protein-bound and unbound forms. Availability of high-resolution structures of their unbound and bound forms provides an opportunity to understand the molecular mechanisms involved. In this work, we have addressed the question “What is the nature, extent, location and functional significance of structural changes which are associated with formation of protein-protein complexes?”
A database of 76 non-redundant sets of high resolution 3-D structures of protein-protein complexes, representing diverse functions, and corresponding unbound forms, has been used in this analysis. Structural changes associated with protein-protein complexation have been investigated using structural measures and Protein Blocks description. Our study highlights that significant structural rearrangement occurs on binding at the interface as well as at regions away from the interface to form a highly specific, stable and functional complex. Notably, predominantly unaltered interfaces interact mainly with interfaces undergoing substantial structural alterations, revealing the presence of at least one structural regulatory component in every complex.
Interestingly, about one-half of the number of complexes, comprising largely of signalling proteins, show substantial localized structural change at surfaces away from the interface. Normal mode analysis and available information on functions on some of these complexes suggests that many of these changes are allosteric. This change is largely manifest in the proteins whose interfaces are altered upon binding, implicating structural change as the possible trigger of allosteric effect. Although large-scale studies of allostery induced by small-molecule effectors are available in literature, this is, to our knowledge, the first study indicating the prevalence of allostery induced by protein effectors.
The enrichment of allosteric sites in signalling proteins, whose mutations commonly lead to diseases such as cancer, provides support for the usage of allosteric modulators in combating these diseases.
Transient protein-protein interactions play crucial roles in all facets of cellular physiology. Here, using an analysis on known 3-D structures of transient protein-protein complexes, their corresponding uncomplexed forms and energy calculations we seek to understand the roles of protein-protein interfacial residues in the unbound forms. We show that there are conformationally near invariant and evolutionarily conserved interfacial residues which are rigid and they account for ∼65% of the core interface. Interestingly, some of these residues contribute significantly to the stabilization of the interface structure in the uncomplexed form. Such residues have strong energetic basis to perform dual roles of stabilizing the structure of the uncomplexed form as well as the complex once formed while they maintain their rigid nature throughout. This feature is evolutionarily well conserved at both the structural and sequence levels. We believe this analysis has general bearing in the prediction of interfaces and understanding molecular recognition.