Despite numerous attempts over many years to develop an HIV vaccine based on classical strategies, none has convincingly succeeded to date. A number of approaches are being pursued in the field, including building upon possible efficacy indicated by the recent RV144 clinical trial, which combined two HIV vaccines. Here, we argue for an approach based, in part, on understanding the HIV envelope spike and its interaction with broadly neutralizing antibodies (bnAbs) at the molecular level and using this understanding to design immunogens as possible vaccines. BnAbs can protect against virus challenge in animal models and many such antibodies have been isolated recently. We further propose that studies focused on how best to provide T cell help to B cells that produce bnAbs are crucial for optimal immunization strategies. The synthesis of rational immunogen design and immunization strategies, together with iterative improvements, offers great promise for advancing toward an HIV vaccine.
Proteins are known to be dynamic in nature, changing from one conformation to another while performing vital cellular tasks. It is important to understand these movements in order to better understand protein function. At the same time, experimental techniques provide us with only single snapshots of the whole ensemble of available conformations. Computational protein morphing provides a visualization of a protein structure transitioning from one conformation to another by producing a series of intermediate conformations.
We present a novel, efficient morphing algorithm, Morph-Pro based on linear interpolation. We also show that apart from visualization, morphing can be used to provide plausible intermediate structures. We test this by using the intermediate structures of a c-Jun N-terminal kinase (JNK1) conformational change in a virtual docking experiment. The structures are shown to dock with higher score to known JNK1-binding ligands than structures solved using X-Ray crystallography. This experiment demonstrates the potential applications of the intermediate structures in modeling or virtual screening efforts.
Visualization of protein conformational changes is important for characterization of protein function. Furthermore, the intermediate structures produced by our algorithm are good approximations to true structures. We believe there is great potential for these computationally predicted structures in protein-ligand docking experiments and virtual screening. The Morph-Pro web server can be accessed at http://morph-pro.bioinf.spbau.ru.
Protein morphing; Molecular docking; Virtual screening
The tad (tight adherence) locus encodes a protein translocation system that produces a novel variant of type IV pili. The pilus assembly protein TadZ (called CpaE in Caulobacter crescentus) is ubiquitous in tad loci, but is absent in other type IV pilus biogenesis systems. The crystal structure of TadZ from E. rectale (ErTadZ), in complex with ATP and Mg2+, was determined to 2.1 Å resolution. ErTadZ contains an atypical ATPase domain with a variant of a deviant Walker-A motif that retains ATP binding capacity while displaying only low intrinsic ATPase activity. The bound ATP plays an important role in dimerization of ErTadZ. The N-terminal atypical receiver domain resembles the canonical receiver domain of response regulators, but has a degenerate, stripped-down “active site”. Homology modeling of the N-terminal atypical receiver domain of CpaE indicates that it has a conserved protein-protein binding surface similar to that of the polar localization module of the social mobility protein FrzS, suggesting a similar function. Our structural results also suggest that TadZ localizes to the pole through the atypical receiver domain during early stage of pili biogenesis, and functions as a hub for recruiting other pili components, thus providing insights into the Tad pilus assembly process.
Type IV pili assembly; TadZ; atypical receiver domain; atypical ATPase; localization factor
MtfA of Escherichia coli (formerly YeeI) was previously identified as a regulator of the phosphoenolpyruvate (PEP)-dependent:glucose phosphotransferase system. MtfA homolog proteins are highly conserved, especially among beta- and gammaproteobacteria. We determined the crystal structures of the full-length MtfA apoenzyme from Klebsiella pneumoniae and its complex with zinc (holoenzyme) at 2.2 and 1.95 Å, respectively. MtfA contains a conserved H149E150XXH153+E212+Y205 metallopeptidase motif. The presence of zinc in the active site induces significant conformational changes in the region around Tyr205 compared to the conformation of the apoenzyme. Additionally, the zinc-bound MtfA structure is in a self-inhibitory conformation where a region that was disordered in the unliganded structure is now observed in the active site and a nonproductive state of the enzyme is formed. MtfA is related to the catalytic domain of the anthrax lethal factor and the Mop protein involved in the virulence of Vibrio cholerae, with conservation in both overall structure and in the residues around the active site. These results clearly provide support for MtfA as a prototypical zinc metallopeptidase (gluzincin clan).
Evolutionary innovation in eukaryotes and especially animals is at least partially driven by genome rearrangements and the resulting emergence of proteins with new domain combinations, and thus potentially novel functionality. Given the random nature of such rearrangements, one could expect that proteins with particularly useful multidomain combinations may have been rediscovered multiple times by parallel evolution. However, existing reports suggest a minimal role of this phenomenon in the overall evolution of eukaryotic proteomes. We assembled a collection of 172 complete eukaryotic genomes that is not only the largest, but also the most phylogenetically complete set of genomes analyzed so far. By employing a maximum parsimony approach to compare repertoires of Pfam domains and their combinations, we show that independent evolution of domain combinations is significantly more prevalent than previously thought. Our results indicate that about 25% of all currently observed domain combinations have evolved multiple times. Interestingly, this percentage is even higher for sets of domain combinations in individual species, with, for instance, 70% of the domain combinations found in the human genome having evolved independently at least once in other species. We also show that previous, much lower estimates of this rate are most likely due to the small number and biased phylogenetic distribution of the genomes analyzed. The process of independent emergence of identical domain combination is widespread, not limited to domains with specific functional categories. Besides data from large-scale analyses, we also present individual examples of independent domain combination evolution. The surprisingly large contribution of parallel evolution to the development of the domain combination repertoire in extant genomes has profound consequences for our understanding of the evolution of pathways and cellular processes in eukaryotes and for comparative functional genomics.
Most proteins in eukaryotes are composed of two or more domains, evolutionary independent units with (often) their own individual functions. The specific repertoire of multidomain proteins in a given species defines the topology of pathways and networks that carry out its metabolic and regulatory processes. When proteins with new domain combinations emerge by gene fusion and fission, it directly affects topology of cellular networks in this organism. To better understand the evolution of such networks we analyzed a large set of eukaryotic genomes for the evolutionary history of known domain combinations. Our analysis shows that 70% of all domain combinations present in the human genome independently appeared in at least one other eukaryotic genome. Overall, over 25% of all known multidomain architectures emerged independently several times in the history of life. The difference between a global and species specific picture can be explained by the existence of a core set of domain combinations that keeps reemerging in different species, which are accompanied by a smaller number of unique domain combinations that do not appear anywhere else.
The human nuclear factor related to kappa-B-binding protein (NFRKB) is a 1299-residue protein that is a component of the metazoan INO80 complex involved in chromatin remodeling, transcription regulation, DNA replication and DNA repair. Although full length NFRKB is predicted to be around 65% disordered, comparative sequence analysis identified several potentially structured sections in the N-terminal region of the protein. These regions were targeted for crystallographic studies, and the structure of one of these regions spanning residues 370–495 was determined using the JCSG high-throughput structure determination pipeline. The structure reveals a novel, mostly helical domain reminiscent of the winged-helix fold typically involved in DNA binding. However, further analysis shows that this domain does not bind DNA, suggesting it may belong to a small group of winged-helix domains involved in protein-protein interactions.
Limited or regulatory proteolysis plays a critical role in many important biological pathways like blood coagulation, cell proliferation, and apoptosis. A better understanding of mechanisms that control this process is required for discovering new proteolytic events and for developing inhibitors with potential therapeutic value. Two features that determine the susceptibility of peptide bonds to proteolysis are the sequence in the vicinity of the scissile bond and the structural context in which the bond is displayed. In this study we assessed statistical significance and predictive power of individual structural descriptors and combination thereof for the identification of cleavage sites. The analysis was performed on a dataset of >200 proteolytic events documented in CutDB for a variety of mammalian regulatory proteases and their physiological substrates with known 3D structures. The results confirmed the significance and provided a ranking within three main categories of structural features: exposure > flexibility > local interactions. Among secondary structure elements, the largest frequency of proteolytic cleavage was confirmed for loops and lower but significant frequency for helices. Limited proteolysis has lower albeit appreciable frequency of occurrence in certain types of β-strands, which is in contrast with some previous reports. Descriptors deduced directly from the amino acid sequence displayed only marginal predictive capabilities. Homology-based structural models showed a predictive performance comparable to protein substrates with experimentally established structures. Overall, this study provided a foundation for accurate automated prediction of segments of protein structure susceptible to proteolytic processing and, potentially, other post-translational modifications.
proteolysis; proteolytic processing; limited proteolysis; regulatory proteolysis; protease; cleavage site; cleavage site prediction
The protein universe can be organized in families that group proteins sharing common ancestry. Such families display variable levels of structural and functional divergence, from homogenous families, where all members have the same function and very similar structure, to very divergent families, where large variations in function and structure are observed. For practical purposes of structure and function prediction, it would be beneficial to identify sub-groups of proteins with highly similar structures (iso-structural) and/or functions (iso-functional) within divergent protein families. We compared three algorithms in their ability to cluster large protein families and discuss whether any of these methods could reliably identify such iso-structural or iso-functional groups. We show that clustering using profile-sequence and profile-profile comparison methods closely reproduces clusters based on similarities between 3D structures or clusters of proteins with similar biological functions. In contrast, the still commonly used sequence-based methods with fixed thresholds result in vast overestimates of structural and functional diversity in protein families. As a result, these methods also overestimate the number of protein structures that have to be determined to fully characterize structural space of such families. The fact that one can build reliable models based on apparently distantly related templates is crucial for extracting maximal amount of information from new sequencing projects.
Comparative modeling; Protein structure prediction; Protein function prediction; Protein Structure Initiative; Structural Genomics
The crystal structures of an unliganded and adenosine 5′-monophosphate (AMP) bound, metal-dependent phosphoesterase (YP_910028.1) from Bifidobacterium adolescentis are reported at 2.4 Å and 1.94 Å, respectively. Functional characterization of this enzyme was guided by computational analysis and then confirmed by experiment. The structure consists of a PHP (Polymerase and Histidinol Phosphatase, Pfam: PF02811) domain with a second domain (residues 105–178) inserted in the middle of the PHP sequence. The insert domain functions in binding AMP, but the precise function and substrate specificity of this domain is unknown. Initial bioinformatics analyses yielded multiple potential functional leads, with most of them suggesting DNA polymerase or DNA replication activity. Phylogenetic analysis indicated a potential DNA polymerase function that was somewhat supported by global structural comparisons identifying the closest structural match to the alpha subunit of DNA polymerase III. However, several other functional predictions, including phosphoesterase, could not be excluded. THEMATICS, a computational method for the prediction of active sites from protein 3D structures, identified potential reactive residues in YP_910028.1. Further analysis of the predicted active site and local comparison with its closest structure matches strongly suggested phosphoesterase activity, which was confirmed experimentally. Primer extension assays on both normal and mismatched DNA show neither extension nor degradation and provide evidence that YP_910028.1 has neither DNA polymerase activity nor DNA proofreading activity. These results suggest that many of the sequence neighbors previously annotated as having DNA polymerase activity may actually be misannotated.
Functional annotation; structural genomics; phosphoesterase; THEMATICS; active site prediction
Metagenomics sequencing projects have dramatically increased our knowledge of the protein universe and provided over one-half of currently known protein sequences; they have also introduced a much broader phylogenetic diversity into the protein databases. The full analysis of metagenomic datasets is only beginning, but it has already led to the discovery of thousands of new protein families, likely representing novel functions specific to given environments. At the same time, a deeper analysis of such novel families, including experimental structure determination of some representatives, suggests that most of them represent distant homologs of already characterized protein families, and thus most of the protein diversity present in the new environments are due to functional divergence of the known protein families rather than the emergence of new ones.
Toll/interleukin-1 receptor (TIR) domain-containing proteins play important roles in defense against pathogens in both animals and plants, connecting the immunity signaling pathways via a chain of specific protein–protein interactions. Among them is SARM, the only TIR domain-containing adaptor that can negatively regulate TLR signaling. By extensive phylogenetic analysis, we show here that SARM is closely related to bacterial proteins with TIR domains, suggesting that this family has a different evolutionary history from other animal TIR-containing adaptors, possibly emerging via a lateral gene transfer from bacteria to animals. We also show evidence of several similar, independent transfer events, none of which, however, survived in vertebrates. An evolutionary relationship between the animal SARM adaptor and bacterial proteins with TIR domains illustrates the possible role that bacterial TIR-containing proteins play in regulating eukaryotic immune responses and how this mechanism was possibly adapted by the eukaryotes themselves.
Toll-like receptor; host–commensal interaction; host–pathogen interaction; lateral gene transfer; commensal microflora; innate immunity
Function diversification in large protein families is a major mechanism driving expansion of cellular networks, providing organisms with new metabolic capabilities and thus adding to their evolutionary success. However, our understanding of the evolutionary mechanisms of functional diversity in such families is very limited, which, among many other reasons, is due to the lack of functionally well-characterized sets of proteins. Here, using the FGGY carbohydrate kinase family as an example, we built a confidently annotated reference set (CARS) of proteins by propagating experimentally verified functional assignments to a limited number of homologous proteins that are supported by their genomic and functional contexts. Then, we analyzed, on both the phylogenetic and the molecular levels, the evolution of different functional specificities in this family. The results show that the different functions (substrate specificities) encoded by FGGY kinases have emerged only once in the evolutionary history following an apparently simple divergent evolutionary model. At the same time, on the molecular level, one isofunctional group (L-ribulokinase, AraB) evolved at least two independent solutions that employed distinct specificity-determining residues for the recognition of a same substrate (L-ribulose). Our analysis provides a detailed model of the evolution of the FGGY kinase family. It also shows that only combined molecular and phylogenetic approaches can help reconstruct a full picture of functional diversifications in such diverse families.
The protein universe is under constant expansion and is reshaping through multiple duplication, gene losses, lateral gene transfers, and speciation events. Large and functionally heterogeneous protein families that evolve through these processes contain conserved motifs and structural scaffolds, yet their individual members often perform diverse functions. For this reason, the exact functional annotation for their individual members is difficult without detailed analysis of the family. In our study, we performed such a detailed analysis of a particularly heterogeneous FGGY kinase family through the integration of several computational approaches. The combination of phylogenetic and molecular approaches allowed us to precisely assign function to hundreds of proteins, thus reconstructing carbohydrate utilization pathways in almost 200 bacterial species. This analysis also showed that different molecular mechanisms could evolve within a group of isofunctional proteins. Moreover, based on our experience with this specific protein family of FGGY kinases, we believe that our approach can be generally adapted for the analyses of other protein families and that the accumulation of evolutionary models for various families would lead to a better understanding of the protein universe.
Archaeal membrane lipids consist of branched, saturated hydrocarbons distinct from those found in bacteria and eukaryotes. Digeranylgeranylglycerophospholipid reductase (DGGR) catalyzes the hydrogenation process that converts unsaturated 2,3-di-O-geranylgeranylglyceryl phosphate to saturated 2,3-di-O-phytanylglyceryl phosphate as a critical step in the biosynthesis of archaeal membrane lipids. The saturation of hydrocarbon chains confers the ability to resist hydrolysis and oxidation and helps archaea withstand extreme conditions. DGGR is a member of the geranylgeranyl reductase (GGR) family that is also widely distributed in bacteria and plants, where the family members are involved in the biosynthesis of photosynthetic pigments. We have determined the crystal structure of DGGR from the thermophilic heterotrophic archaea Thermoplasma acidophilum at 1.6 Å resolution, in complex with FAD and a bacterial lipid. The DGGR structure can be assigned to the well-studied, para-hydroxybenzoate hydroxylase (PHBH) SCOP superfamily of flavoproteins that include many aromatic hydroxylases and other enzymes with diverse functions. In the DGGR complex, FAD adopts the IN conformation (closed) previously observed in other PHBH flavoproteins. DGGR contains a large substrate-binding site that extends across the entire ligand-binding domain. Electron density corresponding to a bacterial lipid was found within this cavity. The cavity consists of a large opening that tapers down to two narrow curved tunnels that closely mimic the shape of the preferred substrate. We identified a sequence motif, PxxYxWxFP, that defines a specificity pocket in the structure and precisely aligns the double bond of the geranyl group with respect to the FAD cofactor, thus providing a structural basis for the substrate specificity of GGRs. DGGR is likely to share a common mechanism with other PHBH enzymes in which FAD switches between two conformations that correspond to the reductive and oxidative half cycles. The structure provides evidence that substrate binding likely involves conformational changes, which are coupled to the two conformational states of the FAD.
Summary: With the continuous growth of the RCSB Protein Data Bank (PDB), providing an up-to-date systematic structure comparison of all protein structures poses an ever growing challenge. Here, we present a comparison tool for calculating both 1D protein sequence and 3D protein structure alignments. This tool supports various applications at the RCSB PDB website. First, a structure alignment web service calculates pairwise alignments. Second, a stand-alone application runs alignments locally and visualizes the results. Third, pre-calculated 3D structure comparisons for the whole PDB are provided and updated on a weekly basis. These three applications allow users to discover novel relationships between proteins available either at the RCSB PDB or provided by the user.
Availability and Implementation: A web user interface is available at http://www.rcsb.org/pdb/workbench/workbench.do. The source code is available under the LGPL license from http://www.biojava.org. A source bundle, prepared for local execution, is available from http://source.rcsb.org
Contact: email@example.com; firstname.lastname@example.org
The “Function to Find Domain” (FIIND)-containing proteins CARD8 (Cardinal; Tucan) and NLRP1 (NALP1; NAC) are well known components of inflammasomes, multiprotein complexes responsible for activation of caspase-1, a regulator of inflammation and innate immunity. Although identified many years ago, the role of the FIIND is unknown. Here, we report that CARD8 and NLRP1 undergo autoproteolytic cleavage at a conserved SF/S motif within the FIIND. Using bioinformatics and computational modeling approaches, we detected striking structural similarity between the FIIND and the ZU5-UPA domain present in the autoproteolytic protein PIDD. This allowed us to generate a three-dimensional model and to gain insights in the molecular mechanism of the cleavage. Site-directed mutagenesis experiments revealed that the second serine of the SF/S motif is required for CARD8 and NLRP1 autoproteolysis. Furthermore, we discovered an important function for conserved glutamic acid and histidine residues, located in proximity of the cleavage site in regulating the autoprocessing efficiency. Altogether, these results identify a function for the FIIND and show that CARD8 and NLRP1 are ZU5-UPA domain-containing autoproteolytic proteins, thus suggesting a novel mechanism for regulating innate immune responses.
Neurons are known to use large amounts of energy for their normal function and activity. In order to meet this demand, mitochondrial fission, fusion, and movement events (mitochondrial dynamics) control mitochondrial morphology, facilitating biogenesis and proper distribution of mitochondria within neurons. In contrast, dysfunction in mitochondrial dynamics results in reduced cell bioenergetics and thus contributes to neuronal injury and death in many neurodegenerative disorders, including Alzheimer’s disease (AD), Parkinson’s disease, and Huntington’s disease. We recently reported that amyloid-β peptide, thought to be a key mediator of AD pathogenesis, engenders S-nitrosylation and thus hyperactivation of the mitochondrial fission protein Drp1. This activation leads to excessive mitochondrial fragmentation, bioenergetic compromise, and synaptic damage in models of AD. Here, we provide an extended commentary on our findings of nitric oxide-mediated abnormal mitochondrial dynamics.
S-Nitrosylation; Dynamin-related protein 1; Alzheimers’s disease; Mitochondrial fission
Imelysin-like proteins define a superfamily of bacterial proteins that are likely involved in iron uptake. Members of this superfamily were previously thought to be peptidases and were included in the MEROPS family M75. We determined the first crystal structures of two remotely related, imelysin-like proteins. The Psychrobacter arcticus structure was determined at 2.15 Å resolution and contains the canonical imelysin fold, while higher resolution structures from the gut bacteria Bacteroides ovatus, in two crystal forms (at 1.25 Å and 1.44 Å resolution), have a circularly permuted topology. Both structures are highly similar to each other despite low sequence similarity and circular permutation. The all-helical structure can be divided into two similar four-helix bundle domains. The overall structure and the GxHxxE motif region differ from known HxxE metallopeptidases, suggesting that imelysin-like proteins are not peptidases. A putative functional site is located at the domain interface. We have now organized the known homologous proteins into a superfamily, which can be separated into four families. These families share a similar functional site, but each has family-specific structural and sequence features. These results indicate that imelysin-like proteins have evolved from a common ancestor, and likely have a conserved function.
NlpC/P60 superfamily papain-like enzymes play important roles in all kingdoms of life. Two members of this superfamily, LRAT-like and YaeF/YiiX-like families, were predicted to contain a catalytic domain that is circularly permuted such that the catalytic cysteine is located near the C-terminus, instead of at the N-terminus. These permuted enzymes are widespread in virus, pathogenic bacteria, and eukaryotes. We determined the crystal structure of a member of the YaeF/YiiX-like family from Bacillus cereus in complex with lysine. The structure, which adopts a ligand-induced, “closed” conformation, confirms the circular permutation of catalytic residues. A comparative analysis of other related protein structures within the NlpC/P60 superfamily is presented. Permutated NlpC/P60 enzymes contain a similar conserved core and arrangement of catalytic residues, including a Cys/His-containing triad and an additional conserved tyrosine. More surprisingly, permuted enzymes have a hydrophobic S1 binding pocket that is distinct from previously characterized enzymes in the family, indicative of novel substrate specificity. Further analysis of a structural homolog, YiiX (PDB 2if6) identified a fatty acid in the conserved hydrophobic pocket, thus providing additional insights into possible function of these novel enzymes.
The Fold and Function Assignment System (FFAS) server [Jaroszewski et al. (2005) FFAS03: a server for profile–profile sequence alignments. Nucleic Acids Research, 33, W284–W288] implements the algorithm for protein profile–profile alignment introduced originally in [Rychlewski et al. (2000) Comparison of sequence profiles. Strategies for structural predictions using sequence information. Protein Science: a Publication of the Protein Society, 9, 232–241]. Here, we present updates, changes and novel functionality added to the server since 2005 and discuss its new applications. The sequence database used to calculate sequence profiles was enriched by adding sets of publicly available metagenomic sequences. The profile of a user’s protein can now be compared with ∼20 additional profile databases, including several complete proteomes, human proteins involved in genetic diseases and a database of microbial virulence factors. A newly developed interface uses a system of tabs, allowing the user to navigate multiple results pages, and also includes novel functionality, such as a dotplot graph viewer, modeling tools, an improved 3D alignment viewer and links to the database of structural similarities. The FFAS server was also optimized for speed: running times were reduced by an order of magnitude. The FFAS server, http://ffas.godziklab.org, has no log-in requirement, albeit there is an option to register and store results in individual, password-protected directories. Source code and Linux executables for the FFAS program are available for download from the FFAS server.
Bacterial cell walls contain peptidoglycan, an essential polymer made by enzymes in the Mur pathway. These proteins are specific to bacteria, which make them targets for drug discovery. MurC, MurD, MurE and MurF catalyze the synthesis of the peptidoglycan precursor UDP-N-acetylmuramoyl-L-alanyl-γ-D-glutamyl-meso-diaminopimelyl-D-alanyl-D-alanine by the sequential addition of amino acids onto UDP-N-acetylmuramic acid (UDP-MurNAc). MurC-F enzymes have been extensively studied by biochemistry and X-ray crystallography. In Gram-negative bacteria, ∼30–60% of the bacterial cell wall is recycled during each generation. Part of this recycling process involves the murein peptide ligase (Mpl), which attaches the breakdown product, the tripeptide L-alanyl-γ-D-glutamyl-meso-diaminopimelate, to UDP-MurNAc. We present the crystal structure at 1.65 Å resolution of a full-length Mpl from the permafrost bacterium Psychrobacter arcticus 273-4 (PaMpl). Although the Mpl structure has similarities to Mur enzymes, it has unique sequence and structure features that are likely related to its role in cell wall recycling, a function that differentiates it from the MurC-F enzymes. We have analyzed the sequence-structure relationships that are unique to Mpl proteins and compared them to MurC-F ligases. We have also characterized the biochemical properties of this enzyme (optimal temperature, pH and magnesium binding profiles and kinetic parameters). Although the structure does not contain any bound substrates, we have identified ∼30 residues that are likely to be important for recognition of the tripeptide and UDP-MurNAc substrates, as well as features that are unique to Psychrobacter Mpl proteins. These results provide the basis for future mutational studies for more extensive function characterization of the Mpl sequence-structure relationships.
CvfB is a conserved regulatory protein important for the virulence of Staphylococcus aureus. We show here that CvfB binds RNA. The crystal structure of the CvfB ortholog from Streptococcus pneumoniae at 1.4 Å resolution reveals a unique RNA binding protein that is formed from a concatenation of well-known structural modules that bind nucleic acids: three consecutive S1 RNA-binding domains and a winged-helix (WH) domain. The third S1 and the WH domains are required for cooperative RNA binding and form a continuous surface that likely contributes to the RNA interaction. The WH domain is critical to CvfB function and contains a unique structural motif. Thus CvfB represents a novel assembly of modules for binding RNA.
CvfB; Winged-helix domain; S1 domain; RNA binding; virulence
Genome size and complexity, as measured by the number of genes or protein domains, is remarkably similar in most extant eukaryotes and generally exhibits no correlation with their morphological complexity. Underlying trends in the evolution of the functional content and capabilities of different eukaryotic genomes might be hidden by simultaneous gains and losses of genes.
We reconstructed the domain repertoires of putative ancestral species at major divergence points, including the last eukaryotic common ancestor (LECA). We show that, surprisingly, during eukaryotic evolution domain losses in general outnumber domain gains. Only at the base of the animal and the vertebrate sub-trees do domain gains outnumber domain losses. The observed gain/loss balance has a distinct functional bias, most strikingly seen during animal evolution, where most of the gains represent domains involved in regulation and most of the losses represent domains with metabolic functions. This trend is so consistent that clustering of genomes according to their functional profiles results in an organization similar to the tree of life. Furthermore, our results indicate that metabolic functions lost during animal evolution are likely being replaced by the metabolic capabilities of symbiotic organisms such as gut microbes.
While protein domain gains and losses are common throughout eukaryote evolution, losses oftentimes outweigh gains and lead to significant differences in functional profiles. Results presented here provide additional arguments for a complex last eukaryotic common ancestor, but also show a general trend of losses in metabolic capabilities and gain in regulatory complexity during the rise of animals.
Peroxide turnover and signalling are involved in many biological phenomena relevant to human diseases. Yet, all the players and mechanisms involved in peroxide perception are not known. Elucidating very remote evolutionary relationships between proteins is an approach that allows the discovery of novel protein functions. Here, we start with three human proteins, SRPX, SRPX2 and CCDC80, involved in tumor suppression and progression, which possess a conserved region of similarity. Structure and function prediction allowed the definition of P-DUDES, a phylogenetically widespread, possibly ancient protein structural domain, common to vertebrates and many bacterial species.
We show, using bioinformatics approaches, that the P-DUDES domain, surprisingly, adopts the thioredoxin-like (Thx-like) fold. A tentative, more detailed prediction of function is made, namely, that of a 2-Cys peroxiredoxin. Incidentally, consistent overexpression of all three human P-DUDES genes in two public glioblastoma microarray gene expression datasets was discovered. This finding is discussed in the context of the tumor suppressor role that has been ascribed to P-DUDES proteins in several studies. Majority of non-redundant P-DUDES proteins are found in marine metagenome, and among the bacterial species possessing this domain a trend for a higher proportion of aquatic species is observed.
The new protein structural domain, now with a broad enzymatic function predicted, may become a drug target once its detailed molecular mechanism of action is understood in detail.
The Open Protein Structure Annotation Network (TOPSAN) is a web-based collaboration platform for exploring and annotating structures determined by structural genomics efforts. Characterization of those structures presents a challenge since the majority of the proteins themselves have not yet been characterized. Responding to this challenge, the TOPSAN platform facilitates collaborative annotation and investigation via a user-friendly web-based interface pre-populated with automatically generated information. Semantic web technologies expand and enrich TOPSAN’s content through links to larger sets of related databases, and thus, enable data integration from disparate sources and data mining via conventional query languages. TOPSAN can be found at http://www.topsan.org.