Bacteroides thetaiotaomicron, a predominant member of the human gut microbiota, is characterized by its ability to utilize a wide variety of polysaccharides using the extensive saccharolytic machinery that is controlled by an expanded repertoire of transcription factors (TFs). The availability of genomic sequences for multiple Bacteroides species opens an opportunity for their comparative analysis to enable characterization of their metabolic and regulatory networks.
A comparative genomics approach was applied for the reconstruction and functional annotation of the carbohydrate utilization regulatory networks in 11 Bacteroides genomes. Bioinformatics analysis of promoter regions revealed putative DNA-binding motifs and regulons for 31 orthologous TFs in the Bacteroides. Among the analyzed TFs there are 4 SusR-like regulators, 16 AraC-like hybrid two-component systems (HTCSs), and 11 regulators from other families. Novel DNA motifs of HTCSs and SusR-like regulators in the Bacteroides have the common structure of direct repeats with a long spacer between two conserved sites.
The inferred regulatory network in B. thetaiotaomicron contains 308 genes encoding polysaccharide and sugar catabolic enzymes, carbohydrate-binding and transport systems, and TFs. The analyzed TFs control pathways for utilization of host and dietary glycans to monosaccharides and their further interconversions to intermediates of the central metabolism. The reconstructed regulatory network allowed us to suggest and refine specific functional assignments for sugar catabolic enzymes and transporters, providing a substantial improvement to the existing metabolic models for B. thetaiotaomicron. The obtained collection of reconstructed TF regulons is available in the RegPrecise database (http://regprecise.lbl.gov).
Regulatory network; Regulon; Transcription factor; BACTEROIDES; Carbohydrate utilization
Genome scale network reconstruction has enabled predictive modeling of metabolism for many systems. Traditionally, protein structural information has not been represented in such reconstructions. Expanding a genome-scale model of Escherichia coli metabolism by including experimental and predicted protein structures enabled the analysis of protein thermostability in a network context, allowing prediction of protein activities that limit network function at super-optimal temperature and mechanistic interpretations of mutations found in strains adapted to heat. Predicted growth-limiting factors for thermotolerance were validated through nutrient supplementation experiments and defined metabolic sensitivities to heat stress, providing evidence that metabolic enzyme thermostability is rate limiting at super-optimal temperature. Inclusion of structural information expanded the content and predictive capability of genome-scale metabolic networks enabling structural systems biology of metabolism.
A novel highly conserved protein domain, DUF162 [Pfam: PF02589], can be mapped to two proteins: LutB and LutC. Both proteins are encoded by a highly conserved LutABC operon, which has been implicated in lactate utilization in bacteria. Based on our analysis of its sequence, structure, and recent experimental evidence reported by other groups, we hereby redefine DUF162 as the LUD domain family.
JCSG solved the first crystal structure [PDB:2G40] from the LUD domain family: LutC protein, encoded by ORF DR_1909, of Deinococcus radiodurans. LutC shares features with domains in the functionally diverse ISOCOT superfamily. We have observed that the LUD domain has an increased abundance in the human gut microbiome.
We propose a model for the substrate and cofactor binding and regulation in LUD domain. The significance of LUD-containing proteins in the human gut microbiome, and the implication of lactate metabolism in the radiation-resistance of Deinococcus radiodurans are discussed.
LUD; DUF162; LutB; LutC; Domain of unknown function; Deinococcus radiodurans
The discovery of broadly neutralizing antibodies (bNAbs) has provided an enormous impetus to the HIV vaccine research and to entire immunology. The bNAber database at http://bNAber.org provides open, user-friendly access to detailed data on the rapidly growing list of HIV bNAbs, including neutralization profiles, sequences and three-dimensional structures (when available). It also provides an extensive list of visualization and analysis tools, such as heatmaps to analyse neutralization data as well as structure and sequence viewers to correlate bNAbs properties with structural and sequence features of individual antibodies. The goal of the bNAber database is to enable researchers in this field to easily compare and analyse available information on bNAbs thereby supporting efforts to design an effective vaccine for HIV/AIDS. The bNAber database not only provides easy access to data that currently is scattered in the Supplementary Materials sections of individual papers, but also contributes to the development of general standards of data that have to be presented with the discovery of new bNAbs and a universal mechanism of how such data can be shared.
Despite numerous attempts over many years to develop an HIV vaccine based on classical strategies, none has convincingly succeeded to date. A number of approaches are being pursued in the field, including building upon possible efficacy indicated by the recent RV144 clinical trial, which combined two HIV vaccines. Here, we argue for an approach based, in part, on understanding the HIV envelope spike and its interaction with broadly neutralizing antibodies (bnAbs) at the molecular level and using this understanding to design immunogens as possible vaccines. BnAbs can protect against virus challenge in animal models and many such antibodies have been isolated recently. We further propose that studies focused on how best to provide T cell help to B cells that produce bnAbs are crucial for optimal immunization strategies. The synthesis of rational immunogen design and immunization strategies, together with iterative improvements, offers great promise for advancing toward an HIV vaccine.
Every genome contains a large number of uncharacterized proteins that may encode entirely novel biological systems. Many of these uncharacterized proteins fall into related sequence families. By applying sequence and structural analysis we hope to provide insight into novel biology.
We analyze a previously uncharacterized Pfam protein family called DUF4424 [Pfam:PF14415]. The recently solved three-dimensional structure of the protein lpg2210 from Legionella pneumophila provides the first structural information pertaining to this family. This protein additionally includes the first representative structure of another Pfam family called the YARHG domain [Pfam:PF13308]. The Pfam family DUF4424 adopts a 19-stranded beta-sandwich fold that shows similarity to the N-terminal domain of leukotriene A-4 hydrolase. The YARHG domain forms an all-helical domain at the C-terminus. Structure analysis allows us to recognize distant similarities between the DUF4424 domain and individual domains of M1 aminopeptidases and tricorn proteases, which form massive proteasome-like capsids in both archaea and bacteria.
Based on our analyses we hypothesize that the DUF4424 domain may have a role in forming large, multi-component enzyme complexes. We suggest that the YARGH domain may play a role in binding a moiety in proximity with peptidoglycan, such as a hydrophobic outer membrane lipid or lipopolysaccharide.
Domain of unknown function; Protein family; Protein structure; DUF4424; YARHG domain; Sequence analysis
Proteins are known to be dynamic in nature, changing from one conformation to another while performing vital cellular tasks. It is important to understand these movements in order to better understand protein function. At the same time, experimental techniques provide us with only single snapshots of the whole ensemble of available conformations. Computational protein morphing provides a visualization of a protein structure transitioning from one conformation to another by producing a series of intermediate conformations.
We present a novel, efficient morphing algorithm, Morph-Pro based on linear interpolation. We also show that apart from visualization, morphing can be used to provide plausible intermediate structures. We test this by using the intermediate structures of a c-Jun N-terminal kinase (JNK1) conformational change in a virtual docking experiment. The structures are shown to dock with higher score to known JNK1-binding ligands than structures solved using X-Ray crystallography. This experiment demonstrates the potential applications of the intermediate structures in modeling or virtual screening efforts.
Visualization of protein conformational changes is important for characterization of protein function. Furthermore, the intermediate structures produced by our algorithm are good approximations to true structures. We believe there is great potential for these computationally predicted structures in protein-ligand docking experiments and virtual screening. The Morph-Pro web server can be accessed at http://morph-pro.bioinf.spbau.ru.
Protein morphing; Molecular docking; Virtual screening
The tad (tight adherence) locus encodes a protein translocation system that produces a novel variant of type IV pili. The pilus assembly protein TadZ (called CpaE in Caulobacter crescentus) is ubiquitous in tad loci, but is absent in other type IV pilus biogenesis systems. The crystal structure of TadZ from E. rectale (ErTadZ), in complex with ATP and Mg2+, was determined to 2.1 Å resolution. ErTadZ contains an atypical ATPase domain with a variant of a deviant Walker-A motif that retains ATP binding capacity while displaying only low intrinsic ATPase activity. The bound ATP plays an important role in dimerization of ErTadZ. The N-terminal atypical receiver domain resembles the canonical receiver domain of response regulators, but has a degenerate, stripped-down “active site”. Homology modeling of the N-terminal atypical receiver domain of CpaE indicates that it has a conserved protein-protein binding surface similar to that of the polar localization module of the social mobility protein FrzS, suggesting a similar function. Our structural results also suggest that TadZ localizes to the pole through the atypical receiver domain during early stage of pili biogenesis, and functions as a hub for recruiting other pili components, thus providing insights into the Tad pilus assembly process.
Type IV pili assembly; TadZ; atypical receiver domain; atypical ATPase; localization factor
MtfA of Escherichia coli (formerly YeeI) was previously identified as a regulator of the phosphoenolpyruvate (PEP)-dependent:glucose phosphotransferase system. MtfA homolog proteins are highly conserved, especially among beta- and gammaproteobacteria. We determined the crystal structures of the full-length MtfA apoenzyme from Klebsiella pneumoniae and its complex with zinc (holoenzyme) at 2.2 and 1.95 Å, respectively. MtfA contains a conserved H149E150XXH153+E212+Y205 metallopeptidase motif. The presence of zinc in the active site induces significant conformational changes in the region around Tyr205 compared to the conformation of the apoenzyme. Additionally, the zinc-bound MtfA structure is in a self-inhibitory conformation where a region that was disordered in the unliganded structure is now observed in the active site and a nonproductive state of the enzyme is formed. MtfA is related to the catalytic domain of the anthrax lethal factor and the Mop protein involved in the virulence of Vibrio cholerae, with conservation in both overall structure and in the residues around the active site. These results clearly provide support for MtfA as a prototypical zinc metallopeptidase (gluzincin clan).
Evolutionary innovation in eukaryotes and especially animals is at least partially driven by genome rearrangements and the resulting emergence of proteins with new domain combinations, and thus potentially novel functionality. Given the random nature of such rearrangements, one could expect that proteins with particularly useful multidomain combinations may have been rediscovered multiple times by parallel evolution. However, existing reports suggest a minimal role of this phenomenon in the overall evolution of eukaryotic proteomes. We assembled a collection of 172 complete eukaryotic genomes that is not only the largest, but also the most phylogenetically complete set of genomes analyzed so far. By employing a maximum parsimony approach to compare repertoires of Pfam domains and their combinations, we show that independent evolution of domain combinations is significantly more prevalent than previously thought. Our results indicate that about 25% of all currently observed domain combinations have evolved multiple times. Interestingly, this percentage is even higher for sets of domain combinations in individual species, with, for instance, 70% of the domain combinations found in the human genome having evolved independently at least once in other species. We also show that previous, much lower estimates of this rate are most likely due to the small number and biased phylogenetic distribution of the genomes analyzed. The process of independent emergence of identical domain combination is widespread, not limited to domains with specific functional categories. Besides data from large-scale analyses, we also present individual examples of independent domain combination evolution. The surprisingly large contribution of parallel evolution to the development of the domain combination repertoire in extant genomes has profound consequences for our understanding of the evolution of pathways and cellular processes in eukaryotes and for comparative functional genomics.
Most proteins in eukaryotes are composed of two or more domains, evolutionary independent units with (often) their own individual functions. The specific repertoire of multidomain proteins in a given species defines the topology of pathways and networks that carry out its metabolic and regulatory processes. When proteins with new domain combinations emerge by gene fusion and fission, it directly affects topology of cellular networks in this organism. To better understand the evolution of such networks we analyzed a large set of eukaryotic genomes for the evolutionary history of known domain combinations. Our analysis shows that 70% of all domain combinations present in the human genome independently appeared in at least one other eukaryotic genome. Overall, over 25% of all known multidomain architectures emerged independently several times in the history of life. The difference between a global and species specific picture can be explained by the existence of a core set of domain combinations that keeps reemerging in different species, which are accompanied by a smaller number of unique domain combinations that do not appear anywhere else.
The human nuclear factor related to kappa-B-binding protein (NFRKB) is a 1299-residue protein that is a component of the metazoan INO80 complex involved in chromatin remodeling, transcription regulation, DNA replication and DNA repair. Although full length NFRKB is predicted to be around 65% disordered, comparative sequence analysis identified several potentially structured sections in the N-terminal region of the protein. These regions were targeted for crystallographic studies, and the structure of one of these regions spanning residues 370–495 was determined using the JCSG high-throughput structure determination pipeline. The structure reveals a novel, mostly helical domain reminiscent of the winged-helix fold typically involved in DNA binding. However, further analysis shows that this domain does not bind DNA, suggesting it may belong to a small group of winged-helix domains involved in protein-protein interactions.
Limited or regulatory proteolysis plays a critical role in many important biological pathways like blood coagulation, cell proliferation, and apoptosis. A better understanding of mechanisms that control this process is required for discovering new proteolytic events and for developing inhibitors with potential therapeutic value. Two features that determine the susceptibility of peptide bonds to proteolysis are the sequence in the vicinity of the scissile bond and the structural context in which the bond is displayed. In this study we assessed statistical significance and predictive power of individual structural descriptors and combination thereof for the identification of cleavage sites. The analysis was performed on a dataset of >200 proteolytic events documented in CutDB for a variety of mammalian regulatory proteases and their physiological substrates with known 3D structures. The results confirmed the significance and provided a ranking within three main categories of structural features: exposure > flexibility > local interactions. Among secondary structure elements, the largest frequency of proteolytic cleavage was confirmed for loops and lower but significant frequency for helices. Limited proteolysis has lower albeit appreciable frequency of occurrence in certain types of β-strands, which is in contrast with some previous reports. Descriptors deduced directly from the amino acid sequence displayed only marginal predictive capabilities. Homology-based structural models showed a predictive performance comparable to protein substrates with experimentally established structures. Overall, this study provided a foundation for accurate automated prediction of segments of protein structure susceptible to proteolytic processing and, potentially, other post-translational modifications.
proteolysis; proteolytic processing; limited proteolysis; regulatory proteolysis; protease; cleavage site; cleavage site prediction
The protein universe can be organized in families that group proteins sharing common ancestry. Such families display variable levels of structural and functional divergence, from homogenous families, where all members have the same function and very similar structure, to very divergent families, where large variations in function and structure are observed. For practical purposes of structure and function prediction, it would be beneficial to identify sub-groups of proteins with highly similar structures (iso-structural) and/or functions (iso-functional) within divergent protein families. We compared three algorithms in their ability to cluster large protein families and discuss whether any of these methods could reliably identify such iso-structural or iso-functional groups. We show that clustering using profile-sequence and profile-profile comparison methods closely reproduces clusters based on similarities between 3D structures or clusters of proteins with similar biological functions. In contrast, the still commonly used sequence-based methods with fixed thresholds result in vast overestimates of structural and functional diversity in protein families. As a result, these methods also overestimate the number of protein structures that have to be determined to fully characterize structural space of such families. The fact that one can build reliable models based on apparently distantly related templates is crucial for extracting maximal amount of information from new sequencing projects.
Comparative modeling; Protein structure prediction; Protein function prediction; Protein Structure Initiative; Structural Genomics
The crystal structures of an unliganded and adenosine 5′-monophosphate (AMP) bound, metal-dependent phosphoesterase (YP_910028.1) from Bifidobacterium adolescentis are reported at 2.4 Å and 1.94 Å, respectively. Functional characterization of this enzyme was guided by computational analysis and then confirmed by experiment. The structure consists of a PHP (Polymerase and Histidinol Phosphatase, Pfam: PF02811) domain with a second domain (residues 105–178) inserted in the middle of the PHP sequence. The insert domain functions in binding AMP, but the precise function and substrate specificity of this domain is unknown. Initial bioinformatics analyses yielded multiple potential functional leads, with most of them suggesting DNA polymerase or DNA replication activity. Phylogenetic analysis indicated a potential DNA polymerase function that was somewhat supported by global structural comparisons identifying the closest structural match to the alpha subunit of DNA polymerase III. However, several other functional predictions, including phosphoesterase, could not be excluded. THEMATICS, a computational method for the prediction of active sites from protein 3D structures, identified potential reactive residues in YP_910028.1. Further analysis of the predicted active site and local comparison with its closest structure matches strongly suggested phosphoesterase activity, which was confirmed experimentally. Primer extension assays on both normal and mismatched DNA show neither extension nor degradation and provide evidence that YP_910028.1 has neither DNA polymerase activity nor DNA proofreading activity. These results suggest that many of the sequence neighbors previously annotated as having DNA polymerase activity may actually be misannotated.
Functional annotation; structural genomics; phosphoesterase; THEMATICS; active site prediction
Metagenomics sequencing projects have dramatically increased our knowledge of the protein universe and provided over one-half of currently known protein sequences; they have also introduced a much broader phylogenetic diversity into the protein databases. The full analysis of metagenomic datasets is only beginning, but it has already led to the discovery of thousands of new protein families, likely representing novel functions specific to given environments. At the same time, a deeper analysis of such novel families, including experimental structure determination of some representatives, suggests that most of them represent distant homologs of already characterized protein families, and thus most of the protein diversity present in the new environments are due to functional divergence of the known protein families rather than the emergence of new ones.
Toll/interleukin-1 receptor (TIR) domain-containing proteins play important roles in defense against pathogens in both animals and plants, connecting the immunity signaling pathways via a chain of specific protein–protein interactions. Among them is SARM, the only TIR domain-containing adaptor that can negatively regulate TLR signaling. By extensive phylogenetic analysis, we show here that SARM is closely related to bacterial proteins with TIR domains, suggesting that this family has a different evolutionary history from other animal TIR-containing adaptors, possibly emerging via a lateral gene transfer from bacteria to animals. We also show evidence of several similar, independent transfer events, none of which, however, survived in vertebrates. An evolutionary relationship between the animal SARM adaptor and bacterial proteins with TIR domains illustrates the possible role that bacterial TIR-containing proteins play in regulating eukaryotic immune responses and how this mechanism was possibly adapted by the eukaryotes themselves.
Toll-like receptor; host–commensal interaction; host–pathogen interaction; lateral gene transfer; commensal microflora; innate immunity
Function diversification in large protein families is a major mechanism driving expansion of cellular networks, providing organisms with new metabolic capabilities and thus adding to their evolutionary success. However, our understanding of the evolutionary mechanisms of functional diversity in such families is very limited, which, among many other reasons, is due to the lack of functionally well-characterized sets of proteins. Here, using the FGGY carbohydrate kinase family as an example, we built a confidently annotated reference set (CARS) of proteins by propagating experimentally verified functional assignments to a limited number of homologous proteins that are supported by their genomic and functional contexts. Then, we analyzed, on both the phylogenetic and the molecular levels, the evolution of different functional specificities in this family. The results show that the different functions (substrate specificities) encoded by FGGY kinases have emerged only once in the evolutionary history following an apparently simple divergent evolutionary model. At the same time, on the molecular level, one isofunctional group (L-ribulokinase, AraB) evolved at least two independent solutions that employed distinct specificity-determining residues for the recognition of a same substrate (L-ribulose). Our analysis provides a detailed model of the evolution of the FGGY kinase family. It also shows that only combined molecular and phylogenetic approaches can help reconstruct a full picture of functional diversifications in such diverse families.
The protein universe is under constant expansion and is reshaping through multiple duplication, gene losses, lateral gene transfers, and speciation events. Large and functionally heterogeneous protein families that evolve through these processes contain conserved motifs and structural scaffolds, yet their individual members often perform diverse functions. For this reason, the exact functional annotation for their individual members is difficult without detailed analysis of the family. In our study, we performed such a detailed analysis of a particularly heterogeneous FGGY kinase family through the integration of several computational approaches. The combination of phylogenetic and molecular approaches allowed us to precisely assign function to hundreds of proteins, thus reconstructing carbohydrate utilization pathways in almost 200 bacterial species. This analysis also showed that different molecular mechanisms could evolve within a group of isofunctional proteins. Moreover, based on our experience with this specific protein family of FGGY kinases, we believe that our approach can be generally adapted for the analyses of other protein families and that the accumulation of evolutionary models for various families would lead to a better understanding of the protein universe.
Archaeal membrane lipids consist of branched, saturated hydrocarbons distinct from those found in bacteria and eukaryotes. Digeranylgeranylglycerophospholipid reductase (DGGR) catalyzes the hydrogenation process that converts unsaturated 2,3-di-O-geranylgeranylglyceryl phosphate to saturated 2,3-di-O-phytanylglyceryl phosphate as a critical step in the biosynthesis of archaeal membrane lipids. The saturation of hydrocarbon chains confers the ability to resist hydrolysis and oxidation and helps archaea withstand extreme conditions. DGGR is a member of the geranylgeranyl reductase (GGR) family that is also widely distributed in bacteria and plants, where the family members are involved in the biosynthesis of photosynthetic pigments. We have determined the crystal structure of DGGR from the thermophilic heterotrophic archaea Thermoplasma acidophilum at 1.6 Å resolution, in complex with FAD and a bacterial lipid. The DGGR structure can be assigned to the well-studied, para-hydroxybenzoate hydroxylase (PHBH) SCOP superfamily of flavoproteins that include many aromatic hydroxylases and other enzymes with diverse functions. In the DGGR complex, FAD adopts the IN conformation (closed) previously observed in other PHBH flavoproteins. DGGR contains a large substrate-binding site that extends across the entire ligand-binding domain. Electron density corresponding to a bacterial lipid was found within this cavity. The cavity consists of a large opening that tapers down to two narrow curved tunnels that closely mimic the shape of the preferred substrate. We identified a sequence motif, PxxYxWxFP, that defines a specificity pocket in the structure and precisely aligns the double bond of the geranyl group with respect to the FAD cofactor, thus providing a structural basis for the substrate specificity of GGRs. DGGR is likely to share a common mechanism with other PHBH enzymes in which FAD switches between two conformations that correspond to the reductive and oxidative half cycles. The structure provides evidence that substrate binding likely involves conformational changes, which are coupled to the two conformational states of the FAD.
Summary: With the continuous growth of the RCSB Protein Data Bank (PDB), providing an up-to-date systematic structure comparison of all protein structures poses an ever growing challenge. Here, we present a comparison tool for calculating both 1D protein sequence and 3D protein structure alignments. This tool supports various applications at the RCSB PDB website. First, a structure alignment web service calculates pairwise alignments. Second, a stand-alone application runs alignments locally and visualizes the results. Third, pre-calculated 3D structure comparisons for the whole PDB are provided and updated on a weekly basis. These three applications allow users to discover novel relationships between proteins available either at the RCSB PDB or provided by the user.
Availability and Implementation: A web user interface is available at http://www.rcsb.org/pdb/workbench/workbench.do. The source code is available under the LGPL license from http://www.biojava.org. A source bundle, prepared for local execution, is available from http://source.rcsb.org
Contact: firstname.lastname@example.org; email@example.com
The “Function to Find Domain” (FIIND)-containing proteins CARD8 (Cardinal; Tucan) and NLRP1 (NALP1; NAC) are well known components of inflammasomes, multiprotein complexes responsible for activation of caspase-1, a regulator of inflammation and innate immunity. Although identified many years ago, the role of the FIIND is unknown. Here, we report that CARD8 and NLRP1 undergo autoproteolytic cleavage at a conserved SF/S motif within the FIIND. Using bioinformatics and computational modeling approaches, we detected striking structural similarity between the FIIND and the ZU5-UPA domain present in the autoproteolytic protein PIDD. This allowed us to generate a three-dimensional model and to gain insights in the molecular mechanism of the cleavage. Site-directed mutagenesis experiments revealed that the second serine of the SF/S motif is required for CARD8 and NLRP1 autoproteolysis. Furthermore, we discovered an important function for conserved glutamic acid and histidine residues, located in proximity of the cleavage site in regulating the autoprocessing efficiency. Altogether, these results identify a function for the FIIND and show that CARD8 and NLRP1 are ZU5-UPA domain-containing autoproteolytic proteins, thus suggesting a novel mechanism for regulating innate immune responses.
Neurons are known to use large amounts of energy for their normal function and activity. In order to meet this demand, mitochondrial fission, fusion, and movement events (mitochondrial dynamics) control mitochondrial morphology, facilitating biogenesis and proper distribution of mitochondria within neurons. In contrast, dysfunction in mitochondrial dynamics results in reduced cell bioenergetics and thus contributes to neuronal injury and death in many neurodegenerative disorders, including Alzheimer’s disease (AD), Parkinson’s disease, and Huntington’s disease. We recently reported that amyloid-β peptide, thought to be a key mediator of AD pathogenesis, engenders S-nitrosylation and thus hyperactivation of the mitochondrial fission protein Drp1. This activation leads to excessive mitochondrial fragmentation, bioenergetic compromise, and synaptic damage in models of AD. Here, we provide an extended commentary on our findings of nitric oxide-mediated abnormal mitochondrial dynamics.
S-Nitrosylation; Dynamin-related protein 1; Alzheimers’s disease; Mitochondrial fission
Imelysin-like proteins define a superfamily of bacterial proteins that are likely involved in iron uptake. Members of this superfamily were previously thought to be peptidases and were included in the MEROPS family M75. We determined the first crystal structures of two remotely related, imelysin-like proteins. The Psychrobacter arcticus structure was determined at 2.15 Å resolution and contains the canonical imelysin fold, while higher resolution structures from the gut bacteria Bacteroides ovatus, in two crystal forms (at 1.25 Å and 1.44 Å resolution), have a circularly permuted topology. Both structures are highly similar to each other despite low sequence similarity and circular permutation. The all-helical structure can be divided into two similar four-helix bundle domains. The overall structure and the GxHxxE motif region differ from known HxxE metallopeptidases, suggesting that imelysin-like proteins are not peptidases. A putative functional site is located at the domain interface. We have now organized the known homologous proteins into a superfamily, which can be separated into four families. These families share a similar functional site, but each has family-specific structural and sequence features. These results indicate that imelysin-like proteins have evolved from a common ancestor, and likely have a conserved function.
NlpC/P60 superfamily papain-like enzymes play important roles in all kingdoms of life. Two members of this superfamily, LRAT-like and YaeF/YiiX-like families, were predicted to contain a catalytic domain that is circularly permuted such that the catalytic cysteine is located near the C-terminus, instead of at the N-terminus. These permuted enzymes are widespread in virus, pathogenic bacteria, and eukaryotes. We determined the crystal structure of a member of the YaeF/YiiX-like family from Bacillus cereus in complex with lysine. The structure, which adopts a ligand-induced, “closed” conformation, confirms the circular permutation of catalytic residues. A comparative analysis of other related protein structures within the NlpC/P60 superfamily is presented. Permutated NlpC/P60 enzymes contain a similar conserved core and arrangement of catalytic residues, including a Cys/His-containing triad and an additional conserved tyrosine. More surprisingly, permuted enzymes have a hydrophobic S1 binding pocket that is distinct from previously characterized enzymes in the family, indicative of novel substrate specificity. Further analysis of a structural homolog, YiiX (PDB 2if6) identified a fatty acid in the conserved hydrophobic pocket, thus providing additional insights into possible function of these novel enzymes.
The Fold and Function Assignment System (FFAS) server [Jaroszewski et al. (2005) FFAS03: a server for profile–profile sequence alignments. Nucleic Acids Research, 33, W284–W288] implements the algorithm for protein profile–profile alignment introduced originally in [Rychlewski et al. (2000) Comparison of sequence profiles. Strategies for structural predictions using sequence information. Protein Science: a Publication of the Protein Society, 9, 232–241]. Here, we present updates, changes and novel functionality added to the server since 2005 and discuss its new applications. The sequence database used to calculate sequence profiles was enriched by adding sets of publicly available metagenomic sequences. The profile of a user’s protein can now be compared with ∼20 additional profile databases, including several complete proteomes, human proteins involved in genetic diseases and a database of microbial virulence factors. A newly developed interface uses a system of tabs, allowing the user to navigate multiple results pages, and also includes novel functionality, such as a dotplot graph viewer, modeling tools, an improved 3D alignment viewer and links to the database of structural similarities. The FFAS server was also optimized for speed: running times were reduced by an order of magnitude. The FFAS server, http://ffas.godziklab.org, has no log-in requirement, albeit there is an option to register and store results in individual, password-protected directories. Source code and Linux executables for the FFAS program are available for download from the FFAS server.