While certain archaea appear to synthesize and/or metabolize fatty acids, the respective pathways still remain obscure. By analyzing the genomic distribution of the key lipid-related enzymes, we were able to identify the likely components of the archaeal pathway of fatty acid metabolism, namely, a combination of the enzymes of bacterial-type β-oxidation of fatty acids (acyl-CoA-dehydrogenase, enoyl-CoA hydratase, and 3-hydroxyacyl-CoA dehydrogenase) with paralogs of the archaeal acetyl-CoA C-acetyltransferase, an enzyme of the mevalonate biosynthesis pathway. These three β-oxidation enzymes working in the reverse direction could potentially catalyze biosynthesis of fatty acids, with paralogs of acetyl-CoA C-acetyltransferase performing addition of C2 fragments. The presence in archaea of the genes for energy-transducing membrane enzyme complexes, such as cytochrome bc complex, cytochrome c oxidase, and diverse rhodopsins, was found to correlate with the presence of the proposed system of fatty acid biosynthesis. We speculate that because these membrane complexes functionally depend on fatty acid chains, their genes could have been acquired via lateral gene transfer from bacteria only by those archaea that already possessed a system of fatty acid biosynthesis. The proposed pathway of archaeal fatty acid metabolism operates in extreme conditions and therefore might be of interest in the context of biofuel production and other industrial applications.
biofuels; β-oxidation; halobacteria; methanogens; rhodopsin; bioenergetics
Formation of heat-resistant endospores is a specific property of the members of the phylum Firmicutes (low-G+C Gram-positive bacteria). It is found in representatives of four different classes of Firmicutes: Bacilli, Clostridia, Erysipelotrichia, and Negativicutes, which all encode similar sets of core sporulation proteins. Each of these classes also includes non-spore-forming organisms that sometimes belong to the same genus or even species as their spore-forming relatives. This chapter reviews the diversity of the members of phylum Firmicutes, its current taxonomy, and the status of genome sequencing projects for various subgroups within the phylum. It also discusses the evolution of the Firmicutes from their apparently spore-forming common ancestor and the independent loss of sporulation genes in several different lineages (staphylococci, streptococci, listeria, lactobacilli, ruminococci) in the course of their adaptation to the saprophytic lifestyle in nutrient-rich environment. It argues that systematics of Firmicutes is a rapidly developing area of research that benefits from the evolutionary approaches to the ever-increasing amount of genomic and phenotypic data and allows arranging these data into a common framework.
Later the Bacillus filaments begin to prepare for spore formation. In their homogenous contents strongly refracting bodies appear. From each of these bodies develops an oblong or shortly cylindrical, strongly refracting, dark-rimmed spore.
Ferdinand Cohn. 1876. Untersuchungen über Bacterien. IV. Beiträge zur Biologie der Bacillen. Beiträge zur Biologie der Pflanzen, vol. 2, pp. 249–276. (Studies on the biology of the bacilli. In: Milestones in Microbiology: 1546 to 1940. Translated and edited by Thomas D. Brock. Prentice-Hall, Englewood Cliffs, NJ, 1961, pp. 49–56).
In contrast to numerous enzymes involved in c-di-GMP synthesis and degradation in enterobacteria, only a handful of c-di-GMP receptors/effectors have been identified. In search of new c-di-GMP receptors, we screened the Escherichia coli ASKA overexpression gene library using the Differential Radial Capillary Action of Ligand Assay (DRaCALA) with fluorescently and radioisotope-labeled c-di-GMP. We uncovered three new candidate c-di-GMP receptors in E. coli and characterized one of them, BcsE. The bcsE gene is encoded in cellulose synthase operons in representatives of Gammaproteobacteria and Betaproteobacteria. The purified BcsE proteins from E. coli, Salmonella enterica and Klebsiella pneumoniae bind c-di-GMP via the domain of unknown function, DUF2819, which is hereby designated GIL, GGDEF I-site like domain. The RxGD motif of the GIL domain is required for c-di-GMP binding, similar to the c-di-GMP-binding I-site of the diguanylate cyclase GGDEF domain. Thus, GIL is the second protein domain, after PilZ, dedicated to c-di-GMP-binding. We show that in S. enterica, BcsE is not essential for cellulose synthesis but is required for maximal cellulose production, and that c-di-GMP binding is critical for BcsE function. It appears that cellulose production in enterobacteria is controlled by a two-tiered c-di-GMP-dependent system involving BcsE and the PilZ domain containing glycosyltransferase BcsA.
Binding of cytochrome c, released from the damaged mitochondria, to the apoptotic protease activating factor 1 (Apaf-1) is a key event in the apoptotic signaling cascade. The binding triggers a major domain rearrangement in Apaf-1, which leads to oligomerization of Apaf-1/cytochrome c complexes into an apoptosome. Despite the availability of crystal structures of cytochrome c and Apaf-1 and cryo-electron microscopy models of the entire apoptosome, the binding mode of cytochrome c to Apaf-1, as well as the nature of the amino acid residues of Apaf-1 involved remain obscure.
We investigated the interaction between cytochrome c and Apaf-1 by combining several modeling approaches. We have applied protein-protein docking and energy minimization, evaluated the resulting models of the Apaf-1/cytochrome c complex, and carried out a further analysis by means of molecular dynamics simulations. We ended up with a single model structure where all the lysine residues of cytochrome c that are known as functionally-relevant were involved in forming salt bridges with acidic residues of Apaf-1. This model has revealed three distinctive bifurcated salt bridges, each involving a single lysine residue of cytochrome c and two neighboring acidic resides of Apaf-1. Salt bridge-forming amino acids of Apaf-1 showed a clear evolutionary pattern within Metazoa, with pairs of acidic residues of Apaf-1, involved in bifurcated salt bridges, reaching their highest numbers in the sequences of vertebrates, in which the cytochrome c-mediated mechanism of apoptosome formation seems to be typical.
The reported model of an Apaf-1/cytochrome c complex provides insights in the nature of protein-protein interactions which are hard to observe in crystallographic or electron microscopy studies. Bifurcated salt bridges can be expected to be stronger than simple salt bridges, and their formation might promote the conformational change of Apaf-1, leading to the formation of an apoptosome. Combination of structural and sequence analyses provides hints on the evolution of the cytochrome c-mediated apoptosis.
This article was reviewed by Andrei L. Osterman, Narayanaswamy Srinivasan, Igor N. Berezovsky, and Gerrit Vriend (nominated by Martijn Huynen).
Electronic supplementary material
The online version of this article (doi:10.1186/s13062-015-0059-4) contains supplementary material, which is available to authorized users.
Apoptosis; WD40 domains; Hydrogen bond; Salt bridge; Apoptosis; Protein-protein interactions; Caspase; Molecular dynamics simulations; Sequence analysis; Evolution
The availability of complete genome sequences of diverse bacteria and archaea makes comparative sequence analysis a powerful tool for analyzing signal transduction systems encoded in these genomes. However, most signal transduction proteins consist of two or more individual protein domains, which significantly complicates their functional annotation and makes automated annotation of these proteins in the course of large-scale genome sequencing projects particularly unreliable. We describe here certain common-sense protocols for sequence analysis of two-component histidine kinases and response regulators, as well as other components of the prokaryotic signal transduction machinery: Ser/Thr/Tyr protein kinases and protein phosphatases, adenylate and diguanylate cyclases and c-di-GMP phosphodiesterases. These protocols rely on publicly available computational tools and databases and can be utilized by anyone with an Internet access.
Globin-coupled sensors (GCS) are heme-binding signal transducers in Bacteria and Archaea where an N-terminal globin controls the activity of a variable C-terminal domain. Here we report that BpeGReg, a globin-coupled diguanylate cyclase (GCDC) from the whooping-cough pathogen Bordetella pertussis, synthesizes the second messenger bis-(3’–5’)-cyclic diguanosine monophosphate (c-di-GMP) upon oxygen binding. Expression of BpeGReg in Salmonella typhimurium enhances biofilm formation, while knockout of the BpeGReg gene of B. pertussis results in decreased biofilm formation. These results represent the first identification of a gaseous ligand for any diguanylate cyclase and provide definitive experimental evidence that a globin-coupled sensor regulates c-di-GMP synthesis and biofilm formation. We propose that the synthesis of c-di-GMP by globin sensors is a widespread phenomenon in bacteria.
globin; oxygen sensor; c-di-GMP; diguanylate cyclase; biofilm
Comparative analysis of the complete genome sequences from a variety of poorly studied organisms aims at predicting ecological and behavioral properties of these organisms and help in characterizing their habitats. This task requires finding appropriate descriptors that could be correlated with the core traits of each system and would allow meaningful comparisons. Using the relatively simple bacterial models, first attempts have been made to introduce suitable metrics to describe the complexity of organism’s signaling machinery, which included introducing the “bacterial IQ” score. Here, we use an updated census of prokaryotic signal transduction systems to improve this parameter and evaluate its consistency within selected bacterial phyla. We also introduce a more elaborate descriptor, a set of profiles of relative abundance of members of each family of signal transduction proteins encoded in each genome. We show that these family profiles are well conserved within each genus and are often consistent within families of bacteria. Thus, they reflect evolutionary relationships between organisms as well as individual adaptations of each organism to its specific ecological niche.
comparative genomics; evolution; protein phosphorylation; receptor; Mycobacterium; Shewanella
The 2015 Nucleic Acids Research Database Issue contains 172 papers that include descriptions of 56 new molecular biology databases, and updates on 115 databases whose descriptions have been previously published in NAR or other journals. Following the classification that has been introduced last year in order to simplify navigation of the entire issue, these articles are divided into eight subject categories. This year's highlights include RNAcentral, an international community portal to various databases on noncoding RNA; ValidatorDB, a validation database for protein structures and their ligands; SASBDB, a primary repository for small-angle scattering data of various macromolecular complexes; MoonProt, a database of ‘moonlighting’ proteins, and two new databases of protein–protein and other macromolecular complexes, ComPPI and the Complex Portal. This issue also includes an unusually high number of cancer-related databases and other databases dedicated to genomic basics of disease and potential drugs and drug targets. The size of NAR online Molecular Biology Database Collection, http://www.oxfordjournals.org/nar/database/a/, remained approximately the same, following the addition of 74 new resources and removal of 77 obsolete web sites. The entire Database Issue is freely available online on the Nucleic Acids Research web site (http://nar.oxfordjournals.org/).
Microbial genome sequencing projects produce numerous sequences of deduced proteins, only a small fraction of which have been or will ever be studied experimentally. This leaves sequence analysis as the only feasible way to annotate these proteins and assign to them tentative functions. The Clusters of Orthologous Groups of proteins (COGs) database (http://www.ncbi.nlm.nih.gov/COG/), first created in 1997, has been a popular tool for functional annotation. Its success was largely based on (i) its reliance on complete microbial genomes, which allowed reliable assignment of orthologs and paralogs for most genes; (ii) orthology-based approach, which used the function(s) of the characterized member(s) of the protein family (COG) to assign function(s) to the entire set of carefully identified orthologs and describe the range of potential functions when there were more than one; and (iii) careful manual curation of the annotation of the COGs, aimed at detailed prediction of the biological function(s) for each COG while avoiding annotation errors and overprediction. Here we present an update of the COGs, the first since 2003, and a comprehensive revision of the COG annotations and expansion of the genome coverage to include representative complete genomes from all bacterial and archaeal lineages down to the genus level. This re-analysis of the COGs shows that the original COG assignments had an error rate below 0.5% and allows an assessment of the progress in functional genomics in the past 12 years. During this time, functions of many previously uncharacterized COGs have been elucidated and tentative functional assignments of many COGs have been validated, either by targeted experiments or through the use of high-throughput methods. A particularly important development is the assignment of functions to several widespread, conserved proteins many of which turned out to participate in translation, in particular rRNA maturation and tRNA modification. The new version of the COGs is expected to become an important tool for microbial genomics.
This review traces the evolution of the cytochrome bc complexes from their early spread among prokaryotic lineages and up to the mitochondrial cytochrome bc1 complex (complex III) and its role in apoptosis. The results of phylogenomic analysis suggest that the bacterial cytochrome b6f-type complexes with short cytochromes b were the ancient form that preceded in evolution the cytochrome bc1-type complexes with long cytochromes b. The common ancestor of the b6f-type and the bc1-type complexes probably resembled the b6f-type complexes found in Heliobacteriaceae and in some Planctomycetes. Lateral transfers of cytochrome bc operons could account for the several instances of acquisition of different types of bacterial cytochrome bc complexes by archaea. The gradual oxygenation of the atmosphere could be the key evolutionary factor that has driven further divergence and spread of the cytochrome bc complexes. On one hand, oxygen could be used as a very efficient terminal electron acceptor. On the other hand, auto-oxidation of the components of the bc complex results in the generation of reactive oxygen species (ROS), which necessitated diverse adaptations of the b6f-type and bc1-type complexes, as well as other, functionally coupled proteins. A detailed scenario of the gradual involvement of the cardiolipin-containing mitochondrial cytochrome bc1 complex into the intrinsic apoptotic pathway is proposed, where the functioning of the complex as an apoptotic trigger is viewed as a way to accelerate the elimination of the cells with irreparably damaged, ROS-producing mitochondria.
bioenergetics; molecular evolution; ubiquinol:cytochrome c oxidoreductase; ubiquinone; plastoquinone; cytochrome c; cardiolipin; cell death; photosynthesis; apoptosome
The availability of genome sequences from a variety of organisms presents an opportunity to apply this sequence information to solving the key problems of molecular biology. One of the principal roadblocks on this path is the lack of appropriate descriptors and metrics that could succinctly represent the new knowledge stemming from the genomic data. Several new metrics have recently been used in comparative genome analysis, yet challenges remain in finding an appropriate language for the emerging discipline of systems biology.
The class Clostridia in the phylum Firmicutes (formerly low-G+C Gram-positive bacteria) includes diverse bacteria of medical, environmental, and biotechnological importance. The Selenomonas-Megasphaera-Sporomusa branch, which unifies members of the Firmicutes with Gram-negative-type cell envelopes, was recently moved from Clostridia to a separate class Negativicutes. However, draft genome sequences of the spore-forming members of the Negativicutes revealed typically clostridial sets of sporulation genes. To address this and other questions in clostridial phylogeny, we have compared a phylogenetic tree for a concatenated set of 50 widespread ribosomal proteins with the trees for beta subunits of the RNA polymerase (RpoB) and DNA gyrase (GyrB) and with the 16S rRNA-based phylogeny. The results obtained by these methods showed remarkable consistency, suggesting that they reflect the true evolutionary history of these bacteria. These data put the Selenomonas-Megasphaera-Sporomusa group back within the Clostridia. They also support placement of Clostridium difficile and its close relatives within the family Peptostreptococcaceae; we suggest resolving the long-standing naming conundrum by renaming it Peptoclostridium difficile. These data also indicate the existence of a group of cellulolytic clostridia that belong to the family Ruminococcaceae. As a tentative solution to resolve the current taxonomical problems, we propose assigning 78 validly described Clostridium species that clearly fall outside the family Clostridiaceae to six new genera: Peptoclostridium, Lachnoclostridium, Ruminiclostridium, Erysipelatoclostridium, Gottschalkia, and Tyzzerella. This work reaffirms that 16S rRNA and ribosomal protein sequences are better indicators of evolutionary proximity than phenotypic traits, even such key ones as the structure of the cell envelope and Gram-staining pattern.
Sporulation; taxonomy; Gram staining; cellulose; xylan; Clostridium difficile
Sentra (), a database of signal transduction proteins encoded in completely sequenced prokaryotic genomes, has been updated to reflect recent advances in understanding signal transduction events on a whole-genome scale. Sentra consists of two principal components, a manually curated list of signal transduction proteins in 202 completely sequenced prokaryotic genomes and an automatically generated listing of predicted signaling proteins in 235 sequenced genomes that are awaiting manual curation. In addition to two-component histidine kinases and response regulators, the database now lists manually curated Ser/Thr/Tyr protein kinases and protein phosphatases, as well as adenylate and diguanylate cyclases and c-di-GMP phosphodiesterases, as defined in several recent reviews. All entries in Sentra are extensively annotated with relevant information from public databases (e.g. UniProt, KEGG, PDB and NCBI). Sentra's infrastructure was redesigned to support interactive cross-genome comparisons of signal transduction capabilities of prokaryotic organisms from a taxonomic and phenotypic perspective and in the framework of signal transduction pathways from KEGG. Sentra leverages the PUMA2 system to support interactive analysis and annotation of signal transduction proteins by the users.
We have recently reconstructed the ‘hatcheries’ of the
first cells by combining geochemical analysis with phylogenomic scrutiny of the
inorganic ion requirements of universal components of modern cells (Mulkidjanian
et al.: Origin of first cells at terrestrial, anoxic
geothermal fields. Proc Natl Acad Sci USA 2012,
109:E821–830). These ubiquitous, and by inference primordial, proteins
and functional systems show affinity to and functional requirement for
K+, Zn2+, Mn2+, and phosphate. Thus,
protocells must have evolved in habitats with a high
K+/Na+ ratio and relatively high concentrations of Zn,
Mn and phosphorous compounds. Geochemical reconstruction shows that the ionic
composition conducive to the origin of cells could not have existed in marine
settings but is compatible with emissions of vapor-dominated zones of inland
geothermal systems. Under anoxic, CO2-dominated atmosphere, the ionic
composition of pools of cool, condensed vapor at anoxic geothermal fields would
resemble the internal milieu of modern cells. Such pools would be lined with
porous silicate minerals mixed with metal sulfides and enriched in K+
ions and phosphorous compounds.
Here we address some questions that have appeared in print after the
publication of our anoxic geothermal field scenario. We argue that anoxic
geothermal fields, which were identified as likely cradles of life by using a
top-down approach and phylogenomics analysis as a tool, could provide
geochemical conditions similar to those which were suggested as most conducive
for the emergence of life by the chemists who pursuit the complementary
Any scenario of the transition from chemistry to biology should include an “energy module” because life can exist only when supported by energy flow(s). We addressed the problem of primordial energetics by combining physico-chemical considerations with phylogenomic analysis. We propose that the first replicators could use abiotically formed, exceptionally photostable activated nucleotides both as building blocks and as the main energy source. Nucleoside triphosphates could replace cyclic nucleotides as the principal energy-rich compounds at the stage of the first cells, presumably because the metal chelates of nucleoside triphosphates penetrated membranes much better than the respective metal complexes of nucleoside monophosphates. The ability to exploit natural energy flows for biogenic production of energy-rich molecules could evolve only gradually, after the emergence of sophisticated enzymes and ion-tight membranes. We argue that, in the course of evolution, sodium-dependent membrane energetics preceded the proton-based energetics which evolved independently in bacteria and archaea.
Over the last five years proteogenomics (using mass spectroscopy to identify proteins predicted from genomic sequences) has emerged as a promising approach to the high-throughput identification of protein N-termini, which remains a problem in genome annotation. Comparison of the experimentally determined N-termini with those predicted by sequence analysis tools allows identification of the signal peptides and therefore conclusions on the cytoplasmic or extracytoplasmic (periplasmic or extracellular) localization of the respective proteins. We present here the results of a proteogenomic study of the signal peptides in Escherichia coli K-12 and compare its results with the available experimental data and predictions by such software tools as SignalP and Phobius. A single proteogenomics experiment recovered more than a third of all signal peptides that had been experimentally determined during the past three decades and confirmed at least 31additional signal peptides, mostlyin the known exported proteins, which had been previously predicted but not validated. The filtering of putative signal peptides for the peptide length and the presence of an eight-residue hydrophobic patch and a typical signal peptidase cleavage site proved sufficient to eliminate the false-positive hits. Surprisingly, the results of this proteogenomics study, as well as a re-analysis of the E. coli genome with the latest version of SignalP program, show that the fraction of proteins containing signal peptides is only about 10%, or half of previous estimates.
Twenty-five years have passed since the discovery of cyclic dimeric (3′→5′) GMP (cyclic di-GMP or c-di-GMP). From the relative obscurity of an allosteric activator of a bacterial cellulose synthase, c-di-GMP has emerged as one of the most common and important bacterial second messengers. Cyclic di-GMP has been shown to regulate biofilm formation, motility, virulence, the cell cycle, differentiation, and other processes. Most c-di-GMP-dependent signaling pathways control the ability of bacteria to interact with abiotic surfaces or with other bacterial and eukaryotic cells. Cyclic di-GMP plays key roles in lifestyle changes of many bacteria, including transition from the motile to the sessile state, which aids in the establishment of multicellular biofilm communities, and from the virulent state in acute infections to the less virulent but more resilient state characteristic of chronic infectious diseases. From a practical standpoint, modulating c-di-GMP signaling pathways in bacteria could represent a new way of controlling formation and dispersal of biofilms in medical and industrial settings. Cyclic di-GMP participates in interkingdom signaling. It is recognized by mammalian immune systems as a uniquely bacterial molecule and therefore is considered a promising vaccine adjuvant. The purpose of this review is not to overview the whole body of data in the burgeoning field of c-di-GMP-dependent signaling. Instead, we provide a historic perspective on the development of the field, emphasize common trends, and illustrate them with the best available examples. We also identify unresolved questions and highlight new directions in c-di-GMP research that will give us a deeper understanding of this truly universal bacterial second messenger.
The 2014 Nucleic Acids Research Database Issue includes descriptions of 58 new molecular biology databases and recent updates to 123 databases previously featured in NAR or other journals. For convenience, the issue is now divided into eight sections that reflect major subject categories. Among the highlights of this issue are six databases of the transcription factor binding sites in various organisms and updates on such popular databases as CAZy, Database of Genomic Variants (DGV), dbGaP, DrugBank, KEGG, miRBase, Pfam, Reactome, SEED, TCDB and UniProt. There is a strong block of structural databases, which includes, among others, the new RNA Bricks database, updates on PDBe, PDBsum, ArchDB, Gene3D, ModBase, Nucleic Acid Database and the recently revived iPfam database. An update on the NCBI’s MMDB describes VAST+, an improved tool for protein structure comparison. Two articles highlight the development of the Structural Classification of Proteins (SCOP) database: one describes SCOPe, which automates assignment of new structures to the existing SCOP hierarchy; the other one describes the first version of SCOP2, with its more flexible approach to classifying protein structures. This issue also includes a collection of articles on bacterial taxonomy and metagenomics, which includes updates on the List of Prokaryotic Names with Standing in Nomenclature (LPSN), Ribosomal Database Project (RDP), the Silva/LTP project and several new metagenomics resources. The NAR online Molecular Biology Database Collection, http://www.oxfordjournals.org/nar/database/c/, has been expanded to 1552 databases. The entire Database Issue is freely available online on the Nucleic Acids Research website (http://nar.oxfordjournals.org/).
Planctomycetes, Verrucomicrobia and Chlamydia are prokaryotic phyla that are sometimes grouped together as the PVC superphylum of eubacteria. Some PVC species possess interesting attributes, in particular, internal membranes that superficially resemble eukaryotic endomembranes. Some biologists now claim that PVC bacteria are nucleus-bearing prokaryotes and that they are evolutionary intermediates in the transition from prokaryote to eukaryote. PVC prokaryotes do not possess a nucleus and are not intermediates in the prokaryote-to-eukaryote transition. All of the PVC traits that are currently cited as evidence for aspiring eukaryoticity are either analogous (the result of convergent evolution), not homologous, to eukaryotic traits; or else they are the result of lateral gene transfers. Here we summarize the evidence that shows why most of the purported similarities between the PVC bacteria and eukaryotes are analogous and the rest are consequence of lateral gene acquisition.
Experimental data exists for only a vanishingly small fraction of sequenced microbial genes. This community page discusses the progress made by the COMBREX project to address this important issue using both computational and experimental resources.
The 20th annual Database Issue of Nucleic Acids Research includes 176 articles, half of which describe new online molecular biology databases and the other half provide updates on the databases previously featured in NAR and other journals. This year’s highlights include two databases of DNA repeat elements; several databases of transcriptional factors and transcriptional factor-binding sites; databases on various aspects of protein structure and protein–protein interactions; databases for metagenomic and rRNA sequence analysis; and four databases specifically dedicated to Escherichia coli. The increased emphasis on using the genome data to improve human health is reflected in the development of the databases of genomic structural variation (NCBI’s dbVar and EBI’s DGVa), the NIH Genetic Testing Registry and several other databases centered on the genetic basis of human disease, potential drugs, their targets and the mechanisms of protein–ligand binding. Two new databases present genomic and RNAseq data for monkeys, providing wealth of data on our closest relatives for comparative genomics purposes. The NAR online Molecular Biology Database Collection, available at http://www.oxfordjournals.org/nar/database/a/, has been updated and currently lists 1512 online databases. The full content of the Database Issue is freely available online on the Nucleic Acids Research website (http://nar.oxfordjournals.org/).
The 19th annual Database Issue of Nucleic Acids Research features descriptions of 92 new online databases covering various areas of molecular biology and 100 papers describing recent updates to the databases previously described in NAR and other journals. The highlights of this issue include, among others, a description of neXtProt, a knowledgebase on human proteins; a detailed explanation of the principles behind the NCBI Taxonomy Database; NCBI and EBI papers on the recently launched BioSample databases that store sample information for a variety of database resources; descriptions of the recent developments in the Gene Ontology and UniProt Gene Ontology Annotation projects; updates on Pfam, SMART and InterPro domain databases; update papers on KEGG and TAIR, two universally acclaimed databases that face an uncertain future; and a separate section with 10 wiki-based databases, introduced in an accompanying editorial. The NAR online Molecular Biology Database Collection, available at http://www.oxfordjournals.org/nar/database/a/, has been updated and now lists 1380 databases. Brief machine-readable descriptions of the databases featured in this issue, according to the BioDBcore standards, will be provided at the http://biosharing.org/biodbcore web site. The full content of the Database Issue is freely available online on the Nucleic Acids Research web site (http://nar.oxfordjournals.org/).
Comparative analysis of the sequences of enzymes encoded in a variety of prokaryotic and eukaryotic genomes reveals convergence and divergence at several levels. Functional convergence can be inferred when structurally distinct and hence non-homologous enzymes show the ability to catalyze the same biochemical reaction. In contrast, as a result of functional diversification, many structurally similar enzyme molecules act on substantially distinct substrates and catalyze diverse biochemical reactions. Here, we present updates on the ATP-grasp, alkaline phosphatase, cupin, HD hydrolase, and N-terminal nucleophile (Ntn) hydrolase enzyme superfamilies and discuss the patterns of sequence and structural conservation and diversity within these superfamilies. Typically, enzymes within a superfamily possess common sequence motifs and key active site residues, as well as (predicted) reaction mechanisms. These observations suggest that the strained conformation (the entatic state) of the active site, which is responsible for the substrate binding and formation of the transition complex, tends to be conserved within enzyme superfamilies. The subsequent fate of the transition complex is not necessarily conserved and depends on the details of the structures of the enzyme and the substrate. This variability of reaction outcomes limits the ability of sequence analysis to predict the exact enzymatic activities of newly sequenced gene products. Nevertheless, sequence-based (super)family assignments and generic functional predictions, even if imprecise, provide valuable leads for experimental studies and remain the best approach to the functional annotation of uncharacterized proteins from new genomes.
Enzyme Catalysis; Enzyme Mechanisms; Enzyme Structure; Evolution; Phosphodiesterases; Convergence; Divergence
Cyclic diguanylate (c-di-GMP) is a ubiquitous second messenger regulating diverse cellular functions including motility, biofilm formation, cell cycle progression and virulence in bacteria. In the cell, degradation of c-di-GMP is catalyzed by highly specific EAL domain phosphodiesterases whose catalytic mechanism is still unclear. Here, we purified 13 EAL domain proteins from various organisms and demonstrated that their catalytic activity is associated with the presence of 10 conserved EAL domain residues. The crystal structure of the TDB1265 EAL domain was determined in a free state (1.8 Å) and in complex with c-di-GMP (2.35 Å) and unveiled the role of the conserved residues in substrate binding and catalysis. The structure revealed the presence of two metal ions directly coordinated by six conserved residues, two oxygens of the c-di-GMP phosphate, and potential catalytic water molecule. Our results support a two-metal-ion catalytic mechanism of c-di-GMP hydrolysis by EAL domain phosphodiesterases.
EAL domain; cyclic di-GMP; phosphodiesterase; X-ray crystallography; Thiobacillus denitrificans
Binding of calcium ions (Ca2+) to proteins can have profound effects on their structure and function. Common roles of calcium binding include structure stabilization and regulation of activity. It is known that diverse families – EF-hands being one of at least twelve – use a Dx[DN]xDG linear motif to bind calcium in near-identical fashion. Here, four novel structural contexts for the motif are described. Existing experimental data for one of them, a thermophilic archaeal subtilisin, demonstrate for the first time a role for Dx[DN]xDG-bound calcium in protein folding. An integrin-like embedding of the motif in the blade of a β-propeller fold – here named the calcium blade – is discovered in structures of bacterial and fungal proteins. Furthermore, sensitive database searches suggest a common origin for the calcium blade in β-propeller structures of different sizes and a pan-kingdom distribution of these proteins. Factors favouring the multiple convergent evolution of the motif appear to include its general Asp-richness, the regular spacing of the Asp residues and the fact that change of Asp into Gly and vice versa can occur though a single nucleotide change. Among the known structural contexts for the Dx[DN]xDG motif, only the calcium blade and the EF-hand are currently found intracellularly in large numbers, perhaps because the higher extracellular concentration of Ca2+ allows for easier fixing of newly evolved motifs that have acquired useful functions. The analysis presented here will inform ongoing efforts toward prediction of similar calcium-binding motifs from sequence information alone.