The 20th annual Database Issue of Nucleic Acids Research includes 176 articles, half of which describe new online molecular biology databases and the other half provide updates on the databases previously featured in NAR and other journals. This year’s highlights include two databases of DNA repeat elements; several databases of transcriptional factors and transcriptional factor-binding sites; databases on various aspects of protein structure and protein–protein interactions; databases for metagenomic and rRNA sequence analysis; and four databases specifically dedicated to Escherichia coli. The increased emphasis on using the genome data to improve human health is reflected in the development of the databases of genomic structural variation (NCBI’s dbVar and EBI’s DGVa), the NIH Genetic Testing Registry and several other databases centered on the genetic basis of human disease, potential drugs, their targets and the mechanisms of protein–ligand binding. Two new databases present genomic and RNAseq data for monkeys, providing wealth of data on our closest relatives for comparative genomics purposes. The NAR online Molecular Biology Database Collection, available at http://www.oxfordjournals.org/nar/database/a/, has been updated and currently lists 1512 online databases. The full content of the Database Issue is freely available online on the Nucleic Acids Research website (http://nar.oxfordjournals.org/).
The availability of genome sequences from a variety of organisms presents an opportunity to apply this sequence information to solving the key problems of molecular biology. One of the principal roadblocks on this path is the lack of appropriate descriptors and metrics that could succinctly represent the new knowledge stemming from the genomic data. Several new metrics have recently been used in comparative genome analysis, yet challenges remain in finding an appropriate language for the emerging discipline of systems biology.
The 19th annual Database Issue of Nucleic Acids Research features descriptions of 92 new online databases covering various areas of molecular biology and 100 papers describing recent updates to the databases previously described in NAR and other journals. The highlights of this issue include, among others, a description of neXtProt, a knowledgebase on human proteins; a detailed explanation of the principles behind the NCBI Taxonomy Database; NCBI and EBI papers on the recently launched BioSample databases that store sample information for a variety of database resources; descriptions of the recent developments in the Gene Ontology and UniProt Gene Ontology Annotation projects; updates on Pfam, SMART and InterPro domain databases; update papers on KEGG and TAIR, two universally acclaimed databases that face an uncertain future; and a separate section with 10 wiki-based databases, introduced in an accompanying editorial. The NAR online Molecular Biology Database Collection, available at http://www.oxfordjournals.org/nar/database/a/, has been updated and now lists 1380 databases. Brief machine-readable descriptions of the databases featured in this issue, according to the BioDBcore standards, will be provided at the http://biosharing.org/biodbcore web site. The full content of the Database Issue is freely available online on the Nucleic Acids Research web site (http://nar.oxfordjournals.org/).
Comparative analysis of the sequences of enzymes encoded in a variety of prokaryotic and eukaryotic genomes reveals convergence and divergence at several levels. Functional convergence can be inferred when structurally distinct and hence non-homologous enzymes show the ability to catalyze the same biochemical reaction. In contrast, as a result of functional diversification, many structurally similar enzyme molecules act on substantially distinct substrates and catalyze diverse biochemical reactions. Here, we present updates on the ATP-grasp, alkaline phosphatase, cupin, HD hydrolase, and N-terminal nucleophile (Ntn) hydrolase enzyme superfamilies and discuss the patterns of sequence and structural conservation and diversity within these superfamilies. Typically, enzymes within a superfamily possess common sequence motifs and key active site residues, as well as (predicted) reaction mechanisms. These observations suggest that the strained conformation (the entatic state) of the active site, which is responsible for the substrate binding and formation of the transition complex, tends to be conserved within enzyme superfamilies. The subsequent fate of the transition complex is not necessarily conserved and depends on the details of the structures of the enzyme and the substrate. This variability of reaction outcomes limits the ability of sequence analysis to predict the exact enzymatic activities of newly sequenced gene products. Nevertheless, sequence-based (super)family assignments and generic functional predictions, even if imprecise, provide valuable leads for experimental studies and remain the best approach to the functional annotation of uncharacterized proteins from new genomes.
Enzyme Catalysis; Enzyme Mechanisms; Enzyme Structure; Evolution; Phosphodiesterases; Convergence; Divergence
Cyclic diguanylate (c-di-GMP) is a ubiquitous second messenger regulating diverse cellular functions including motility, biofilm formation, cell cycle progression and virulence in bacteria. In the cell, degradation of c-di-GMP is catalyzed by highly specific EAL domain phosphodiesterases whose catalytic mechanism is still unclear. Here, we purified 13 EAL domain proteins from various organisms and demonstrated that their catalytic activity is associated with the presence of 10 conserved EAL domain residues. The crystal structure of the TDB1265 EAL domain was determined in a free state (1.8 Å) and in complex with c-di-GMP (2.35 Å) and unveiled the role of the conserved residues in substrate binding and catalysis. The structure revealed the presence of two metal ions directly coordinated by six conserved residues, two oxygens of the c-di-GMP phosphate, and potential catalytic water molecule. Our results support a two-metal-ion catalytic mechanism of c-di-GMP hydrolysis by EAL domain phosphodiesterases.
EAL domain; cyclic di-GMP; phosphodiesterase; X-ray crystallography; Thiobacillus denitrificans
Response regulators (RRs) within two-component signal transduction systems control a variety of cellular processes. Most RRs contain DNA-binding output domains and serve as transcriptional regulators. Other RR types contain RNA-binding, ligand-binding, protein-binding or transporter output domains and exert regulation at the transcriptional, post-transcriptional or post-translational levels. In a significant fraction of RRs, output domains are enzymes that themselves participate in signal transduction: methylesterases, adenylate or diguanylate cyclases, c-di-GMP-specific phosphodiesterases, histidine kinases, serine/threonine protein kinases and protein phosphatases. In addition, there remain output domains whose functions are still unknown. Patterns of the distribution of various RR families are generally conserved within key microbial lineages and can be used to trace adaptations of various species to their unique ecological niches.
protein domains; transcriptional regulation; protein phosphorylation; signal transduction; genome annotation; protein structure
Comparative analysis of the complete genome sequences from a variety of poorly studied organisms aims at predicting ecological and behavioral properties of these organisms and help in characterizing their habitats. This task requires finding appropriate descriptors that could be correlated with the core traits of each system and would allow meaningful comparisons. Using the relatively simple bacterial models, first attempts have been made to introduce suitable metrics to describe the complexity of organism’s signaling machinery, which included introducing the “bacterial IQ” score. Here, we use an updated census of prokaryotic signal transduction systems to improve this parameter and evaluate its consistency within selected bacterial phyla. We also introduce a more elaborate descriptor, a set of profiles of relative abundance of members of each family of signal transduction proteins encoded in each genome. We show that these family profiles are well conserved within each genus and are often consistent within families of bacteria. Thus, they reflect evolutionary relationships between organisms as well as individual adaptations of each organism to its specific ecological niche.
comparative genomics; evolution; protein phosphorylation; receptor; Mycobacterium; Shewanella
The rapidly accumulating genome sequence data allow researchers to address fundamental biological questions that were not even asked just a few years ago. A major problem in genomics is the widening gap between the rapid progress in genome sequencing and the comparatively slow progress in the functional characterization of sequenced genomes. Here we discuss two key questions of genome biology: whether we need more genomes, and how deep is our understanding of biology based on genomic analysis. We argue that overly specific annotations of gene functions are often less useful than the more generic, but also more robust, functional assignments based on protein family classification. We also discuss problems in understanding the functions of the remaining “conserved hypothetical” genes.
The present article proposes the adoption of a community-defined, uniform, generic description of the core attributes of biological databases, BioDBCore. The goals of these attributes are to provide a general overview of the database landscape, to encourage consistency and interoperability between resources; and to promote the use of semantic and syntactic standards. BioDBCore will make it easier for users to evaluate the scope and relevance of available resources. This new resource will increase the collective impact of the information present in biological databases.
The current 18th Database Issue of Nucleic Acids Research features descriptions of 96 new and 83 updated online databases covering various areas of molecular biology. It includes two editorials, one that discusses COMBREX, a new exciting project aimed at figuring out the functions of the ‘conserved hypothetical’ proteins, and one concerning BioDBcore, a proposed description of the ‘minimal information about a biological database’. Papers from the members of the International Nucleotide Sequence Database collaboration (INSDC) describe each of the participating databases, DDBJ, ENA and GenBank, principles of data exchange within the collaboration, and the recently established Sequence Read Archive. A testament to the longevity of databases, this issue includes updates on the RNA modification database, Definition of Secondary Structure of Proteins (DSSP) and Homology-derived Secondary Structure of Proteins (HSSP) databases, which have not been featured here in >12 years. There is also a block of papers describing recent progress in protein structure databases, such as Protein DataBank (PDB), PDB in Europe (PDBe), CATH, SUPERFAMILY and others, as well as databases on protein structure modeling, protein–protein interactions and the organization of inter-protein contact sites. Other highlights include updates of the popular gene expression databases, GEO and ArrayExpress, several cancer gene databases and a detailed description of the UK PubMed Central project. The Nucleic Acids Research online Database Collection, available at: http://www.oxfordjournals.org/nar/database/a/, now lists 1330 carefully selected molecular biology databases. The full content of the Database Issue is freely available online at the Nucleic Acids Research web site (http://nar.oxfordjournals.org/).
The present article proposes the adoption of a community-defined, uniform, generic description of the core attributes of biological databases, BioDBCore. The goals of these attributes are to provide a general overview of the database landscape, to encourage consistency and interoperability between resources and to promote the use of semantic and syntactic standards. BioDBCore will make it easier for users to evaluate the scope and relevance of available resources. This new resource will increase the collective impact of the information present in biological databases.
An analysis of the distribution of the Na+-translocating ATPases/ATP synthases among microbial genomes identified an atypical form of the F1Fo-type ATPase that is present in the archaea Methanosarcina barkeri and M.acetivorans, in a number of phylogenetically diverse marine and halotolerant bacteria and in pathogens Burkholderia spp. In complete genomes, representatives of this form (referred to here as N-ATPase) are always present as second copies, in addition to the typical proton-translocating ATP synthases. The N-ATPase is encoded by a highly conserved atpDCQRBEFAG operon and its subunits cluster separately from the equivalent subunits of the typical F-type ATPases. N-ATPase c subunits carry a full set of sodium-binding residues, indicating that most of these enzymes are Na+-translocating ATPases that likely confer on their hosts the ability to extrude Na+ ions. Other distinctive properties of the N-ATPase operons include the absence of the delta subunit from its cytoplasmic sector and the presence of two additional membrane subunits, AtpQ (formerly gene 1) and AtpR (formerly gene X). We argue that N-ATPases are an early-diverging branch of membrane ATPases that, similarly to the eukaryotic V-type ATPases, do not synthesize ATP.
Contact: firstname.lastname@example.org; email@example.com
Supplementary information: Supplementary data are available at Bioinformatics online.
Evolutionarily unrelated proteins that catalyze the same biochemical reactions are often referred to as analogous - as opposed to homologous - enzymes. The existence of numerous alternative, non-homologous enzyme isoforms presents an interesting evolutionary problem; it also complicates genome-based reconstruction of the metabolic pathways in a variety of organisms. In 1998, a systematic search for analogous enzymes resulted in the identification of 105 Enzyme Commission (EC) numbers that included two or more proteins without detectable sequence similarity to each other, including 34 EC nodes where proteins were known (or predicted) to have distinct structural folds, indicating independent evolutionary origins. In the past 12 years, many putative non-homologous isofunctional enzymes were identified in newly sequenced genomes. In addition, efforts in structural genomics resulted in a vastly improved structural coverage of proteomes, providing for definitive assessment of (non)homologous relationships between proteins.
We report the results of a comprehensive search for non-homologous isofunctional enzymes (NISE) that yielded 185 EC nodes with two or more experimentally characterized - or predicted - structurally unrelated proteins. Of these NISE sets, only 74 were from the original 1998 list. Structural assignments of the NISE show over-representation of proteins with the TIM barrel fold and the nucleotide-binding Rossmann fold. From the functional perspective, the set of NISE is enriched in hydrolases, particularly carbohydrate hydrolases, and in enzymes involved in defense against oxidative stress.
These results indicate that at least some of the non-homologous isofunctional enzymes were recruited relatively recently from enzyme families that are active against related substrates and are sufficiently flexible to accommodate changes in substrate specificity.
This article was reviewed by Andrei Osterman, Keith F. Tipton (nominated by Martijn Huynen) and Igor B. Zhulin. For the full reviews, go to the Reviewers' comments section.
The current issue of Nucleic Acids Research includes descriptions of 58 new and 73 updated data resources. The accompanying online Database Collection, available at http://www.oxfordjournals.org/nar/database/a/, now lists 1230 carefully selected databases covering various aspects of molecular and cell biology. While most data resource descriptions remain very brief, the issue includes several longer papers that highlight recent significant developments in such databases as Pfam, MetaCyc, UniProt, ELM and PDBe. The databases described in the Database Issue and Database Collection, however, are far more than a distinct set of resources; they form a network of connected data, concepts and shared technology. The full content of the Database Issue is available online at the Nucleic Acids Research web site (http://nar.oxfordjournals.org/).
Single domain response regulators (SD-RRs) are signaling components of two-component phosphorylation pathways that harbor a phosphoryl receiver domain but lack a dedicated output domain. The E. coli protein CheY, the paradigm member of this family, regulates chemotaxis by relaying information between chemoreceptors and the flagellar switch. New data provide a more complex picture of CheY-mediated motility control in several bacteria and suggest diverging mechanisms in control of cellular motors. Moreover, advances have been made in understanding cellular functions of SD-RRs beyond chemotaxis. We review recent reports indicating that SD-RRs constitute a family of versatile molecular switches that contribute to cellular organization and dynamics as spatial organizers and/or as allosteric regulators of histidine protein kinases.
two-component systems; single-domain response regulators; receiver domain; chemotaxis; Caulobacter
Studies of the past several decades have provided major insights into the structural organization of biological membranes and mechanisms of many membrane molecular machines. However, the origin(s) of the membrane(s) and membrane proteins remain enigmatic. We discuss different concepts of the origin and early evolution of membranes, with a focus on the evolution of the (im)permeability to charged molecules, such as proteins and nucleic acids, and small ions. Reconstruction of the evolution of F-type and A/V-type membrane ATPases (ATP synthases), which are either proton or sodium-dependent, might help understand not only the origin of membrane bioenergetics, but also of membranes themselves. We argue that evolution of biological membranes occurred as a process of co-evolution of lipid bilayers, membrane proteins and membrane bioenergetics.
The accompanying article (A.Y. Mulkidjanian, Biology Direct 4:26) puts forward a detailed hypothesis on the role of zinc sulfide (ZnS) in the origin of life on Earth. The hypothesis suggests that life emerged within compartmentalized, photosynthesizing ZnS formations of hydrothermal origin (the Zn world), assembled in sub-aerial settings on the surface of the primeval Earth.
If life started within photosynthesizing ZnS compartments, it should have been able to evolve under the conditions of elevated levels of Zn2+ ions, byproducts of the ZnS-mediated photosynthesis. Therefore, the Zn world hypothesis leads to a set of testable predictions regarding the specific roles of Zn2+ ions in modern organisms, particularly in RNA and protein structures related to the procession of RNA and the "evolutionarily old" cellular functions. We checked these predictions using publicly available data and obtained evidence suggesting that the development of the primeval life forms up to the stage of the Last Universal Common Ancestor proceeded in zinc-rich settings. Testing of the hypothesis has revealed the possible supportive role of manganese sulfide in the primeval photosynthesis. In addition, we demonstrate the explanatory power of the Zn world concept by elucidating several points that so far remained without acceptable rationalization. In particular, this concept implies a new scenario for the separation of Bacteria and Archaea and the origin of Eukarya.
The ability of the Zn world hypothesis to generate non-trivial veritable predictions and explain previously obscure items gives credence to its key postulate that the development of the first life forms started within zinc-rich formations of hydrothermal origin and was driven by solar UV irradiation. This concept implies that the geochemical conditions conducive to the origin of life may have persisted only as long as the atmospheric CO2 pressure remained above ca. 10 bar. This work envisions the first Earth biotopes as photosynthesizing and habitable areas of porous ZnS and MnS precipitates around primeval hot springs. Further work will be needed to provide details on the life within these communities and to elucidate the primordial (bio)chemical reactions.
This article was reviewed by Arcady Mushegian, Eugene Koonin, and Patrick Forterre. For the full reviews, please go to the Reviewers' reports section.
Transcriptional regulators containing the LytTR-type DNA-binding domain control production of virulence factors in several bacterial pathogens. In this issue of Structure, Ann Stock and colleagues report the crystal structure of this elusive domain in complex with its DNA target.
All living cells routinely expel Na+ ions, maintaining lower concentration of Na+ in the cytoplasm than in the surrounding milieu. In the vast majority of bacteria, as well as in mitochondria and chloroplasts, export of Na+ occurs at the expense of the proton-motive force. Some bacteria, however, possess primary generators of the transmembrane electrochemical gradient of Na+ (sodium-motive force). These primary Na+ pumps have been traditionally seen as adaptations to high external pH or to high temperature. Subsequent studies revealed, however, the mechanisms for primary sodium pumping in a variety of non-extremophiles, such as marine bacteria and certain bacterial pathogens. Further, many alkaliphiles and hyperthermophiles were shown to rely on H+, not Na+, as the coupling ion. We review here the recent progress in understanding the role of sodium-motive force, including (i) the conclusion on evolutionary primacy of the sodium-motive force as energy intermediate, (ii) the mechanisms, evolutionary advantages and limitations of switching from Na+ to H+ as the coupling ion, and (iii) the possible reasons why certain pathogenic bacteria still rely on the sodium-motive force.
The current issue of Nucleic Acids Research includes descriptions of 179 databases, of which 95 are new. These databases (along with several molecular biology databases described in other journals) have been included in the Nucleic Acids Research online Molecular Biology Database Collection, bringing the total number of databases in the collection to 1170. In this introductory comment, we briefly describe some of these new databases and review the principles guiding the selection of databases for inclusion in the Nucleic Acids Research annual Database Issue and the Nucleic Acids Research online Molecular Biology Database Collection. The complete database list and summaries are available online at the Nucleic Acids Research web site (http://nar.oxfordjournals.org/).
Sporulation in low-G+C gram-positive bacteria (Firmicutes) is an important survival mechanism that involves up to 150 genes, acting in a highly regulated manner. Many sporulation genes have close homologs in non-sporulating bacteria, including cyanobacteria, proteobacteria and spirochaetes, indicating that their products play a wider biological role. Most of them have been characterized as regulatory proteins or enzymes of peptidoglycan turnover; functions of others remain unknown but they are likely to have a general role in cell division and/or development. We have compiled a list of such widely conserved sporulation and germination proteins with poorly characterized functions, ranked them by the width of their phylogenetic distribution, and performed detailed sequence analysis and, where possible, structural modeling aimed at estimating their potential functions. Here we report the results of sequence analysis of Bacillus subtilis spore germination protein GerM, suggesting that it is a widespread cell development protein, whose function might involve binding to peptidoglycan. GerM consists of two tandem copies of a new domain (designated the GERMN domain) that forms phylum-specific fusions with two other newly described domains, GERMN-associated domains 1 and 2 (GMAD1 and GMAD2). Fold recognition reveals a β-propeller fold for GMAD1, while ab initio modeling suggests that GMAD2 adopts a fibronectin type III fold. SpoVS is predicted to adopt the AlbA archaeal chromatin protein fold, which suggests that it is a DNA-binding protein, most likely a novel transcriptional regulator.
Supplementary information: Supplementary data are available at ftp://ftp.ncbi.nih.gov/pub/galperin/Sporulation.html
The F- and V-type ATPases are rotary molecular machines that couple translocation of protons or sodium ions across the membrane to the synthesis or hydrolysis of ATP. Both the F-type (found in most bacteria and eukaryotic mitochondria and chloroplasts) and V-type (found in archaea, some bacteria, and eukaryotic vacuoles) ATPases can translocate either protons or sodium ions. The prevalent proton-dependent ATPases are generally viewed as the primary form of the enzyme whereas the sodium-translocating ATPases of some prokaryotes are usually construed as an exotic adaptation to survival in extreme environments.
We combine structural and phylogenetic analyses to clarify the evolutionary relation between the proton- and sodium-translocating ATPases. A comparison of the structures of the membrane-embedded oligomeric proteolipid rings of sodium-dependent F- and V-ATPases reveals nearly identical sets of amino acids involved in sodium binding. We show that the sodium-dependent ATPases are scattered among proton-dependent ATPases in both the F- and the V-branches of the phylogenetic tree.
Barring convergent emergence of the same set of ligands in several lineages, these findings indicate that the use of sodium gradient for ATP synthesis is the ancestral modality of membrane bioenergetics. Thus, a primitive, sodium-impermeable but proton-permeable cell membrane that harboured a set of sodium-transporting enzymes appears to have been the evolutionary predecessor of the more structurally demanding proton-tight membranes. The use of proton as the coupling ion appears to be a later innovation that emerged on several independent occasions.
This article was reviewed by J. Peter Gogarten, Martijn A. Huynen, and Igor B. Zhulin. For the full reviews, please go to the Reviewers' comments section.