Search tips
Search criteria

Results 1-25 (86)

Clipboard (0)

Select a Filter Below

more »
Year of Publication
more »
1.  The 24th annual Nucleic Acids Research database issue: a look back and upcoming changes 
Nucleic Acids Research  2016;45(Database issue):D1-D11.
This year's Database Issue of Nucleic Acids Research contains 152 papers that include descriptions of 54 new databases and update papers on 98 databases, of which 16 have not been previously featured in NAR. As always, these databases cover a broad range of molecular biology subjects, including genome structure, gene expression and its regulation, proteins, protein domains, and protein–protein interactions. Following the recent trend, an increasing number of new and established databases deal with the issues of human health, from cancer-causing mutations to drugs and drug targets. In accordance with this trend, three recently compiled databases that have been selected by NAR reviewers and editors as ‘breakthrough’ contributions, denovo-db, the Monarch Initiative, and Open Targets, cover human de novo gene variants, disease-related phenotypes in model organisms, and a bioinformatics platform for therapeutic target identification and validation, respectively. We expect these databases to attract the attention of numerous researchers working in various areas of genetics and genomics. Looking back at the past 12 years, we present here the ‘golden set’ of databases that have consistently served as authoritative, comprehensive, and convenient data resources widely used by the entire community and offer some lessons on what makes a successful database. The Database Issue is freely available online at the web site. An updated version of the NAR Molecular Biology Database Collection is available at
PMCID: PMC5210597  PMID: 28053160
2.  Diversity of Cyclic Di-GMP-Binding Proteins and Mechanisms 
Journal of Bacteriology  2015;198(1):32-46.
Cyclic di-GMP (c-di-GMP) synthetases and hydrolases (GGDEF, EAL, and HD-GYP domains) can be readily identified in bacterial genome sequences by using standard bioinformatic tools. In contrast, identification of c-di-GMP receptors remains a difficult task, and the current list of experimentally characterized c-di-GMP-binding proteins is likely incomplete. Several classes of c-di-GMP-binding proteins have been structurally characterized; for some others, the binding sites have been identified; and for several potential c-di-GMP receptors, the binding sites remain to be determined. We present here a comparative structural analysis of c-di-GMP-protein complexes that aims to discern the common themes in the binding mechanisms that allow c-di-GMP receptors to bind it with (sub)micromolar affinities despite the 1,000-fold excess of GTP. The available structures show that most receptors use their Arg and Asp/Glu residues to bind c-di-GMP monomers, dimers, or tetramers with stacked guanine bases. The only exception is the EAL domains that bind c-di-GMP monomers in an extended conformation. We show that in c-di-GMP-binding signature motifs, Arg residues bind to the O-6 and N-7 atoms at the Hoogsteen edge of the guanine base, while Asp/Glu residues bind the N-1 and N-2 atoms at its Watson-Crick edge. In addition, Arg residues participate in stacking interactions with the guanine bases of c-di-GMP and the aromatic rings of Tyr and Phe residues. This may account for the presence of Arg residues in the active sites of every receptor protein that binds stacked c-di-GMP. We also discuss the implications of these structural data for the improved understanding of the c-di-GMP signaling mechanisms.
PMCID: PMC4686193  PMID: 26055114
3.  Nucleotide binding by the widespread high-affinity cyclic di-GMP receptor MshEN domain 
Nature Communications  2016;7:12481.
C-di-GMP is a bacterial second messenger regulating various cellular functions. Many bacteria contain c-di-GMP-metabolizing enzymes but lack known c-di-GMP receptors. Recently, two MshE-type ATPases associated with bacterial type II secretion system and type IV pilus formation were shown to specifically bind c-di-GMP. Here we report crystal structure of the MshE N-terminal domain (MshEN1-145) from Vibrio cholerae in complex with c-di-GMP at a 1.37 Å resolution. This structure reveals a unique c-di-GMP-binding mode, featuring a tandem array of two highly conserved binding motifs, each comprising a 24-residue sequence RLGxx(L/V/I)(L/V/I)xxG(L/V/I)(L/V/I)xxxxLxxxLxxQ that binds half of the c-di-GMP molecule, primarily through hydrophobic interactions. Mutating these highly conserved residues markedly reduces c-di-GMP binding and biofilm formation by V. cholerae. This c-di-GMP-binding motif is present in diverse bacterial proteins exhibiting binding affinities ranging from 0.5 μM to as low as 14 nM. The MshEN domain contains the longest nucleotide-binding motif reported to date.
Cyclic-di-GMP is a bacterial second messenger that binds to the regulatory domain of ATPases of some bacteria. Here, the authors report the crystal structure of this interaction, identify a cyclic-di-GMP binding mode, and show that this interaction might be important for bacterial biofilm formation.
PMCID: PMC5013675  PMID: 27578558
4.  Systematic Nomenclature for GGDEF and EAL Domain-Containing Cyclic Di-GMP Turnover Proteins of Escherichia coli 
Journal of Bacteriology  2015;198(1):7-11.
In recent years, Escherichia coli has served as one of a few model bacterial species for studying cyclic di-GMP (c-di-GMP) signaling. The widely used E. coli K-12 laboratory strains possess 29 genes encoding proteins with GGDEF and/or EAL domains, which include 12 diguanylate cyclases (DGC), 13 c-di-GMP-specific phosphodiesterases (PDE), and 4 “degenerate” enzymatically inactive proteins. In addition, six new GGDEF and EAL (GGDEF/EAL) domain-encoding genes, which encode two DGCs and four PDEs, have recently been found in genomic analyses of commensal and pathogenic E. coli strains. As a group of researchers who have been studying the molecular mechanisms and the genomic basis of c-di-GMP signaling in E. coli, we now propose a general and systematic dgc and pde nomenclature for the enzymatically active GGDEF/EAL domain-encoding genes of this model species. This nomenclature is intuitive and easy to memorize, and it can also be applied to additional genes and proteins that might be discovered in various strains of E. coli in future studies.
PMCID: PMC4686207  PMID: 26148715
5.  Bacterial cellulose biosynthesis: diversity of operons, subunits, products and functions 
Trends in microbiology  2015;23(9):545-557.
Recent studies of bacterial cellulose biosynthesis, including structural characterization of a functional cellulose synthase complex, provided the first mechanistic insight into this fascinating process. In most studied bacteria, just two subunits, BcsA and BcsB, are necessary and sufficient for the formation of the polysaccharide chain in vitro. Other subunits – which differ among various taxa – affect the enzymatic activity and product yield in vivo by modulating expression of biosynthesis apparatus, export of the nascent β-D-glucan polymer to the cell surface, and the organization of cellulose fibers into a higher-order structure. These auxiliary subunits play key roles in determining the quantity and structure of the resulting biofilm, which is particularly important for interactions of bacteria with higher organisms that lead to rhizosphere colonization and modulate virulence of cellulose-producing bacterial pathogens inside and outside of host cells. Here we review the organization of four principal types of cellulose synthase operons found in various bacterial genomes, identify additional bcs genes that encode likely components of the cellulose biosynthesis and secretion machinery, and propose a unified nomenclature for these genes and subunits. We also discuss the role of cellulose as a key component of biofilms formed by a variety of free-living and pathogenic bacteria and, for the latter, in the choice between acute infection and persistence in the host.
PMCID: PMC4676712  PMID: 26077867
bacterial genomes; bacterial host interaction; biofilm structure; environmental bacteria; polysaccharide export; nanocellulose
6.  Phylogenomic reconstruction of archaeal fatty acid metabolism 
Environmental microbiology  2014;16(4):907-918.
While certain archaea appear to synthesize and/or metabolize fatty acids, the respective pathways still remain obscure. By analyzing the genomic distribution of the key lipid-related enzymes, we were able to identify the likely components of the archaeal pathway of fatty acid metabolism, namely, a combination of the enzymes of bacterial-type β-oxidation of fatty acids (acyl-CoA-dehydrogenase, enoyl-CoA hydratase, and 3-hydroxyacyl-CoA dehydrogenase) with paralogs of the archaeal acetyl-CoA C-acetyltransferase, an enzyme of the mevalonate biosynthesis pathway. These three β-oxidation enzymes working in the reverse direction could potentially catalyze biosynthesis of fatty acids, with paralogs of acetyl-CoA C-acetyltransferase performing addition of C2 fragments. The presence in archaea of the genes for energy-transducing membrane enzyme complexes, such as cytochrome bc complex, cytochrome c oxidase, and diverse rhodopsins, was found to correlate with the presence of the proposed system of fatty acid biosynthesis. We speculate that because these membrane complexes functionally depend on fatty acid chains, their genes could have been acquired via lateral gene transfer from bacteria only by those archaea that already possessed a system of fatty acid biosynthesis. The proposed pathway of archaeal fatty acid metabolism operates in extreme conditions and therefore might be of interest in the context of biofuel production and other industrial applications.
PMCID: PMC4019937  PMID: 24818264
biofuels; β-oxidation; halobacteria; methanogens; rhodopsin; bioenergetics
7.  The 2016 database issue of Nucleic Acids Research and an updated molecular biology database collection 
Nucleic Acids Research  2015;44(Database issue):D1-D6.
The 2016 Database Issue of Nucleic Acids Research starts with overviews of the resources provided by three major bioinformatics centers, the U.S. National Center for Biotechnology Information (NCBI), the European Bioinformatics Institute (EMBL-EBI) and Swiss Institute for Bioinformatics (SIB). Also included are descriptions of 62 new databases and updates on 95 databases that have been previously featured in NAR plus 17 previously described elsewhere. A number of papers in this issue deal with resources on nucleic acids, including various kinds of non-coding RNAs and their interactions, molecular dynamics simulations of nucleic acid structure, and two databases of super-enhancers. The protein database section features important updates on the EBI's Pfam, PDBe and PRIDE databases, as well as a variety of resources on pathways, metabolomics and metabolic modeling. This issue also includes updates on popular metagenomics resources, such as MG-RAST, EBI Metagenomics, and probeBASE, as well as a newly compiled Human Pan-Microbe Communities database. A significant fraction of the new and updated databases are dedicated to the genetic basis of disease, primarily cancer, and various aspects of drug research, including resources for patented drugs, their side effects, withdrawn drugs, and potential drug targets. A further six papers present updated databases of various antimicrobial and anticancer peptides. The entire Database Issue is freely available online on the Nucleic Acids Research website ( The NAR online Molecular Biology Database Collection,, has been updated with the addition of 88 new resources and removal of 23 obsolete websites, which brought the current listing to 1685 databases.
PMCID: PMC4702933  PMID: 26740669
8.  Genome Diversity of Spore-Forming Firmicutes 
Microbiology spectrum  2013;1(2):TBS-0015-2012-.
Formation of heat-resistant endospores is a specific property of the members of the phylum Firmicutes (low-G+C Gram-positive bacteria). It is found in representatives of four different classes of Firmicutes: Bacilli, Clostridia, Erysipelotrichia, and Negativicutes, which all encode similar sets of core sporulation proteins. Each of these classes also includes non-spore-forming organisms that sometimes belong to the same genus or even species as their spore-forming relatives. This chapter reviews the diversity of the members of phylum Firmicutes, its current taxonomy, and the status of genome sequencing projects for various subgroups within the phylum. It also discusses the evolution of the Firmicutes from their apparently spore-forming common ancestor and the independent loss of sporulation genes in several different lineages (staphylococci, streptococci, listeria, lactobacilli, ruminococci) in the course of their adaptation to the saprophytic lifestyle in nutrient-rich environment. It argues that systematics of Firmicutes is a rapidly developing area of research that benefits from the evolutionary approaches to the ever-increasing amount of genomic and phenotypic data and allows arranging these data into a common framework. Later the Bacillus filaments begin to prepare for spore formation. In their homogenous contents strongly refracting bodies appear. From each of these bodies develops an oblong or shortly cylindrical, strongly refracting, dark-rimmed spore. Ferdinand Cohn. 1876. Untersuchungen über Bacterien. IV. Beiträge zur Biologie der Bacillen. Beiträge zur Biologie der Pflanzen, vol. 2, pp. 249–276. (Studies on the biology of the bacilli. In: Milestones in Microbiology: 1546 to 1940. Translated and edited by Thomas D. Brock. Prentice-Hall, Englewood Cliffs, NJ, 1961, pp. 49–56).
PMCID: PMC4306282  PMID: 25632373
9.  Systematic Identification of Cyclic-di-GMP Binding Proteins in Vibrio cholerae Reveals a Novel Class of Cyclic-di-GMP-Binding ATPases Associated with Type II Secretion Systems 
PLoS Pathogens  2015;11(10):e1005232.
Cyclic-di-GMP (c-di-GMP) is a ubiquitous bacterial signaling molecule that regulates a variety of complex processes through a diverse set of c-di-GMP receptor proteins. We have utilized a systematic approach to identify c-di-GMP receptors from the pathogen Vibrio cholerae using the Differential Radial Capillary Action of Ligand Assay (DRaCALA). The DRaCALA screen identified a majority of known c-di-GMP binding proteins in V. cholerae and revealed a novel c-di-GMP binding protein, MshE (VC0405), an ATPase associated with the mannose sensitive hemagglutinin (MSHA) type IV pilus. The known c-di-GMP binding proteins identified by DRaCALA include diguanylate cyclases, phosphodiesterases, PilZ domain proteins and transcription factors VpsT and VpsR, indicating that the DRaCALA-based screen of open reading frame libraries is a feasible approach to uncover novel receptors of small molecule ligands. Since MshE lacks the canonical c-di-GMP-binding motifs, a truncation analysis was utilized to locate the c-di-GMP binding activity to the N-terminal T2SSE_N domain. Alignment of MshE homologs revealed candidate conserved residues responsible for c-di-GMP binding. Site-directed mutagenesis of these candidate residues revealed that the Arg9 residue is required for c-di-GMP binding. The ability of c-di-GMP binding to MshE to regulate MSHA dependent processes was evaluated. The R9A allele, in contrast to the wild type MshE, was unable to complement the ΔmshE mutant for the production of extracellular MshA to the cell surface, reduction in flagella swimming motility, attachment to surfaces and formation of biofilms. Testing homologs of MshE for binding to c-di-GMP identified the type II secretion ATPase of Pseudomonas aeruginosa (PA14_29490) as a c-di-GMP receptor, indicating that type II secretion and type IV pili are both regulated by c-di-GMP.
Author Summary
Cyclic-di-GMP (c-di-GMP) is a ubiquitous bacterial signaling molecule that regulates important bacterial functions, including virulence, antibiotic resistance, biofilm formation and cell division. The list of known c-di-GMP receptors is clearly incomplete. Here we utilized a systematic and unbiased biochemical approach to identify c-di-GMP receptors from the 3,812 genes of the Vibrio cholerae genome. Results from this analysis identified most known c-di-GMP receptors as well as MshE, a protein not known to interact with c-di-GMP. The c-di-GMP binding site was identified at the N-terminus of MshE and requires a conserved arginine residue in the 9th position. MshE is the ATPase that powers the secretion of the MshA pili onto the surface of the bacteria. We show that c-di-GMP binding to MshE is required for MshA export and the function of the pili in attachment and biofilm formation. ATPases responsible for related processes such as type IV pili and type II secretion were also tested for c-di-GMP binding, which identified the P. aeruginosa ATPase PA14_29490 as another c-di-GMP binding protein. These findings reveal a new class of c-di-GMP receptor and raise the possibility that c-di-GMP regulate membrane complexes through direct interaction with related type II secretion and type IV pili ATPases.
PMCID: PMC4624772  PMID: 26506097
10.  Eukaryotic G protein-coupled receptors as descendants of prokaryotic sodium-translocating rhodopsins 
Biology Direct  2015;10:63.
Microbial rhodopsins and G-protein coupled receptors (GPCRs, which include animal rhodopsins) are two distinct (super) families of heptahelical (7TM) membrane proteins that share obvious structural similarities but no significant sequence similarity. Comparison of the recently solved high-resolution structures of the sodium-translocating bacterial rhodopsin and various Na+-binding GPCRs revealed striking similarity of their sodium-binding sites. This similarity allowed us to construct a structure-guided sequence alignment for the two (super)families, which highlighted their evolutionary relatedness. Our analysis supports a common underlying molecular mechanism for both families that involves a highly conserved aromatic residue playing a pivotal role in rotation of the 6th transmembrane helix.
This article was reviewed by Oded Beja, G. P. S. Raghava and L. Aravind.
Electronic supplementary material
The online version of this article (doi:10.1186/s13062-015-0091-4) contains supplementary material, which is available to authorized users.
PMCID: PMC4608122  PMID: 26472483
Vision; Bacteriorhodopsin; Halorhodopsin; Sensory rhodopsin; Proteorhodopsin; Opioid receptor; GPCR; Evolution; Signal transduction; Chemoreceptor
11.  GIL, a new c-di-GMP binding protein domain involved in regulation of cellulose synthesis in enterobacteria 
Molecular microbiology  2014;93(3):439-452.
In contrast to numerous enzymes involved in c-di-GMP synthesis and degradation in enterobacteria, only a handful of c-di-GMP receptors/effectors have been identified. In search of new c-di-GMP receptors, we screened the Escherichia coli ASKA overexpression gene library using the Differential Radial Capillary Action of Ligand Assay (DRaCALA) with fluorescently and radioisotope-labeled c-di-GMP. We uncovered three new candidate c-di-GMP receptors in E. coli and characterized one of them, BcsE. The bcsE gene is encoded in cellulose synthase operons in representatives of Gammaproteobacteria and Betaproteobacteria. The purified BcsE proteins from E. coli, Salmonella enterica and Klebsiella pneumoniae bind c-di-GMP via the domain of unknown function, DUF2819, which is hereby designated GIL, GGDEF I-site like domain. The RxGD motif of the GIL domain is required for c-di-GMP binding, similar to the c-di-GMP-binding I-site of the diguanylate cyclase GGDEF domain. Thus, GIL is the second protein domain, after PilZ, dedicated to c-di-GMP-binding. We show that in S. enterica, BcsE is not essential for cellulose synthesis but is required for maximal cellulose production, and that c-di-GMP binding is critical for BcsE function. It appears that cellulose production in enterobacteria is controlled by a two-tiered c-di-GMP-dependent system involving BcsE and the PilZ domain containing glycosyltransferase BcsA.
PMCID: PMC4116459  PMID: 24942809
12.  Modeling of interaction between cytochrome c and the WD domains of Apaf-1: bifurcated salt bridges underlying apoptosome assembly 
Biology Direct  2015;10:29.
Binding of cytochrome c, released from the damaged mitochondria, to the apoptotic protease activating factor 1 (Apaf-1) is a key event in the apoptotic signaling cascade. The binding triggers a major domain rearrangement in Apaf-1, which leads to oligomerization of Apaf-1/cytochrome c complexes into an apoptosome. Despite the availability of crystal structures of cytochrome c and Apaf-1 and cryo-electron microscopy models of the entire apoptosome, the binding mode of cytochrome c to Apaf-1, as well as the nature of the amino acid residues of Apaf-1 involved remain obscure.
We investigated the interaction between cytochrome c and Apaf-1 by combining several modeling approaches. We have applied protein-protein docking and energy minimization, evaluated the resulting models of the Apaf-1/cytochrome c complex, and carried out a further analysis by means of molecular dynamics simulations. We ended up with a single model structure where all the lysine residues of cytochrome c that are known as functionally-relevant were involved in forming salt bridges with acidic residues of Apaf-1. This model has revealed three distinctive bifurcated salt bridges, each involving a single lysine residue of cytochrome c and two neighboring acidic resides of Apaf-1. Salt bridge-forming amino acids of Apaf-1 showed a clear evolutionary pattern within Metazoa, with pairs of acidic residues of Apaf-1, involved in bifurcated salt bridges, reaching their highest numbers in the sequences of vertebrates, in which the cytochrome c-mediated mechanism of apoptosome formation seems to be typical.
The reported model of an Apaf-1/cytochrome c complex provides insights in the nature of protein-protein interactions which are hard to observe in crystallographic or electron microscopy studies. Bifurcated salt bridges can be expected to be stronger than simple salt bridges, and their formation might promote the conformational change of Apaf-1, leading to the formation of an apoptosome. Combination of structural and sequence analyses provides hints on the evolution of the cytochrome c-mediated apoptosis.
This article was reviewed by Andrei L. Osterman, Narayanaswamy Srinivasan, Igor N. Berezovsky, and Gerrit Vriend (nominated by Martijn Huynen).
Electronic supplementary material
The online version of this article (doi:10.1186/s13062-015-0059-4) contains supplementary material, which is available to authorized users.
PMCID: PMC4445527  PMID: 26014357
Apoptosis; WD40 domains; Hydrogen bond; Salt bridge; Apoptosis; Protein-protein interactions; Caspase; Molecular dynamics simulations; Sequence analysis; Evolution
13.  Identification of sensory and signal-transducing domains in two-component signaling systems 
Methods in enzymology  2007;422:47-74.
The availability of complete genome sequences of diverse bacteria and archaea makes comparative sequence analysis a powerful tool for analyzing signal transduction systems encoded in these genomes. However, most signal transduction proteins consist of two or more individual protein domains, which significantly complicates their functional annotation and makes automated annotation of these proteins in the course of large-scale genome sequencing projects particularly unreliable. We describe here certain common-sense protocols for sequence analysis of two-component histidine kinases and response regulators, as well as other components of the prokaryotic signal transduction machinery: Ser/Thr/Tyr protein kinases and protein phosphatases, adenylate and diguanylate cyclases and c-di-GMP phosphodiesterases. These protocols rely on publicly available computational tools and databases and can be utilized by anyone with an Internet access.
PMCID: PMC4445681  PMID: 17628134
14.  Globins Synthesize the Second Messenger c-di-GMP in Bacteria 
Journal of molecular biology  2009;388(2):262-270.
Globin-coupled sensors (GCS) are heme-binding signal transducers in Bacteria and Archaea where an N-terminal globin controls the activity of a variable C-terminal domain. Here we report that BpeGReg, a globin-coupled diguanylate cyclase (GCDC) from the whooping-cough pathogen Bordetella pertussis, synthesizes the second messenger bis-(3’–5’)-cyclic diguanosine monophosphate (c-di-GMP) upon oxygen binding. Expression of BpeGReg in Salmonella typhimurium enhances biofilm formation, while knockout of the BpeGReg gene of B. pertussis results in decreased biofilm formation. These results represent the first identification of a gaseous ligand for any diguanylate cyclase and provide definitive experimental evidence that a globin-coupled sensor regulates c-di-GMP synthesis and biofilm formation. We propose that the synthesis of c-di-GMP by globin sensors is a widespread phenomenon in bacteria.
PMCID: PMC4301737  PMID: 19285985
globin; oxygen sensor; c-di-GMP; diguanylate cyclase; biofilm
15.  The 2015 Nucleic Acids Research Database Issue and Molecular Biology Database Collection 
Nucleic Acids Research  2014;43(Database issue):D1-D5.
The 2015 Nucleic Acids Research Database Issue contains 172 papers that include descriptions of 56 new molecular biology databases, and updates on 115 databases whose descriptions have been previously published in NAR or other journals. Following the classification that has been introduced last year in order to simplify navigation of the entire issue, these articles are divided into eight subject categories. This year's highlights include RNAcentral, an international community portal to various databases on noncoding RNA; ValidatorDB, a validation database for protein structures and their ligands; SASBDB, a primary repository for small-angle scattering data of various macromolecular complexes; MoonProt, a database of ‘moonlighting’ proteins, and two new databases of protein–protein and other macromolecular complexes, ComPPI and the Complex Portal. This issue also includes an unusually high number of cancer-related databases and other databases dedicated to genomic basics of disease and potential drugs and drug targets. The size of NAR online Molecular Biology Database Collection,, remained approximately the same, following the addition of 74 new resources and removal of 77 obsolete web sites. The entire Database Issue is freely available online on the Nucleic Acids Research web site (
PMCID: PMC4383995  PMID: 25593347
16.  Expanded microbial genome coverage and improved protein family annotation in the COG database 
Nucleic Acids Research  2014;43(Database issue):D261-D269.
Microbial genome sequencing projects produce numerous sequences of deduced proteins, only a small fraction of which have been or will ever be studied experimentally. This leaves sequence analysis as the only feasible way to annotate these proteins and assign to them tentative functions. The Clusters of Orthologous Groups of proteins (COGs) database (, first created in 1997, has been a popular tool for functional annotation. Its success was largely based on (i) its reliance on complete microbial genomes, which allowed reliable assignment of orthologs and paralogs for most genes; (ii) orthology-based approach, which used the function(s) of the characterized member(s) of the protein family (COG) to assign function(s) to the entire set of carefully identified orthologs and describe the range of potential functions when there were more than one; and (iii) careful manual curation of the annotation of the COGs, aimed at detailed prediction of the biological function(s) for each COG while avoiding annotation errors and overprediction. Here we present an update of the COGs, the first since 2003, and a comprehensive revision of the COG annotations and expansion of the genome coverage to include representative complete genomes from all bacterial and archaeal lineages down to the genus level. This re-analysis of the COGs shows that the original COG assignments had an error rate below 0.5% and allows an assessment of the progress in functional genomics in the past 12 years. During this time, functions of many previously uncharacterized COGs have been elucidated and tentative functional assignments of many COGs have been validated, either by targeted experiments or through the use of high-throughput methods. A particularly important development is the assignment of functions to several widespread, conserved proteins many of which turned out to participate in translation, in particular rRNA maturation and tRNA modification. The new version of the COGs is expected to become an important tool for microbial genomics.
PMCID: PMC4383993  PMID: 25428365
17.  Evolution of cytochrome bc complexes: from membrane-anchored dehydrogenases of ancient bacteria to triggers of apoptosis in vertebrates 
Biochimica et biophysica acta  2013;1827(0):10.1016/j.bbabio.2013.07.006.
This review traces the evolution of the cytochrome bc complexes from their early spread among prokaryotic lineages and up to the mitochondrial cytochrome bc1 complex (complex III) and its role in apoptosis. The results of phylogenomic analysis suggest that the bacterial cytochrome b6f-type complexes with short cytochromes b were the ancient form that preceded in evolution the cytochrome bc1-type complexes with long cytochromes b. The common ancestor of the b6f-type and the bc1-type complexes probably resembled the b6f-type complexes found in Heliobacteriaceae and in some Planctomycetes. Lateral transfers of cytochrome bc operons could account for the several instances of acquisition of different types of bacterial cytochrome bc complexes by archaea. The gradual oxygenation of the atmosphere could be the key evolutionary factor that has driven further divergence and spread of the cytochrome bc complexes. On one hand, oxygen could be used as a very efficient terminal electron acceptor. On the other hand, auto-oxidation of the components of the bc complex results in the generation of reactive oxygen species (ROS), which necessitated diverse adaptations of the b6f-type and bc1-type complexes, as well as other, functionally coupled proteins. A detailed scenario of the gradual involvement of the cardiolipin-containing mitochondrial cytochrome bc1 complex into the intrinsic apoptotic pathway is proposed, where the functioning of the complex as an apoptotic trigger is viewed as a way to accelerate the elimination of the cells with irreparably damaged, ROS-producing mitochondria.
PMCID: PMC3839093  PMID: 23871937
bioenergetics; molecular evolution; ubiquinol:cytochrome c oxidoreductase; ubiquinone; plastoquinone; cytochrome c; cardiolipin; cell death; photosynthesis; apoptosome
18.  Interplay of heritage and habitat in the distribution of bacterial signal transduction systems 
Molecular bioSystems  2010;6(4):721-728.
Comparative analysis of the complete genome sequences from a variety of poorly studied organisms aims at predicting ecological and behavioral properties of these organisms and help in characterizing their habitats. This task requires finding appropriate descriptors that could be correlated with the core traits of each system and would allow meaningful comparisons. Using the relatively simple bacterial models, first attempts have been made to introduce suitable metrics to describe the complexity of organism’s signaling machinery, which included introducing the “bacterial IQ” score. Here, we use an updated census of prokaryotic signal transduction systems to improve this parameter and evaluate its consistency within selected bacterial phyla. We also introduce a more elaborate descriptor, a set of profiles of relative abundance of members of each family of signal transduction proteins encoded in each genome. We show that these family profiles are well conserved within each genus and are often consistent within families of bacteria. Thus, they reflect evolutionary relationships between organisms as well as individual adaptations of each organism to its specific ecological niche.
PMCID: PMC3071642  PMID: 20237650
comparative genomics; evolution; protein phosphorylation; receptor; Mycobacterium; Shewanella
19.  A genomic update on clostridial phylogeny: Gram-negative spore-formers and other misplaced clostridia 
Environmental microbiology  2013;15(10):2631-2641.
The class Clostridia in the phylum Firmicutes (formerly low-G+C Gram-positive bacteria) includes diverse bacteria of medical, environmental, and biotechnological importance. The Selenomonas-Megasphaera-Sporomusa branch, which unifies members of the Firmicutes with Gram-negative-type cell envelopes, was recently moved from Clostridia to a separate class Negativicutes. However, draft genome sequences of the spore-forming members of the Negativicutes revealed typically clostridial sets of sporulation genes. To address this and other questions in clostridial phylogeny, we have compared a phylogenetic tree for a concatenated set of 50 widespread ribosomal proteins with the trees for beta subunits of the RNA polymerase (RpoB) and DNA gyrase (GyrB) and with the 16S rRNA-based phylogeny. The results obtained by these methods showed remarkable consistency, suggesting that they reflect the true evolutionary history of these bacteria. These data put the Selenomonas-Megasphaera-Sporomusa group back within the Clostridia. They also support placement of Clostridium difficile and its close relatives within the family Peptostreptococcaceae; we suggest resolving the long-standing naming conundrum by renaming it Peptoclostridium difficile. These data also indicate the existence of a group of cellulolytic clostridia that belong to the family Ruminococcaceae. As a tentative solution to resolve the current taxonomical problems, we propose assigning 78 validly described Clostridium species that clearly fall outside the family Clostridiaceae to six new genera: Peptoclostridium, Lachnoclostridium, Ruminiclostridium, Erysipelatoclostridium, Gottschalkia, and Tyzzerella. This work reaffirms that 16S rRNA and ribosomal protein sequences are better indicators of evolutionary proximity than phenotypic traits, even such key ones as the structure of the cell envelope and Gram-staining pattern.
PMCID: PMC4056668  PMID: 23834245
Sporulation; taxonomy; Gram staining; cellulose; xylan; Clostridium difficile
20.  Open Questions on the Origin of Life at Anoxic Geothermal Fields 
We have recently reconstructed the ‘hatcheries’ of the first cells by combining geochemical analysis with phylogenomic scrutiny of the inorganic ion requirements of universal components of modern cells (Mulkidjanian et al.: Origin of first cells at terrestrial, anoxic geothermal fields. Proc Natl Acad Sci USA 2012, 109:E821–830). These ubiquitous, and by inference primordial, proteins and functional systems show affinity to and functional requirement for K+, Zn2+, Mn2+, and phosphate. Thus, protocells must have evolved in habitats with a high K+/Na+ ratio and relatively high concentrations of Zn, Mn and phosphorous compounds. Geochemical reconstruction shows that the ionic composition conducive to the origin of cells could not have existed in marine settings but is compatible with emissions of vapor-dominated zones of inland geothermal systems. Under anoxic, CO2-dominated atmosphere, the ionic composition of pools of cool, condensed vapor at anoxic geothermal fields would resemble the internal milieu of modern cells. Such pools would be lined with porous silicate minerals mixed with metal sulfides and enriched in K+ ions and phosphorous compounds.
Here we address some questions that have appeared in print after the publication of our anoxic geothermal field scenario. We argue that anoxic geothermal fields, which were identified as likely cradles of life by using a top-down approach and phylogenomics analysis as a tool, could provide geochemical conditions similar to those which were suggested as most conducive for the emergence of life by the chemists who pursuit the complementary bottom-up strategy.
PMCID: PMC3997052  PMID: 23132762
21.  The Role of Energy in the Emergence of Biology from Chemistry 
Any scenario of the transition from chemistry to biology should include an “energy module” because life can exist only when supported by energy flow(s). We addressed the problem of primordial energetics by combining physico-chemical considerations with phylogenomic analysis. We propose that the first replicators could use abiotically formed, exceptionally photostable activated nucleotides both as building blocks and as the main energy source. Nucleoside triphosphates could replace cyclic nucleotides as the principal energy-rich compounds at the stage of the first cells, presumably because the metal chelates of nucleoside triphosphates penetrated membranes much better than the respective metal complexes of nucleoside monophosphates. The ability to exploit natural energy flows for biogenic production of energy-rich molecules could evolve only gradually, after the emergence of sophisticated enzymes and ion-tight membranes. We argue that, in the course of evolution, sodium-dependent membrane energetics preceded the proton-based energetics which evolved independently in bacteria and archaea.
PMCID: PMC3974900  PMID: 23100130
22.  How many signal peptides are there in bacteria? 
Environmental microbiology  2013;15(4):983-990.
Over the last five years proteogenomics (using mass spectroscopy to identify proteins predicted from genomic sequences) has emerged as a promising approach to the high-throughput identification of protein N-termini, which remains a problem in genome annotation. Comparison of the experimentally determined N-termini with those predicted by sequence analysis tools allows identification of the signal peptides and therefore conclusions on the cytoplasmic or extracytoplasmic (periplasmic or extracellular) localization of the respective proteins. We present here the results of a proteogenomic study of the signal peptides in Escherichia coli K-12 and compare its results with the available experimental data and predictions by such software tools as SignalP and Phobius. A single proteogenomics experiment recovered more than a third of all signal peptides that had been experimentally determined during the past three decades and confirmed at least 31additional signal peptides, mostlyin the known exported proteins, which had been previously predicted but not validated. The filtering of putative signal peptides for the peptide length and the presence of an eight-residue hydrophobic patch and a typical signal peptidase cleavage site proved sufficient to eliminate the false-positive hits. Surprisingly, the results of this proteogenomics study, as well as a re-analysis of the E. coli genome with the latest version of SignalP program, show that the fraction of proteins containing signal peptides is only about 10%, or half of previous estimates.
PMCID: PMC3621014  PMID: 23556536
23.  Cyclic di-GMP: the First 25 Years of a Universal Bacterial Second Messenger 
Twenty-five years have passed since the discovery of cyclic dimeric (3′→5′) GMP (cyclic di-GMP or c-di-GMP). From the relative obscurity of an allosteric activator of a bacterial cellulose synthase, c-di-GMP has emerged as one of the most common and important bacterial second messengers. Cyclic di-GMP has been shown to regulate biofilm formation, motility, virulence, the cell cycle, differentiation, and other processes. Most c-di-GMP-dependent signaling pathways control the ability of bacteria to interact with abiotic surfaces or with other bacterial and eukaryotic cells. Cyclic di-GMP plays key roles in lifestyle changes of many bacteria, including transition from the motile to the sessile state, which aids in the establishment of multicellular biofilm communities, and from the virulent state in acute infections to the less virulent but more resilient state characteristic of chronic infectious diseases. From a practical standpoint, modulating c-di-GMP signaling pathways in bacteria could represent a new way of controlling formation and dispersal of biofilms in medical and industrial settings. Cyclic di-GMP participates in interkingdom signaling. It is recognized by mammalian immune systems as a uniquely bacterial molecule and therefore is considered a promising vaccine adjuvant. The purpose of this review is not to overview the whole body of data in the burgeoning field of c-di-GMP-dependent signaling. Instead, we provide a historic perspective on the development of the field, emphasize common trends, and illustrate them with the best available examples. We also identify unresolved questions and highlight new directions in c-di-GMP research that will give us a deeper understanding of this truly universal bacterial second messenger.
PMCID: PMC3591986  PMID: 23471616
24.  The 2014 Nucleic Acids Research Database Issue and an updated NAR online Molecular Biology Database Collection 
Nucleic Acids Research  2013;42(Database issue):D1-D6.
The 2014 Nucleic Acids Research Database Issue includes descriptions of 58 new molecular biology databases and recent updates to 123 databases previously featured in NAR or other journals. For convenience, the issue is now divided into eight sections that reflect major subject categories. Among the highlights of this issue are six databases of the transcription factor binding sites in various organisms and updates on such popular databases as CAZy, Database of Genomic Variants (DGV), dbGaP, DrugBank, KEGG, miRBase, Pfam, Reactome, SEED, TCDB and UniProt. There is a strong block of structural databases, which includes, among others, the new RNA Bricks database, updates on PDBe, PDBsum, ArchDB, Gene3D, ModBase, Nucleic Acid Database and the recently revived iPfam database. An update on the NCBI’s MMDB describes VAST+, an improved tool for protein structure comparison. Two articles highlight the development of the Structural Classification of Proteins (SCOP) database: one describes SCOPe, which automates assignment of new structures to the existing SCOP hierarchy; the other one describes the first version of SCOP2, with its more flexible approach to classifying protein structures. This issue also includes a collection of articles on bacterial taxonomy and metagenomics, which includes updates on the List of Prokaryotic Names with Standing in Nomenclature (LPSN), Ribosomal Database Project (RDP), the Silva/LTP project and several new metagenomics resources. The NAR online Molecular Biology Database Collection,, has been expanded to 1552 databases. The entire Database Issue is freely available online on the Nucleic Acids Research website (
PMCID: PMC3965027  PMID: 24316579
25.  New metrics for comparative genomics 
Current opinion in biotechnology  2006;17(5):440-447.
The availability of genome sequences from a variety of organisms presents an opportunity to apply this sequence information to solving the key problems of molecular biology. One of the principal roadblocks on this path is the lack of appropriate descriptors and metrics that could succinctly represent the new knowledge stemming from the genomic data. Several new metrics have recently been used in comparative genome analysis, yet challenges remain in finding an appropriate language for the emerging discipline of systems biology.
PMCID: PMC1764326  PMID: 16978854

Results 1-25 (86)