Toll-like receptor 5 (TLR5) binding to bacterial flagellin activates NF-κB signaling and triggers an innate immune response to the invading pathogen. To elucidate the structural basis and mechanistic implications of TLR5-flagellin recognition, we determined the crystal structure of zebrafish TLR5, as a VLR-hybrid protein, in complex with the D1/D2 fragment of Salmonella flagellin, FliC, at 2.47 Å resolution. TLR5 interacts primarily with the three helices of the FliC D1 domain using its lateral side. Two TLR5-FliC 1:1 heterodimers assemble into a 2:2 tail-to-tail signaling complex that is stabilized by quaternary contacts of the FliC D1 domain with the convex surface of the opposing TLR5. The proposed signaling mechanism is supported by structure-guided mutagenesis and deletion analysis on CBLB502, a therapeutic protein derived from FliC.
Large and functionally heterogeneous families of transcription factors have complex evolutionary histories. What shapes specificities toward effectors and DNA sites in paralogous regulators is a fundamental question in biology. Bacteria from the deep-branching lineage Thermotogae possess multiple paralogs of the repressor, open reading frame, kinase (ROK) family regulators that are characterized by carbohydrate-sensing domains shared with sugar kinases. We applied an integrated genomic approach to study functions and specificities of regulators from this family. A comparative analysis of 11 Thermotogae genomes revealed novel mechanisms of transcriptional regulation of the sugar utilization networks, DNA-binding motifs and specific functions. Reconstructed regulons for seven groups of ROK regulators were validated by DNA-binding assays using purified recombinant proteins from the model bacterium Thermotoga maritima. All tested regulators demonstrated specific binding to their predicted cognate DNA sites, and this binding was inhibited by specific effectors, mono- or disaccharides from their respective sugar catabolic pathways. By comparing ligand-binding domains of regulators with structurally characterized kinases from the ROK family, we elucidated signature amino acid residues determining sugar-ligand regulator specificity. Observed correlations between signature residues and the sugar-ligand specificities provide the framework for structure functional classification of the entire ROK family.
Proline metabolism is linked to hyperprolinemia, schizophrenia, cutis laxa, and cancer. In the latter case, tumor cells tend to rely on proline biosynthesis rather than salvage. Proline is synthesized from either glutamate or ornithine; both are converted to pyrroline-5-carboxylate (P5C), and then to proline via pyrroline-5-carboxylate reductases (PYCRs). Here, the role of three isozymic versions of PYCR was addressed in human melanoma cells by tracking the fate of 13C-labeled precursors. Based on these studies we conclude that PYCR1 and PYCR2, which are localized in the mitochondria, are primarily involved in conversion of glutamate to proline. PYCRL, localized in the cytosol, is exclusively linked to the conversion of ornithine to proline. This analysis provides the first clarification of the role of PYCRs to proline biosynthesis.
Redox-sensing repressor Rex was previously implicated in the control of anaerobic respiration in response to the cellular NADH/NAD+ levels in Gram-positive bacteria. We utilized the comparative genomics approach to infer candidate Rex-binding DNA motifs and assess the Rex regulon content in 119 genomes from 11 taxonomic groups. Both DNA-binding and NAD-sensing domains are broadly conserved in Rex orthologs identified in the phyla Firmicutes, Thermotogales, Actinobacteria, Chloroflexi, Deinococcus-Thermus, and Proteobacteria. The identified DNA-binding motifs showed significant conservation in these species, with the only exception detected in Clostridia, where the Rex motif deviates in two positions from the generalized consensus, TTGTGAANNNNTTCACAA. Comparative analysis of candidate Rex sites revealed remarkable variations in functional repertoires of candidate Rex-regulated genes in various microorganisms. Most of the reconstructed regulatory interactions are lineage specific, suggesting frequent events of gain and loss of regulator binding sites in the evolution of Rex regulons. We identified more than 50 novel Rex-regulated operons encoding functions that are essential for resumption of the NADH:NAD+ balance. The novel functional role of Rex in the control of the central carbon metabolism and hydrogen production genes was validated by in vitro DNA binding assays using the TM0169 protein in the hydrogen-producing bacterium Thermotoga maritima.
Limited or regulatory proteolysis plays a critical role in many important biological pathways like blood coagulation, cell proliferation, and apoptosis. A better understanding of mechanisms that control this process is required for discovering new proteolytic events and for developing inhibitors with potential therapeutic value. Two features that determine the susceptibility of peptide bonds to proteolysis are the sequence in the vicinity of the scissile bond and the structural context in which the bond is displayed. In this study we assessed statistical significance and predictive power of individual structural descriptors and combination thereof for the identification of cleavage sites. The analysis was performed on a dataset of >200 proteolytic events documented in CutDB for a variety of mammalian regulatory proteases and their physiological substrates with known 3D structures. The results confirmed the significance and provided a ranking within three main categories of structural features: exposure > flexibility > local interactions. Among secondary structure elements, the largest frequency of proteolytic cleavage was confirmed for loops and lower but significant frequency for helices. Limited proteolysis has lower albeit appreciable frequency of occurrence in certain types of β-strands, which is in contrast with some previous reports. Descriptors deduced directly from the amino acid sequence displayed only marginal predictive capabilities. Homology-based structural models showed a predictive performance comparable to protein substrates with experimentally established structures. Overall, this study provided a foundation for accurate automated prediction of segments of protein structure susceptible to proteolytic processing and, potentially, other post-translational modifications.
proteolysis; proteolytic processing; limited proteolysis; regulatory proteolysis; protease; cleavage site; cleavage site prediction
NAD is a ubiquitous and essential metabolic redox cofactor which also functions as a substrate in certain regulatory pathways. The last step of NAD synthesis is the ATP-dependent amidation of deamido-NAD by NAD synthetase (NADS). Members of the NADS family are present in nearly all species across the three kingdoms of Life. In eukaryotic NADS, the core synthetase domain is fused with a nitrilase-like glutaminase domain supplying ammonia for the reaction. This two-domain NADS arrangement enabling the utilization of glutamine as nitrogen donor is also present in various bacterial lineages. However, many other bacterial members of NADS family do not contain a glutaminase domain, and they can utilize only ammonia (but not glutamine) in vitro. A single-domain NADS is also characteristic for nearly all Archaea, and its dependence on ammonia was demonstrated here for the representative enzyme from Methanocaldococcus jannaschi. However, a question about the actual in vivo nitrogen donor for single-domain members of the NADS family remained open: Is it glutamine hydrolyzed by a committed (but yet unknown) glutaminase subunit, as in most ATP-dependent amidotransferases, or free ammonia as in glutamine synthetase? Here we addressed this dilemma by combining evolutionary analysis of the NADS family with experimental characterization of two representative bacterial systems: a two-subunit NADS from Thermus thermophilus and a single-domain NADS from Salmonella typhimurium providing evidence that ammonia (and not glutamine) is the physiological substrate of a typical single-domain NADS. The latter represents the most likely ancestral form of NADS. The ability to utilize glutamine appears to have evolved via recruitment of a glutaminase subunit followed by domain fusion in an early branch of Bacteria. Further evolution of the NADS family included lineage-specific loss of one of the two alternative forms and horizontal gene transfer events. Lastly, we identified NADS structural elements associated with glutamine-utilizing capabilities.
Transcriptional regulatory networks are fine-tuned systems that help microorganisms respond to changes in the environment and cell physiological state. We applied the comparative genomics approach implemented in the RegPredict Web server combined with SEED subsystem analysis and available information on known regulatory interactions for regulatory network reconstruction for the human pathogen Staphylococcus aureus and six related species from the family Staphylococcaceae. The resulting reference set of 46 transcription factor regulons contains more than 1,900 binding sites and 2,800 target genes involved in the central metabolism of carbohydrates, amino acids, and fatty acids; respiration; the stress response; metal homeostasis; drug and metal resistance; and virulence. The inferred regulatory network in S. aureus includes ∼320 regulatory interactions between 46 transcription factors and ∼550 candidate target genes comprising 20% of its genome. We predicted ∼170 novel interactions and 24 novel regulons for the control of the central metabolic pathways in S. aureus. The reconstructed regulons are largely variable in the Staphylococcaceae: only 20% of S. aureus regulatory interactions are conserved across all studied genomes. We used a large-scale gene expression data set for S. aureus to assess relationships between the inferred regulons and gene expression patterns. The predicted reference set of regulons is captured within the Staphylococcus collection in the RegPrecise database (http://regprecise.lbl.gov).
Bacterial nicotinate mononucleotide adenylyltransferase encoded by the essential gene nadD plays a central role in the synthesis of the redox cofactor NAD+. The NadD enzyme is conserved in the majority of bacterial species and has been recognized as a novel target for developing new and potentially broad-spectrum antibacterial therapeutics. Here we report the crystal structures of Bacillus anthracis NadD in complex with three NadD inhibitors, including two analogues synthesized in the present study. These structures revealed a common binding site shared by different classes of NadD inhibitors and explored the chemical environment surrounding this site. The structural data obtained here also showed that the subtle changes in ligand structure can lead to significant changes in the binding mode, information that will be useful for future structure-based optimization and design of high affinity inhibitors.
The constitutive activation of the anoxic redox control transcriptional regulator (ArcA) in Escherichia coli during aerobic growth, with the consequent production of a strain that exhibits anaerobic physiology even in the presence of air, is reported in this work. Removal of three terminal cytochrome oxidase genes (cydAB, cyoABCD, and cbdAB) and a quinol monooxygenase gene (ygiN) from the E. coli K-12 MG1655 genome resulted in the activation of ArcA aerobically. These mutations resulted in reduction of the oxygen uptake rate by nearly 98% and production of d-lactate as a sole by-product under oxic and anoxic conditions. The knockout strain exhibited nearly identical physiological behaviors under both conditions, suggesting that the mutations resulted in significant metabolic and regulatory perturbations. In order to fully understand the physiology of this mutant and to identify underlying metabolic and regulatory reasons that prevent the transition from an aerobic to an anaerobic phenotype, we utilized whole-genome transcriptome analysis, 13C tracing experiments, and physiological characterization. Our analysis showed that the deletions resulted in the activation of anaerobic respiration under oxic conditions and a consequential shift in the content of the quinone pool from ubiquinones to menaquinones. An increase in menaquinone concentration resulted in the activation of ArcA. The activation of the ArcB/ArcA regulatory system led to a major shift in the metabolic flux distribution through the central metabolism of the mutant strain. Flux analysis indicated that the mutant strain had undetectable fluxes around the tricarboxylic acid (TCA) cycle and elevated flux through glycolysis and anaplerotic input to oxaloacetate. Flux and transcriptomics data were highly correlated and showed similar patterns.
The emergence of multidrug-resistant pathogens necessitates the search for new antibiotics acting on previously unexplored targets. Nicotinate mononucleotide adenylyltransferase of the NadD family, an essential enzyme of NAD biosynthesis in most bacteria, was selected as a target for structure-based inhibitor development. Using iterative in silico and in vitro screens we identified small molecule compounds that efficiently inhibited target enzymes from Escherichia coli (ecNadD) and Bacillus anthracis (baNadD) but had no effect on functionally equivalent human enzymes. On-target antibacterial activity was demonstrated for some of the selected inhibitors. A 3D structure of baNadD was solved in complex with one of these inhibitors (3_02) providing mechanistic insights and guidelines for further improvement. Most importantly, the results of this study help validate NadD as a target for the development of antibacterial agents with potential broad-spectrum activity.
The specific and tightly controlled transport of numerous nutrients and metabolites across cellular membranes is crucial to all forms of life. However, many of the transporter proteins involved have yet to be identified, including the vitamin transporters in various human pathogens, whose growth depends strictly on vitamin uptake. Comparative analysis of the ever-growing collection of microbial genomes coupled with experimental validation enables the discovery of such transporters. Here, we used this approach to discover an abundant class of vitamin transporters in prokaryotes with an unprecedented architecture. These transporters have energy-coupling modules comprised of a conserved transmembrane protein and two nucleotide binding proteins similar to those of ATP binding cassette (ABC) transporters, but unlike ABC transporters, they use small integral membrane proteins to capture specific substrates. We identified 21 families of these substrate capture proteins, each with a different specificity predicted by genome context analyses. Roughly half of the substrate capture proteins (335 cases) have a dedicated energizing module, but in 459 cases distributed among almost 100 gram-positive bacteria, including numerous human pathogens, different and unrelated substrate capture proteins share the same energy-coupling module. The shared use of energy-coupling modules was experimentally confirmed for folate, thiamine, and riboflavin transporters. We propose the name energy-coupling factor transporters for the new class of membrane transporters.
The Proteolysis MAP (PMAP, http://www.proteolysis.org) is a user-friendly website intended to aid the scientific community in reasoning about proteolytic networks and pathways. PMAP is comprised of five databases, linked together in one environment. The foundation databases, ProteaseDB and SubstrateDB, are driven by an automated annotation pipeline that generates dynamic ‘Molecule Pages’, rich in molecular information. PMAP also contains two community annotated databases focused on function; CutDB has information on more than 5000 proteolytic events, and ProfileDB is dedicated to information of the substrate recognition specificity of proteases. Together, the content within these four databases will ultimately feed PathwayDB, which will be comprised of known pathways whose function can be dynamically modeled in a rule-based manner, and hypothetical pathways suggested by semi-automated culling of the literature. A Protease Toolkit is also available for the analysis of proteases and proteolysis. Here, we describe how the databases of PMAP can be used to foster understanding of proteolytic pathways, and equally as significant, to reason about proteolysis.
Members of a novel glycerate-2-kinase (GK-II) family were tentatively identified in a broad range of species, including eukaryotes and archaea and many bacteria that lack a canonical enzyme of the GarK (GK-I) family. The recently reported three-dimensional structure of GK-II from Thermotoga maritima (TM1585; PDB code 2b8n) revealed a new fold distinct from other known kinase families. Here, we verified the enzymatic activity of TM1585, assessed its kinetic characteristics, and used directed mutagenesis to confirm the essential role of the two active-site residues Lys-47 and Arg-325. The main objective of this study was to apply comparative genomics for the reconstruction of metabolic pathways associated with GK-II in all bacteria and, in particular, in T. maritima. Comparative analyses of ∼400 bacterial genomes revealed a remarkable variety of pathways that lead to GK-II-driven utilization of glycerate via a glycolysis/gluconeogenesis route. In the case of T. maritima, a three-step serine degradation pathway was inferred based on the tentative identification of two additional enzymes, serine-pyruvate aminotransferase and hydroxypyruvate reductase (TM1400 and TM1401, respectively), that convert serine to glycerate via hydroxypyruvate. Both enzymatic activities were experimentally verified, and the entire pathway was validated by its in vitro reconstitution.
A novel family of transcription factors responsible for regulation of various aspects of NAD synthesis in a broad range of bacteria was identified by comparative genomics approach. Regulators of this family (here termed NrtR for Nudix-related transcriptional regulators), currently annotated as ADP-ribose pyrophosphatases from the Nudix family, are composed of an N-terminal Nudix-like effector domain and a C-terminal DNA-binding HTH-like domain. NrtR regulons were reconstructed in diverse bacterial genomes by identification and comparative analysis of NrtR-binding sites upstream of genes involved in NAD biosynthetic pathways. The candidate NrtR-binding DNA motifs showed significant variability between microbial lineages, although the common consensus sequence could be traced for most of them. Bioinformatics predictions were experimentally validated by gel mobility shift assays for two NrtR family representatives. ADP-ribose, the product of glycohydrolytic cleavage of NAD, was found to suppress the in vitro binding of NrtR proteins to their DNA target sites. In addition to a major role in the direct regulation of NAD homeostasis, some members of NrtR family appear to have been recruited for the regulation of other metabolic pathways, including sugar pentoses utilization and biogenesis of phosphoribosyl pyrophosphate. This work and the accompanying study of NiaR regulon demonstrate significant variability of regulatory strategies for control of NAD metabolic pathway in bacteria.
A comparative genomic approach was used to reconstruct transcriptional regulation of NAD biosynthesis in bacteria containing orthologs of Bacillus subtilis gene yrxA, a previously identified niacin-responsive repressor of NAD de novo synthesis. Members of YrxA family (re-named here NiaR) are broadly conserved in the Bacillus/Clostridium group and in the deeply branching Fusobacteria and Thermotogales lineages. We analyzed upstream regions of genes associated with NAD biosynthesis to identify candidate NiaR-binding DNA motifs and assess the NiaR regulon content in these species. Representatives of the two distinct types of candidate NiaR-binding sites, characteristic of the Firmicutes and Thermotogales, were verified by an electrophoretic mobility shift assay. In addition to transcriptional control of the nadABC genes, the NiaR regulon in some species extends to niacin salvage (the pncAB genes) and includes uncharacterized membrane proteins possibly involved in niacin transport. The involvement in niacin uptake proposed for one of these proteins (re-named NiaP), encoded by the B. subtilis gene yceI, was experimentally verified. In addition to bacteria, members of the NiaP family are conserved in multicellular eukaryotes, including human, pointing to possible NaiP involvement in niacin utilization in these organisms. Overall, the analysis of the NiaR and NrtR regulons (described in the accompanying paper) revealed mechanisms of transcriptional regulation of NAD metabolism in nearly a hundred diverse bacteria.
Beyond the well-known role of proteolytic machinery in protein degradation and turnover, many specialized proteases play a key role in various regulatory processes. Thousands of highly specific proteolytic events are associated with normal and pathological conditions, including bacterial and viral infections. However, the information about individual proteolytic events is dispersed over multiple publications and is not easily available for large-scale analysis. CutDB is one of the first systematic efforts to build an easily accessible collection of documented proteolytic events for natural proteins in vivo or in vitro. A CutDB entry is defined by a unique combination of these three attributes: protease, protein substrate and cleavage site. Currently, CutDB integrates 3070 proteolytic events for 470 different proteases captured from public archives (such as MEROPS and HPRD) and publications. CutDB supports various types of data searches and displays, including clickable network diagrams. Most importantly, CutDB is a community annotation resource based on a Wikipedia approach, providing a convenient user interface to input new data online. A recent contribution of 568 proteolytic events by several experts in the field of matrix metallopeptidases suggests that this approach will significantly accelerate the development of CutDB content. CutDB is publicly available at .
Biosynthesis of NAD(P) cofactors is of special importance for cyanobacteria due to their role in photosynthesis and respiration. Despite significant progress in understanding NAD(P) biosynthetic machinery in some model organisms, relatively little is known about its implementation in cyanobacteria. We addressed this problem by a combination of comparative genome analysis with verification experiments in the model system of Synechocystis sp. strain PCC 6803. A detailed reconstruction of the NAD(P) metabolic subsystem using the SEED genomic platform (http://theseed.uchicago.edu/FIG/index.cgi) helped us accurately annotate respective genes in the entire set of 13 cyanobacterial species with completely sequenced genomes available at the time. Comparative analysis of operational variants implemented in this divergent group allowed us to elucidate both conserved (de novo and universal pathways) and variable (recycling and salvage pathways) aspects of this subsystem. Focused genetic and biochemical experiments confirmed several conjectures about the key aspects of this subsystem. (i) The product of the slr1691 gene, a homolog of Escherichia coli gene nadE containing an additional nitrilase-like N-terminal domain, is a NAD synthetase capable of utilizing glutamine as an amide donor in vitro. (ii) The product of the sll1916 gene, a homolog of E. coli gene nadD, is a nicotinic acid mononucleotide-preferring adenylyltransferase. This gene is essential for survival and cannot be compensated for by an alternative nicotinamide mononucleotide (NMN)-preferring adenylyltransferase (slr0787 gene). (iii) The product of the slr0788 gene is a nicotinamide-preferring phosphoribosyltransferase involved in the first step of the two-step nondeamidating utilization of nicotinamide (NMN shunt). (iv) The physiological role of this pathway encoded by a conserved gene cluster, slr0787-slr0788, is likely in the recycling of endogenously generated nicotinamide, as supported by the inability of this organism to utilize exogenously provided niacin. Positional clustering and the cooccurrence profile of the respective genes across a diverse collection of cellular organisms provide evidence of horizontal transfer events in the evolutionary history of this pathway.
NAD is an indispensable redox cofactor in all organisms. Most of the genes required for NAD biosynthesis in various species are known. Ribosylnicotinamide kinase (RNK) was among the few unknown (missing) genes involved with NAD salvage and recycling pathways. Using a comparative genome analysis involving reconstruction of NAD metabolism from genomic data, we predicted and experimentally verified that bacterial RNK is encoded within the 3′ region of the nadR gene. Based on these results and previous data, the full-size multifunctional NadR protein (as in Escherichia coli) is composed of (i) an N-terminal DNA-binding domain involved in the transcriptional regulation of NAD biosynthesis, (ii) a central nicotinamide mononucleotide adenylyltransferase (NMNAT) domain, and (iii) a C-terminal RNK domain. The RNK and NMNAT enzymatic activities of recombinant NadR proteins from Salmonella enterica serovar Typhimurium and Haemophilus influenzae were quantitatively characterized. We propose a model for the complete salvage pathway from exogenous N-ribosylnicotinamide to NAD which involves the concerted action of the PnuC transporter and NRK, followed by the NMNAT activity of the NadR protein. Both the pnuC and nadR genes were proven to be essential for the growth and survival of H. influenzae, thus implicating them as potential narrow-spectrum drug targets.
Novel drug targets are required in order to design new defenses against antibiotic-resistant pathogens. Comparative genomics provides new opportunities for finding optimal targets among previously unexplored cellular functions, based on an understanding of related biological processes in bacterial pathogens and their hosts. We describe an integrated approach to identification and prioritization of broad-spectrum drug targets. Our strategy is based on genetic footprinting in Escherichia coli followed by metabolic context analysis of essential gene orthologs in various species. Genes required for viability of E. coli in rich medium were identified on a whole-genome scale using the genetic footprinting technique. Potential target pathways were deduced from these data and compared with a panel of representative bacterial pathogens by using metabolic reconstructions from genomic data. Conserved and indispensable functions revealed by this analysis potentially represent broad-spectrum antibacterial targets. Further target prioritization involves comparison of the corresponding pathways and individual functions between pathogens and the human host. The most promising targets are validated by direct knockouts in model pathogens. The efficacy of this approach is illustrated using examples from metabolism of adenylate cofactors NAD(P), coenzyme A, and flavin adenine dinucleotide. Several drug targets within these pathways, including three distantly related adenylyltransferases (orthologs of the E. coli genes nadD, coaD, and ribF), are discussed in detail.
Carbon-13 (13C) analysis is a commonly used method for estimating reaction rates in biochemical networks. The choice of carbon labeling pattern is an important consideration when designing these experiments. We present a novel Monte Carlo algorithm for finding the optimal substrate input label for a particular experimental objective (flux or flux ratio). Unlike previous work, this method does not require assumption of the flux distribution beforehand.
Using a large E. coli isotopomer model, different commercially available substrate labeling patterns were tested computationally for their ability to determine reaction fluxes. The choice of optimal labeled substrate was found to be dependent upon the desired experimental objective. Many commercially available labels are predicted to be outperformed by complex labeling patterns. Based on Monte Carlo Sampling, the dimensionality of experimental data was found to be considerably less than anticipated, suggesting that effectiveness of 13C experiments for determining reaction fluxes across a large-scale metabolic network is less than previously believed.
While 13C analysis is a useful tool in systems biology, high redundancy in measurements limits the information that can be obtained from each experiment. It is however possible to compute potential limitations before an experiment is run and predict whether, and to what degree, the rate of each reaction can be resolved.
Genome-scale prediction of gene regulation and reconstruction of transcriptional regulatory networks in bacteria is one of the critical tasks of modern genomics. The Shewanella genus is comprised of metabolically versatile gamma-proteobacteria, whose lifestyles and natural environments are substantially different from Escherichia coli and other model bacterial species. The comparative genomics approaches and computational identification of regulatory sites are useful for the in silico reconstruction of transcriptional regulatory networks in bacteria.
To explore conservation and variations in the Shewanella transcriptional networks we analyzed the repertoire of transcription factors and performed genomics-based reconstruction and comparative analysis of regulons in 16 Shewanella genomes. The inferred regulatory network includes 82 transcription factors and their DNA binding sites, 8 riboswitches and 6 translational attenuators. Forty five regulons were newly inferred from the genome context analysis, whereas others were propagated from previously characterized regulons in the Enterobacteria and Pseudomonas spp.. Multiple variations in regulatory strategies between the Shewanella spp. and E. coli include regulon contraction and expansion (as in the case of PdhR, HexR, FadR), numerous cases of recruiting non-orthologous regulators to control equivalent pathways (e.g. PsrA for fatty acid degradation) and, conversely, orthologous regulators to control distinct pathways (e.g. TyrR, ArgR, Crp).
We tentatively defined the first reference collection of ~100 transcriptional regulons in 16 Shewanella genomes. The resulting regulatory network contains ~600 regulated genes per genome that are mostly involved in metabolism of carbohydrates, amino acids, fatty acids, vitamins, metals, and stress responses. Several reconstructed regulons including NagR for N-acetylglucosamine catabolism were experimentally validated in S. oneidensis MR-1. Analysis of correlations in gene expression patterns helps to interpret the reconstructed regulatory network. The inferred regulatory interactions will provide an additional regulatory constrains for an integrated model of metabolism and regulation in S. oneidensis MR-1.
Carbohydrates are a primary source of carbon and energy for many bacteria. Accurate projection of known carbohydrate catabolic pathways across diverse bacteria with complete genomes constitutes a substantial challenge due to frequent variations in components of these pathways. To address a practically and fundamentally important challenge of reconstruction of carbohydrate utilization machinery in any microorganism directly from its genomic sequence, we combined a subsystems-based comparative genomic approach with experimental validation of selected bioinformatic predictions by a combination of biochemical, genetic and physiological experiments.
We applied this integrated approach to systematically map carbohydrate utilization pathways in 19 genomes from the Shewanella genus. The obtained genomic encyclopedia of sugar utilization includes ~170 protein families (mostly metabolic enzymes, transporters and transcriptional regulators) spanning 17 distinct pathways with a mosaic distribution across Shewanella species providing insights into their ecophysiology and adaptive evolution. Phenotypic assays revealed a remarkable consistency between predicted and observed phenotype, an ability to utilize an individual sugar as a sole source of carbon and energy, over the entire matrix of tested strains and sugars.
Comparison of the reconstructed catabolic pathways with E. coli identified multiple differences that are manifested at various levels, from the presence or absence of certain sugar catabolic pathways, nonorthologous gene replacements and alternative biochemical routes to a different organization of transcription regulatory networks.
The reconstructed sugar catabolome in Shewanella spp includes 62 novel isofunctional families of enzymes, transporters, and regulators. In addition to improving our knowledge of genomics and functional organization of carbohydrate utilization in Shewanella, this study led to a substantial expansion of our current version of the Genomic Encyclopedia of Carbohydrate Utilization. A systematic and iterative application of this approach to multiple taxonomic groups of bacteria will further enhance it, creating a knowledge base adequate for the efficient analysis of any newly sequenced genome as well as of the emerging metagenomic data.
The number of prokaryotic genome sequences becoming available is growing steadily and is growing faster than our ability to accurately annotate them.
We describe a fully automated service for annotating bacterial and archaeal genomes. The service identifies protein-encoding, rRNA and tRNA genes, assigns functions to the genes, predicts which subsystems are represented in the genome, uses this information to reconstruct the metabolic network and makes the output easily downloadable for the user. In addition, the annotated genome can be browsed in an environment that supports comparative analysis with the annotated genomes maintained in the SEED environment.
The service normally makes the annotated genome available within 12–24 hours of submission, but ultimately the quality of such a service will be judged in terms of accuracy, consistency, and completeness of the produced annotations. We summarize our attempts to address these issues and discuss plans for incrementally enhancing the service.
By providing accurate, rapid annotation freely to the community we have created an important community resource. The service has now been utilized by over 120 external users annotating over 350 distinct genomes.