Despite the steadily decreasing costs of genome sequencing, prioritizing organisms for sequencing remains important in large-scale projects. Phylogeny-based selection is of interest to identify those organisms whose genomes can be expected to differ most from those that have already been sequenced. Here, we describe a method that infers a phylogenetic scoring independent of which set of organisms has previously been targeted, which is computationally simple and easy to apply in practice. The scoring itself, as well as pre- and post-processing of the data, is illustrated using two real-world examples in which the method has already been applied for selecting targets for genome sequencing. These projects are the JGI CSP Genomic Encyclopedia of Bacteria and Archaea phase I, targeting 1,000 type strains, and, on a smaller-scale, the phylogenomics of the Roseobacter clade. Potential artifacts of the method are discussed and compared to a selection approach based on the taxonomic classification.
phylogenetic diversity; genomics; taxon selection; 16S rRNA; tree of life; Genomic Encyclopedia; Roseobacter clade
Labrenzia alexandrii Biebl et al. 2007 is a marine member of the family Rhodobacteraceae in the order Rhodobacterales, which has thus far only partially been characterized at the genome level. The bacterium is of interest because it lives in close association with the toxic dinoflagellate Alexandrium lusitanicum. Ultrastructural analysis reveals R-bodies within the bacterial cells, which are primarily known from obligate endosymbionts that trigger “killing traits” in ciliates (Paramecium spp.). Genomic traits of L. alexandrii DFL-11T are in accordance with these findings, as they include the reb genes putatively involved in R-body synthesis. Analysis of the two extrachromosomal elements suggests a role in heavy-metal resistance and exopolysaccharide formation, respectively. The 5,461,856 bp long genome with its 5,071 protein-coding and 73 RNA genes consists of one chromosome and two plasmids, and has been sequenced in the context of the Marine Microbial Initiative.
aerobe; motile; symbiosis; dinoflagellates; photoheterotroph; high-quality draft; Alexandrium lusitanicum; Alphaproteobacteria
For the last 25 years species delimitation in prokaryotes (Archaea and Bacteria) was to a large extent based on DNA-DNA hybridization (DDH), a tedious lab procedure designed in the early 1970s that served its purpose astonishingly well in the absence of deciphered genome sequences. With the rapid progress in genome sequencing time has come to directly use the now available and easy to generate genome sequences for delimitation of species. GBDP (Genome Blast Distance Phylogeny) infers genome-to-genome distances between pairs of entirely or partially sequenced genomes, a digital, highly reliable estimator for the relatedness of genomes. Its application as an in-silico replacement for DDH was recently introduced. The main challenge in the implementation of such an application is to produce digital DDH values that must mimic the wet-lab DDH values as close as possible to ensure consistency in the Prokaryotic species concept.
Correlation and regression analyses were used to determine the best-performing methods and the most influential parameters. GBDP was further enriched with a set of new features such as confidence intervals for intergenomic distances obtained via resampling or via the statistical models for DDH prediction and an additional family of distance functions. As in previous analyses, GBDP obtained the highest agreement with wet-lab DDH among all tested methods, but improved models led to a further increase in the accuracy of DDH prediction. Confidence intervals yielded stable results when inferred from the statistical models, whereas those obtained via resampling showed marked differences between the underlying distance functions.
Despite the high accuracy of GBDP-based DDH prediction, inferences from limited empirical data are always associated with a certain degree of uncertainty. It is thus crucial to enrich in-silico DDH replacements with confidence-interval estimation, enabling the user to statistically evaluate the outcomes. Such methodological advancements, easily accessible through the web service at http://ggdc.dsmz.de, are crucial steps towards a consistent and truly genome sequence-based classification of microorganisms.
Archaea; Bacteria; BLAST; DDH; GGD; GGDC; GBDP; Genomics; MUMmer; Phylogeny; Species concept; Taxonomy
The pragmatic species concept for Bacteria and Archaea is ultimately based on DNA-DNA hybridization (DDH). While enabling the taxonomist, in principle, to obtain an estimate of the overall similarity between the genomes of two strains, this technique is tedious and error-prone and cannot be used to incrementally build up a comparative database. Recent technological progress in the area of genome sequencing calls for bioinformatics methods to replace the wet-lab DDH by in-silico genome-to-genome comparison. Here we investigate state-of-the-art methods for inferring whole-genome distances in their ability to mimic DDH. Algorithms to efficiently determine high-scoring segment pairs or maximally unique matches perform well as a basis of inferring intergenomic distances. The examined distance functions, which are able to cope with heavily reduced genomes and repetitive sequence regions, outperform previously described ones regarding the correlation with and error ratios in emulating DDH. Simulation of incompletely sequenced genomes indicates that some distance formulas are very robust against missing fractions of genomic information. Digitally derived genome-to-genome distances show a better correlation with 16S rRNA gene sequence distances than DDH values. The future perspectives of genome-informed taxonomy are discussed, and the investigated methods are made available as a web service for genome-based species delineation.
Archaea; Bacteria; BLAST; GBDP; genomics; MUMmer; phylogeny; species concept; taxonomy
DNA-DNA hybridization (DDH) is a widely applied wet-lab technique to obtain an estimate of the overall similarity between the genomes of two organisms. To base the species concept for prokaryotes ultimately on DDH was chosen by microbiologists as a pragmatic approach for deciding about the recognition of novel species, but also allowed a relatively high degree of standardization compared to other areas of taxonomy. However, DDH is tedious and error-prone and first and foremost cannot be used to incrementally establish a comparative database. Recent studies have shown that in-silico methods for the comparison of genome sequences can be used to replace DDH. Considering the ongoing rapid technological progress of sequencing methods, genome-based prokaryote taxonomy is coming into reach. However, calculating distances between genomes is dependent on multiple choices for software and program settings. We here provide an overview over the modifications that can be applied to distance methods based in high-scoring segment pairs (HSPs) or maximally unique matches (MUMs) and that need to be documented. General recommendations on determining HSPs using BLAST or other algorithms are also provided. As a reference implementation, we introduce the GGDC web server (http://ggdc.gbdp.org).
BLAST; GBDP; GGDC web server; genomics; MUMmer; phylogeny; species delineation; microbial taxonomy
The genus Nocardiopsis, a widespread group in phylum Actinobacteria, has received much attention owing to its ecological versatility, pathogenicity, and ability to produce a rich array of bioactive metabolites. Its high environmental adaptability might be attributable to its genome dynamics, which can be estimated through comparative genomic analysis targeting microorganisms with close phylogenetic relationships but different phenotypes. To shed light on speciation, gene content evolution, and environmental adaptation in these unique actinobacteria, we sequenced draft genomes for 16 representative species of the genus and compared them with that of the type species N. dassonvillei subsp. dassonvillei DSM 43111T. The core genome of 1,993 orthologous and paralogous gene clusters was identified, and the pan-genomic reservoir was found not only to accommodate more than 22,000 genes, but also to be open. The top ten paralogous genes in terms of copy number could be referred to three functional categories: transcription regulators, transporters, and synthases related to bioactive metabolites. Based on phylogenomic reconstruction, we inferred past evolutionary events, such as gene gains and losses, and identified a list of clade-specific genes implicated in environmental adaptation. These results provided insights into the genetic causes of environmental adaptability in this cosmopolitan actinobacterial group and the contributions made by its inherent features, including genome dynamics and the constituents of core and accessory proteins.
Hoeflea phototrophica Biebl et al. 2006 is a member of the family Phyllobacteriaceae in the order Rhizobiales, which is thus far only partially characterized at the genome level. This marine bacterium contains the photosynthesis reaction-center genes pufL and pufM and is of interest because it lives in close association with toxic dinoflagellates such as Prorocentrum lima. The 4,467,792 bp genome (permanent draft sequence) with its 4,296 protein-coding and 69 RNA genes is a part of the Marine Microbial Initiative.
aerobic; rod-shaped; motile; photoheterotroph; Phenotype MicroArray; bacteriochlorophyll a; symbiosis; dinoflagellates; Prorocentrum lima; Phyllobacteriaceae
Fungus-cultivating termites make use of an obligate mutualism with fungi from the genus Termitomyces, which are acquired through either vertical transmission via reproductive alates or horizontally transmitted during the formation of new mounds. Termitomyces taxonomy, and thus estimating diversity and host specificity of these fungi, is challenging because fruiting bodies are rarely found. Molecular techniques can be applied but need not necessarily yield the same outcome than morphological identification.
Culture-dependent and culture-independent methods were used to comprehensively assess host specificity and gut fungal diversity. Termites were identified using mitochondrial cytochrome oxidase II (COII) genes. Twenty-three Termitomyces cultures were isolated from fungal combs. Internal transcribed spacer (ITS) clone libraries were constructed from termite guts. Presence of Termitomyces was confirmed using specific and universal primers. Termitomyces species boundaries were estimated by cross-comparison of macromorphological and sequence features, and ITS clustering parameters accordingly optimized. The overall trends in coverage of Termitomyces diversity and host associations were estimated using Genbank data.
Results and Conclusion
Results indicate a monoculture of Termitomyces in the guts as well as the isolation sources (fungal combs). However, cases of more than one Termitomyces strains per mound were observed since mounds can contain different termite colonies. The newly found cultures, as well as the clustering analysis of GenBank data indicate that there are on average between one and two host genera per Termitomyces species. Saturation does not appear to have been reached, neither for the total number of known Termitomyces species nor for the number of Termitomyces species per host taxon, nor for the number of known hosts per Termitomyces species. Considering the rarity of Termitomyces fruiting bodies, it is suggested to base the future taxonomy of the group mainly on well-characterized and publicly accessible cultures.
The emergence of next generation sequencing (NGS) has provided the means for rapid and high throughput sequencing and data generation at low cost, while concomitantly creating a new set of challenges. The number of available assembled microbial genomes continues to grow rapidly and their quality reflects the quality of the sequencing technology used, but also of the analysis software employed for assembly and annotation.
In this work, we have explored the quality of the microbial draft genomes across various sequencing technologies. We have compared the draft and finished assemblies of 133 microbial genomes sequenced at the Department of Energy-Joint Genome Institute and finished at the Los Alamos National Laboratory using a variety of combinations of sequencing technologies, reflecting the transition of the institute from Sanger-based sequencing platforms to NGS platforms. The quality of the public assemblies and of the associated gene annotations was evaluated using various metrics. Results obtained with the different sequencing technologies, as well as their effects on downstream processes, were analyzed. Our results demonstrate that the Illumina HiSeq 2000 sequencing system, the primary sequencing technology currently used for de novo genome sequencing and assembly at JGI, has various advantages in terms of total sequence throughput and cost, but it also introduces challenges for the downstream analyses. In all cases assembly results although on average are of high quality, need to be viewed critically and consider sources of errors in them prior to analysis.
These data follow the evolution of microbial sequencing and downstream processing at the JGI from draft genome sequences with large gaps corresponding to missing genes of significant biological role to assemblies with multiple small gaps (Illumina) and finally to assemblies that generate almost complete genomes (Illumina+PacBio).
Cellulose degrading enzymes have important functions in the biotechnology industry, including the production of biofuels from lignocellulosic biomass. Anaerobes including Clostridium species organize cellulases and other glycosyl hydrolases into large complexes known as cellulosomes. In contrast, aerobic actinobacteria utilize systems comprised of independently acting enzymes, often with carbohydrate binding domains. Numerous actinobacterial genomes have become available through the Genomic Encyclopedia of Bacteria and Archaea (GEBA) project. We identified putative cellulose-degrading enzymes belonging to families GH5, GH6, GH8, GH9, GH12, GH48, and GH51 in the genomes of eleven members of the actinobacteria. The eleven organisms were tested in several assays for cellulose degradation, and eight of the organisms showed evidence of cellulase activity. The three with the highest cellulase activity were Actinosynnema mirum, Cellulomonas flavigena, and Xylanimonas cellulosilytica. Cellobiose is known to induce cellulolytic enzymes in the model organism Thermobifida fusca, but only Nocardiopsis dassonvillei showed higher cellulolytic activity in the presence of cellobiose. In T. fusca, cellulases and a putative cellobiose ABC transporter are regulated by the transcriptional regulator CelR. Nine organisms appear to use the CelR site or a closely related binding site to regulate an ABC transporter. In some, CelR also regulates cellulases, while cellulases are controlled by different regulatory sites in three organisms. Mining of genome data for cellulose degradative enzymes followed by experimental verification successfully identified several actinobacteria species which were not previously known to degrade cellulose as cellulolytic organisms.
Saccharomonospora marina Liu et al. 2010 is a member of the genus Saccharomonospora, in the family Pseudonocardiaceae that is poorly characterized at the genome level thus far. Members of the genus Saccharomonospora are of interest because they originate from diverse habitats, such as leaf litter, manure, compost, surface of peat, moist, over-heated grain, and ocean sediment, where they might play a role in the primary degradation of plant material by attacking hemicellulose. Organisms belonging to the genus are usually Gram-positive staining, non-acid fast, and classify among the actinomycetes. Here we describe the features of this organism, together with the complete genome sequence (permanent draft status), and annotation. The 5,965,593 bp long chromosome with its 5,727 protein-coding and 57 RNA genes was sequenced as part of the DOE funded Community Sequencing Program (CSP) 2010 at the Joint Genome Institute (JGI).
aerobic; chemoheterotrophic; Gram-positive; vegetative and aerial mycelia; spore-forming; non-motile; marine bacterium; Pseudonocardiaceae; CSP 2010
Saccharomonospora azurea Runmao et al. 1987 is a member of the genus Saccharomonospora, which is in the family Pseudonocardiaceae and thus far poorly characterized genomically. Members of the genus Saccharomonospora are of interest because they originate from diverse habitats, such as leaf litter, manure, compost, the surface of peat, and moist and over-heated grain, and may play a role in the primary degradation of plant material by attacking hemicellulose. Next to S. viridis, S. azurea is only the second member in the genus Saccharomonospora for which a completely sequenced type strain genome will be published. Here we describe the features of this organism, together with the complete genome sequence with project status ‘Improved high quality draft’, and the annotation. The 4,763,832 bp long chromosome with its 4,472 protein-coding and 58 RNA genes was sequenced as part of the DOE funded Community Sequencing Program (CSP) 2010 at the Joint Genome Institute (JGI).
aerobic; chemoheterotrophic; Gram-positive; vegetative and aerial mycelia; spore-forming; non-motile; soil bacterium; Pseudonocardiaceae; CSP 2010
The Phenotype MicroArray (OmniLog® PM) system is able to simultaneously capture a large number of phenotypes by recording an organism's respiration over time on distinct substrates. This technique targets the object of natural selection itself, the phenotype, whereas previously addressed ‘-omics’ techniques merely study components that finally contribute to it. The recording of respiration over time, however, adds a longitudinal dimension to the data. To optimally exploit this information, it must be extracted from the shapes of the recorded curves and displayed in analogy to conventional growth curves.
The free software environment R was explored for both visualizing and fitting of PM respiration curves. Approaches using either a model fit (and commonly applied growth models) or a smoothing spline were evaluated. Their reliability in inferring curve parameters and confidence intervals was compared to the native OmniLog® PM analysis software. We consider the post-processing of the estimated parameters, the optimal classification of curve shapes and the detection of significant differences between them, as well as practically relevant questions such as detecting the impact of cultivation times and the minimum required number of experimental repeats.
We provide a comprehensive framework for data visualization and parameter estimation according to user choices. A flexible graphical representation strategy for displaying the results is proposed, including 95% confidence intervals for the estimated parameters. The spline approach is less prone to irregular curve shapes than fitting any of the considered models or using the native PM software for calculating both point estimates and confidence intervals. These can serve as a starting point for the automated post-processing of PM data, providing much more information than the strict dichotomization into positive and negative reactions. Our results form the basis for a freely available R package for the analysis of PM data.
Halopiger xanaduensis is the type species of the genus Halopiger and belongs to the euryarchaeal family Halobacteriaceae. H. xanaduensis strain SH-6, which is designated as the type strain, was isolated from the sediment of a salt lake in Inner Mongolia, Lake Shangmatala. Like other members of the family Halobacteriaceae, it is an extreme halophile requiring at least 2.5 M salt for growth. We report here the sequencing and annotation of the 4,355,268 bp genome, which includes one chromosome and three plasmids. This genome is part of a Joint Genome Institute (JGI) Community Sequencing Program (CSP) project to sequence diverse haloarchaeal genomes.
Archaea; Euryarchaeota; Halobacteriaceae; extreme halophile
Bacillus tusciae Bonjour & Aragno 1994 is a hydrogen-oxidizing, thermoacidophilic spore former that lives as a facultative chemolithoautotroph in solfataras. Although 16S rRNA gene sequencing was well established at the time of the initial description of the organism, 16S sequence data were not available and the strain was placed into the genus Bacillus based on limited chemotaxonomic information. Despite the now obvious misplacement of strain T2 as a member of the genus Bacillus in 16S rRNA-based phylogenetic trees, the misclassification remained uncorrected for many years, which was likely due to the extremely difficult, analysis-hampering cultivation conditions and poor growth rate of the strain. Here we provide a taxonomic re-evaluation of strain T2T (= DSM 2912 = NBRC 15312) and propose its reclassification as the type strain of a new species, Kyrpidia tusciae, and the type species of the new genus Kyrpidia, which is a sister-group of Alicyclobacillus. The family Alicyclobacillaceae da Costa and Rainey, 2010 is emended. The 3,384,766 bp genome with its 3,323 protein-coding and 78 RNA genes is part of the Genomic Encyclopedia of Bacteria and Archaea project.
hydrogen-oxidizing; aerobe; facultative chemolithoautotroph; thermoacidophile; free-living; solfatara; spore-forming; Bacillaceae; GEBA
Staphylothermus hellenicus belongs to the order Desulfurococcales within the archaeal phylum Crenarchaeota. Strain P8T is the type strain of the species and was isolated from a shallow hydrothermal vent system at Palaeochori Bay, Milos, Greece. It is a hyperthermophilic, anaerobic heterotroph. Here we describe the features of this organism together with the complete genome sequence and annotation. The 1,580,347 bp genome with its 1,668 protein-coding and 48 RNA genes was sequenced as part of a DOE Joint Genome Institute (JGI) Laboratory Sequencing Program (LSP) project.
Archaea; Crenarchaeota; Desulfurococcaceae; hyperthermophile; hydrothermal vent; anaerobe
The associations between pathogens and their hosts are complex and can result from any combination of evolutionary events such as codivergence, switching, and duplication of the pathogen. Mycoviruses are RNA viruses which infect fungi and for which natural vectors are so far unknown. Thus, lateral transfer might be improbable and codivergence their dominant mode of evolution. Accordingly, mycoviruses are a suitable target for statistical tests of virus-host codivergence, but inference of mycovirus phylogenies might be difficult because of low sequence similarity even within families.
We analyzed here the evolutionary dynamics of all mycovirus families by comparing virus and host phylogenies. Additionally, we assessed the sensitivity of the co-phylogenetic tests to the settings for inferring virus trees from their genome sequences and approximate, taxonomy-based host trees.
While sequence alignment filtering modes affected branch support, the overall results of the co-phylogenetic tests were significantly influenced only by the number of viruses sampled per family. The trees of the two largest families, Partitiviridae and Totiviridae, were significantly more similar to those of their hosts than expected by chance, and most individual host-virus links had a significant positive impact on the global fit, indicating that codivergence is the dominant mode of virus diversification. However, in this regard mycoviruses did not differ from closely related viruses sampled from non-fungus hosts. The remaining virus families were either dominated by other evolutionary modes or lacked an apparent overall pattern. As this negative result might be caused by insufficient taxon sampling, the most parsimonious hypothesis still is that host-parasite evolution is basically the same in all mycovirus families. This is the first study of mycovirus-host codivergence, and the results shed light not only on how mycovirus biology affects their co-phylogenetic relationships, but also on their presumable host range itself.
The extremely halophilic archaea are present worldwide in saline environments and have important biotechnological applications. Ten complete genomes of haloarchaea are now available, providing an opportunity for comparative analysis.
We report here the comparative analysis of five newly sequenced haloarchaeal genomes with five previously published ones. Whole genome trees based on protein sequences provide strong support for deep relationships between the ten organisms. Using a soft clustering approach, we identified 887 protein clusters present in all halophiles. Of these core clusters, 112 are not found in any other archaea and therefore constitute the haloarchaeal signature. Four of the halophiles were isolated from water, and four were isolated from soil or sediment. Although there are few habitat-specific clusters, the soil/sediment halophiles tend to have greater capacity for polysaccharide degradation, siderophore synthesis, and cell wall modification. Halorhabdus utahensis and Haloterrigena turkmenica encode over forty glycosyl hydrolases each, and may be capable of breaking down naturally occurring complex carbohydrates. H. utahensis is specialized for growth on carbohydrates and has few amino acid degradation pathways. It uses the non-oxidative pentose phosphate pathway instead of the oxidative pathway, giving it more flexibility in the metabolism of pentoses.
These new genomes expand our understanding of haloarchaeal catabolic pathways, providing a basis for further experimental analysis, especially with regard to carbohydrate metabolism. Halophilic glycosyl hydrolases for use in biofuel production are more likely to be found in halophiles isolated from soil or sediment.
False truffles are ecologically important as mycorrhizal partners of trees and evolutionarily highly interesting as the result of a shift from epigeous mushroom-like to underground fruiting bodies. Since its first description by Vittadini in 1831, inappropriate species concepts in the highly diverse false truffle genus Hymenogaster has led to continued confusion, caused by a large variety of prevailing taxonomical opinions.
In this study, we reconsidered the species delimitations in Hymenogaster based on a comprehensive collection of Central European taxa comprising more than 140 fruiting bodies from 20 years of field work. The ITS rDNA sequence dataset was subjected to phylogenetic analysis as well as clustering optimization using OPTSIL software.
Among distinct species concepts from the literature used to create reference partitions for clustering optimization, the broadest concept resulted in the highest agreement with the ITS data. Our results indicate a highly variable morphology of H. citrinus and H. griseus, most likely linked to environmental influences on the phenology (maturity, habitat, soil type and growing season). In particular, taxa described in the 19th century frequently appear as conspecific. Conversely, H. niveus appears as species complex comprising seven cryptic species with almost identical macro- and micromorphology. H. intermedius and H. huthii are described as novel species, each of which with a distinct morphology intermediate between two species complexes. A revised taxonomy for one of the most taxonomically difficult genera of Basidiomycetes is proposed, including an updated identification key. The (semi-)automated selection among species concepts used here is of importance for the revision of taxonomically problematic organism groups in general.
This report details the outcome the first meeting of the Earth Microbiome Project to discuss sample selection and acquisition. The meeting, held at the Argonne National Laboratory on Wednesday October 6th 2010, focused on discussion of how to prioritize environmental samples for sequencing and metagenomic analysis as part of the global effort of the EMP to systematically determine the functional and phylogenetic diversity of microbial communities across the world.
Within the archaea, the thermoacidophilic crenarchaeote Sulfolobus solfataricus has become an important model organism for physiology and biochemistry, comparative and functional genomics, as well as, more recently also for systems biology approaches. Within the Sulfolobus Systems Biology (“SulfoSYS”)-project the effect of changing growth temperatures on a metabolic network is investigated at the systems level by integrating genomic, transcriptomic, proteomic, metabolomic and enzymatic information for production of a silicon cell-model. The network under investigation is the central carbohydrate metabolism. The generation of high-quality quantitative data, which is critical for the investigation of biological systems and the successful integration of the different datasets, derived for example from high-throughput approaches (e.g., transcriptome or proteome analyses), requires the application and compliance of uniform standard protocols, e.g., for growth and handling of the organism as well as the “–omics” approaches. Here, we report on the establishment and implementation of standard operating procedures for the different wet-lab and in silico techniques that are applied within the SulfoSYS-project and that we believe can be useful for future projects on Sulfolobus or (hyper)thermophiles in general. Beside established techniques, it includes new methodologies like strain surveillance, the improved identification of membrane proteins and the application of crenarchaeal metabolomics.
Electronic supplementary material
The online version of this article (doi:10.1007/s00792-009-0280-0) contains supplementary material, which is available to authorized users.
Crenarchaeon; Standard operating procedures; Genomics; Transcriptomics; Proteomics; Metabolomics; Biochemistry; Systems biology
Sulfate-reducing bacteria (SRB) belonging to the metabolically versatile Desulfobacteriaceae are abundant in marine sediments and contribute to the global carbon cycle by complete oxidation of organic compounds. Desulfobacterium autotrophicum HRM2 is the first member of this ecophysiologically important group with a now available genome sequence. With 5.6 megabasepairs (Mbp) the genome of Db. autotrophicum HRM2 is about 2 Mbp larger than the sequenced genomes of other sulfate reducers (SRB). A high number of genome plasticity elements (> 100 transposon-related genes), several regions of GC discontinuity and a high number of repetitive elements (132 paralogous genes Mbp−1) point to a different genome evolution when comparing with Desulfovibrio spp. The metabolic versatility of Db. autotrophicum HRM2 is reflected in the presence of genes for the degradation of a variety of organic compounds including long-chain fatty acids and for the Wood–Ljungdahl pathway, which enables the organism to completely oxidize acetyl-CoA to CO2 but also to grow chemolithoautotrophically. The presence of more than 250 proteins of the sensory/regulatory protein families should enable Db. autotrophicum HRM2 to efficiently adapt to changing environmental conditions. Genes encoding periplasmic or cytoplasmic hydrogenases and formate dehydrogenases have been detected as well as genes for the transmembrane TpII-c3, Hme and Rnf complexes. Genes for subunits A, B, C and D as well as for the proposed novel subunits L and F of the heterodisulfide reductases are present. This enzyme is involved in energy conservation in methanoarchaea and it is speculated that it exhibits a similar function in the process of dissimilatory sulfate reduction in Db. autotrophicum HRM2.
Hyperthermus butylicus, a hyperthermophilic
neutrophile and anaerobe, is a member of the archaeal kingdom
Crenarchaeota. Its genome consists of a single circular chromosome of
1,667,163 bp with a 53.7% G+C content. A total of 1672 genes were
annotated, of which 1602 are protein-coding, and up to a third are
specific to H. butylicus. In contrast to some other
crenarchaeal genomes, a high level of GUG and UUG start codons are
predicted. Two cdc6 genes are present, but neither
could be linked unambiguously to an origin of replication. Many of the
predicted metabolic gene products are associated with the fermentation
of peptide mixtures including several peptidases with diverse
specificities, and there are many encoded transporters. Most of the
sulfur-reducing enzymes, hydrogenases and electron-transfer proteins
were identified which are associated with energy production by
reducing sulfur to H2S. Two large clusters of regularly
interspaced repeats (CRISPRs) are present, one of which is associated
with a crenarchaeal-type cas gene superoperon; none
of the spacer sequences yielded good sequence matches with known
archaeal chromosomal elements. The genome carries no detectable
transposable or integrated elements, no inteins, and introns are
exclusive to tRNA genes. This suggests that the genome structure is
quite stable, possibly reflecting a constant, and relatively
uncompetitive, natural environment.
anaerobe; genome analysis; hyperthermophile; solfataric habitat
Sulfolobus acidocaldarius is an aerobic thermoacidophilic crenarchaeon which grows optimally at 80°C and pH 2 in terrestrial solfataric springs. Here, we describe the genome sequence of strain DSM639, which has been used for many seminal studies on archaeal and crenarchaeal biology. The circular genome carries 2,225,959 bp (37% G+C) with 2,292 predicted protein-encoding genes. Many of the smaller genes were identified for the first time on the basis of comparison of three Sulfolobus genome sequences. Of the protein-coding genes, 305 are exclusive to S. acidocaldarius and 866 are specific to the Sulfolobus genus. Moreover, 82 genes for untranslated RNAs were identified and annotated. Owing to the probable absence of active autonomous and nonautonomous mobile elements, the genome stability and organization of S. acidocaldarius differ radically from those of Sulfolobus solfataricus and Sulfolobus tokodaii. The S. acidocaldarius genome contains an integrated, and probably encaptured, pARN-type conjugative plasmid which may facilitate intercellular chromosomal gene exchange in S. acidocaldarius. Moreover, it contains genes for a characteristic restriction modification system, a UV damage excision repair system, thermopsin, and an aromatic ring dioxygenase, all of which are absent from genomes of other Sulfolobus species. However, it lacks genes for some of their sugar transporters, consistent with it growing on a more limited range of carbon sources. These results, together with the many newly identified protein-coding genes for Sulfolobus, are incorporated into a public Sulfolobus database which can be accessed at http://dac.molbio.ku.dk/dbs/Sulfolobus.
The hyperthermophilic, facultatively heterotrophic crenarchaeum Thermoproteus tenax was analyzed using a low-coverage shotgun-sequencing approach. A total of 1.81 Mbp (representing 98.5% of the total genome), with an average gap size of 100 bp and 5.3-fold coverage, are reported, giving insights into the genome of T. tenax. Genome analysis and biochemical studies enabled us to reconstruct its central carbohydrate metabolism. T. tenax uses a variant of the reversible Embden-Meyerhof-Parnas (EMP) pathway and two different variants of the Entner-Doudoroff (ED) pathway (a nonphosphorylative variant and a semiphosphorylative variant) for carbohydrate catabolism. For the EMP pathway some new, unexpected enzymes were identified. The semiphosphorylative ED pathway, hitherto supposed to be active only in halophiles, is found in T. tenax. No evidence for a functional pentose phosphate pathway, which is essential for the generation of pentoses and NADPH for anabolic purposes in bacteria and eucarya, is found in T. tenax. Most genes involved in the reversible citric acid cycle were identified, suggesting the presence of a functional oxidative cycle under heterotrophic growth conditions and a reductive cycle for CO2 fixation under autotrophic growth conditions. Almost all genes necessary for glycogen and trehalose metabolism were identified in the T. tenax genome.