Despite the steadily decreasing costs of genome sequencing, prioritizing organisms for sequencing remains important in large-scale projects. Phylogeny-based selection is of interest to identify those organisms whose genomes can be expected to differ most from those that have already been sequenced. Here, we describe a method that infers a phylogenetic scoring independent of which set of organisms has previously been targeted, which is computationally simple and easy to apply in practice. The scoring itself, as well as pre- and post-processing of the data, is illustrated using two real-world examples in which the method has already been applied for selecting targets for genome sequencing. These projects are the JGI CSP Genomic Encyclopedia of Bacteria and Archaea phase I, targeting 1,000 type strains, and, on a smaller-scale, the phylogenomics of the Roseobacter clade. Potential artifacts of the method are discussed and compared to a selection approach based on the taxonomic classification.
phylogenetic diversity; genomics; taxon selection; 16S rRNA; tree of life; Genomic Encyclopedia; Roseobacter clade
This manuscript calls for an international effort to generate a comprehensive catalog from genome sequences of all the archaeal and bacterial type strains.
Microbes hold the key to life. They hold the secrets to our past (as the descendants of the earliest forms of life) and the prospects for our future (as we mine their genes for solutions to some of the planet's most pressing problems, from global warming to antibiotic resistance). However, the piecemeal approach that has defined efforts to study microbial genetic diversity for over 20 years and in over 30,000 genome projects risks squandering that promise. These efforts have covered less than 20% of the diversity of the cultured archaeal and bacterial species, which represent just 15% of the overall known prokaryotic diversity. Here we call for the funding of a systematic effort to produce a comprehensive genomic catalog of all cultured Bacteria and Archaea by sequencing, where available, the type strain of each species with a validly published name (currently∼11,000). This effort will provide an unprecedented level of coverage of our planet's genetic diversity, allow for the large-scale discovery of novel genes and functions, and lead to an improved understanding of microbial evolution and function in the environment.
A novel Gram-reaction-positive, aerobic actinobacterium, tolerant to mitomycin C, heavy metals, metalloids, hydrogen peroxide, desiccation, and ionizing- and UV-radiation, designated G18T, was isolated from dolomitic marble collected from outcrops in Samara (Namibia). The growth range was 15–35°C, at pH 5.5–9.5 and in presence of 1% NaCl, forming greenish-black coloured colonies on GYM Streptomyces agar. Chemotaxonomic and molecular characteristics of the isolate matched those described for other representatives of the genus Geodermatophilus. The peptidoglycan contained meso-diaminopimelic acid as diagnostic diaminoacid. The main phospholipids were phosphatidylethanolamine, phosphatidylcholine, phosphatidylinositol, and small amount of diphosphatidylglycerol. MK-9(H4) was the dominant menaquinone and galactose was detected as diagnostic sugar. The major cellular fatty acids were branched-chain saturated acids iso-C16:0 and iso-C15:0 and the unsaturated C17:1ω8c and C16:1ω7c. The 16S rRNA gene showed 97.4–99.1% sequence identity with the other representatives of genus Geodermatophilus. Based on phenotypic results and 16S rRNA gene sequence analysis, strain G18T is proposed to represent a novel species, Geodermatophilus poikilotrophi. Type strain is G18T (= DSM 44209T = CCUG 63018T). The INSDC accession number is HF970583. The novel R software package lethal was used to compute the lethal doses with confidence intervals resulting from tolerance experiments.
Thermotoga thermarum Windberger et al. 1989 is a member to the genomically well characterized genus Thermotoga in the phylum ‘Thermotogae’. T. thermarum is of interest for its origin from a continental solfataric spring vs. predominantly marine oil reservoirs of other members of the genus. The genome of strain LA3T also provides fresh data for the phylogenomic positioning of the (hyper-)thermophilic bacteria. T. thermarum strain LA3T is the fourth sequenced genome of a type strain from the genus Thermotoga, and the sixth in the family Thermotogaceae to be formally described in a publication. Phylogenetic analyses do not reveal significant discrepancies between the current classification of the group, 16S rRNA gene data and whole-genome sequences. Nevertheless, T. thermarum significantly differs from other Thermotoga species regarding its iron-sulfur cluster synthesis, as it contains only a minimal set of the necessary proteins. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 2,039,943 bp long chromosome with its 2,015 protein-coding and 51 RNA genes is a part of the Genomic Encyclopedia of Bacteria and Archaea project.
anaerobic; motile; thermophilic; chemoorganotrophic; solfataric spring; outer sheath-like structure; Thermotogaceae; GEBA
Methanoplanus limicola Wildgruber et al. 1984 is a mesophilic methanogen that was isolated from a swamp composed of drilling waste near Naples, Italy, shortly after the Archaea were recognized as a separate domain of life. Methanoplanus is the type genus in the family Methanoplanaceae, a taxon that felt into disuse since modern 16S rRNA gene sequences-based taxonomy was established. Methanoplanus is now placed within the Methanomicrobiaceae, a family that is so far poorly characterized at the genome level. The only other type strain of the genus with a sequenced genome, Methanoplanus petrolearius SEBR 4847T, turned out to be misclassified and required reclassification to Methanolacinia. Both, Methanoplanus and Methanolacinia, needed taxonomic emendations due to a significant deviation of the G+C content of their genomes from previously published (pre-genome-sequence era) values. Until now genome sequences were published for only four of the 33 species with validly published names in the Methanomicrobiaceae. Here we describe the features of M. limicola, together with the improved-high-quality draft genome sequence and annotation of the type strain, M3T. The 3,200,946 bp long chromosome (permanent draft sequence) with its 3,064 protein-coding and 65 RNA genes is a part of the Genomic
anaerobic; motile; mesophilic; methanogen; swamp; improved-high-quality draft; Methanomicrobiaceae; GEBA
Frateuria aurantia (ex Kondô and Ameyama 1958) Swings et al. 1980 is a member of the bispecific genus Frateuria in the family Xanthomonadaceae, which is already heavily targeted for non-type strain genome sequencing. Strain Kondô 67T was initially (1958) identified as a member of ‘Acetobacter aurantius’, a name that was not considered for the approved list. Kondô 67T was therefore later designated as the type strain of the newly proposed acetogenic species Frateuria aurantia. The strain is of interest because of its triterpenoids (hopane family). F. aurantia Kondô 67T is the first member of the genus Frateura whose genome sequence has been deciphered, and here we describe the features of this organism, together with the complete genome sequence and annotation. The 3,603,458-bp long chromosome with its 3,200 protein-coding and 88 RNA genes is a part of the Genomic
strictly aerobic; motile; rod-shaped; acetogenic; mesophilic; ‘Acetobacter aurantius’; Xanthomonadaceae; GEBA
Thermanaerovibrio velox Zavarzina et al. 2000 is a member of the Synergistaceae, a family in the phylum Synergistetes that is already well-characterized at the genome level. Members of this phylum were described as Gram-negative staining anaerobic bacteria with a rod/vibrioid cell shape and possessing an atypical outer cell envelope. They inhabit a large variety of anaerobic environments including soil, oil wells, wastewater treatment plants and animal gastrointestinal tracts. They are also found to be linked to sites of human diseases such as cysts, abscesses, and areas of periodontal disease. The moderately thermophilic and organotrophic T. velox shares most of its morphologic and physiologic features with the closely related species, T. acidaminovorans. In addition to Su883T, the type strain of T. acidaminovorans, stain Z-9701T is the second type strain in the genus Thermanaerovibrio to have its genome sequence published. Here we describe the features of this organism, together with the non-contiguous genome sequence and annotation. The 1,880,838 bp long chromosome (non-contiguous finished sequence) with its 1,751 protein-coding and 59 RNA genes is a part of the Genomic
obligate anaerobic; motile; curved rods; organotrophic; S0-reduction; cyanobacterial mat; Synergistaceae; Synergistetes; GEBA
Saccharomonospora cyanea Runmao et al. 1988 is a member of the genus Saccharomonospora in the family Pseudonocardiaceae that is moderately well characterized at the genome level thus far. Members of the genus Saccharomonospora are of interest because they originate from diverse habitats, such as soil, leaf litter, manure, compost, surface of peat, moist, over-heated grain, and ocean sediment, where they probably play a role in the primary degradation of plant material by attacking hemicellulose. Species of the genus Saccharomonospora are usually Gram-positive, non-acid fast, and are classified among the actinomycetes. S. cyanea is characterized by a dark blue (= cyan blue) aerial mycelium. After S. viridis, S. azurea, and S. marina, S. cyanea is only the fourth member in the genus for which a completely sequenced (non-contiguous finished draft status) type strain genome will be published. Here we describe the features of this organism, together with the draft genome sequence, and annotation. The 5,408,301 bp long chromosome with its 5,139 protein-coding and 57 RNA genes was sequenced as part of the DOE funded Community Sequencing Program (CSP) 2010 at the Joint Genome Institute (JGI).
draft genome; aerobic; chemoheterotrophic; Gram-positive; vegetative and aerial mycelia; spore-forming; non-motile; soil bacterium; Pseudonocardiaceae; CSP 2010
Rubellimicrobium thermophilum Denner et al. 2006 is the type species of the genus Rubellimicrobium, a representative of the Roseobacter clade within the Rhodobacteraceae. Members of this clade were shown to be abundant especially in coastal and polar waters, but were also found in microbial mats and sediments. They are metabolically versatile and form a physiologically heterogeneous group within the Alphaproteobacteria. Strain C-Ivk-R2A-2T was isolated from colored deposits in a pulp dryer; however, its natural habitat is so far unknown. Here we describe the features of this organism, together with the draft genome sequence and annotation and novel aspects of its phenotype. The 3,161,245 bp long genome contains 3,243 protein-coding and 45 RNA genes.
rod-shaped; reddish-pigmented; thermophile; chemoheterotrophic; prophage-like structures; Rhodobacteraceae; Roseobacter clade; Alphaproteobacteria
Leptonema illini Hovind-Hougen 1979 is the type species of the genus Leptonema, family Leptospiraceae, phylum Spirochaetes. Organisms of this family have a Gram-negative-like cell envelope consisting of a cytoplasmic membrane and an outer membrane. The peptidoglycan layer is associated with the cytoplasmic rather than the outer membrane. The two flagella of members of Leptospiraceae extend from the cytoplasmic membrane at the ends of the bacteria into the periplasmic space and are necessary for their motility. Here we describe the features of the L. illini type strain, together with the complete genome sequence, and annotation. This is the first genome sequence (finished at the level of Improved High Quality Draft) to be reported from of a member of the genus Leptonema and a representative of the third genus of the family Leptospiraceae for which complete or draft genome sequences are now available. The three scaffolds of the 4,522,760 bp draft genome sequence reported here, and its 4,230 protein-coding and 47 RNA genes are part of the Genomic
Gram-negative; flexible; motile; cytoplasmatic tubules; non-sporulating; axial flagella; aerobic; chemoorganotrophic; Leptospiraceae; GEBA
Turneriella parva Levett et al. 2005 is the only species of the genus Turneriella which was established as a result of the reclassification of Leptospira parva Hovind-Hougen et al. 1982. Together with Leptonema and Leptospira, Turneriella constitutes the family Leptospiraceae, within the order Spirochaetales. Here we describe the features of this free-living aerobic spirochete together with the complete genome sequence and annotation. This is the first complete genome sequence of a member of the genus Turneriella and the 13th member of the family Leptospiraceae for which a complete or draft genome sequence is now available. The 4,409,302 bp long genome with its 4,169 protein-coding and 45 RNA genes is part of the Genomic
Gram-negative; motile; axial filaments; helical; flexible; non-sporulating; aerobic; mesophile; Leptospiraceae; GEBA
Spirochaeta africana Zhilina et al. 1996 is an anaerobic, aerotolerant, spiral-shaped bacterium that is motile via periplasmic flagella. The type strain of the species, Z-7692T, was isolated in 1993 or earlier from a bacterial bloom in the brine under the trona layer in a shallow lagoon of the alkaline equatorial Lake Magadi in Kenya. Here we describe the features of this organism, together with the complete genome sequence, and annotation. Considering the pending reclassification of S. caldaria to the genus Treponema, S. africana is only the second 'true' member of the genus Spirochaeta with a genome-sequenced type strain to be published. The 3,285,855 bp long genome of strain Z-7692T with its 2,817 protein-coding and 57 RNA genes is a part of the G enomic
E ncyclopedia of
B acteria and
A rchaea project.
anaerobic; aerotolerant; mesophilic; halophilic; spiral-shaped; motile; periplasmic flagella; Gram-negative; chemoorganotrophic; Spirochaetaceae; GEBA
Coriobacterium glomerans Haas and König 1988, is the only species of the genus Coriobacterium, family Coriobacteriaceae, order Coriobacteriales, phylum Actinobacteria. The bacterium thrives as an endosymbiont of pyrrhocorid bugs, i.e. the red fire bug Pyrrhocoris apterus L. The rationale for sequencing the genome of strain PW2T is its endosymbiotic life style which is rare among members of Actinobacteria. Here we describe the features of this symbiont, together with the complete genome sequence and its annotation. This is the first complete genome sequence of a member of the genus Coriobacterium and the sixth member of the order Coriobacteriales for which complete genome sequences are now available. The 2,115,681 bp long single replicon genome with its 1,804 protein-coding and 54 RNA genes is part of the Genomic
Gram-positive; non-motile; non-sporulating; obligatory anaerobic; chemoorganotroph; mesophile; endosymbiont; insect intestinal tract; Coriobacteriaceae; Actinobacteria; GEBA
At present, Joostella marina Quan et al. 2008 is the sole species with a validly published name in the genus Joostella, family Flavobacteriacae, phylum Bacteriodetes. It is a yellow-pigmented, aerobic, marine organism about which little has been reported other than the chemotaxonomic features required for initial taxonomic description. The genome of J. marina strain En5T complements a list of 16 Flavobacteriaceae strains for which complete genomes and draft genomes are currently available. Here we describe the features of this bacterium, together with the complete genome sequence, and annotation. This is the first member of the genus Joostella for which a complete genome sequence becomes available. The 4,508,243 bp long single replicon genome with its 3,944 protein-coding and 60 RNA genes is part of the Genomic
Gram-negative; non-motile; aerobic; mesophile; Flavobacteriaceae; Bacteroidetes; GEBA
Anaerobaculum mobile Menes and Muxí 2002 is one of three described species of the genus Anaerobaculum, family Synergistaceae, phylum Synergistetes. This anaerobic and motile bacterium ferments a range of carbohydrates and mono- and dicarboxylic acids with acetate, hydrogen and CO2 as end products. A. mobile NGAT is the first member of the genus Anaerobaculum and the sixth member of the phylum Synergistetes with a completely sequenced genome. Here we describe the features of this bacterium, together with the complete genome sequence, and annotation. The 2,160,700 bp long single replicon genome with its 2,053 protein-coding and 56 RNA genes is part of the Genomic
Gram-negative; rod-shaped; motile; flagellum; non-spore forming; anaerobic; chemoorganotrophic; crotonate-reducer; Synergistetes; Synergistaceae; GEBA
Alistipes finegoldii Rautio et al. 2003 is one of five species of Alistipes with a validly published name: family Rikenellaceae, order Bacteroidetes, class Bacteroidia, phylum Bacteroidetes. This rod-shaped and strictly anaerobic organism has been isolated mostly from human tissues. Here we describe the features of the type strain of this species, together with the complete genome sequence, and annotation. A. finegoldii is the first member of the genus Alistipes for which the complete genome sequence of its type strain is now available. The 3,734,239 bp long single replicon genome with its 3,302 protein-coding and 68 RNA genes is part of the Genomic
Gram-negative; rod-shaped; non-sporulating; non-motile; mesophile; strictly anaerobic; chemoorganotrophic; Rikenellaceae; GEBA
Spirochaeta caldaria Pohlschroeder et al. 1995 is an obligately anaerobic, spiral-shaped bacterium that is motile via periplasmic flagella. The type strain, H1T, was isolated in 1990 from cyanobacterial mat samples collected at a freshwater hot spring in Oregon, USA, and is of interest because it enhances the degradation of cellulose when grown in co-culture with Clostridium thermocellum. Here we provide a taxonomic re-evaluation for S. caldaria based on phylogenetic analyses of 16S rRNA sequences and whole genomes, and propose the reclassification of S. caldaria and two other Spirochaeta species as members of the emended genus Treponema. Whereas genera such as Borrelia and Sphaerochaeta possess well-distinguished genomic features related to their divergent lifestyles, the physiological and functional genomic characteristics of Spirochaeta and Treponema appear to be intermixed and are of little taxonomic value. The 3,239,340 bp long genome of strain H1T with its 2,869 protein-coding and 59 RNA genes is a part of the Genomic
obligately anaerobic; thermophilic; spiral-shaped; motile; periplasmic flagella; Gram-negative; chemoorganotrophic; Spirochaetaceae; Spirochaeta; Treponema; GEBA
Labrenzia alexandrii Biebl et al. 2007 is a marine member of the family Rhodobacteraceae in the order Rhodobacterales, which has thus far only partially been characterized at the genome level. The bacterium is of interest because it lives in close association with the toxic dinoflagellate Alexandrium lusitanicum. Ultrastructural analysis reveals R-bodies within the bacterial cells, which are primarily known from obligate endosymbionts that trigger “killing traits” in ciliates (Paramecium spp.). Genomic traits of L. alexandrii DFL-11T are in accordance with these findings, as they include the reb genes putatively involved in R-body synthesis. Analysis of the two extrachromosomal elements suggests a role in heavy-metal resistance and exopolysaccharide formation, respectively. The 5,461,856 bp long genome with its 5,071 protein-coding and 73 RNA genes consists of one chromosome and two plasmids, and has been sequenced in the context of the Marine Microbial Initiative.
aerobe; motile; symbiosis; dinoflagellates; photoheterotroph; high-quality draft; Alexandrium lusitanicum; Alphaproteobacteria
For the last 25 years species delimitation in prokaryotes (Archaea and Bacteria) was to a large extent based on DNA-DNA hybridization (DDH), a tedious lab procedure designed in the early 1970s that served its purpose astonishingly well in the absence of deciphered genome sequences. With the rapid progress in genome sequencing time has come to directly use the now available and easy to generate genome sequences for delimitation of species. GBDP (Genome Blast Distance Phylogeny) infers genome-to-genome distances between pairs of entirely or partially sequenced genomes, a digital, highly reliable estimator for the relatedness of genomes. Its application as an in-silico replacement for DDH was recently introduced. The main challenge in the implementation of such an application is to produce digital DDH values that must mimic the wet-lab DDH values as close as possible to ensure consistency in the Prokaryotic species concept.
Correlation and regression analyses were used to determine the best-performing methods and the most influential parameters. GBDP was further enriched with a set of new features such as confidence intervals for intergenomic distances obtained via resampling or via the statistical models for DDH prediction and an additional family of distance functions. As in previous analyses, GBDP obtained the highest agreement with wet-lab DDH among all tested methods, but improved models led to a further increase in the accuracy of DDH prediction. Confidence intervals yielded stable results when inferred from the statistical models, whereas those obtained via resampling showed marked differences between the underlying distance functions.
Despite the high accuracy of GBDP-based DDH prediction, inferences from limited empirical data are always associated with a certain degree of uncertainty. It is thus crucial to enrich in-silico DDH replacements with confidence-interval estimation, enabling the user to statistically evaluate the outcomes. Such methodological advancements, easily accessible through the web service at http://ggdc.dsmz.de, are crucial steps towards a consistent and truly genome sequence-based classification of microorganisms.
Archaea; Bacteria; BLAST; DDH; GGD; GGDC; GBDP; Genomics; MUMmer; Phylogeny; Species concept; Taxonomy
Desulfotomaculum ruminis Campbell and Postgate 1965 is a member of the large genus Desulfotomaculum which contains 30 species and is contained in the family Peptococcaceae. This species is of interest because it represents one of the few sulfate-reducing bacteria that have been isolated from the rumen. Here we describe the features of D. ruminis together with the complete genome sequence and annotation. The 3,969,014 bp long chromosome with a total of 3,901 protein-coding and 85 RNA genes is the second completed genome sequence of a type strain of the genus Desulfotomaculum to be published, and was sequenced as part of the DOE Joint Genome Institute Community Sequencing Program 2009.
anaerobic; motile; sporulating; mesophilic; sulfate-reducer; hydrogen sulfide; incomplete oxidizer; mixotrophic; CSP 2009; Peptococcaceae; Clostridiales
Niabella soli Weon et al. 2008 is a member of the Chitinophagaceae, a family within the class Sphingobacteriia that is poorly characterized at the genome level, thus far. N. soli strain JS13-8T is of interest for its ability to produce a variety of glycosyl hydrolases. The genome of N. soli strain JS13-8T is only the second genome sequence of a type strain from the family Chitinophagaceae to be published, and the first one from the genus Niabella. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 4,697,343 bp long chromosome with its 3,931 protein-coding and 49 RNA genes is a part of the Genomic
aerobic; non-motile; Gram-negative; mesophilic; chemoorganotrophic; glycosyl hydrolases; soil; Chitinophagaceae; GEBA
Gillisia limnaea Van Trappen et al. 2004 is the type species of the genus Gillisia, which is a member of the well characterized family Flavobacteriaceae. The genome of G. limnea R-8282T is the first sequenced genome (permanent draft) from a type strain of the genus Gillisia. Here we describe the features of this organism, together with the permanent-draft genome sequence and annotation. The 3,966,857 bp long chromosome (two scaffolds) with its 3,569 protein-coding and 51 RNA genes is a part of the Genomic
Encyclopedia of Bacteria and Archaea project.
aerobic; motile; rod-shaped; moderately halotolerant; psychrophilic; chemoheterotrophic; proteorhodopsin; microbial mat; yellow-pigmented; Flavobacteriaceae; GEBA
Owenweeksia hongkongensis Lau et al. 2005 is the sole member of the monospecific genus Owenweeksia in the family Cryomorphaceae, a poorly characterized family at the genome level thus far. This family comprises seven genera within the class Flavobacteria. Family members are known to be psychrotolerant, rod-shaped and orange pigmented (β-carotene), typical for Flavobacteria. For growth, seawater and complex organic nutrients are necessary. The genome of O. hongkongensis UST20020801T is only the second genome of a member of the family Cryomorphaceae whose sequence has been deciphered. Here we describe the features of this organism, together with the complete genome sequence and annotation. The 4,000,057 bp long chromosome with its 3,518 protein-coding and 45 RNA genes is a part of the Genomic
aerobic; motile; rod-shaped; mesophilic; non-fermentative; Gram-negative; orange-pigmented sea water; Bacteroidetes; Flavobacteria; Cryomorphaceae; GEBA
Starkeya novella (Starkey 1934) Kelly et al. 2000 is a member of the family Xanthobacteraceae in the order ‘Rhizobiales’, which is thus far poorly characterized at the genome level. Cultures from this species are most interesting due to their facultatively chemolithoautotrophic lifestyle, which allows them to both consume carbon dioxide and to produce it. This feature makes S. novella an interesting model organism for studying the genomic basis of regulatory networks required for the switch between consumption and production of carbon dioxide, a key component of the global carbon cycle. In addition, S. novella is of interest for its ability to grow on various inorganic sulfur compounds and several C1-compounds such as methanol. Besides Azorhizobium caulinodans, S. novella is only the second species in the family Xanthobacteraceae with a completely sequenced genome of a type strain. The current taxonomic classification of this group is in significant conflict with the 16S rRNA data. The genomic data indicate that the physiological capabilities of the organism might have been underestimated. The 4,765,023 bp long chromosome with its 4,511 protein-coding and 52 RNA genes was sequenced as part of the DOE Joint Genome Institute Community Sequencing Program (CSP) 2008.
strictly aerobic; facultatively chemoautotrophic; methylotrophic and heterotrophic; Gram-negative; rod-shaped; non-motile; soil bacterium; Xanthobacteraceae; CSP 2008
Sulfobacillus acidophilus Norris et al. 1996 is a member of the genus Sulfobacillus which comprises five species of the order Clostridiales. Sulfobacillus species are of interest for comparison to other sulfur and iron oxidizers and also have biomining applications. This is the first completed genome sequence of a type strain of the genus Sulfobacillus, and the second published genome of a member of the species S. acidophilus. The genome, which consists of one chromosome and one plasmid with a total size of 3,557,831 bp harbors 3,626 protein-coding and 69 RNA genes, and is a part of the Genomic
aerobic; motile; Gram-positive; acidophilic; moderately thermophilic; sulfide- and iron-oxidizing; biomining; autotrophic; mixotrophic; soil; insertis sedis; Clostridiales; GEBA