For decades there has been increasing interest in understanding the relationships between microbial communities and ecosystem functions. Current DNA sequencing technologies allows for the exploration of microbial communities in two principle ways: targeted rRNA gene surveys and shotgun metagenomics. For large study designs, it is often still prohibitively expensive to sequence metagenomes at both the breadth and depth necessary to statistically capture the true functional diversity of a community. Although rRNA gene surveys provide no direct evidence of function, they do provide a reasonable estimation of microbial diversity, while being a very cost-effective way to screen samples of interest for later shotgun metagenomic analyses. However, there is a great deal of 16S rRNA gene survey data currently available from diverse environments, and thus a need for tools to infer functional composition of environmental samples based on 16S rRNA gene survey data.
We present a computational method called pangenome-based functional profiles (PanFP), which infers functional profiles of microbial communities from 16S rRNA gene survey data for Bacteria and Archaea. PanFP is based on pangenome reconstruction of a 16S rRNA gene operational taxonomic unit (OTU) from known genes and genomes pooled from the OTU’s taxonomic lineage. From this lineage, we derive an OTU functional profile by weighting a pangenome’s functional profile with the OTUs abundance observed in a given sample. We validated our method by comparing PanFP to the functional profiles obtained from the direct shotgun metagenomic measurement of 65 diverse communities via Spearman correlation coefficients. These correlations improved with increasing sequencing depth, within the range of 0.8–0.9 for the most deeply sequenced Human Microbiome Project mock community samples. PanFP is very similar in performance to another recently released tool, PICRUSt, for almost all of survey data analysed here. But, our method is unique in that any OTU building method can be used, as opposed to being limited to closed-reference OTU picking strategies against specific reference sequence databases.
We developed an automated computational method, which derives an inferred functional profile based on the 16S rRNA gene surveys of microbial communities. The inferred functional profile provides a cost effective way to study complex ecosystems through predicted comparative functional metagenomes and metadata analysis. All PanFP source code and additional documentation are freely available online at GitHub (https://github.com/srjun/PanFP).
Electronic supplementary material
The online version of this article (doi:10.1186/s13104-015-1462-8) contains supplementary material, which is available to authorized users.
Microbial communities; Metagenome; 16S rRNA survey; Pangenome
Polycyclic aromatic hydrocarbons (PAH) are ubiquitous environmental pollutants and microbial biodegradation is an important means of remediation of PAH-contaminated soil. Delftia acidovorans Cs1-4 (formerly Delftia sp. Cs1-4) was isolated by using phenanthrene as the sole carbon source from PAH contaminated soil in Wisconsin. Its full genome sequence was determined to gain insights into a mechanisms underlying biodegradation of PAH. Three genomic libraries were constructed and sequenced: an Illumina GAii shotgun library (916,416,493 reads), a 454 Titanium standard library (770,171 reads) and one paired-end 454 library (average insert size of 8 kb, 508,092 reads). The initial assembly contained 40 contigs in two scaffolds. The 454 Titanium standard data and the 454 paired end data were assembled together and the consensus sequences were computationally shredded into 2 kb overlapping shreds. Illumina sequencing data was assembled, and the consensus sequence was computationally shredded into 1.5 kb overlapping shreds. Gaps between contigs were closed by editing in Consed, by PCR and by Bubble PCR primer walks. A total of 182 additional reactions were needed to close gaps and to raise the quality of the finished sequence. The final assembly is based on 253.3 Mb of 454 draft data (averaging 38.4 X coverage) and 590.2 Mb of Illumina draft data (averaging 89.4 X coverage). The genome of strain Cs1-4 consists of a single circular chromosome of 6,685,842 bp (66.7 %G+C) containing 6,028 predicted genes; 5,931 of these genes were protein-encoding and 4,425 gene products were assigned to a putative function. Genes encoding phenanthrene degradation were localized to a 232 kb genomic island (termed the phn island), which contained near its 3’ end a bacteriophage P4-like integrase, an enzyme often associated with chromosomal integration of mobile genetic elements. Other biodegradation pathways reconstructed from the genome sequence included: benzoate (by the acetyl-CoA pathway), styrene, nicotinic acid (by the maleamate pathway) and the pesticides Dicamba and Fenitrothion. Determination of the complete genome sequence of D. acidovorans Cs1-4 has provided new insights the microbial mechanisms of PAH biodegradation that may shape the process in the environment.
Delftia acidovorans Cs1-4; Genome; phn island; Phenanthrene; polycyclic aromatic hydrocarbons; Nanopods
Desulfovibrio carbinoliphilus subsp. oakridgensis FW-101-2B is an anaerobic, organic acid/alcohol-oxidizing, sulfate-reducing δ-proteobacterium. FW-101-2B was isolated from contaminated groundwater at The Field Research Center at Oak Ridge National Lab after in situ stimulation for heavy metal-reducing conditions. The genome will help elucidate the metabolic potential of sulfate-reducing bacteria during uranium reduction.
The benefits of using transgenic switchgrass with decreased levels of caffeic acid 3-O-methyltransferase (COMT) as biomass feedstock have been clearly demonstrated. However, its effect on the soil microbial community has not been assessed. Here we report metagenomic and metatranscriptomic analyses of root-associated soil from COMT switchgrass compared with nontransgenic counterparts.
Bacteria belonging to the phylum Gemmatimonadetes are found in a wide variety of environments and are particularly abundant in soils. Here, we present the complete genome sequence and methylation pattern of the newly described Gemmatirosa kalamazoonensis type strain.
More than 80% of the microbial genomes in GenBank are of ‘draft’ quality (12,553 draft vs. 2,679 finished, as of October, 2013). We have examined all the microbial DNA sequences available for complete, draft, and Sequence Read Archive genomes in GenBank as well as three other major public databases, and assigned quality scores for more than 30,000 prokaryotic genome sequences.
Scores were assigned using four categories: the completeness of the assembly, the presence of full-length rRNA genes, tRNA composition and the presence of a set of 102 conserved genes in prokaryotes. Most (~88%) of the genomes had quality scores of 0.8 or better and can be safely used for standard comparative genomics analysis. We compared genomes across factors that may influence the score. We found that although sequencing depth coverage of over 100x did not ensure a better score, sequencing read length was a better indicator of sequencing quality. With few exceptions, most of the 30,000 genomes have nearly all the 102 essential genes.
The score can be used to set thresholds for screening data when analyzing “all published genomes” and reference data is either not available or not applicable. The scores highlighted organisms for which commonly used tools do not perform well. This information can be used to improve tools and to serve a broad group of users as more diverse organisms are sequenced. Unexpectedly, the comparison of predicted tRNAs across 15,000 high quality genomes showed that anticodons beginning with an ‘A’ (codons ending with a ‘U’) are almost non-existent, with the exception of one arginine codon (CGU); this has been noted previously in the literature for a few genomes, but not with the depth found here.
DNA; Sequencing; Database; Quality; Evaluation; Status
The thermophilic anaerobe Clostridium thermocellum is a candidate consolidated bioprocessing (CBP) biocatalyst for cellulosic ethanol production. The aim of this study was to investigate C. thermocellum genes required to ferment biomass substrates and to conduct a robust comparison of DNA microarray and RNA sequencing (RNA-seq) analytical platforms.
C. thermocellum ATCC 27405 fermentations were conducted with a 5 g/L solid substrate loading of either pretreated switchgrass or Populus. Quantitative saccharification and inductively coupled plasma emission spectroscopy (ICP-ES) for elemental analysis revealed composition differences between biomass substrates, which may have influenced growth and transcriptomic profiles. High quality RNA was prepared for C. thermocellum grown on solid substrates and transcriptome profiles were obtained for two time points during active growth (12 hours and 37 hours postinoculation). A comparison of two transcriptomic analytical techniques, microarray and RNA-seq, was performed and the data analyzed for statistical significance. Large expression differences for cellulosomal genes were not observed. We updated gene predictions for the strain and a small novel gene, Cthe_3383, with a putative AgrD peptide quorum sensing function was among the most highly expressed genes. RNA-seq data also supported different small regulatory RNA predictions over others. The DNA microarray gave a greater number (2,351) of significant genes relative to RNA-seq (280 genes when normalized by the kernel density mean of M component (KDMM) method) in an analysis of variance (ANOVA) testing method with a 5% false discovery rate (FDR). When a 2-fold difference in expression threshold was applied, 73 genes were significantly differentially expressed in common between the two techniques. Sulfate and phosphate uptake/utilization genes, along with genes for a putative efflux pump system were some of the most differentially regulated transcripts when profiles for C. thermocellum grown on either pretreated switchgrass or Populus were compared.
Our results suggest that a high degree of agreement in differential gene expression measurements between transcriptomic platforms is possible, but choosing an appropriate normalization regime is essential.
Genome; Reannotation; Biomass; Elemental composition; RNA-seq; Microarray; Phosphate; Normalization; Transcriptomics
Granulicella mallensis MP5ACTX8T is a novel species of the genus Granulicella in subdivision 1of Acidobacteria. G. mallensis is of ecological interest being a member of the dominant soil bacterial community active at low temperatures and nutrient limiting conditions in Arctic alpine tundra. G. mallensis is a cold-adapted acidophile and a versatile heterotroph that hydrolyzes a suite of sugars and complex polysaccharides. Genome analysis revealed metabolic versatility with genes involved in metabolism and transport of carbohydrates. These include gene modules encoding the carbohydrate-active enzyme (CAZyme) family involved in breakdown, utilization and biosynthesis of diverse structural and storage polysaccharides including plant based carbon polymers. The genome of Granulicella mallensis MP5ACTX8T consists of a single replicon of 6,237,577 base pairs (bp) with 4,907 protein-coding genes and 53 RNA genes.
cold adapted; acidophile; tundra soil; Acidobacteria
The genomes of the Betaproteobacteria Alicycliphilus denitrificans strains BC and K601T have been sequenced to get insight into the physiology of the two strains. Strain BC degrades benzene with chlorate as electron acceptor. The cyclohexanol-degrading denitrifying strain K601T is not able to use chlorate as electron acceptor, while strain BC cannot degrade cyclohexanol. The 16S rRNA sequences of strains BC and K601T are identical and the fatty acid methyl ester patterns of the strains are similar. Basic Local Alignment Search Tool (BLAST) analysis of predicted open reading frames of both strains showed most hits with Acidovorax sp. JS42, a bacterium that degrades nitro-aromatics. The genomes include strain-specific plasmids (pAlide201 in strain K601T and pAlide01 and pAlide02 in strain BC). Key genes of chlorate reduction in strain BC were located on a 120 kb megaplasmid (pAlide01), which was absent in strain K601T. Genes involved in cyclohexanol degradation were only found in strain K601T. Benzene and toluene are degraded via oxygenase-mediated pathways in both strains. Genes involved in the meta-cleavage pathway of catechol are present in the genomes of both strains. Strain BC also contains all genes of the ortho-cleavage pathway. The large number of mono- and dioxygenase genes in the genomes suggests that the two strains have a broader substrate range than known thus far.
Extremely thermophilic bacteria of the genus Caldicellulosiruptor utilize carbohydrate components of plant cell walls, including cellulose and hemicellulose, facilitated by a diverse set of glycoside hydrolases (GHs). From a biofuel perspective, this capability is crucial for deconstruction of plant biomass into fermentable sugars. While all species from the genus grow on xylan and acid-pretreated switchgrass, growth on crystalline cellulose is variable. The basis for this variability was examined using microbiological, genomic, and proteomic analyses of eight globally diverse Caldicellulosiruptor species. The open Caldicellulosiruptor pangenome (4,009 open reading frames [ORFs]) encodes 106 GHs, representing 43 GH families, but only 26 GHs from 17 families are included in the core (noncellulosic) genome (1,543 ORFs). Differentiating the strongly cellulolytic Caldicellulosiruptor species from the others is a specific genomic locus that encodes multidomain cellulases from GH families 9 and 48, which are associated with cellulose-binding modules. This locus also encodes a novel adhesin associated with type IV pili, which was identified in the exoproteome bound to crystalline cellulose. Taking into account the core genomes, pangenomes, and individual genomes, the ancestral Caldicellulosiruptor was likely cellulolytic and evolved, in some cases, into species that lost the ability to degrade crystalline cellulose while maintaining the capacity to hydrolyze amorphous cellulose and hemicellulose.
Toxic cyanobacterial blooms have persisted in freshwater systems around the world for centuries and appear to be globally increasing in frequency and severity. Toxins produced by bloom-associated cyanobacteria can have drastic impacts on the ecosystem and surrounding communities, and bloom biomass can disrupt aquatic food webs and act as a driver for hypoxia. Little is currently known regarding the genomic content of the Microcystis strains that form blooms or the companion heterotrophic community associated with bloom events. To address these issues, we examined the bloom-associated microbial communities in single samples from Lake Erie (North America), Lake Tai (Taihu, China), and Grand Lakes St. Marys (OH, USA) using comparative metagenomics. Together the Cyanobacteria and Proteobacteria comprised >90% of each bloom bacterial community sample, although the dominant phylum varied between systems. Relative to the existing Microcystis aeruginosa NIES 843 genome, sequences from Lake Erie and Taihu revealed a number of metagenomic islands that were absent in the environmental samples. Moreover, despite variation in the phylogenetic assignments of bloom-associated organisms, the functional potential of bloom members remained relatively constant between systems. This pattern was particularly noticeable in the genomic contribution of nitrogen assimilation genes. In Taihu, the genetic elements associated with the assimilation and metabolism of nitrogen were predominantly associated with Proteobacteria, while these functions in the North American lakes were primarily contributed to by the Cyanobacteria. Our observations build on an emerging body of metagenomic surveys describing the functional potential of microbial communities as more highly conserved than that of their phylogenetic makeup within natural systems.
Paenibacillus sp.Y412MC10 was one of a number of organisms isolated from Obsidian Hot Spring, Yellowstone National Park, Montana, USA under permit from the National Park Service. The isolate was initially classified as a Geobacillus sp. Y412MC10 based on its isolation conditions and similarity to other organisms isolated from hot springs at Yellowstone National Park. Comparison of 16 S rRNA sequences within the Bacillales indicated that Geobacillus sp.Y412MC10 clustered with Paenibacillus species, and the organism was most closely related to Paenibacillus lautus. Lucigen Corp. prepared genomic DNA and the genome was sequenced, assembled, and annotated by the DOE Joint Genome Institute. The genome sequence was deposited at the NCBI in October 2009 (NC_013406). The genome of Paenibacillus sp. Y412MC10 consists of one circular chromosome of 7,121,665 bp with an average G+C content of 51.2%. Comparison to other Paenibacillus species shows the organism lacks nitrogen fixation, antibiotic production and social interaction genes reported in other paenibacilli. The Y412MC10 genome shows a high level of synteny and homology to the draft sequence of Paenibacillus sp. HGF5, an organism from the Human Microbiome Project (HMP) Reference Genomes. This, combined with genomic CAZyme analysis, suggests an intestinal, rather than environmental origin for Y412MC10.
Geobacillus sp. Y412MC10; Paenibacillus sp. Y412MC10; Obsidian Hot Spring
Ruminococcus albus 7 is a highly cellulolytic ruminal bacterium that is a member of the phylum Firmicutes. Here, we describe the complete genome of this microbe. This genome will be useful for rumen microbiology and cellulosome biology and in biofuel production, as one of its major fermentation products is ethanol.
Alicycliphilus denitrificans strain BC and A. denitrificans strain K601T degrade cyclic hydrocarbons. These strains have been isolated from a mixture of wastewater treatment plant material and benzene-polluted soil and from a wastewater treatment plant, respectively, suggesting their role in bioremediation of soil and water. Although the strains are phylogenetically closely related, there are some clear physiological differences. The hydrocarbon cyclohexanol, for example, can be degraded by strain K601T but not by strain BC. Furthermore, both strains can use nitrate and oxygen as an electron acceptor, but only strain BC can use chlorate as electron acceptor. To better understand the nitrate and chlorate reduction mechanisms coupled to the oxidation of cyclic compounds, the genomes of A. denitrificans strains BC and K601T were sequenced. Here, we report the complete genome sequences of A. denitrificans strains BC and K601T.
Desulfovibrio alaskensis G20 (formerly Desulfovibrio desulfuricans G20) is a Gram-negative mesophilic sulfate-reducing bacterium (SRB), known to corrode ferrous metals and to reduce toxic radionuclides and metals such as uranium and chromium to sparingly soluble and less toxic forms. We present the 3.7-Mb genome sequence to provide insights into its physiology.
Halanaerobium hydrogenoformans is an alkaliphilic bacterium capable of biohydrogen production at pH 11 and 7% (wt/vol) salt. We present the 2.6-Mb genome sequence to provide insights into its physiology and potential for bioenergy applications.
Here we present the genome of strain Exiguobacterium sp. AT1b, a thermophilic member of the genus Exiguobacterium whose representatives were isolated from various environments along a thermal and physicochemical gradient. This genome was sequenced to be a comparative resource for the study of thermal adaptation with a psychroactive representative of the genus, Exiguobacterium sibiricum strain 255-15, that was previously sequenced by the U.S. Department of Energy's (DOE's) Joint Genome Institute (JGI) (http://genome.ornl.gov/microbial/exig/).
Cellulosilyticum lentocellum DSM 5427 is an anaerobic, endospore-forming member of the Firmicutes. We describe the complete genome sequence of this cellulose-degrading bacterium, which was originally isolated from estuarine sediment of a river that received both domestic and paper mill waste. Comparative genomics of cellulolytic clostridia will provide insight into factors that influence degradation rates.
Desulfovibrio desulfuricans strain ND132 is an anaerobic sulfate-reducing bacterium (SRB) capable of producing methylmercury (MeHg), a potent human neurotoxin. The mechanism of methylation by this and other organisms is unknown. We present the 3.8-Mb genome sequence to provide further insight into microbial mercury methylation.
The genus Caldicellulosiruptor contains the most thermophilic, plant biomass-degrading bacteria isolated to date. Previously, genome sequences from three cellulolytic members of this genus were reported (C. saccharolyticus, C. bescii, and C. obsidiansis). To further explore the physiological and biochemical basis for polysaccharide degradation within this genus, five additional genomes were sequenced: C. hydrothermalis, C. kristjanssonii, C. kronotskyensis, C. lactoaceticus, and C. owensensis. Taken together, the seven completed and one draft-phase Caldicellulosiruptor genomes suggest that, while central metabolism is highly conserved, significant differences in glycoside hydrolase inventories and numbers of carbohydrate transporters exist, a finding which likely relates to variability observed in plant biomass degradation capacity.
Nocardioides sp. strain JS614 grows on ethene and vinyl chloride (VC) as sole carbon and energy sources and is of interest for bioremediation and biocatalysis. Sequencing of the complete genome of JS614 provides insight into the genetic basis of alkene oxidation, supports ongoing research into the physiology and biochemistry of growth on ethene and VC, and provides biomarkers to facilitate detection of VC/ethene oxidizers in the environment. This is the first genome sequence from the genus Nocardioides and the first genome of a VC/ethene-oxidizing bacterium.
Chloroflexus aurantiacus is a thermophilic filamentous anoxygenic phototrophic (FAP) bacterium, and can grow phototrophically under anaerobic conditions or chemotrophically under aerobic and dark conditions. According to 16S rRNA analysis, Chloroflexi species are the earliest branching bacteria capable of photosynthesis, and Cfl. aurantiacus has been long regarded as a key organism to resolve the obscurity of the origin and early evolution of photosynthesis. Cfl. aurantiacus contains a chimeric photosystem that comprises some characters of green sulfur bacteria and purple photosynthetic bacteria, and also has some unique electron transport proteins compared to other photosynthetic bacteria.
The complete genomic sequence of Cfl. aurantiacus has been determined, analyzed and compared to the genomes of other photosynthetic bacteria.
Abundant genomic evidence suggests that there have been numerous gene adaptations/replacements in Cfl. aurantiacus to facilitate life under both anaerobic and aerobic conditions, including duplicate genes and gene clusters for the alternative complex III (ACIII), auracyanin and NADH:quinone oxidoreductase; and several aerobic/anaerobic enzyme pairs in central carbon metabolism and tetrapyrroles and nucleic acids biosynthesis. Overall, genomic information is consistent with a high tolerance for oxygen that has been reported in the growth of Cfl. aurantiacus. Genes for the chimeric photosystem, photosynthetic electron transport chain, the 3-hydroxypropionate autotrophic carbon fixation cycle, CO2-anaplerotic pathways, glyoxylate cycle, and sulfur reduction pathway are present. The central carbon metabolism and sulfur assimilation pathways in Cfl. aurantiacus are discussed. Some features of the Cfl. aurantiacus genome are compared with those of the Roseiflexus castenholzii genome. Roseiflexus castenholzii is a recently characterized FAP bacterium and phylogenetically closely related to Cfl. aurantiacus. According to previous reports and the genomic information, perspectives of Cfl. aurantiacus in the evolution of photosynthesis are also discussed.
The genomic analyses presented in this report, along with previous physiological, ecological and biochemical studies, indicate that the anoxygenic phototroph Cfl. aurantiacus has many interesting and certain unique features in its metabolic pathways. The complete genome may also shed light on possible evolutionary connections of photosynthesis.
Modern methods to develop microbe-based biomass conversion processes require a system-level understanding of the microbes involved. Clostridium species have long been recognized as ideal candidates for processes involving biomass conversion and production of various biofuels and other industrial products. To expand the knowledge base for clostridial species relevant to current biofuel production efforts, we have sequenced the genomes of 20 species spanning multiple genera. The majority of species sequenced fall within the class III cellulosome-encoding Clostridium and the class V saccharolytic Thermoanaerobacteraceae. Species were chosen based on representation in the experimental literature as model organisms, ability to degrade cellulosic biomass either by free enzymes or by cellulosomes, ability to rapidly ferment hexose and pentose sugars to ethanol, and ability to ferment synthesis gas to ethanol. The sequenced strains significantly increase the number of noncommensal/nonpathogenic clostridial species and provide a key foundation for future studies of biomass conversion, cellulosome composition, and clostridial systems biology.
Caldicellulosiruptor obsidiansis OB47T (ATCC BAA-2073, JCM 16842) is an extremely thermophilic, anaerobic bacterium capable of hydrolyzing plant-derived polymers through the expression of multidomain/multifunctional hydrolases. The complete genome sequence reveals a diverse set of carbohydrate-active enzymes and provides further insight into lignocellulosic biomass hydrolysis at high temperatures.