Members of the genus Caldicellulosiruptor
within the order Clostridiales
can solubilize cellulose at extremely thermophilic growth temperatures (65 to 80°C). Caldicellulosiruptor obsidiansis
was isolated from Obsidian Pool, Yellowstone National Park, in enrichment cultures containing dilute acid-pretreated switchgrass as the primary carbon and energy source for cultivation (5
). High-temperature saccharification can promote higher hydrolysis rates while reducing cooling costs following biomass pretreatment and suppressing contamination in reactors (9
). Given the organism's rapid growth on cellulosic substrates and ability to use a wide range of plant-derived sugars, a complete genome sequence was determined using a sequencing-by-synthesis approach.
The genome of C
was sequenced by the U.S. Department of Energy (DOE) Joint Genome Institute (JGI) using a combination of Illumina (1
) and 454 technologies (8
). All of the general aspects of library construction and sequencing performed at the JGI can be found at http://www.jgi.doe.gov/
. Illumina sequencing data were assembled with VELVET (10
), and the consensus sequences were shredded into 1.5-kbp overlapped fake reads and assembled together with the 454 data. The initial Newbler assembly contained 64 contigs in two scaffolds. The initial 454 assembly was converted into a Phrap assembly by making fake reads from the consensus and collecting the read pairs in the 454 paired-end library. The Phred/Phrap/Consed software package was used for sequence assembly and quality assessment (2
) in the following finishing process. Illumina data were used to correct potential base errors and increase consensus quality using the Polisher software developed at the JGI (Alla Lapidus, unpublished data). After the shotgun stage, reads were assembled with parallel Phrap (High Performance Software, LLC). Possible misassemblies were corrected with gapResolution (Cliff Han, unpublished data), Dupfinisher (6
), or sequencing of cloned bridging PCR fragments with subcloning. Gaps between contigs were closed by editing in Consed, by PCR, and by Bubble PCR primer walks. A total of 773 additional reactions and seven shatter libraries were necessary to close gaps and to raise the quality of the finished sequence. The genome was annotated at Oak Ridge National Laboratory using the automated annotation pipeline, which is driven by the gene prediction algorithm Prodigal (7
). Annotation quality was verified by the JGI.
Although many well-characterized bacteria and fungi can use cellulose, C. obsidiansis was selected and isolated specifically for its ability to deconstruct potential bioenergy feedstocks (e.g., pretreated switchgrass or Populus sp.). Through high-throughput sequencing of novel strains relevant to different aspects of renewable energy production, genome-enabled technologies can be used to discover important cellular properties (such as the secretion of hydrolytic enzymes). Making the genome sequence of C. obsidiansis OB47T available will allow comprehensive comparisons with other members of the genus and enable further investigation into the mechanisms employed by microorganisms to solubilize lignocellulosic materials at elevated temperatures.