Many thermophiles and hyperthermophiles have been isolated from hot springs and other thermal environments (
1). The complete genome sequences of 19 thermophilic or hyperthermophilic prokaryotic species have been determined. The genomic information should facilitate the study of thermophily in the prokaryotic cells and thermostability of the proteins. In fact, the features of the genomic sequence, which discriminate between thermophiles and mesophiles, can be simply identified by using principal component analysis (PCA) of the amino acid composition (
2–
4) or relative synonymous codon usage (
3–
5).
Comparative genomics is another useful approach for extracting candidate genes associated with thermophily. A previous study has shown that a phylogenetic pattern search against the clusters of orthologous groups (COGs) database (
6) retrieved only one hyperthermophile-specific gene: reverse gyrase (
7). Reverse gyrase, which is similar to some type I DNA topoisomerases from mesophiles, is thought to help DNA to function at high temperatures by increasing topological links between the two DNA strands. Indeed, the reverse gyrase gene has been identified in the genomes of hyperthermophiles, except the recently determined
Thermus thermophilus genome (
8). Despite this remarkable result, many other crucial genes responsible for thermophily are probably still hidden in the genome. However, identifying such genes through comparison of a variety of genomes is generally not easy, because phylogenetically related thermophiles share many genes that are not directly associated with thermophily, and phylogenetically distant thermophiles may have different mechanisms for thermoadaptation. One of the effective approaches in revealing thermophily-related genes based on genomic information is to compare genomes between closely related organisms, including both thermophiles and mesophiles. This approach is also effective for understanding thermoadaptation from the viewpoint of evolution, although the genomic sequences from an appropriate set of organisms are needed, which have not yet been obtained.
Aerobic endospore-forming Gram-positive
Bacillus-related species have been isolated from various terrestrial soils and deep-sea sediments (
9–
11). It is known that
Bacillus-related species can grow in a wide range of environments, at pH 2–12, in temperatures between 5 and 78°C, in salinity from 0 to 30% NaCl, and in pressures from 0.1 MPa (atmospheric pressure) to at least 30 MPa (corresponding to the pressure at a depth of 3000 m) (
12,
13). The complete genome sequences of five mesophilic bacilli with different phenotypic properties,
Bacillus subtilis (
14),
Bacillus halodurans, (
15),
Oceanobacillus iheyensis (
16),
Bacillus cereus (
17) and
Bacillus anthracis (
18), have already been determined, although the complete genome sequence of a thermophilic
Bacillus-related species has not yet been established. These species are positioned as representatives of major diverged clusters in the 16S rRNA tree (Supplementary Figure 1).
Geobacillus kaustophilus HTA426, which was isolated from the deep-sea sediment of the Mariana Trench (
19,
20), is a thermophilic
Bacillus-related species whose upper temperature limit for growth is 74°C (optimally 60°C). It is known that there are at least 12 other thermophilic
Geobacillus species, which have been reclassified from the genus
Bacillus (
21).
Here, we report the complete nucleotide sequence of the genome of G.kaustophilus. We provide the first comparative analysis of the thermophilic genome with those of five other phylogenetically related mesophilic bacilli, B.subtilis, B.halodurans, O.iheyensis, B.anthracis and B.cereus, in order to highlight the thermophilic features of the genome. Special emphasis is placed on the mechanisms of adaptation of the bacilli to high-temperature environments.