|Home | About | Journals | Submit | Contact Us | Français|
We present herein the first complete genome sequence of a thermophilic Bacillus-related species, Geobacillus kaustophilus HTA426, which is composed of a 3.54 Mb chromosome and a 47.9 kb plasmid, along with a comparative analysis with five other mesophilic bacillar genomes. Upon orthologous grouping of the six bacillar sequenced genomes, it was found that 1257 common orthologous groups composed of 1308 genes (37%) are shared by all the bacilli, whereas 839 genes (24%) in the G.kaustophilus genome were found to be unique to that species. We were able to find the first prokaryotic sperm protamine P1 homolog, polyamine synthase, polyamine ABC transporter and RNA methylase in the 839 unique genes; these may contribute to thermophily by stabilizing the nucleic acids. Contrasting results were obtained from the principal component analysis (PCA) of the amino acid composition and synonymous codon usage for highlighting the thermophilic signature of the G.kaustophilus genome. Only in the PCA of the amino acid composition were the Bacillus-related species located near, but were distinguishable from, the borderline distinguishing thermophiles from mesophiles on the second principal axis. Further analysis revealed some asymmetric amino acid substitutions between the thermophiles and the mesophiles, which are possibly associated with the thermoadaptation of the organism.
Many thermophiles and hyperthermophiles have been isolated from hot springs and other thermal environments (1). The complete genome sequences of 19 thermophilic or hyperthermophilic prokaryotic species have been determined. The genomic information should facilitate the study of thermophily in the prokaryotic cells and thermostability of the proteins. In fact, the features of the genomic sequence, which discriminate between thermophiles and mesophiles, can be simply identified by using principal component analysis (PCA) of the amino acid composition (2–4) or relative synonymous codon usage (3–5).
Comparative genomics is another useful approach for extracting candidate genes associated with thermophily. A previous study has shown that a phylogenetic pattern search against the clusters of orthologous groups (COGs) database (6) retrieved only one hyperthermophile-specific gene: reverse gyrase (7). Reverse gyrase, which is similar to some type I DNA topoisomerases from mesophiles, is thought to help DNA to function at high temperatures by increasing topological links between the two DNA strands. Indeed, the reverse gyrase gene has been identified in the genomes of hyperthermophiles, except the recently determined Thermus thermophilus genome (8). Despite this remarkable result, many other crucial genes responsible for thermophily are probably still hidden in the genome. However, identifying such genes through comparison of a variety of genomes is generally not easy, because phylogenetically related thermophiles share many genes that are not directly associated with thermophily, and phylogenetically distant thermophiles may have different mechanisms for thermoadaptation. One of the effective approaches in revealing thermophily-related genes based on genomic information is to compare genomes between closely related organisms, including both thermophiles and mesophiles. This approach is also effective for understanding thermoadaptation from the viewpoint of evolution, although the genomic sequences from an appropriate set of organisms are needed, which have not yet been obtained.
Aerobic endospore-forming Gram-positive Bacillus-related species have been isolated from various terrestrial soils and deep-sea sediments (9–11). It is known that Bacillus-related species can grow in a wide range of environments, at pH 2–12, in temperatures between 5 and 78°C, in salinity from 0 to 30% NaCl, and in pressures from 0.1 MPa (atmospheric pressure) to at least 30 MPa (corresponding to the pressure at a depth of 3000 m) (12,13). The complete genome sequences of five mesophilic bacilli with different phenotypic properties, Bacillus subtilis (14), Bacillus halodurans, (15), Oceanobacillus iheyensis (16), Bacillus cereus (17) and Bacillus anthracis (18), have already been determined, although the complete genome sequence of a thermophilic Bacillus-related species has not yet been established. These species are positioned as representatives of major diverged clusters in the 16S rRNA tree (Supplementary Figure 1).
Geobacillus kaustophilus HTA426, which was isolated from the deep-sea sediment of the Mariana Trench (19,20), is a thermophilic Bacillus-related species whose upper temperature limit for growth is 74°C (optimally 60°C). It is known that there are at least 12 other thermophilic Geobacillus species, which have been reclassified from the genus Bacillus (21).
Here, we report the complete nucleotide sequence of the genome of G.kaustophilus. We provide the first comparative analysis of the thermophilic genome with those of five other phylogenetically related mesophilic bacilli, B.subtilis, B.halodurans, O.iheyensis, B.anthracis and B.cereus, in order to highlight the thermophilic features of the genome. Special emphasis is placed on the mechanisms of adaptation of the bacilli to high-temperature environments.
The genome of G.kaustophilus HTA426 was primarily sequenced using the whole-genome random-sequencing method used in our previous studies (15,16). The predicted protein-coding regions were initially defined by searching for open reading frames longer than 100 codons, in a manner similar to previous investigations. Searches of protein databases for amino acid similarities and annotation were performed using the same method as described in the previous study (15,16). The functional assignment for annotated CDSs identified in the G.kaustophilus genome followed the protocol used for B.subtilis (14).
The coding sequences of 149 prokaryotic genomes were obtained from the NCBI ftp site (ftp://ftp.ncbi.nih.gov/genbank/genomes/Bacteria). The amino acid composition of the translated sequence and the relative synonymous codon usage (RSCU) of genes in these genomes and the G.kaustophilus genome were subjected to PCA after the elimination of genes smaller than 150 codons. For calculating the amino acid composition, proteins containing at least two transmembrane segments, as predicted by the PSORT program (22), were also eliminated. RSCU for each gene is a 59 dimensional vector, whose elements were defined as
where Xi is the number of occurrences of the i-th codon, which codes for the amino acid a(i), Ca is a set of codons that code for the amino acid a and |C| is the cardinality of the set C (23); we considered 59 codons from which stop codons and Met and Trp codons that have no synonymous codons were excluded. PCA and other statistical analyses were performed using the R statistical package (http://www.r-project.org/).
To examine the effect of overrepresentation of some closely related organisms, we also prepared two additional sets of organisms: one set includes only one strain from each species and the other set includes only one species from each genus except for the genus Bacillus. Since the results of the PCAs with these sets of organisms showed very similar patterns to that of the complete set of organisms, here we show only the result of the complete set of organisms.
The counts of amino acid substitutions between G.kaustophilus and other Bacillus-related species were tabulated using multiple alignments of amino acid sequences in 1056 common orthologous groups (see below) showing one-to-one correspondences across all genomes; the CLUSTALW program (24) was used for creating the alignments. In this statistical analysis, the gaps appearing in the alignment columns are also treated as a single character. Let nAB be the number of sites at which amino acid A in a mesophilic bacillar genome changed to amino acid B in the G.kaustophilus genome, and assume that nAB > nBA. The statistical significance of the observed asymmetry between amino acids A and B was evaluated by the probability P(X ≥ nAB), assuming that X follows the binomial distribution Bi(nAB + nBA, 0.5). Here, we considered the asymmetry significant when P ≤ 10−5. This implies that type I error rate of multiple testing with 210 substitution patterns between 21 characters (including the gap character) is at most 0.21%.
Orthologous groups in the Bacillus-related species were established from the all-against-all similarity results of applying the clustering program on the MBGD database (25). The common orthologs conserved in all Bacillus-related genomes were then selected for constructing multiple alignments. Since these orthologs generally display complicated mutual relationships, such as many-to-many correspondences and fusion or fission of domains, we selected only those orthologs with one-to-one correspondences without domain splitting, in order to simplify the analysis.
The genome of G.kaustophilus is composed of a single circular chromosome (3554776 bp) and a plasmid (47890 bp), with a mean G+C content of 52.1 and 44.2%, respectively (Table (Table11 and Figure Figure1).1). We identified 3498 protein-coding sequences (CDSs), with a mean size of 862 nt, and the coding sequences were found to cover 86% of the chromosome. Predicted protein sequences were compared with sequences in a non-redundant protein database, and biological roles were assigned to 1914 CDSs (54.7%). In this database search, 1096 CDSs (31.3%) were identified as conserved proteins of unknown function in comparison with proteins from other organisms (Figure (Figure2D).2D). We found that 75.2% of the genes started with ATG, 13.5% with GTG and 11.3% with TTG. These values are similar to those of other Bacillus-related species, except that the ratio of ATG (83.1%) in the initiation codon of B.anthracis is a little bit higher than that of the others. On the other hand, 42 CDSs were identified in the plasmid designated as pTHA426 (Figure (Figure1),1), and the mean size of the CDSs (906 nt) was much larger than that identified in the plasmids of B.cereus (645 nt) and B.anthracis (639–645 nt). There is a difference in the pattern of the initiation codon between the chromosome and the plasmid of G.kaustophilus. The ratio of ATG in the chromosome is more than 11 percentage points higher and the ratio of GTG is 8 percentage points lower than the ATG and GTG ratios of the plasmid, respectively (Table (Table1).1). The number of predicted proteins with biological roles was 24 (57.1%). The G.kaustophilus genome was found to contain nine copies of the rRNA operon and 87 tRNA species organized into 11 clusters involving 81 genes plus 6 single genes.
A new database specifically established for the G.kaustophilus sequences, ExtremoBase, will be accessible at http://www.jamstec.go.jp/jamstec-e/bio/exbase.html. The data will be also available through the MBGD server (http://mbgd.genome.ad.jp/) with additional functions for ortholog grouping and comparative genome analysis.
The G.kaustophilus genome possesses 91 genes encoding putative transposases (Tpases) categorized into 31 groups, which seem to be carried by various insertion sequences (ISs). Although the number of groups was slightly greater than the case of B.halodurans (27 groups), the number of those genes was less than the 112 found in the B.halodurans genome. However, the total number of Tpase genes in G.kaustophilis genome is much greater than that of four other sequenced bacilli, B.subtilis (none), O.iheyensis (21 genes), B.cereus (28 genes) and B.anthracis (12 genes) (Figure (Figure3).3). Forty-three of the Tpase genes in the G.kaustophilus genome are similar to the sequences present in the genomes of thermophilic bacteria such as Geobacillus stearothermophilus (26), Thermoanaerobacter tengcongensis (27) and Thermotoga maritima (28). The remaining 48 genes are similar to those of mesophilic bacteria such as B.halodurans, O.iheyensis (29) and B.cereus, and also to those of mesophilic archaea such as Methanosarcina mazeri (30) and Methanosarcina acetivorans (31). Of the 48 genes, 9 genes showed significant homology to Tpases carried by IS642, IS653 and IS654 identified in the B.halodurans genome, which are categorized into the IS630 family, the IS650/IS653 family and the IS256 family, respectively (32). The first family has been reported only in B.halodurans and G.stearothermophilus, and the latter two families were known before this study only in B.halodurans among the bacilli (29). Recently, we showed that B.halodurans genome contains a new transposon, which is very similar to the one identified in thermophilic G.stearothermophilus genome (33). Thus, it is clear that insertion sequences, such as the IS and the transposon, are shared not only between thermophiles but also between thermophiles and mesophiles. On the other hand, no evidence of horizontal gene transfer between hyperthermophilic archaea and G.kaustophilus was obtained through the analysis of Tpases in this study.
The genome also contains at least 21 putative phage-associated genes, which are similar to those of Streptococcus pyrogenes, Lactococcus lactis, B.subtilis and Clostridium perfringens. These genes are distributed within a 40 kb-region (approximately) corresponding to the region, 535–575 kb from the oriC of the G.kaustophilus genome. The sequence homology and the organization of the phage-related genes in G.kaustophilus genome is comparatively similar to those of prophage 315.1, identified in S.pyrogenes M3, which is classified into the Siphoviridae family (34).
Out of 3498 genes predicted in the G.kaustophilus chromosome, 839 (24%) categorized into 757 groups were unique genes, which possess no orthologous relationships to the other five sequenced bacillar genomes, and 488 (14%) genes categorized into 419 groups were orphans, showing no significant similarity to any other gene products (Figure (Figure2B2B and D). The 1257 common orthologous groups composed of 1308 genes are shared among all six sequenced bacillar genomes (Figure (Figure2C2C and Supplementary Table S1). Recently, it was shown that 271 genes are indispensable for the growth of B.subtilis in nutritious conditions (35). Out of 1308 common genes, 233 show correspondences to the B.subtilis essential genes. Through a series of orthologous analyses of the six bacilli used in this study, four essential genes, ymaA, ydiO, tagB and tagF, associated with purine/pyrimidine biosynthesis, DNA methylation and teichoic acid biosynthesis, were found to be unique to the B.subtilis genome. Teichoic acids are composed of cell-wall teichoic acid and lipoteichoic acid (36). It is known that six genes (tagA, tagB, tagD, tagE, tagF and tagO) are associated with teichoic acid biosynthesis in B.subtilis, with all genes except tagE being essential for growth. The five other Bacillus-related species, however, lack some of these genes; remarkably, G.kaustophilus lacks all genes except tagE, suggesting that these bacilli may have a different pathway for teichoic acid biosynthesis. G.kaustophilus genome lacks eight more B.subtilis essential genes (totally 16 genes), mrpB, mrpC, mrpF, glyQ, glyS, menA, ppaC and ydiP associated with Na+/H+ antiporter for pH homeostasis, glycyl-tRNA synthetase, inorganic pyrophosphatase and DNA methyltransferase.
As shown in Figure Figure2A,2A, it was found that 189 orthologous groups composed of 221 genes are commonly shared among the five mesophilic bacilli, but not by G.kaustophilus. In these mesophile-specific orthologs, there are 20 genes encoding ABC transporter (10 ATP-binding proteins, 6 permeases and 4 substrate-binding proteins) filed into function category 1.2 (transport/binding proteins and lipoproteins), 20 genes encoding transcriptional regulator (category 3.5.2, regulation), and 11 genes encoding proteins for immunity to bacteriotoxin-related proteins, toxic anion resistance-related proteins and catalase (category 4.2, detoxification) (Figure (Figure22 and Supplementary Table S2). This is one of the significant differences between thermophilic G.kaustophilus and mesophilic bacillar genomes. The set of 839 unique genes of G.kaustophilus contains 22 individual ABC transporter genes (10 ATP-binding proteins, 7 permeases and 5 substrate-binding proteins), whereas there are only six and three genes categorized into 3.5.2 and 4.2, respectively (Figure (Figure2B2B and Supplementary Table S3).
G.kaustophilus shares 1773–2014 orthologous genes with the other five bacilli, corresponding to 53.4–62.0% of all genes in the HTA426 genome. If the relative physical distribution of orthologous genes in the genomes between G.kaustophilus and other Bacillus-related species is the same, a diagonal line should appear from the lower left to the upper right in Figure Figure4.4. However, there are many orthologous genes deviated from the line, although the physical distributions of the orthologous genes between G.kaustophilus and B.subtilis, and between G.kaustophilus and O.iheyensis are largely collinear (Figure (Figure4B4B and D). The difference in the physical distributions of orthologous genes in the genomes presumably occurred due to various minor inversion and horizontal gene transfer. On the other hand, 1056 are common orthologous genes possessing one-to-one correspondences among the six bacilli and these common orthologs represent 23.7–36% of each genome. Most of the common genes were found to be distributed in the collinear regions, but the direction of collinearity of the orthologs between G.kaustophilus and B.cereus changes at ~28–33° from the ter region in both directions (Figure (Figure4C).4C). The physical distribution of orthologous genes between G.kaustophilus and B.halodurans is very similar to the case of B.cereus (Figure (Figure4A),4A), and a similar result has been previously documented in a comparison between B.subtilis and B.halodurans. It has been reported that the B.halodurans genome has an inversion between the regions around 112–153° and 212–240°, due to the action of IS elements (14,32).
The features of the genomic sequence determining thermophiles and mesophiles can be easily identified through PCA (or correspondence analysis, a similar technique) of the amino acid composition and the relative synonymous codon usage, as mentioned previously. In both analyses, whereas the first principal component (PC1) correlated with the G+C content of the chromosome, the second PC (PC2) clearly correlated with the optimal growth temperature, so that thermophiles and mesophiles can be distinguished from each other along the second axis. We attempted PCA in order to confirm whether the G.kaustophilus genome has a signature similar to that of other thermophiles (Figure (Figure55 and see also Supplementary Figure 2). In both analyses, all thermophiles whose genomes have already been reported were located above the borderline, distinguishing thermophiles from mesophiles, except Thermosynochococcus elongatus (37), whose upper growth temperature limit (60°C) is rather low in comparison to that of other thermophiles. G.kaustophilus was located above the borderline in the PCA of amino acid composition, but below the borderline in the PCA of synonymous codon usage. Thus, the G.kaustophilus genome is the first complete thermophilic genome that clearly shows different tendencies between the PCAs of synonymous codon usage and amino acid composition.
In the case of the PCA of amino acid composition, we were able to find the borderline distinguishing thermophiles from mesophiles at 0.0164 on the second principal axis. G.kaustophilus is positioned on this borderline, along with the thermophilic Thermoplasma acidophilum (38) and Thermoplasma volcanicum, which can grow at temperatures of up to 62–67°C (39). On the other hand, the PC2 positions of the five mesophilic Bacillus-related species are below the borderline (Figure (Figure5A).5A). Therefore, the genomes of the six Bacillus-related species can serve as good material for studying the mechanisms of thermostabilization of proteins and thermophily of microbes, since a limited number of amino acid changes yielding differences in PC2 positions, as shown in Figure Figure5,5, seem to reflect differences in thermophily or thermostability among these six bacilli.
In contrast to the results of the PCA of the amino acid composition, the synonymous codon usage in the G.kaustophilus genome does not show any distinguishable thermophilic pattern (Figure (Figure5B).5B). The thermophilic pattern in the synonymous codon usage is probably due to natural selection related to thermophily acting on the nucleotide sequence; it may be related to mRNA thermostability or the stability of codon–anticodon interactions (4,5). This pattern also seems to reflect the dinucleotide composition of genomic sequences, which may be related to the flexibility of the DNA molecule; purine–purine (RR) or pyrimidine–pyrimidine (YY) dinucleotides are predominant in thermophiles (40) (Supplementary Figure 2B). The G.kaustophilus genome did not appear to have been subjected to such selective pressure [e.g. the RR + YY value of the G.kaustophilus genome (50.7%) was less than that of the B.subtilis genome (53.2%)]. Some specific factors involved in the stabilization of DNA or RNA, of which some candidates will be listed below, might compensate for the lack of this genomic signature.
Generally, it is known that the G+C content in rRNA and tRNA, rather than that of the entire genome, linearly correlates with growth temperature in thermophilic archaea, although discriminating moderate thermophiles from mesophiles through the G+C content in the RNA molecules alone is generally not easy (41). Indeed, the mean G+C content in the tRNA of the G.kaustophilus genome was 59.1%, a value slightly lower than in B.halodurans and B.cereus (Table (Table1).1). On the other hand, the mean G+C content in the rRNA operons in the G.kaustophilus genome was found to be 58.5% (Table (Table1).1). This value is 4–6 percentage points higher than that of mesophilic Bacillus-related species (52.7–54.4%) with maximum temperatures for growth ranging from 42 to 58°C. However, this difference in the rRNAs was less than the difference in the genomic G+C contents between G.kaustophilus (52.1%) and the mesophilic bacilli (35.3–43.7%), probably due to the stronger constraints imposed on the rRNA operons. We examined the relationship between the G+C content in the 16S rRNA and that in the entire genome and confirmed a clear correlation between these values among various mesophiles (Figure (Figure6).6). On the other hand, the rRNAs of the hyperthermophiles showed apparently higher G+C contents than those of the mesophiles regardless of their genomic G+C contents. The G+C content in the 16S rRNA of G.kaustophilus genome was moderately, but still, significantly higher than those of the mesophiles, even when the difference in the genomic G+C content was taken into consideration (Figure (Figure6).6). Therefore, we conclude that the higher G+C content in rRNA is one of the thermophilic signatures in the G.kaustophilus genome.
We tried to identify amino acid substitutions showing significant asymmetry between G.kaustophilus and other Bacillus-related species, using multiple alignments of 1056 common orthologous groups that have one-to-one correspondences across all genomes. The resulting asymmetric substitutions were plotted on the plane generated by the first two principal components, as in Figure Figure5,5, according to the difference in the PC scores yielded by each substitution (Figure (Figure77).
There were 39 asymmetric substitutions commonly identified between G.kaustophilus and all other Bacillus-related species (red circles in Figure Figure7).7). Remarkably, all of these substitutions increase the PC1 scores of the G.kaustophilus proteins, corresponding to an increase in the chromosomal G+C content. On the other hand, 24 out of 39 substitutions increase the PC2 scores of the G.kaustophilus proteins, presumably corresponding to an increase in the thermostability of the proteins. These substitutions generally increase the content of Arg, Ala, Gly, Val and Pro in the G.kaustophilus proteins, while decreasing Gln, Thr, Asn and Ser (Figure (Figure77 and see also Supplementary Figure 2A). However, the overall increase in the PC2 score by these common substitutions is less remarkable than that in the PC1 score.
In contrast, among species-specific asymmetric substitutions (those with larger and smaller differences are shown in Figure Figure7,7, represented by green and black circles, respectively), we were able to find a substantial number of substitutions that increase the PC2 scores of the G.kaustophilus proteins, especially, when we compared them with those of the B.subtilis or O.iheyensis proteins (Figure (Figure7).7). Indeed, 8 out of 10 and 15 out of 18 large asymmetric substitutions (green circles) found in B.subtilis and O.iheyensis proteins, respectively, were found to increase the PC2 scores of G.kaustophilus proteins. In particular, there are four asymmetric substitutions (QE, SE, DE and NK) found in the comparison with the O.iheyensis genome and located in the upper-left quadrant of the principal component plane, which cannot be explained by the difference in GC/AT mutation pressure; instead, they are likely to be explained by the difference in selection pressure related to the thermostability of proteins.
The order of the number of asymmetric substitutions was not equivalent to the order of the similarity of each species with the G.kaustophilus proteins. Indeed, the former order was B.halodurans (50 substitutions) < B.subtilis (82 substitutions) < B.cereus (84 substitutions) < O.iheyensis (92 substitutions), whereas the average identities between G.kaustophilus and other Bacillus-related species in the 1056 orthologous protein sequence alignments were B.subtilis (64.0%) > B.cereus (63.0%) > B.halodurans (61.3%) > O.iheyensis (57.0%). Therefore, the difference in Figure Figure77 cannot be explained by the difference in evolutionary distance only.
The observation that all common asymmetric substitutions between G.kaustophilus and other Bacillus-related species increase the PC1 scores, indicates that the observed substitution bias is mainly due to the GC/AT directional mutation pressure (40), which itself cannot be considered as a direct cause of the thermostabilization of proteins in general. Indeed, as indicated in Figure Figure55 as well as in the previous studies (2,3,42), increase in G+C content seems to be a completely independent process from thermoadaptation. One of the plausible explanations is that accelerated amino acid changes caused by the GC/AT mutation pressure, combined with the subsequent natural selection, facilitated the adaptation of G.kaustophilus to an environment with a higher temperature through the increase in the thermostability of all its proteins. Recently, Nishio et al. (42) reported a similar observation, according to which the increase in G+C content in the Corynebacterium efficiens genome beyond that in the Corynebacterium glutamicum genome, could account for the major patterns of asymmetric amino acid substitutions, possibly associated with the increase in the thermostability of C.efficiens (43). Thus, such a scenario might apply more generally for mesophilic bacteria in their adaptation to environments with higher temperatures.
However, by this hypothesis alone, it is difficult to explain the substantial differences in the substitution patterns among mesophilic bacilli. An alternative hypothesis is that the common ancestors of these bacilli have some thermophilic features, and that the current mesophilic bacilli have lost these features during the course of evolution. In this case, the difference in substitution patterns among the organisms can be explained as a difference in the extent to which the thermostability of proteins has decreased in each organism. In this connection, the position of the Bacillus-related species on the PC2 axis in the PCA of the amino acid composition is remarkable; it is located near the borderline, distinguishing thermophiles from mesophiles (Figure (Figure55).
Although it is not clear what the upper temperature limit for bacterial life is, or what specific factors will set this limit, it is generally assumed that the limit will be dictated by molecular instability. Actually, DNA duplex stability is apparently achieved at high temperatures by elevated salt concentrations, polyamines, cationic proteins and supercoiling, rather than the manipulation of G+C ratios (43). RNA stability is enhanced by covalent modification, and the secondary structure is also probably critical. We found some genes in a set of unique G.kaustophilus genes, which seem to be involved in the stabilization of DNA and RNA, such as protamine, spermidine/spermine synthase, tRNA methyltransferase (MTase) and rRNA MTase (Supplementary Table S3).
DNA topology is affected by the interaction with cationic proteins, numerous examples of which have been identified in hyperthermophiles. The small basic proteins bind DNA in vitro with substantial increases in Tm, and have been variously shown to bend DNA or form nucleosome structures (43). Protamine is a protein that binds DNA in sperm, replacing histones and allowing chromosomes to become more highly condensed than it is possible with histones. Surprisingly, a gene (GK 1739) was found to show significant similarity (51%) to sperm protamine P1 from Phascolarctos cinereus (Koala bear). Since there has been no report of prokaryotic protamine-like gene thus far, this is the first such discovery in prokaryotes. Archaeal histones belonging to the HMf family are homologs of eukaryal nucleosome core histones, and have been shown to bind to and compact archaeal DNA both in vitro and in vivo (44). Histone-like proteins from Sulfolobus (45) have no eukaryal homologs, but, like the HMf proteins, they also compact DNA and increase the Tm of DNA in vitro. Therefore, the protamine P1-like protein identified in G.kaustophilus presumably behaves similarly to archaeal histone-like proteins in supporting life at high temperatures.
Polycationic polyamines, which increase the Tm of DNA and protect ribosomes from thermal inactivation in vitro, have been observed in hyperthermophiles. The polyamines participate in many cellular processes through their binding to DNA, RNA and phospholipids, not only in thermophiles but also in mesophiles. There is an interesting report according to which most spermines and spermidines exist as a polyamine–RNA complex in mesophilic Escherichia coli cells (46). A comprehensive analysis of the polyamines in hundreds of bacterial and archaeal species, from mesophiles to hyperthermophiles, has been carried out previously (47–49). The results showed that some kinds of polyamine, such as norspermine and norspermidine, occurred mainly in hyperthermophilic archaea (43). A hyperthermophilic bacterium, T.thermophilus, produced unusual polyamines (homocaldohexamine) in addition to norspermine and norspermidine, and inactivation of the basic genes related to polyamine synthesis, such as speA, speB and speE, resulted in a loss of the hyperthermophilic phenotype (growth defect at 78°C) (50). Thus, it is thought that these polyamines may play unique roles in supporting hyperthermophilic life. On the other hand, thermophilic Geobacillus species do not produce polyamines specific to hyperthermophiles, but produce spermine as a major polyamine; spermine is not produced by mesophilic Bacillus-related species, such as B.halodurans, B.subtilis and B.cereus, which we used for comparative analysis in this study (49). The genomes of five species, i.e. all except for O.iheyensis, contain a common gene for spermidine synthase, and other unique genes for it were identified in the B.anthracis, B.cereus and O.iheyensis genomes in addition to the common gene. Since it became clear that G.kaustophilus possesses unique genes for spermine/spermidine synthase and polyamine ABC transporter (permease) among the six sequenced bacilli, these genes seem to be strong candidates responsible for thermophily in these bacilli.
All types of cellular RNA contain modified nucleosides, but the largest number and greatest variety are found among tRNAs, and >80 different modifications have been identified to date in the tRNAs of various organisms (51). Modifications consist of simple chemical alterations of the nucleoside (e.g. methylation of the base or ribose, base isomerization, reduction, thiolation or deamination) or more complex hypermodifications. The structural stabilization of ribonucleic acids in hyperthermophiles is particularly important in tRNAs, where there is a requirement for the maintenance of a complex three-dimensional structure. Actually, it has recently been reported that the inactivation of the gene for tRNA MTase in T.thermophilus resulted in a thermosensitive phenotype (growth defect at 80°C), which suggests a role for the N1-methylation of tRNA adenosine-58 in the adaptation of life to extreme temperatures (52). The six bacillar genomes were found to share five orthologous tRNA MTases and four orthologous rRNA MTases, and the G.kaustophilus genome was found to contain three more unique tRNA or tRNA/rRNA MTase genes lacking orthologous relationships to the five other bacilli. Thus, these genes also seem to be strong candidates responsible for thermophily, although their specificity in the modification of tRNA is still not clear.
In this paper, we have attempted to highlight the genes involved the thermophilic phenotype and the genomic features related to thermophily by PCA of the amino acid composition and asymmetric amino acid substitution based on comparative analysis of thermophilic G.kaustophilus with five other mesophilic Bacillus-related species. G.kaustophilus presumably shares some similar basic mechanisms for thermophily with other thermophiles or hyperthermophiles, while also having mechanisms unique to thermophilic Bacillus-related species. Although we were able to find strong candidates responsible for thermophily in a unique-gene set of G.kaustophilus (839 genes), half of those genes are orphan, and 24.6% of them are conserved in other organisms but are as yet functionally opaque, as shown in Figure Figure22 and Supplementary Table S3. Therefore, another candidate responsible for thermophily may be among those genes whose function is not yet known. It will be necessary to do further comparative analysis with other thermophilic Bacillus-related species as a second step in revealing hidden capacity for thermophily.
Supplementary Material is available at NAR Online.
DDBJ/EMBL/GenBank accession nos+ AP006508–AP006519