To further our understanding of the functions of individual genes and cellular systems in D. radiodurans
as well as their relationship with other organisms, we undertook a detailed computational analysis of the D. radiodurans
genome. In addition to the standard genome annotation procedure of The Institute for Genomic Research (218
), we used several approaches for deeper protein characterization. In particular, we systematically applied sensitive profile-based methods that included PSI-BLAST, which constructs a position-dependent weight matrix from multiple alignments generated from the BLAST hits above a certain expectation value (e-value) and allows iterative database searches using the information derived from such a matrix (5
), IMPALA (175
), which searches the matrix against profile databases, and SMART (179
), which uses a Hidden Markov Model algorithm (59
) to search a sequence against a multiple-alignment database. In addition to the database of profiles included in the SMART system, two other profile collections were used: (i) 5,640 profiles derived from the structurally characterized domains contained in the SCOP database (100
), and (ii) 150 profiles for widespread domains primarily involved in different forms of signaling that were employed in previous genome comparisons (40
Paralogous families of proteins encoded in the D. radiodurans
genome were initially identified by comparing the complete set of D. radiodurans
proteins to itself (after filtering for low-complexity regions with the SEG program [220
]) using the PSI-BLAST program run for three iterations and clustering proteins by single linkage (clustering threshold e-value, 0.001) using the GROUPER program (214
). One sequence from each cluster was used to generate a position-specific matrix by running an iterative PSI-BLAST search first against a D. radiodurans
protein and then against the nonredundant protein database. These profiles were used to search for additional family members in the D. radiodurans
proteome. Families that were recognized by the same profile were joined into superfamilies.
The phylogenetic affinities of D. radiodurans
were explored using the COGNITOR program. This program assigns query proteins to conserved protein families that consist of apparent orthologs, termed clusters of orthologous groups (COGs) (201
). The functional assignments embedded in the COG database were also used to reconstruct metabolic pathways and other functional systems in D. radiodurans
together with the KEGG (105
) and WIT (157
Analysis of the phyletic distribution of homologs of Deinococcus
proteins detected in database searches was performed using the TAX_COLLECTOR program of the SEALS package (214
). This was followed by phylogenetic tree construction for specific cases. Multiple alignments for phylogenetic reconstruction were generated using the ClustalW program (93
) and, when necessary, further adjusted on the basis of PSI-BLAST search outputs. Phylogenetic trees were constructed using the neighbor-joining methods with bootstrap replications as implemented in the NEIGHBOR program of the PHYLIP package (67
Intergenic repeats were identified using the BLASTN program (6
). As a result of this analysis, 2,007 D. radiodurans
proteins were assigned to 1,272 COGs, which placed them into specific phylogenetic and functional contexts. In conjunction with profile analysis, this allowed us to define the domain architectures of multidomain proteins, to identify protein families that are unusually expanded in D. radiodurans
, and to assign function and/or structure to a number of proteins previously described as hypothetical.
Below, we present an overview of the principal functional systems of D. radiodurans as determined by these analyses and describe unusual aspects of the genome that may be relevant to understanding the extreme resistance of this organism to radiation, desiccation, and other stress factors.
Analysis of the genome of D. radiodurans
shows that it has a typical set of proteins for housekeeping and regulatory functions. As demonstrated by the COG analysis, the metabolic capabilities of D. radiodurans
are similar to those of E. coli
) but less diverse (Table ); D. radiodurans
is an obligatory heterotroph (212
). Table lists and compares the standard metabolic pathways of Deinococcus
to the corresponding pathways in E. coli, Synechocystis, Bacillus subtilis
, and Mycobacterium tuberculosis
Basic metabolic pathways in D. radiodurans
Energy production and conversion.
Probably the most interesting feature of the systems for energy production in D. radiodurans
is that, unlike most other free-living bacteria, it uses the vacuolar type of proton ATP synthase instead of the F1
type. Vacuolar (V)-type H+
-ATPase is typical of eukaryotes and archaea; all archaea have a conserved operon that consists of eight genes encoding the ATPase subunits. This operon is partially conserved (with some of the subunits missing) in a minority of characterized bacteria, where it replaces the F1
ATPase, e.g., in Deinococcus, Thermus
, spirochetes, chlamydiae, and Enterococcus
. The scattered distribution of the V-ATPase operon among bacteria, in contrast to its conservation in archaea, suggests that this operon has been disseminated in the bacterial world by horizontal transfer. The genes for the standard five complexes of electron transport and oxidative phosphorylation are present in D. radiodurans
, with a few exceptions, but some genes of the cytochrome bd
quinol oxidase complex are missing. Given that this complex is active predominantly under low-oxygen conditions in other bacteria, its apparent loss in Deinococcus
is consistent with D. radiodurans
being strictly aerobic. Interestingly, D. radiodurans
encodes a multisubunit Na+
antiporter (DR0880 to DR0886) that is characteristic of thermophiles and a few other bacteria (B. subtilis
and Rickettsia prowazekii
), but is absent in E. coli, Synechosystis
, and Mycobacterium
. It has been shown that this system is necessary for cells to grow under alkaline conditions (95
The D. radiodurans
genome appears to encode functional pathways for glycolysis, gluconeogenesis, the pentose phosphate shunt, and the tricarboxylic acid (TCA) cycle. A few genes are missing, but these may not be essential since they are also absent in some bacteria that are functional in these pathways (Table ). The D. radiodurans
Entner-Doudoroff pathway may be disrupted since a key enzyme, 2-keto-3-deoxy-6-phosphogluconate aldolase (an ortholog of E. coli
Eda), is missing. However, this enzyme is also absent in archaea, where the Entner-Doudoroff pathway appears to be functional, and therefore the enzyme could be displaced by a nonorthologous aldolase in Deinococcus
. The glyoxalate bypass that has only been described for E. coli
and M. tuberculosis
is present and complete in Deinococcus
. It remains unclear, however, why some intermediates of the TCA cycle cannot support the growth of D. radiodurans
). As expected of a heterotroph, Deinococcus
encodes several enzymes for complex carbohydrate metabolism; for some of these, e.g., glycogen-debranching enzymes (DR0405 and DR0191), phylogenetic analysis suggests that horizontal transfer from eukaryotes has occurred (data not shown). Other enzymes for sugar conversion, as well as most of the known sugar transport systems, are encoded in D. radiodurans
, and this is consistent with the observation that a variety of different sugars can be used by this bacterium as carbon and energy sources (212
Amino acid and nucleotide metabolism. D. radiodurans
is unable to use ammonia as a nitrogen source despite the presence of apparently functional genes for glutamate ammonia ligase and carbamoyl-phosphate synthase, which are key enzymes for ammonia utilization. While there is currently no explanation for this, it has been shown that D. radiodurans
can use amino acids effectively as a nitrogen source and that sulfur-containing amino acids appear to be the most readily utilized form of nitrogen. Notably, D. radiodurans
lacks the standard pathways for cysteine and methionine biosynthesis yet is able to produce these amino acids using unidentified biosynthetic pathways when provided with other amino acids (212
). The absence of all key enzymes for lysine biosynthesis is another puzzling feature of Deinococcus
metabolism since it does not require lysine for growth (212
). All of the other standard amino acid pathways appear to be functional. Although a few genes seem to be missing from these pathways, they are also absent in some of the other free-living bacteria, where they probably have been displaced by paralogous or nonhomologous enzymes. Some of the genes for enzymes of arginine metabolism are likely to have been acquired by the common ancestor of the Thermus-Deinococcus
group from archaea (see Tables and ).
Deinococcus-Thermus shared features
Most of the known genes for nucleotide metabolism are present in D. radiodurans. The most conspicuous gap is the absence of purine nucleoside phosphorylase, a key enzyme of purine salvage, which has been found in all free-living organisms investigated. Another noteworthy absence is that of two related enzymes of pyrimidine salvage, cytidine deaminase and dUTPase (important in preventing DNA damage), which are present in most bacteria. As may be the case for absent amino acid biosynthetic genes, there might also be unidentified enzymes that compensate for these pyrimidine salvage activities.
Metabolism of lipids and cell wall components.
D. radiodurans lacks only one gene from the standard bacterial set of genes coding for enzymes of lipid metabolism, namely, phosphatidylglycerophosphate synthase, which is involved in the biosynthesis of acidic phospholipids. With the exception of the archaeon Methanococcus jannaschii, phosphatidylglycerophosphate synthase has been detected in all organisms with completely sequenced genomes. Its absence in Deinococcus, therefore, is unexpected. Deinococcus encodes multiple copies of several fatty acid biosynthesis genes, of which some could have been transferred horizontally into Deinococcus from distant taxa (Table ). Consistent with the unusual structure of the peptidoglycan layer in Deinococcus (see above), we identified all essential genes for ornithine metabolism but did not detect several key enzymes for diaminopimelic acid biosynthesis.
Metabolism of coenzymes.
Our experimental data show that Deinococcus
is capable of de novo biosynthesis of all principal coenzyme components except for nicotinic acid (212
). Consistent with this result, we find that genes for several key enzymes of NAD biosynthesis are missing in the genome, which is unusual since this pathway is present in most free-living organisms. Several other conventional pathways for coenzyme biosynthesis are also not complete (Table ), but, given the ability of Deinococcus
to grow in the absence of these coenzymes, it probably encodes functional analogs of these.
The translation apparatus is arguably the most highly conserved and uniform of cellular systems, and D. radiodurans
is no exception. It contains a typical bacterial complement of translation machinery components. This general uniformity notwithstanding, there are several unique features in the translation apparatus of Deinococcus
that have been revealed both experimentally and by genome analysis. In particular, Deinococcus
has a unique repertoire of genes and reactions for the formation of glutaminyl-tRNA and asparaginyl-tRNA. Generally, there are two pathways for the activation of glutamine and asparagine: (i) direct charging of tRNAGln
by glutaminyl- and asparaginyl-tRNA synthetase (Gln-RS and Asn-RS), respectively, and (ii) transamidation of Glu-tRNAGln
by the respective amidotransferases (AdT), Glu-AdT and Asp-AdT (101
). Usually, the two pathways and the corresponding genes are not present in the same organism. The transamidation pathway for glutamine is predominant in bacteria and archaea, whereas glutaminyl-tRNA synthetase is typical of eukaryotes and gamma proteobacteria (101
). In the case of asparagine, archaea primarily use the transamidation pathway, eukaryotes use the direct pathway, and bacteria have a patchy distribution of both systems. Glu-AdT has been studied in detail; it consists of three subunits encoded by the gatABC
). The nature of Asp-AdT is less clear; it has been suggested that it shares A and C subunits with Glu-AdT whereas the B subunit (the likely determinant of tRNA binding) is unique. D. radiodurans
encodes Asn-RS, Gln-RS, and the GatABC proteins (45
). A recent genome survey has shown that the two systems also coexist in several members of the proteobacteria (85
), but Deinococcus
is the only nonproteobacterial species with this combination of asparagine and glutamine activation systems. Furthermore, in addition to the intact GatB, Deinococcus
encodes a C-terminal domain of this protein that is fused to Gln-RS (Fig. ). The GatABC complex of D. radiodurans
is capable of catalyzing the formation of both Gln-tRNAGln
, but in vivo apparently only Asn-tRNAAsn
is formed, since the discriminating Glu-RS of Deinococcus
does not produce the mischarged Glu-tRNAGln
). In contrast, Deinococcus
encodes two copies of Asp-RS, a typical bacterial discriminating copy and nondiscriminating copy that probably was acquired from the archaea by horizontal gene transfer (45
) (see below). The nondiscriminating Asp-RS produces Asp-tRNAAsn
, which serves as the substrate for the GatABC enzyme. It has been suggested that the main role of the Asn-tRNAAsn
formation in Deinococcus
is the synthesis of asparagine, rather than its incorporation into proteins, since Deinococcus
does not encode orthologs of known asparagine synthetases (45
). Given that GatB is thought to be the tRNA-binding component of Glu-AdT and Asp-AdT, the C-terminal GatB-related domain in Deinococcus
Gln-RS could enhance the specificity of this enzyme for tRNAGln
. This domain is missing in other Gln-RSs, but the respective organisms do not encode GatB, which in Deinococcus
could compete with Gln-RS for binding tRNAGln
Examples of unique domain architectures of Deinococcus proteins.
The repertoire of aminoacyl-tRNA synthetases (aminoacyl-RSs) in Deinococcus also shows several other peculiarities. In addition to the corresponding functional enzymes, Deinococcus encodes truncated and apparently inactive forms of Glu-RS and Ala-RS, as well as apparently active paralogs of Trp-RS and His-RS. Possible horizontal transfer of these additional enzymes as well as other aminoacyl-RSs from archaea and thermophilic bacteria could be readily examined once more of these organisms are sequenced.
Replication, Repair, and Recombination
contains all the typical bacterial genes that comprise the basal DNA replication machinery (Table ). The number of paralogs and the domain organization of the DNA polymerase III α-subunit is variable in the major bacterial divisions in terms of the presence of an active or inactivated PHP domain, which is predicted to possess phosphatase activity, and the proofreading 3′-5′ exonuclease domain. D. radiodurans
encodes a single α-subunit that is most similar to proteobacterial polymerases and does not contain the 3′-5′ exonuclease, which is encoded by a separate gene orthologous to E. coli dnaQ
. Unlike the proteobacterial orthologs, however, the Deinococcus
polymerase contains an apparently active PHP domain. This appears to represent the ancestral bacterial state of the replicative DNA polymerase, which is also seen in bacteria like Synechocystis
. In addition to typical proteins involved in replication, Deinococcus
encodes DNA polymerase X, which is similar to the eukaryotic DNA polymerase beta (references 27
and references therein), and is relatively uncommon in prokaryotes. Deinococcus
polymerase X contains an N-terminal nucleotidyltransferase domain and a C-terminal PHP hydrolase domain, the same domain architecture that is seen in homologs from B. subtilis
and Methanobacterium thermoautotrophicum
; this conservation of domain organization suggests horizontal transfer of the polymerase X gene (13
). Notably, along with a few other bacteria, such as Synechocystis
and Aquifex, Deinococcus
encodes three small nucleotidyltransferases (DR1806, DR0679, and DR0248), which are expanded in archaea (13
). These “minimal” nucleotidyltransferases are typically accompanied by a small protein that is fused to the nucleotidyltransferase in the DR0248 protein; the function of this protein, however, has not been characterized directly but is likely to be coupled to that of the nucleotidyltransferases.
Genes coding for replication, repair, and recombination functions in D. radioduransa
The repertoire of DNA-associated proteins in Deinococcus
is similar to that in other bacteria, but some unique features were noticed. Like other bacteria, Deinococcus
encodes an ortholog of the chromosomal DNA-binding protein HU, which is believed to play a central role in DNA packaging and also as a cofactor in recombination (reference 184
and references therein). Interestingly, the sequenced genome of the Deinococcus
R1 strain contains three adjacent open reading frames (ORFs) encoding fragments of the single-stranded DNA-binding protein (SSB) but lacks a complete gene for SSB; so far, all sequenced bacterial genomes encoded an intact SSB. Because of the 10-fold coverage during the TIGR sequencing project (218
), two sequencing errors in this short gene would seem unlikely. Two explanations arise: (i) Deinococcus
could encode an as yet unrecognized SSB analog (or an extremely diverged homolog), making the SSB gene expendable; or (ii) a tripartite SSB gene could be expressed by a translational readthrough mechanism or even a unique RNA-editing mechanism.
Bacterial DNA repair includes several partially redundant pathways and generally shows considerable flexibility (20
). We investigated the predicted repair system components of D. radiodurans
in detail, to detect any possible correlation with its exceptional radioresistant and desiccation-resistant phenotype. Generally, it appears that Deinococcus
possesses a typical bacterial system for DNA repair and that, commensurate with the genome size, its repair pathways even appear to be less complex and diverse than those of bacteria with larger genomes, such as E. coli
and B. subtilis
. At the same time, there are several interesting and unusual aspects of the predicted layout of the repair systems in Deinococcus
that may be linked to its phenotype (Table ).
The nucleotide excision repair system that consists of the UvrABC excinuclease and the UvrD and Mfd (transcription-repair coupling factor) helicases is fully represented in D. radiodurans
. Also present are the main components of the base excision repair system including several nucleotide glycosylases and endonucleases, namely, MutM (formamidopyrimidine and 8-oxoguanine DNA glycosylase); MutY (8-oxoguanine DNA glycosylase and apurinic DNA endonuclease-lyase); two paralogous uracil DNA glycosylases (Ung homologs); an additional, recently identified enzyme that has the same activity but is unrelated to Ung (DR1751) (174
); endonucleases III (Nth) and V (YjaF); and exonuclease III (XthA). Deinococcus
lacks two key enzymes involved in the repair of UV-damaged DNA in other organisms, namely, endonuclease IV (AP-endonuclease) and photo-lyase. Instead, it encodes a typical bacterial UV endonuclease III (thymine glycol-DNA glycosylase) and, more unexpectedly, a TIM-barrel fold nuclease characteristic of eukaryotes and most closely related to the UV endonuclease of Neurospora
). Eukaryotic-type topoisomerase IB is a truly unexpected protein to be identified in the Deinococcus
genome and also could play a role in UV resistance (see “Horizontal gene transfer” below).
The repertoire of recombinational repair genes in Deinococcus includes orthologs of most of the E. coli genes involved in this process (Table ), but the RecBCD recombinase is missing. While this complex is not universal in bacteria, it is a major component of recombination systems in most free-living species. In Deinococcus, where recombination is thought to be an important contributor to damage-resistance, the absence of this ATP-dependent exonuclease is unexpected. Deinococcus does encode an apparent ortholog of one of the helicase-related subunits of this complex, RecD, but not the other subunits. The RecD protein in Deinococcus is unusual in that it contains an N-terminal region of about 200 amino acid residues that consist of three tandem predicted HhH DNA-binding domains; this unusual domain organization of the RecD protein is shared with B. subtilis and Chlamydia. Such dissociation of RecD from the RecB and RecC subunits is not unique to Deinococcus; “solo” RecD-related proteins are also present in M. jannaschii and in yeast. The function(s) of RecD, once outside the recombinase complex, is unknown.
Another component of the recombinational repair system in Deinococcus
that has an unusual domain architecture is the RecQ helicase. It contains three tandem copies of the C-terminal helicase-RNase D (HRD) domain, instead of the single copy present in all other bacteria except Neisseria
that similarly possesses three copies (141
) (also see below). RecQ sequences from Neisseria
are more similar to each other than to any other homologs, which, together with the distinctive triplication of the HRD domain, indicates that the recQ
gene has been exchanged between bacteria from these two distant lineages. In addition, Deinococcus
encodes a protein (DR2444) that contains an HRD domain and a domain homologous to cystathionine gamma-lyase; this is the first example of an HRD domain that is not associated with either a helicase or a nuclease (although it is possible that the domain organization of this protein is an artifact caused by a frameshift). This propagation of the HRD domain in Deinococcus
could contribute to the repair phenotype given the interactions of RecQ with RecA in recombination (88
The methylation-dependent mismatch repair system of D. radiodurans includes the MutS and MutL ATPases and endonuclease VII (XseA). Orthologs of the site-specific methylases Dcm and Dam, which are associated with mismatch repair, are not readily detectable. It appears likely, however, that other distantly related DNA methylases predicted in D. radiodurans could perform similar functions.
Like other bacteria with large genomes, D. radiodurans
encodes the LexA repressor-autoprotease (DRA0344), which in E. coli
and B. subtilis
controls the expression of the SOS regulon. In addition, unlike any of the other bacterial genomes studied, D. radiodurans
encodes a second, diverged copy of LexA (DRA0074), which retains the same arrangement of the helix-turn-helix (HTH) DNA-binding domain and the autoprotease domain. Attempts to identify LexA-binding sites and the composition of the putative SOS regulon in D. radiodurans
have been unsuccessful (M. S. Gelfand, personal communication). This suggests that D. radiodurans
does not possess a functional SOS response system, which is in agreement with the results of previous experimental studies (142
). Furthermore, Deinococcus
does not encode proteins of the DinP/UmuC family, nonprocessive DNA polymerases that play a critical role in translesion DNA synthesis and associated error-prone repair such as SOS repair in E. coli
In addition to orthologs of well-characterized repair proteins discussed in this section, Deinococcus encodes several unusual proteins and expanded protein families that are less confidently associated with repair but might contribute to the unusual effectiveness of the repair and recombination systems in this bacterium; these proteins are discussed below in the section on the unique features of the Deinococcus proteome.
Stress Response and Signal Transduction Systems
encodes a broad spectrum of proteins that have been associated with various forms of stress response in other bacteria as well as several proteins that appear to be unique and could contribute to more specific forms of the stress response (Table ). Orthologs of almost all known genes involved in different stress responses in other bacteria (109
) are present in Deinococcus
. The few stress response proteins that are missing are either specific to the adaptation of a particular organism to its environment or, when of more general significance, likely to be replaced by nonorthologous proteins with similar functions. For example, instead of using the OtsA and OtsB proteins for the synthesis of the osmoprotection disaccharide trehalose, Deinococcus
probably uses an alternative pathway via trehalose synthase (DR0933), which has been recently characterized in Thermus
). Trehalose plays a major role in the desiccation resistance of E. coli
) and is also likely to be important in Deinococcus. Deinococcus
has two additional genes for trehalose metabolism: maltooligosyl trehalose synthase (DR0463), which provides yet another route of trehalose formation, and trehalohydrolase (DR0464). These genes apparently form a mobile operon and probably have been acquired by Deinococcus
through horizontal transfer, since their closest homologs are found in Rhizobium
, where they appear to have the same operon organization (130
Stress response-related genes in D. radiodurans
Among the proteins associated with oxidative stress response, Deinococcus encodes three catalases (DR1998, DRA0259, and DRA0146), two of which are highly similar to one another and to catalases from other bacteria whereas the third is only distantly related to other catalases. The gene for this unusual predicted catalase (DRA0146) is closely linked to and probably forms an operon with a gene for a peroxidase (DRA0145). DRA0146 is most similar to its ortholog from Rhizobium, and these two proteins are, in turn, more closely related to eukaryotic catalases from plants than to bacterial catalases. This suggests that Deinococcus acquired the gene for this catalase from a nitrogen-fixing bacterium, which, in turn, had hijacked it from a plant. In contrast, DRA0145 is distinctly closer to certain peroxidases from fungi, such as Galactomyces geotrichum, than to bacterial forms from Neisseria, E. coli, and actinomycetes. Thus, the entire operon probably has been acquired horizontally. A broad spectrum of other genes that may be involved in the stress response include DRA0149 (agmatinase), DR1353 (an acid-inducible apolipoprotein amino-acetyltransferase), and DR2299, DR1605, and DR2245 (genes of the two-component response and cyclic diguanylate signaling system), which again are very similar to homologs from the family Rhizobiaceae, suggesting significant horizontal gene transfer between these distant bacteria.
In addition to the well-characterized components of stress response systems, Deinococcus
encodes several proteins and entire protein families whose specific roles are unknown but are likely to be important for the multiple stress resistance phenotypes of the bacterium. An example of a poorly studied but potentially important system is the “addiction module” response (2
), which is encoded by two genes, mazE
(DR0416 and DR0417, respectively). MazF is a stable protein that is toxic to bacteria, whereas MazE protects cells from the toxic effect of MazF and is degraded by the ClpP serine protease. Expression of these two genes is regulated by ppGpp, which is produced by the RelA enzyme (or the bifunctional enzyme SpoT) in response to amino acid starvation. On the basis of these studies, Aizenman et al. (2
) have proposed a model of programmed bacterial cell death dependent on the MazEF proteins. Currently, Deinococcus
is the only bacterium other than E. coli
, the model system in which the role of these proteins was elucidated, that has both genes and retains their operon organization. Another example of poorly characterized genes that are likely to be involved in stress response are two proteins (DR2056 and DR1940) that are homologous to the E. coli
heat shock protein HslJ (42
). One of these proteins, DR1940, contains three copies of the HslJ domain, a feature that has not yet been seen in this protein family. All the HslJ domains contain two conserved cysteines that could function as a redox pair, with the protein itself being a disulfide bond chaperone. The only prominent chaperone that is missing without an obvious replacement is HSP90, but this gene is also absent in archaea and bacterial thermophiles and therefore appears to be nonessential.
The signal transduction system of D. radiodurans
has chimeric features of prokaryotic and eukaryotic systems. This form of chimerism in the signaling system is becoming increasingly evident in several bacterial lineages such as actinomycetes, myxobacteria, and spore-forming firmicutes that undergo cellular differentiation. The typically bacterial components of the signaling system include the two-component systems with the histidine kinase and receiver domains (159
) and the cyclic diguanylate signaling system with the GGDEF, EAL, and HD_GYP domains, which appear to function as cyclases and phosphodiesterases (75
). In addition, these signaling domains are typically combined with small molecule and protein-binding domains, such as PAS and GAF (17
), and the conformation-signaling HAMP domain (16
). The two-component phosphorelay system is well developed in Deinococcus
, which encodes 23 histidine kinase domains and 29 receiver domains that form several combinations with the GAF and PAS domains. This system is expected to play a major role in sensing redox, light, and other environmental stimuli. Consistent with this, DRA0050, which is orthologous to the cyanobacterial and plant phytochromes, has been shown to be a photoreceptor involved in the regulation of pigment biosynthesis (55
), which is likely to affect resistance to DNA-damaging agents (35
). Genes encoding two proteins that consist of a sensory transduction histidine kinase and a receiver domain (DRB0028 and DRB0029) appear to be coregulated with an sB
operon (DRB0024 to DRB0027). This operon encodes the antisigma factor-regulatory system and is known to be involved in stress response in other bacteria (92
). As a whole, this array of six genes appears to comprise a stress response module unique for Deinococcus
Deinococcus encodes 16 GGDEF domain-containing proteins, which suggests a major role for this uniquely bacterial module that is predicted to function as a cyclase in diguanylate signaling. The two predicted distinct phosphodiesterases of this system, the HD-GYP and EAL domains (six and four copies, respectively, in Deinococcus), complement each other in terms of their copy numbers, as has been observed for other bacterial genomes. These domains tend to combine with the stimulus-sensing PAS and GAF domains. One such interesting architecture is the combination of the GAF domain and the HD_GYP domain in two Deinococcus proteins (Fig. ). The representation of this signaling system in Deinococcus is comparable to that in other bacteria with moderate-sized to large genomes.
Distinct domain architectures of selected proteins implicated in signal transduction in Deinococcus.
While Deinococcus lacks flagella and is unlikely to be capable of chemotactic motility, it possesses certain remnants of the chemotactic signaling system that are likely to signal through alternative pathways. In particular, there are three methyl-accepting chemotactic receptor proteins (DRA0352, DRA0353, and DRA0354), each containing two HAMP domains, but there is no methyltransferase of the chemotactic signaling pathway. These three proteins are encoded by genes located in the vicinity of genes for a CheA-like histidine kinase and a CheY-like receiver domain, which suggests that the methyl-accepting receptor forms a single functional unit with this two-component system protein. Given the apparent absence of chemotaxis, the methyl-accepting receptors could form a scaffold for binding of the CheA kinase, which might signal the availability of amino acids in the environment.
The tetratricopeptide repeats (TPR) seem to play a special role in Deinococcus
signaling. In three distinct proteins, these repeats are combined with typically bacterial signaling modules (Fig. ). The TPR modules are likely to mediate protein-protein interactions within molecular complexes involving these proteins, as documented in eukaryotic systems (113
). WD40 proteins, which often serve as interaction partners to TPR in eukaryotes (210
), are also expanded in Deinococcus
and could cooperate with the TPR-containing proteins. Of particular interest is another group of at least four β-propeller proteins that appear to be closer to the YWTD class of propellers than to WD40s (DR0960, DR1725, DR2062, and DR2484). In actinomycetes, these propeller domains are fused to protein kinases and are likely to perform specific protein-protein interaction functions in signaling (163
The prominence of the “eukaryotic” component of the signal transduction systems in Deinococcus
is underscored by the fact that it encodes 11 Pkn2-type kinases and 1 kinase of the RIO1 family (DR2209), which is typical of archaea and eukaryotes (121
) and was detected in bacteria for the first time. This number is greater than in most other prokaryotes (121
), suggesting that protein-serine/threonine phosphorylation-dependent regulatory pathways play a major role in Deinococcus
. Consistent with this, Deinococcus
also encodes PP2C phosphatases and a FHA domain that typically function in conjunction with the serine/threonine kinases.
Several protein families that have been implicated in stress response and signal transduction in other organisms have undergone specific expansion in Deinococcus; these are discussed in some detail below.
Distinctive Features of Predicted Operon Organization and Transcription Regulation
Generally, the genome organization of D. radiodurans
is similar to that of other bacteria (218
). Many functionally related genes are organized into clusters that are likely to comprise operons, including such common ones as ribosomal protein genes, ATP synthase, NADH dehydrogenase, and various ATP-binding cassette (ABC)-type transport systems. Beyond these generic operons, however, several unusual gene clusters were detected, and some of these are likely to be related to the unique features of Deinococcus
Some unusual predicted operons in D. radiodurans
The first group of such unique gene arrays includes paralogous genes that encode protein families overrepresented in Deinococcus, such as amino-acetyltransferases, Nudix hydrolases, and genes of the TerE and DinB/YfiT families (see below). Some of these clusters appear to have evolved by tandem duplication within the Deinococcus lineage, e.g., an acetyltransferase cluster (DR2254 and DR2255) and a Nudix cluster (DR0783 and DR0784). Other clusters of paralogs clearly resulted from a single horizontal transfer event, e.g., the group of tellurium resistance genes (DR2220 to DR2226) that are related to the corresponding gene cluster on the broad-host-range plasmid R478. Finally, some clusters that consist of related genes with apparent phylogenetic affinities to different bacterial lineages (e.g., an acetyltransferase cluster [DR0675 to DR0677]) seem to have originated within the Deinococcus lineage through gene translocation. The second group of unusual predicted operons includes rare gene clusters that probably were acquired by horizontal transfer. Some of these operons could contribute to damage resistance, e.g., DNA repair-related functions (deoxypurine kinase operon [DR0298 and DR0299], eukaryotic-type uracil-DNA-glycosylase and topoisomerase IB [DR0689 and DR0690]), DNA transformation-related functions (competence genes [DR1854 and DR1855], restriction-modification system [DRB0143 and DRB0144]), stress response (DR0389 and DR0390; DR1160 and DR1161), and pigment biosynthesis (DR0861 and DR0862).
Two operons (DR0853 to DR0854 and DR2180 to DR2181) each consist of a gene for a small GTPase of the Ras/Rab family and a gene coding for a small protein of an uncharacterized family that is widespread in bacteria and archaea (L. Aravind and E. V. Koonin, unpublished data). The orthologous GTPase in Myxococcus
is important for gliding motility (90
), suggesting a role for these proteins in signaling. Expansion of the uncharacterized protein family encoded by the genes adjacent to the GTPase is seen in Streptomyces
and appears to result from relatively recent duplications (DR0616, DR0995, and DR1612), with three of these genes forming a cluster in the chromosome (DR0993 to DR0995). Juxtaposition of these genes with genes for Ras/Rab-GTPases is frequently observed in other genomes, including Myxococcus
and archaeal and bacterial thermophiles, suggesting that they form a mobile operon, with the encoded proteins being functionally coupled.
Another predicted operon (DR0332 to DR0335) that could have been horizontally transferred from cyanobacteria encodes components of a protein kinase-dependent regulatory pathway. These include two active Pkn2-type serine/threonine protein kinase with Zn ribbons, a PP2C-type phosphatase with an N-terminally disrupted Pkn2 kinase domain, and a protein that contains a phosphoserine-binding FHA domain combined with a Zn ribbon domain orthologous to proteins from cyanobacteria (FraH) and actinomycetes (121
). The phosphorylation system encoded by this operon may play a role in cellular differentiation, with the Zn-ribbon-FHA protein functioning as the downstream effector that regulates transcription.
The general picture of transcription regulation in Deinococcus
emerging from genome analysis is similar to that seen in other bacteria. Among Deinococcus
gene products, we detected 104 HTH domain-containing proteins that are predicted to function as transcriptional regulators. This number is close to those detected in other free-living bacteria with similar genome sizes (14
); the repertoire of HTH-containing proteins identified in Deinococcus
covers most of the diversity of prokaryotic transcriptional regulators. Deinococcus
encodes seven members of the MerR/SoxR family of regulators (a greater number than in other characterized bacteria except B. subtilis
), which could participate in the regulation of various stress response pathways (24
). Another family of predicted HTH regulators of unknown specificity that is expanded in Deinococcus
consists of eight paralogs (e.g., DR1954); such an expansion is unprecedented in other bacteria and suggests a unique role in the regulation of a distinct set of genes.
Expansion of Specific Protein Families
Expansion of specific protein families has been observed for several complete genomes (43
). Sometimes there is a clear relationship between the expansion of a particular protein family and the adaptation of the respective organism to its environment. Examples of such adaptive expansions include ferredoxins in autotrophic archaea (126
), several families of enzymes involved in lipid degradation in M. tuberculosis
), and c
-type cytochromes in the metal-reducing bacteria Shewanella
In the D. radiodurans
genome, we detected several expansions, some of which appear to be related to stress response and damage control (Fig. ). In particular, several different families of hydrolases are overrepresented compared to other sequenced genomes. These include MutT-like pyrophosphatases (Nudix), calcineurin-like phosphoesterases, lipase/epoxidase-like (α/β) hydrolases, subtilisin-like proteases, and sugar deacetylases. In addition to such specifically expanded families, several other families of hydrolases are present in Deinococcus
in elevated numbers although they are also common in other bacteria and are not shown here. Some of these hydrolases are likely to be involved in the decomposition of damage products (“cell cleaning”) under stress conditions. Independent expansions of certain families, such as α/β hydrolases in Deinococcus
and subtilisin-like proteases in Deinococcus
, are noteworthy and probably correlate with the adaptation of these organisms to the facultative or obligatory heterotrophic life-style (43
Specific protein family expansion in Deinococcus.
Expansion of the Nudix hydrolase protein superfamily is one of the most prominent features of the Deinococcus
genome. The MutT protein, the prototype for this superfamily, has been identified as the central component of an antimutagenic system responsible for preventing incorporation of 8-oxo-dGTP into DNA (136
). Subsequently, it has been shown that different MutT-like enzymes use a variety of substrates, and the Nudix pyrophosphohydrolases have been tentatively defined as a superfamily of “house-cleaning” enzymes that destroy potentially deleterious compounds (28
). A detailed analysis of Nudix proteins in Deinococcus
revealed five distinct multidomain proteins, in which the MutT domain is combined with other domains (Fig. ). Orthologous proteins for three of them also exist in other bacteria. In particular, the family typified by E. coli
YjaD contains a Zn ribbon module, which is probably involved in nucleic acid binding. Another Deinococcus
protein contains an apparently inactivated (with the catalytic motif REXXEE missing) MutT domain combined with a TagD-like nucleotidyltransferase domain and is likely to perform a regulatory function. A second TagD-like nucleotidyltransferase from Deinococcus
(DRA0273) is very similar, but the MutT domain has apparently eroded beyond recognition. Orthologs of a third Nudix protein, which contains an uncharacterized C-terminal domain, are present in Streptomyces, Mycobacterium
, and Synechocystis
. Again, in most of them, the Nudix pyrophosphohydrolase appears to be inactivated, suggesting a regulatory function.
Distinct domain architectures of proteins containing the MutT-like domain. aa, amino acids; SAM, S-adenosylmethionine.
Two closely related Deinococcus proteins contain a duplication of the MutT domain that has not yet been detected in any other organism. Three more Nudix proteins are specifically related to the proteins containing this duplication, and the genes for two of these are adjacent on the chromosome (DR0783 and DR0784). These seven related MutT domains appear to form a Deinococcus-specific family of Nudix hydrolases. Another Nudix protein consists of three domains, namely, S-adenosylmethionine (SAM)-dependent methylase, MutT, and cytosine deaminase (Fig. ). This domain combination is unique to Deinococcus and suggests that the protein is involved in an as a yet uncharacterized repair pathway.
Altogether, Deinococcus encodes 23 Nudix superfamily proteins that contain 25 individual MutT domains. Some of these proteins are likely to be repair enzymes with known activities, including the MutT ortholog (DR0261), while others will have novel functions, as suggested by the domain combinations discussed above. Other functions are likely to include utilization of damage products formed under various stress conditions. It is unlikely that a distant ancestor of the Deinococcus lineage encoded all these MutT-containing proteins. Rather, it appears that the heterogeneous collection of these proteins encoded by D. radiodurans was assembled via the mixed routes of serial duplication, particularly in the distinct deinococcal family of seven Nudix domains, and horizontal gene transfer.
Amino group acetyltransferases comprise another family that appears to have undergone independent expansion in Deinococcus
and in Bacillus
. Acetyltransferases of this type participate in various metabolic pathways, including lipid biosynthesis, and in regulatory systems. Except for B. subtilis
, other bacteria have less than half the number of these enzymes with respect to the number found in D. radiodurans
. Like the acetylases in other bacteria, these enzymes are likely to participate in detoxification of antibitotics and possibly of toxic products that arise upon DNA damage, as well as in regulatory protein acetylation. A Deinococcus
-specific family of acetyltransferases, which consists of at least 11 proteins, is most similar to acetyltransferases involved in peptide antibiotic resistance, such as streptothricin acetyltransferase of Streptomyces
). These acetyltransferases might aid the survival of Deinococcus
in the presence of peptide antibiotics secreted by other bacteria, with which it has to compete for nitrogen and carbon sources as a part of its heterotrophic life-style.
Enzymes of the α/β hydrolase superfamily are mainly neutral lipases or acetyl esterases, but some of them have unusual substrate specificity, e.g., heroin esterase from Rhodococcus
) and antibiotic bialaphos acetyl esterase from Streptomyces
); other proteins of this superfamily possess unexpected activities, e.g., metal ion-free oxidoreductase from Streptomyces
). The expanded families of α/β hydrolases in Deinococcus
could be exploited for xenobiotic metabolism and/or the biogenesis of the complex cell envelopes (see above).
In several cases, expansion of specific subfamilies within common protein families appears to be important. Deinococcus
encodes three paralogous proteins (DR0202, DR0494, and DR2273) related to the FlaR protein from gram-positive bacteria. One of these proteins has been shown to affect DNA topology and is osmoregulated when expressed in E. coli
). It also influences the expression of supercoiling-sensitive promoters and is considered to be a chromatin-associated protein (173
). Topological changes of DNA could play a role in DNA repair of Deinococcus
, and the FlaR homologs might be involved in these processes. The FlaR subfamily belongs to the P-loop-containing kinase superfamily that includes nucleotide, gluconate, and shikimate kinases (224
encodes three paralogous proteins (DR0609, DR2467, and DR2139) that belong to another uncharacterized subfamily of these kinases which is also represented in several other bacteria.
Another interesting case is the LigT protein family, which is found in several bacteria, archaea, and eukaryotes and includes RNA ligases and predicted 2′,5′-cyclic nucleotide phosphodiesterases. In addition to the LigT ortholog (DR2339), Deinococcus encodes two predicted phosphodiesterases of this family (DR1000 and DR1814) that may participate in RNA metabolism or signaling.
Expansion of several other protein families is consistent with the unusual stress resistance capabilities of D. radiodurans
. For example, Deinococcus
encodes seven small nuclease domains related to the McrA endonuclease of E. coli
). The McrA-like nuclease domain is part of three multidomain protein architectures that seem to be unique to Deinococcus
(see below). This previously unreported propagation of McrA-like nucleases could make a contribution to the repair potential of Deinococcus
. In evolutionary terms, the McrA domain, like the MutT domain, apparently has been expanded in Deinococcus
through a recent duplication (DR1312 and DR2483 are 50% identical), as well as through acquisition of genes by horizontal gene transfer.
Expansion of proteins of the TerDEXZ/CABP family in Deinococcus
is interesting because some of these proteins could confer resistance to a variety of DNA-damaging agents, including heavy-metal cations, methyl methanesulfonate, mitomycin C and UV (21
), and other forms of stress (11
). Two members of this family, CABP1 and CABP2, are expressed during starvation in Dictyostelium
and form a heterodimer that binds cyclic AMP (cAMP) (78
), suggesting that other members of the family also bind various small-molecule ligands.
encodes the largest number of the pathogenesis-related 1 (PR1) family proteins (five members) among bacteria. These secreted proteins are widespread in eukaryotes but sporadic in bacteria (195
); unlike the eukaryotic members of this family, the bacterial PR1-related proteins lack the disulfide bond-forming cysteines (68
). Since they are predicted to be secreted, the bacterial PR1 family proteins might play a role in inhibiting extracellular enzymes or in interacting with other cells, as suggested by the known activities of their eukaryotic homologs (106
The second largest protein expansion in Deinococcus
is the family of uncharacterized small proteins whose prototype is B. subtilis
DinB, a DNA damage-inducible gene product (39
). Among bacteria, Deinococcus
encodes the greatest number of these proteins, although comparable independent expansions are seen in B. subtilis
and the actinomycetes (Fig. ). Examination of the multiple alignment of this family (Fig. ) reveals three conserved histidines that could form a catalytic triad of a novel metal-dependent enzyme, perhaps a hydrolase. The prediction of enzymatic activity of these proteins raises the possibility that they could be nucleases directly involved in DNA degradation, which begins in Deinococcus
immediately after DNA damage (23
). This protein family may be particularly amenable to experimental studies, given its expansion in B. subtilis
, a model for many DNA repair studies.
FIG. 5 Multiple alignment of the conserved core of the DinB/YfiT protein family. The alignment was generated by parsing the PSI-BLAST HSPs and realigning them with the ALITRE program (181). The numbers between aligned blocks indicate the lengths of variable (more ...)
Several families of Deinococcus
proteins are highly diverged and, in the initial analysis, appeared to have no homologs in other species. Database searches with individual sequences of these proteins failed to show statistically significant similarity to any proteins other than their paralogs from Deinococcus
. Only profiles that included information on all of the paralogs (see the description of methods above) allowed the identification of homologs from other organisms. An example of such a family is a distinct group of six HTH-containing DNA-binding proteins predicted to function as transcriptional regulators. This family is of particular interest because at least one of its members (DR0171, or IrrI [211
]) appears to be associated with radiation resistance (22
About 720 proteins encoded in the D. radiodurans genome have no detectable homologs in the current databases. Most of these are predicted membrane or nonglobular proteins that tend to evolve rapidly, and this impedes the detection of sequence similarity. Nevertheless, we identified 26 families with at least two members each that appear to be Deinococcus specific (Table ). Some of these families have conserved sequence and structural features that, in spite of the absence of significant overall similarity to any other proteins, are reminiscent of well-characterized domains. For example, the DR2457-like and DR2241-like families contain pairs of conserved cysteines that resemble Zn ribbons present in different enzymes and nucleic acid-binding proteins. Several other uncharacterized globular proteins in Deinococcus (e.g., DR1088 and DR1486) also contain such cysteine pairs, which suggests metal binding or perhaps nucleic acid binding. An intriguing possibility is that, similarly to better-characterized families, these unique protein families have emerged as a result of adaptation and may be involved in novel mechanisms of DNA repair or stress response specific to Deinococcus.
Unique protein families in D. radiodurans
Proteins with Unusual Domain Architectures
Combining several domains into one protein may give rise to novel protein functions, enhance the cooperation between existing functionally linked protein activities, facilitate regulation, and/or result in modification of substrate specificity (74
). Thus, it seems reasonable to assume that, like expansion of paralogous families, unique domain architectures are lineage-specific adaptations to a particular life-style. The D. radiodurans
genome encodes over 20 multidomain proteins with unusual domain combinations that have not been detected in other species (Fig. ). The two phenomena appear to be linked since there are several examples where the unusual domain architectures are present in members of expanded protein families. The unique combination of the Nudix hydrolase domain with a methyltransferase and a cytosine deaminase has already been described. The McrA-like nuclease domain is a part of three unusual domain arrangements, at least two of which are suggestive of repair functions (Fig. ). A particularly good example of such a functionally interpretable association is the DRA0131 protein, where the endonuclease domain is combined with a RAD25-like helicase, which in eukaryotes is involved in nucleotide excision repair of UV-damaged DNA (84
). In DR1533, an McrA-like endonuclease is linked to a SAD domain, which so far has been detected only in eukaryotic chromatin-associated proteins (71
). A third protein, DRA0057, also contains a domain of the TerDEXZ/CABP family (see above) and is likely to be related to stress response.
One of the Deinococcus
α/β hydrolases is fused to a flavin-containing monooxygenase domain, also a unique domain configuration (Fig. ). The well-established role of flavin-containing monooxygenases in xenobiotic transformation and oxygen reactivity (177
) strengthens the hypothesis that the two domains function together in the metabolism of some environmental compound or secondary metabolite. There are other such fusions that point to potential novel metabolic functions. For example, the DRA0304 protein contains a metallo-β-lactamase-like domain fused to a C-terminal rhodanese-like domain. Proteins that consist of a single rhodanese-like domain are involved in different forms of stress response. For example, E. coli
PspE (phage shock protein E) is induced in response to heat, ethanol, osmotic shock, and phage infection; din1 and sen1 proteins from plants are dark inducible and senescence associated, and the 67B2 protein of Drosophila
is also heat shock inducible (96
). Proteins with the same domain composition but with the order of the domains reversed are encoded in the gas vesicle plasmid of the archaeon Halobacterium halobium
) (GenBank ID number [GI], 2822321 and 2822327), which suggests that the hydrolase domain and rhodanese-like domain cooperate in their chaperone or metabolic functions. Another unique domain fusion (DR1207) with a possible role in the metabolism of some amino group-containing compounds includes a cytosine deaminase domain and a PP-loop ATPase similar to the cell cycle protein MesJ.
DRB0098 contains a phosphatase domain and a polynucleotide kinase domain and is another example of an independent origin of a multidomain proteine with analogous domain architectures in distant taxa. Proteins combining these (predicted) enzymatic activities have been found only in Deinococcus
, bacteriophage T4, and some eukaryotes, including humans, Caenorhabditis elegans
, and Schizosaccharomyces pombe
. The phosphatase domain of the phage T4 and eukaryotic proteins belongs to the haloacid dehalogenase superfamily (12
), whereas the one from Deinococcus
belongs to the HD hydrolase superfamily (15
). By analogy to the eukaryotic proteins that function in DNA repair following ionizing radiation and oxidative damage (102
), the deinococcal enzyme may be implicated in a similar process.
Most of the HTH-containing proteins predicted to function as transcriptional regulators in Deinococcus
share the domain architecture with their bacterial and archaeal homologs. However, one of these proteins, DR2199, has an unusual combination of domains (Fig. ). In addition to a C-terminal HTH domain, this protein contains (i) a distinct N-terminal domain homologous to the eukaryotic developmental regulator schlafen
) and to several uncharacterized bacterial and archaeal proteins and (ii) another, uncharacterized domain shared with several bacterial and archaeal proteins. The unusual domain architecture of DR2199 is conserved in two proteins from the archaeon Pyrococcus abyssi
(GI, 5459605 and 5458925).
Horizontal Gene Transfer
Numerous recent observations support the notion that horizontal gene transfer has played a major role in the evolution of bacteria and archaea (18
is no exception to this trend since it apparently acquired a significant number of genes by horizontal transfer from various sources. The most notable of these genes are listed in Table . Several genes found in Deinococcus
previously have been detected only in eukaryotes and/or archaea. One of these encodes topoisomerase IB, an enzyme that is highly characteristic of eukaryotes and is present in D. radiodurans
in addition to the typical bacterial topoisomerases IA and II. The recent demonstration of a structural and mechanistic relationship between topoisomerase IB and site-specific recombinases (38
) makes a role in recombination plausible for the D. radiodurans
enzyme. A knockout mutant with this gene deleted is substantially more sensitive to UV (254 nm) but not to ionizing radiation than is the wild type (Daly et al., unpublished). Notably, in the Deinococcus
genome, the gene for topoisomerase IB is adjacent to a gene that encodes uracil-DNA glycosylase with a clear eukaryotic phylogenetic affinity. It appears likely that the two genes were simultaneously transferred from a eukaryotic source, possibly a large DNA virus because both enzymes are encoded by poxviruses (182
), although a virus in which these genes were adjacent has not yet been detected.
Examples of horizontally transferred genes in D. radiodurans
Another typical eukaryotic protein encoded by Deinococcus
is a highly conserved ortholog of the eukaryotic RNA-binding protein Ro. This protein has a distinct RNA-binding domain that is shared with the RNA-binding subunits of eukaryotic telomerases such as TP-1 and p80. In eukaryotes, Ro binds specific small RNA molecules (Y RNAs) of ribonucleoprotein particles that are found both in the cytoplasm and in the nucleus (79
) and has been proposed to play a role in the “quality control” of large-scale 5S rRNA biosynthesis (79
). In Deinococcus
, the Ro ortholog is involved in the regulation of UV repair, in which the eukaryote-type topoisomerase IB is believed to participate. It binds to several small RNAs analogous to the Y-RNAs that are encoded by genes upstream of the Ro gene (37
). Interestingly, an independent transfer of another eukaryotic member of this family, related to the telomerase RNA-binding subunit, into the genome of Streptomyces
suggests a more widespread accquisition of Ro-like RNA-binding proteins by bacteria (L. Aravind, unpublished data).
The gene for a predicted protein kinase of the RIO1 family, which previously has been detected in archaea and eukaryotes but not in bacteria (121
), also appears to have been transferred into the genome of Deinococcus
proteins whose plant homologs are induced by desiccation are of particular interest; this is the first report of bacterial homologs of plant desiccation resistance-associated proteins. The DR1372 protein belongs to the Lea-14 (late embryogenesis abundant) family of group 4 of LEA proteins, one of the best-studied plant desiccation response-associated protein families (73
). Using iterative database searching, we detected additional homologs of LEA14-like proteins in many archaeal species (Fig. ). In plants, these proteins are cytosolic. However, DR1372 and some of the archaeal homologs contain a signal peptide, and this suggests that in Deinococcus
and in archaea, the subcellular localization of these proteins could be different.
FIG. 6 Multiple alignment of the selected members of the LEA14 family of desiccation related proteins. The numbers and coloring in this alignment are the same as in Fig. . The two-letter code for species is as follows: AF, Archaeaoglobus fulgidus (more ...)
The Lea-76 family belongs to group 3 of LEA proteins, which also are well-characterized and widespread desiccation-induced proteins in plants (46
). The main sequence feature in these proteins is a tandem repeat of a distinct 11-mer motif, in which the amino acids at positions 1, 2, 5, and 9 are nonpolar and the rest are charged or amide residues (e.g., AAQKTKDYASD in the Lea-76 protein from soybean; GI, 421875) (58
). Besides plants, at least two proteins of this family are present in the nematode C. elegans
(GI, 2353333 and 3924824). This motif is conserved in two Deinococcus
proteins, DR0105 and DR1172, which show significant similarity to Lea-76 proteins. More generally, several other families of late embryogenesis-abundant and/or water stress resistance-related proteins are rich in repeats and/or have biased amino acid composition (62
), complicating the identification of homologs. Therefore, it is possible that some as yet uncharacterized Deinococcus
proteins containing compositionally biased sequences are also relevant to desiccation resistance. The DRB0118 protein is a homolog of a desiccation-related protein from Craterostigma plantagineum
(GI, 118622), an extremely desiccation-resistant plant from the Asteridae class. In this plant, several water stress response proteins have been identified (161
), with the protein homologous to DRB0118 being the only one that has no homologs in other plants. A positive correlation between resistance to desiccation and radioresistance has recently been established by examining a series of D. radiodurans
radiosensitive mutants for desiccation resistance (135
). It is possible, therefore, that the homologs of plant desiccation resistance-associated genes have been acquired by Deinococcus
via horizontal gene transfer; the products of these genes may be generally important to the resistance phenotype (135
Other apparent horizontal transfers to Deinococcus
from eukaryotes are not easily interpretable. For example, DR1790 is a highly conserved member of a protein family that includes the yellow protein of Drosophila
and royal jelly protein from the honeybee and so far has not been detected outside the insects (4
). This seems to point to a rather precise source of this horizontally transferred gene, but the biochemical function of its product is not known. Based on its role in cuticular pigmentation in Drosophila
), it may be speculated that it could be an enzyme required for the metabolism of certain pigments.
Mobile Genetic Elements
The genome of D. radiodurans contains a number of predicted mobile elements of different classes. These are of particular interest because of the role some of them could play in recombinational repair.
Two inteins, protein splicing elements that are typically inserted in genes involved in DNA metabolism and other nucleotide-utilizing enzymes (162
), were identified in D. radiodurans
. One of these is inserted in the ribonucleotide reductase and is similar to the inteins inserted in orthologous enzymes from B. subtilis
, pyrococci, and chilo iridiscent virus. This intein contains an inserted cro-like HTH domain (14
) followed by a homing endonuclease of the LAGLI-DAG family (47
). The second intein is inserted between the P-loop motif and the Mg2+
-binding (Walker B) motif of a SWI2/SNF2 family ATPase, which is involved in chromatin remodeling; this is the first documented instance of an intein interrupting a protein of this family. The most unusual feature of this intein that it is encoded by two distinct adjacent ORFs (DR1258 and DR1259), each of which also encodes a portion of the ATPase split by the intein. Recently, it has been proposed and then shown experimentally that the split intein in the Synechocystis
DNA polymerase III α-subunit assembles from the two separately translated ORFs and splices out to form a fully functional protein (66
). A similar protein transsplicing mechanism is likely to generate an active SWI2/SNF2 ATPase in Deinococcus
Insertional sequences (ISs) in the D. radiodurans
genome were identified during the genome annotation by the presence of ORFs homologous to transposases of several different IS families (34
). Several of these ORFs exist in multiple copies. For most of these elements (IS4_DR, TCL9, TCL121, TCL23, IS3_DR, and AXL_DR), the precise length could be determined. All of these elements have the typical features of ISs identified in other species (72
). In particular, they contain one or two ORFs that encode a transcriptional regulator and a transposase, as well as inverted terminal repeats and/or internal repeats (data not shown). Three elements (TCL9, TCL121, and TCL23) of the Tc1-mariner family are closely related to each other and are likely to be the product of a recent duplication, probably specific to the Deinococcus
lineage (data not shown).
Overall, we detected 52 IS elements in the D. radiodurans genome (Table ). The three most abundant ISs are IS4_DR (13 copies), IS2621_DR (11 copies), and IS200_DR (8 copies). IS elements are unevenly distributed on the chromosomes and plasmids. The number of copies per 10,000 nucleotides in the plasmid and the megaplasmid is more than 10 times greater than the number found in chromosomes I and II. Only one IS element is present in the chromosome II, whereas the plasmid contains nine. There are five single-copy IS elements in the D. radiodurans genome, three of them on the plasmid. They may be transpositionally inactive, or, alternatively, they could have been only recently acquired by the R1 strain.
Distribution of insertion sequences in the D. radiodurans genome
Notably, IS elements are significantly more abundant in D. radiodurans
than in any of the other sequenced bacterial genomes (Table ). D. radiodurans
contains 16.3 IS elements per 1,000 genes, whereas E. coli
, ranking second, has only 8.4. If the number of IS elements is a reflection of transposition activity, this would be expected to cause genome instability and result in high levels of genome rearrangement in Deinococcus
. There is, however, little direct evidence for any active transposition in D. radiodurans
. In the entire genome, there is only one example of gene disruption by an IS element, where IS2621
is inserted into the gene for alkaline serine exoprotease A (aqualysin I). Similarly, only one IS-induced mutation has been detected in D. radiodurans
]). Nevertheless, the abundance of IS elements in the Deinococcus
genome is remarkable, and their involvement in genome instability is the subject of ongoing investigations.
Number of repeats in bacterial genomes
Small noncoding repeats.
We identified several families of small noncoding repeats (SNRs) in the D. radiodurans intergenic regions (Table ). A comparison to other bacterial genomes showed that, like IS elements, SNRs are more abundant in D. radiodurans than in E. coli (Table ). However, the location bias observed for IS elements appears to be reversed for SNRs. There are no SNRs in the plasmid, that contains five IS elements. In contrast, chromosome II, that contains only one IS element, has 18 SNRs.
Distribution of SNRs in the D. radiodurans genome
D. radiodurans SNRs have a complex mosaic configuration, as exemplified by SNR2, that consists of five conserved modules (Fig. A). Module I (also shared with the small repetitive element [SRE] family) and module V contain two parts of the inverted repeat present in SNR2. The different configurations of the SNR2 family are shown (Fig. B). These data suggest that deletions and insertions are likely to have played an important role in the evolution of SNRs. For example, module III is likely to be missing when both modules II and IV are present.
FIG. 7 (A) Structure of the full-length repeated member of the SNR2 family. Inverted repeats are marked by arrows. Roman numerals and different colors mark the five conserved modules. (B) Number of SNR2 members with the indicated modular configuration. Each (more ...)
The distribution of SNRs along D. radiodurans
chromosome I was tested against the null hypothesis of random occurrence of an SNR in the intergenic regions. The analyses for individual families as well as SNRs together showed that, with a single exception, there is no significant deviation from the random-placement model. The exception is the SNR5 family members that show a tendency (P
< 0.05) to occur closer to each other than predicted by the random model. There is no significant correlation between the direction of a repeat and the direction of the adjacent gene, nor an apparent relationship between a particular SNR family and the functions of the adjacent genes. Thus, SNRs are not likely to play a direct role in the regulation of transcription or translation. It should be noted in this context that while some D. radiodurans
SNRs have characteristics similar to the E. coli
families of small repeats (bacterial interspersed mosaic elements [BIMEs] [76
]), SNRs do not share sequence or structural features with E. coli rho
-independent transcription terminators (Ter repeats [29
]). The energy of potential RNA secondary structures predicted for D. radiodurans
SNRs does not differ from the values obtained for coding regions or other sequence fragments unrelated to SNRs.
A sequence for the SRE from the D. radiodurans
strain SARK was published previously (120
) and has provided an opportunity to compare two evolutionarily distinct but closely related SNRs. A multiple alignment of SRE sequences from both strains (not shown) showed that most of the strain-specific substitutions were located in the central regions of two pairs of inverted repeats. In strain SARK, these SRE inverted repeats may form a pair of hairpin-like structures (120
). However, the substitutions seen in R1 may disrupt these hairpins. The predicted free energy for the consensus hairpin I in SARK is −11.2 kcal/mol, but that in R1 is only −6.1 kcal/mol, as estimated by the Mfold program (134
); for hairpin II, it is −17.0 and −11.0 kcal/mol, respectively. The nonrandom clustering of the strain-specific nucleotide substitutions in strain R1 is tantalizing. One could speculate that there is a strain-specific selection pressure for either strengthening (in SARK) or disrupting (in R1) these hairpins within this particular repeat family, and this would suggest a specific function for these repeats. The second possibility is that the multiple substitutions in the hairpin represent regions of the SRE that are hot spots for spontaneous mutagenesis.
The first SRE detected in D. radiodurans
was within a cloned mitomycin C-inducible gene of strain SARK (120
). Interestingly, a comparison with the corresponding region of the R1 strain shows that the repeat is missing in R1, demonstrating the mobility of SRE (127
). The propensity of D. radiodurans
to amplify DNA sequences that are flanked by direct repeats (31
) is relevant to the large number of repeats, both ISs and SNRs, in its chromosomes and plasmids (4 to 10 copies per cell). The abundance of such repeated sequences flanking genes and operons throughout the genome could provide the potential for expansion and regulation of genomic regions in response to environmental challenges.
Two prophages unrelated to one another are present in the Deinococcus genome. One of these is located on chromosome I (between positions 518499 and 547679), and the other is located on chromosome II (between positions 80554 and 113236). Some of the proteins encoded in these prophages are distantly related to several phage proteins from other bacteria, but most ORFs have no detectable homologs. These prophages contain some genes definitely acquired from a bacterial genome, e.g., a serine/threonine protein kinase (DR0534) and a MotB/OMPA family protein (DR0536), and therefore are possible vectors for horizontal gene transfer.
Evolutionary Relationships to Other Bacteria and Phylogeny
A specific relationship between Thermus
has been established by both traditional microbiological (32
) and molecular phylogenetic (156
) approaches. These species currently comprise a bacterial group without a clear relationship to other major branches of bacteria. Previous attempts to clarify these relationships (80
) have led to the proposition that the Thermus-Deinococcus
group is an intermediate between gram-positive and gram-negative bacteria. Furthermore, on the basis of phylogenetic trees developed for several protein families (HSP70, HSP40, FtsZ, RecA, and some translation elongation factors) and rRNA, an affinity of this group with cyanobacteria has been proposed (reference 80
and references therein).
Sequence analysis on the complete genome scale has revealed a major role of horizontal gene transfer in the evolution of bacteria and archaea. It appears that for most bacterial genomes, at least 10 to 15% of genes have been involved in horizontal transfer (18
; K. S. Makarova, L. Aravind, and E. V. Koonin, unpublished data). As discussed above, this level of horizontal gene transfer is consistent with our findings in Deinococcus
. The taxonomic distribution of the best BLAST hits for all proteins in the Deinococcus
genome is shown in Fig. . More than half of the genes did not show specific affinity to any major bacterial branch, archaea, or eukaryotes. Some of these genes were unique to Deinococcus
, but the majority appear to be more or less equidistant from their homologs from other major taxa. Among the remaining genes, the greatest fraction was most similar to homologs from gram-positive bacteria (Fig. ), but even in this case it is difficult to distinguish a genuine phylogenetic signal from preferential horizontal gene transfer. Therefore, this form of analysis does not yield a specific phylogenetic placement for the Thermus-Deinococcus
FIG. 8 Taxonomic affinities of Deinococcus proteins. We defined a hit to a particular lineage as the best one if it had a BLAST E-value for a protein from this lineage 100 times lower than to any protein from another lineage. Hits to Thermus-Deinococcus group (more ...)
For phylogenetic reconstruction, we used a nearly complete set of ribosomal proteins (50 sequences) and three RNA polymerase subunits that are shared among all bacterial species. A slightly smaller protein set was used to additionally include Thermus aquaticus (Fig. ). All of these proteins are subunits of large, coevolving, macromolecular complexes, and therefore the respective genes are less prone to horizontal transfer. Furthermore, the large amount of sequence information included in this analysis helped to minimize the effects of possible horizontal transfer events or fluctuations in the evolutionary rate that could affect the tree topology for individual protein families. Tree A and tree C (Fig. ) have essentially the same topology, indicating that the Thermus-Deinococcus group is a deeply rooted bacterial branch with a marginal, but not necessarily reliable, affinity to the cluster of gram-positive bacteria, cyanobacteria, and bacterial thermophiles. Tree B and tree D clearly confirm the strong relationship between Thermus and Deinococcus. These analyses did not detect any evidence for the previously suggested specific relationship between the Thermus-Deinococcus group and cyanobacteria.
FIG. 9 Phylogenetic trees. (A) All ribosomal proteins shared by selected organisms with a completely sequenced genome; (B) all ribosomal proteins shared by Thermus and selected organisms; (C) RNA polymerase subunit A; (D) fragment of RNA polymerase subunit A (more ...)
Derived shared characteristics between Thermus
were used for a preliminary assessment of the possible genome organization and physiological features of their common ancestor. A comparison of all available protein sequences from Thermus
to those encoded in the Deinococcus
genome showed several features that are unique to this clade (Table ). The conservation of a distinct S-layer-like protein in the Thermus-Deinococcus
group suggests that the last common ancestor already possessed the unique membrane structure observed in both organisms (146
). Another shared protein unique to these organisms contains a predicted signal peptide that could be involved in the formation of the characteristic infrastructure of their outer membranes. Further, the conservation of two proteins that are distantly related to nitrogen metabolism regulators suggests that a derived state of this system evolved before the divergence of Thermus
(Table ) (33
). Several proteins that are highly conserved in Thermus
show a clear affinity to archaea and/or eukaryotes, and this may have arisen by ancient horizontal gene transfer events.
We also estimated gene flow and gene loss rates in these moderately related bacteria from the same clade. Deinococcus
has twice as many genes as Thermus
), yet we found a significant number of genes that are present in Thermus
but not in Deinococcus
(Table ). This probably reflects the distinct metabolic repertoires of these bacteria, as well as the presence in Thermus
of genes associated with thermophilicity.
The phylogenetic affinity between Thermus and Deinococcus raises the issue of whether their common ancestor was a thermophile. We compared the fractions of genes shared by archaeal and bacterial thermophiles for all bacteria with completely sequenced large genomes. Perhaps not unexpectedly, the greatest fraction was seen in B. subtilis, because many species of the Bacillus-Clostridium group are thermophiles and Thermotoga may be a highly derived member of this group; Deinococcus had the second greatest fraction (Fig. ). The number of common genes between these thermophiles and Deinococcus is consistent with the hypothesis that the ancestor of the Thermus-Deinococcus group also was at least a moderate thermophile with the descendent clades evolving in different directions and accquiring different sets of genes via horizontal transfer. The complete genome sequence of Thermus and, ideally, other members of this clade would be required for a definitive evaluation of this hypothesis.
Comparison of shared “thermophilic” genes in different species.