|Home | About | Journals | Submit | Contact Us | Français|
The bacterium Deinococcus radiodurans shows remarkable resistance to a range of damage caused by ionizing radiation, desiccation, UV radiation, oxidizing agents, and electrophilic mutagens. D. radiodurans is best known for its extreme resistance to ionizing radiation; not only can it grow continuously in the presence of chronic radiation (6 kilorads/h), but also it can survive acute exposures to gamma radiation exceeding 1,500 kilorads without dying or undergoing induced mutation. These characteristics were the impetus for sequencing the genome of D. radiodurans and the ongoing development of its use for bioremediation of radioactive wastes. Although it is known that these multiple resistance phenotypes stem from efficient DNA repair processes, the mechanisms underlying these extraordinary repair capabilities remain poorly understood. In this work we present an extensive comparative sequence analysis of the Deinococcus genome. Deinococcus is the first representative with a completely sequenced genome from a distinct bacterial lineage of extremophiles, the Thermus-Deinococcus group. Phylogenetic tree analysis, combined with the identification of several synapomorphies between Thermus and Deinococcus, supports the hypothesis that it is an ancient group with no clear affinities to any of the other known bacterial lineages. Distinctive features of the Deinococcus genome as well as features shared with other free-living bacteria were revealed by comparison of its proteome to the collection of clusters of orthologous groups of proteins. Analysis of paralogs in Deinococcus has revealed several unique protein families. In addition, specific expansions of several other families including phosphatases, proteases, acyltransferases, and Nudix family pyrophosphohydrolases were detected. Genes that potentially affect DNA repair and recombination and stress responses were investigated in detail. Some proteins appear to have been horizontally transferred from eukaryotes and are not present in other bacteria. For example, three proteins homologous to plant desiccation resistance proteins were identified, and these are particularly interesting because of the correlation between desiccation and radiation resistance. Compared to other bacteria, the D. radiodurans genome is enriched in repetitive sequences, namely, IS-like transposons and small intergenic repeats. In combination, these observations suggest that several different biological mechanisms contribute to the multiple DNA repair-dependent phenotypes of this organism.
The evolution of organisms that are able to grow continuously at 6 kilorads (60 Gy)/h (119) or survive acute irradiation doses of 1,500 kilorads (50–52) is remarkable, given the apparent absence of highly radioactive habitats on Earth over geologic times. Notwithstanding a few natural fission reactors like those that gave rise to the Oklo uranium deposits (Gabon) 2 billion years ago (151), the radiation levels in the Earth's surface environments, including its waters containing dissolved radionuclides, have provided only about 0.05 to 20 rads/year over the last 4 billion years (193). DNA damage is readily inflicted on organisms by a variety of other common physicochemical agents (e.g., UV light or oxidizing agents) or nonstatic environments (e.g., cycles of desiccation and hydration or cycles of high and low temperatures) and it seems more likely that radiation resistance evolved in response to chronic exposure to nonradioactive forms of DNA damage.
Bacteria belonging to the family Deinococcaceae are some of the most radiation-resistant organisms discovered, and they are vegetative, easily cultured, and nonpathogenic (23, 137, 138). Despite their ubiquitous distribution and apparent ancient derivation, only seven species of Deinococcaceae have been described (69, 138, 145). Deinococcus radiodurans strain R1 was the first of the deinobacteria to be discovered and was isolated in Oregon in 1956 (7) from canned meat that had spoiled following exposure to X rays. Culture yielded a red-pigmented, nonsporulating, gram-positive coccus that was extremely resistant to ionizing radiation, UV light, hydrogen peroxide, and numerous other agents that damage DNA (119, 137, 142, 215), as well as being highly resistant to desiccation (135). It is an aerobic, large (1- to 2-μm) tetrad-forming soil bacterium that is best known for its supreme resistance to ionizing radiation. It not only can survive acute exposures to gamma radiation that exceed 1,500 krads without dying or undergoing induced mutation (53), but it also displays luxuriant growth in the presence of high-level chronic irradiation (6 kilorads/h) (119, 212) without there being any effect on its growth rate or ability to express cloned foreign genes (31). For comparison, Escherichia coli will not grow and is killed in the presence of 6 kilorads/h (119) and an acute dose of only 100 to 200 kilorads needed to sterilize a culture. Similarly, vegetative cells of Bacillus spp. cannot grow at 6 kilorads/h and Bacillus spores show a 5-order-of-magnitude decrease in viability following acute exposure to 200 to 1,000 kilorads (207).
Shortly after the isolation of D. radiodurans R1 in 1956, a second strain of D. radiodurans (SARK) was discovered as an air contaminant in a hospital in Ontario (R. G. E. Murray and C. F. Robinow, Seventh International Congress for Microbiology, 1958). Since then, six closely related radioresistant species have been identified: Deinococcus radiopugnans from haddock tissue (54), Deinococcus radiophilus from Bombay duck (122), Deinococcus proteolyticus from the feces of Lama glama (108), the rod-shaped Deinococcus grandis from elephant feces (158), and the two thermophilic species Deinococcus geothermalis and Deinococcus murrayi from hot springs in Portugal and Italy, respectively (69). These species together form a distinct eubacterial phylogenetic lineage, believed to be most closely related to the Thermus genus. Based on 16S rDNA sequence analysis, it has been proposed that Deinococcus and Thermus form a eubacterial phylum (168). To date, the natural distribution of the deinococci has not been explored systematically. Isolations have occurred worldwide but are diverse and patchy in distribution. In addition to those noted above, sites of isolation include damp soil near a lake in England (133), weathered granite from the Antarctic Dry Valleys (44), irradiated medical instruments, and air purification systems (10, 41, 114, 145). As suggested above, it is possible that their extreme proficiency at DNA repair is related to the selective advantage in environments where they are prone to damage during long periods of desiccation (135). More recently, it has been proposed that adaptation could also occur in permafrost or other semifrozen conditions where cryptobiotic microbes with extremely long generation times could be selected with metabolic processes able to repair the unavoidable accumulation of background radiation-induced DNA damage (171).
Of the deinococcal species, D. radiodurans (138) and D. geothermalis (48) are the only ones for which a system of genetic transformation and manipulation has been developed. Now adding to this genetic technology is the recent complete sequencing and annotation of the D. radiodurans genome (218). The D. radiodurans strain R1 genome consists of two chromosomes (DR_Main [2.65 Mbp] and DR412 [412 kbp]), one megaplasmid (DR177 [177 kbp]), and one plasmid (46 kbp) (218), carrying 3,195 predicted genes. This combination of factors has positioned D. radiodurans as a promising candidate for the study of mechanisms of DNA damage and repair, as well as its exploitation for practical purposes such as cleanup and stabilization of radioactive waste sites. For example, D. radiodurans is being engineered to express metal-detoxifying and organic compound-degrading functions in environments heavily contaminated by radiation; 7 × 107 m3 of ground and 3 × 109 liters of groundwater were contaminated by radioactive waste generated in the United States during the Cold War (31, 48, 119).
The cell envelope of D. radiodurans is unusual in terms of its structure and composition (3). Although the cell envelope of D. radiodurans is reminiscent of the cell walls of gram-negative organisms (32, 61, 208, 221), Deinococcus often stains gram positive; this may result from the inability of its thick peptidoglycan layer to decolorize. Its cell envelope consists of the plasma and outer membranes, which are separated by a 14- to 20-nm peptidoglycan layer and an uncharacterized “compartmentalized layer.” At least six layers have been identified by electron microscopy, with the innermost layer being the plasma membrane. The next layer is a peptidoglycan-containing cell wall and appears to be perforated (the holey layer), but it has no known physiological significance. The third layer appears to be divided into numerous fine compartments (the compartmentalized layer). The fourth layer is the outer membrane, and the fifth layer is a distinct electrolucent zone. The sixth layer consists of regularly packed hexagonal protein subunits (the S-layer, or hexagonally packed intermediate layer), typical of other bacterial S-layers (26, 115, 206). A few strains of Deinococcus also exhibit a dense carbohydrate coat (25, 26, 118, 187, 205, 208, 221). Only the cytoplasmic membrane and the peptidoglycan layer are involved in septum formation during cell division. The other layers are regarded as a sheath, since they surround groups of cells and form on the surface of daughter cells as they separate (187, 208, 221).
The chemical structure of the peptidoglycan layer of D. radiodurans SARK has been investigated using mass spectrometry (165), and the structure obtained is consistent with the A3β classification given to D. radiodurans (32, 176, 186). Thermus thermophilus HB8 (166) also has an A3β murein chemotype, and its peptidoglycan is built from the same monomeric subunit, underscoring the phylogenetic relationship between these genera.
The plasma and outer membranes appear to have the same lipid composition (206), yet there is no evidence for conventional lipopolysaccharides. The fatty acid composition of D. radiodurans is distinctive (69); attempts to identify hydroxy fatty acids, lipid A, and heptoses have been unsuccessful (145). A mixture of 15-, 16-, 17-, and 18-carbon saturated and monounsaturated acids are present, while polyunsaturated, cyclopropyl, and branched-chain fatty acids are not detectable. D. radiodurans has the distinguishing characteristic of lacking conventional phospholipids found in other bacteria (204). Of the D. radiodurans membrane lipid, 43% is composed of phosphoglycolipids containing a series of alkylamines as structural components, hitherto unknown as lipid constituents (8, 9). These lipids appear to be derived from the same precursor, a novel phosphatidylglycerolalkylamine, and form when the precursor is glycosylated with galactose or glucosamine. Although glucosamine-containing lipids have been found in other species, notably members of the genus Thermus (160), these phosphoglycolipids are, at present, considered unique to D. radiodurans.
The most extensively studied of the deinococci is D. radiodurans. Unlike other deinobacterial species, it is amenable to genetic manipulation due to its natural transformability by both high-molecular-weight chromosomal DNA and plasmid DNA (131, 143, 189). The natural transformability of D. radiodurans has facilitated the development of a variety of techniques for genetic manipulation of this organism (31, 49–52, 81–83, 119, 120, 131, 189–191), rendering it a highly susceptible target for molecular investigation. Transformability, however, is not integral to DNA damage resistance, since the other deinobacterial species are no less radioresistant than D. radiodurans (142) but are not transformable by any forms of DNA (D. geothermalis, however, is an exception since it has been transformed with plasmid recently ). In the exponential growth phase, D. radiodurans does not die in response to ionizing irradiation up to 0.5 megarad and shows 10% survival at 0.8 megarad (142), while exponentially growing E. coli, for comparison, shows a very small shoulder of complete resistance and 10% survival at 15 kilorads (188), a 50-fold difference in resistance (188). In the stationary growth phase, D. radiodurans does not die until exposed to 1.5 megarads, over 100-fold greater resistance than stationary-phase E. coli (53, 137). In exponential phase, D. radiodurans is 33-fold more resistant to UV than is E. coli (197). Compared to other organisms, the D. radiodurans DNA sustains the expected amount of damage in vivo at high irradiation doses, on the order of 150 to 200 double-stranded DNA breaks (DSBs) at 1.5 megarads per haploid chromosome under aerobic irradiation conditions, all of which are mended within hours following irradiation (53, 107, 123), nor is its DNA less susceptible than that of E. coli to UV in vivo (183). Furthermore, survivors of extreme ionizing radiation, UV, or bulky chemical-adduct exposures do not show any mutagenesis greater than that occurring after a single round of normal replication (197, 198). On the other hand, D. radiodurans is mutable by N-methyl-N′-nitro-N-nitrosoguanidine and other agents that can cause mispairing of bases during replication (197, 198). Of the many forms of damage imposed on DNA by ionizing radiation, DSBs are considered the most lethal due to the inherent difficulty in their repair, since no single-strand template for accurate repair remains in the double helix (117). Other organisms, such as E. coli, can repair at most a few DSBs per chromosome without dying (112).
D. radiodurans contains 8 to 10 haploid genome copies during exponential growth and 4 genome copies during stationary phase (87, 89). In comparison, E. coli contains four or five haploid chromosomes during vigorous exponential growth, and this multiplicity in E. coli has been shown to be necessary for repair of DSBs (112). However, multiplicity in itself is insufficient for radioresistance. Micrococcus luteus and Micrococcus sodonensis also contain multiple genome equivalents but are radiosensitive (142). Azotobacter vinelandii, which contains up to 80 chromosomes per cell (164, 172), is quite sensitive to UV damage (125), to which D. radiodurans is highly resistant. Using various growth media, Harsojo et al. (89) were able to vary the genomic complement of D. radiodurans between 5 and 10 during the exponential phase and demonstrated that there was no correlation between chromosome number and radioresistance. The authors concluded that if chromosome multiplicity is important in repair, five or fewer chromosomes are sufficient. On high-level irradiation (1.75 megarads), D. radiodurans can reconstitute its genome from 1,000 to 2,000 DSB fragments compared to the maximum capability of E. coli of restoring its genome from 10 to 15 DSB fragments. Since most recombination models postulate that all DSB fragments search all others for homology during repair, this would call for an astronomical number of combinations to ensure genome restoration in D. radiodurans. Therefore, it may be that D. radiodurans can use redundant information in ways that other organisms do not. An alternative repair model has been postulated for D. radiodurans in which its chromosomes are always aggregated and aligned, thus dramatically simplifying the search for repair templates (51, 139) following DNA damage.
Repair of DNA damage in D. radiodurans follows an ordered series of events (137, 142). Physical repair of lesions requires conditions compatible with growth (212). For colony formation assays, this is simply achieved by plating on nutrient agar. For liquid cultures, this requires fresh nutrient medium and adjustment of cellular density to a level suitable for exponential growth. This has been demonstrated in liquid cultures for excision of pyrimidine dimers (30), repair of DSBs (56, 53), and recombinational repair of plasmids and chromosomes (51). While growth-promoting conditions are essential for removal of lesions from cellular DNA, the cells themselves do not immediately divide. Indeed, there is a dramatic inhibition of growth for extended durations following acute exposure to nonlethal (or partially lethal) DNA damage. This growth lag is associated with limited degradation of chromosomal DNA intrinsic to the DNA repair processes. Degradation proceeds at a rate independent of dose (the initial extent of damage), but its duration is positively correlated with dose (137; also see reference 142 and citations therein). Thus, the greater the dose, the longer the growth lag, which may exceed the duration of DNA degradation. Following a nonlethal exposure of stationary-phase D. radiodurans to 1.5 megarads under anoxic conditions, dilute liquid cultures of D. radiodurans show no growth for about 10 h and then resume rapid exponential growth (53). The dose-dependent delay of the onset of cellular replication suggests the existence of a checkpoint that monitors the extent of repair and accordingly controls the initiation of replicative DNA synthesis. During the period of stasis, it can be expected that the cell undergoes several phases of repair. The first can be termed cellular cleansing, and it involves several modalities, including the export of damaged DNA components. Initially, the products formed are DNA fragments about 2,000 bp long and consists of a mixture of damaged and undamaged nucleotides and nucleosides (22, 213). These products are found in the cytoplasm and also in the surrounding growth medium, suggesting that D. radiodurans exports the DNA degradation products once they are formed (reference 22 and citations therein). The removal of damaged nucleotides outside the cell might protect the organism from elevated levels of mutagenesis by preventing the reincorporation of damaged bases during DNA synthesis (22). Remaining intracellular mutagenic precursors could be sanitized via pyrophosphohydrolases of the Nudix superfamily (for “nucleoside diphosphate linked to some other moiety x”), the founding member of which is the repair enzyme MutT (28). MutT has an 8-oxo-dGTPase activity, which produces 8-oxo-dGMP plus inorganic pyrophosphate. Since 8-oxo-dGTP is highly mutagenic, the enzyme “sanitizes” the nucleoside triphosphate pool. D. radiodurans is markedly rich in Nudix proteins, some of which may act to sanitize other mutagenic DNA precursors (218). Finally, activated oxygen species with long half-lives may be eliminated by superoxide dismutases and catalases such as SodA and KatA (129). During this initial phase of cellular cleansing, amino acids, nucleotides, nucleosides, sugars, and phosphate may be imported into the cell while precursors for DNA synthesis are made by way of ribonucleoside diphosphate reductase (104). Subsequent phases of repair are genomic restoration and coordination of repair activities.
D. radiodurans has repair pathways that include excision repair, mismatch repair, and recombinational repair. Generally, no marked error-prone SOS response is observed in D. radiodurans (142). However, there have been a few reports consistent with SOS response, where preexposure to low doses of ionizing radiation, UV, or hydrogen peroxide causes a low level of subsequent increased resistance to DNA damage (twofold or less) (199, 215). Since the SOS response is not always mutagenic, the absence of DNA damage-induced mutagenesis observed in D. radiodurans cannot be taken as evidence against the existence of the SOS response in this bacterium. Photoreactivation is not present (142), and it has been reported that the adaptive response to alkylation damage is also absent (170). It is known that following DNA damage, there are changes in the cellular abundance of proteins, with enhanced synthesis of four to nine proteins, as judged by sodium dodecyl sulfate-polyacrylamide protein gels (86, 200). Included in this group of proteins are probably RecA (36), elongation factor Tu (200), and KatA (129). While there are many predicted DNA repair genes and pathways in the D. radiodurans genome (218), only a few of its DNA repair enzymatic activities and/or genes have been evaluated for their biochemical activities. The UvrA protein and its gene have been detected (1, 149), and it has been identified as a component of nucleotide excision repair. UV endonuclease-beta has been purified and found to be a 36-kDa manganese-requiring protein, which is thus far only known to recognize UV-induced pyrimidine cyclobutane dimers, incising them as an endonuclease rather than as a glycosylase (63–65). Other repair-related activities detected in extracts of D. radiodurans include uracil DNA glycosylase (132), a thymine glycol glycosylase, and a deoxyribophosphodiesterase (144). DNA polymerase I activity is present and is necessary for resistance to both UV and ionizing radiation (81). Both UvrA and DNA polymerase I deficiencies can be fully complemented by the expression of E. coli UvrA and DNA polymerase I proteins in D. radiodurans mutants, respectively (1, 81). However, this is not the case for D. radiodurans recA, which appears to play a more important role in the extreme radiation resistance phenotype.
The D. radiodurans RecA protein has been detected and its gene has been sequenced; it shows greater than 50% identity to the E. coli RecA protein (81). Mutants with mutations in this gene are highly sensitive to UV and ionizing radiation. Unlike UvrA and DNA polymerase I proteins, expression of E. coli RecA in D. radiodurans does not complement the RecA deficiency and appears to have no effect on D. radiodurans (82, 36). Expression of D. radiodurans RecA in E. coli has been reported to be lethal (36); however, recently it has been successfully expressed in E. coli with less toxicity (M. M. Cox and K. W. Minton, unpublished data), and it has been reported to complement E. coli RecA deficiency (150).
D. radiodurans RecA has recently been purified and characterized (M. M. Cox, unpublished data). In vitro, it has been shown to catalyze the spectrum of activities classically attributed to RecA proteins: (i) it forms striated filaments on single-stranded DNA and double-stranded DNA; (ii) it promotes an efficient DNA strand exchange reaction; and (iii) it has a DNA-dependent nucleoside triphosphatase activity. However, D. radiodurans RecA is distinct from other well-characterized RecAs (e.g., from the gram-negative E. coli) in its nucleoside triphosphatase and DNA strand exchange activities. Unlike E. coli RecA, D. radiodurans RecA does not hydrolyze ATP at pH 7.5, although it exhibits some ATPase activity at lower pHs. In contrast, it is very effective at hydrolyzing dATP over a broad pH range.
The existence of a very efficient recA-independent single-stranded DNA annealing repair pathway has been reported for D. radiodurans (50). This pathway is active during and immediately after DNA damage and before the onset of recA-dependent repair. It can repair about one-third of the 150 to 200 DSBs per chromosome following exposure to 1.75 megarads (50). It has also been reported that unlike other organisms, D. radiodurans RecA is not present in the undamaged deinococcal cell but is synthesized only following DNA damage and following repair. D. radiodurans RecA is apparently expressed in D. radiodurans only following extreme DNA damage (36), and it is noteworthy that the recA-defective D. radiodurans strain rec30 is more radiation resistant than E. coli (138). It is possible that the greater resistance of rec30 arises from the presence of multiple copies of its genome in combination with the single-stranded DNA-annealing repair pathway, which is fully functional in this mutant (50). Together, this evidence supports the idea that D. radiodurans RecA is not necessary for the repair of nonextreme DNA damage (~10 DSB/chromosome, ~100 kilorads) and that Dr RecA may be activated only when DNA is highly damaged (>100 kilorads) (M. J. Daly, unpublished data).
To further our understanding of the functions of individual genes and cellular systems in D. radiodurans as well as their relationship with other organisms, we undertook a detailed computational analysis of the D. radiodurans genome. In addition to the standard genome annotation procedure of The Institute for Genomic Research (218), we used several approaches for deeper protein characterization. In particular, we systematically applied sensitive profile-based methods that included PSI-BLAST, which constructs a position-dependent weight matrix from multiple alignments generated from the BLAST hits above a certain expectation value (e-value) and allows iterative database searches using the information derived from such a matrix (5, 6), IMPALA (175), which searches the matrix against profile databases, and SMART (179, 178), which uses a Hidden Markov Model algorithm (59) to search a sequence against a multiple-alignment database. In addition to the database of profiles included in the SMART system, two other profile collections were used: (i) 5,640 profiles derived from the structurally characterized domains contained in the SCOP database (100, 219), and (ii) 150 profiles for widespread domains primarily involved in different forms of signaling that were employed in previous genome comparisons (40, 163, 175).
Paralogous families of proteins encoded in the D. radiodurans genome were initially identified by comparing the complete set of D. radiodurans proteins to itself (after filtering for low-complexity regions with the SEG program ) using the PSI-BLAST program run for three iterations and clustering proteins by single linkage (clustering threshold e-value, 0.001) using the GROUPER program (214). One sequence from each cluster was used to generate a position-specific matrix by running an iterative PSI-BLAST search first against a D. radiodurans protein and then against the nonredundant protein database. These profiles were used to search for additional family members in the D. radiodurans proteome. Families that were recognized by the same profile were joined into superfamilies.
The phylogenetic affinities of D. radiodurans were explored using the COGNITOR program. This program assigns query proteins to conserved protein families that consist of apparent orthologs, termed clusters of orthologous groups (COGs) (201, 202). The functional assignments embedded in the COG database were also used to reconstruct metabolic pathways and other functional systems in D. radiodurans together with the KEGG (105) and WIT (157) databases.
Analysis of the phyletic distribution of homologs of Deinococcus proteins detected in database searches was performed using the TAX_COLLECTOR program of the SEALS package (214). This was followed by phylogenetic tree construction for specific cases. Multiple alignments for phylogenetic reconstruction were generated using the ClustalW program (93) and, when necessary, further adjusted on the basis of PSI-BLAST search outputs. Phylogenetic trees were constructed using the neighbor-joining methods with bootstrap replications as implemented in the NEIGHBOR program of the PHYLIP package (67).
Intergenic repeats were identified using the BLASTN program (6). As a result of this analysis, 2,007 D. radiodurans proteins were assigned to 1,272 COGs, which placed them into specific phylogenetic and functional contexts. In conjunction with profile analysis, this allowed us to define the domain architectures of multidomain proteins, to identify protein families that are unusually expanded in D. radiodurans, and to assign function and/or structure to a number of proteins previously described as hypothetical.
Below, we present an overview of the principal functional systems of D. radiodurans as determined by these analyses and describe unusual aspects of the genome that may be relevant to understanding the extreme resistance of this organism to radiation, desiccation, and other stress factors.
Analysis of the genome of D. radiodurans shows that it has a typical set of proteins for housekeeping and regulatory functions. As demonstrated by the COG analysis, the metabolic capabilities of D. radiodurans are similar to those of E. coli (152) but less diverse (Table (Table1);1); D. radiodurans is an obligatory heterotroph (212). Table Table11 lists and compares the standard metabolic pathways of Deinococcus to the corresponding pathways in E. coli, Synechocystis, Bacillus subtilis, and Mycobacterium tuberculosis.
Probably the most interesting feature of the systems for energy production in D. radiodurans is that, unlike most other free-living bacteria, it uses the vacuolar type of proton ATP synthase instead of the F1F0 type. Vacuolar (V)-type H+-ATPase is typical of eukaryotes and archaea; all archaea have a conserved operon that consists of eight genes encoding the ATPase subunits. This operon is partially conserved (with some of the subunits missing) in a minority of characterized bacteria, where it replaces the F1F0 ATPase, e.g., in Deinococcus, Thermus, spirochetes, chlamydiae, and Enterococcus. The scattered distribution of the V-ATPase operon among bacteria, in contrast to its conservation in archaea, suggests that this operon has been disseminated in the bacterial world by horizontal transfer. The genes for the standard five complexes of electron transport and oxidative phosphorylation are present in D. radiodurans, with a few exceptions, but some genes of the cytochrome bd quinol oxidase complex are missing. Given that this complex is active predominantly under low-oxygen conditions in other bacteria, its apparent loss in Deinococcus is consistent with D. radiodurans being strictly aerobic. Interestingly, D. radiodurans encodes a multisubunit Na+/H+ antiporter (DR0880 to DR0886) that is characteristic of thermophiles and a few other bacteria (B. subtilis and Rickettsia prowazekii), but is absent in E. coli, Synechosystis, and Mycobacterium. It has been shown that this system is necessary for cells to grow under alkaline conditions (95).
The D. radiodurans genome appears to encode functional pathways for glycolysis, gluconeogenesis, the pentose phosphate shunt, and the tricarboxylic acid (TCA) cycle. A few genes are missing, but these may not be essential since they are also absent in some bacteria that are functional in these pathways (Table (Table1).1). The D. radiodurans Entner-Doudoroff pathway may be disrupted since a key enzyme, 2-keto-3-deoxy-6-phosphogluconate aldolase (an ortholog of E. coli Eda), is missing. However, this enzyme is also absent in archaea, where the Entner-Doudoroff pathway appears to be functional, and therefore the enzyme could be displaced by a nonorthologous aldolase in Deinococcus. The glyoxalate bypass that has only been described for E. coli and M. tuberculosis is present and complete in Deinococcus. It remains unclear, however, why some intermediates of the TCA cycle cannot support the growth of D. radiodurans (212). As expected of a heterotroph, Deinococcus encodes several enzymes for complex carbohydrate metabolism; for some of these, e.g., glycogen-debranching enzymes (DR0405 and DR0191), phylogenetic analysis suggests that horizontal transfer from eukaryotes has occurred (data not shown). Other enzymes for sugar conversion, as well as most of the known sugar transport systems, are encoded in D. radiodurans, and this is consistent with the observation that a variety of different sugars can be used by this bacterium as carbon and energy sources (212).
D. radiodurans is unable to use ammonia as a nitrogen source despite the presence of apparently functional genes for glutamate ammonia ligase and carbamoyl-phosphate synthase, which are key enzymes for ammonia utilization. While there is currently no explanation for this, it has been shown that D. radiodurans can use amino acids effectively as a nitrogen source and that sulfur-containing amino acids appear to be the most readily utilized form of nitrogen. Notably, D. radiodurans lacks the standard pathways for cysteine and methionine biosynthesis yet is able to produce these amino acids using unidentified biosynthetic pathways when provided with other amino acids (212). The absence of all key enzymes for lysine biosynthesis is another puzzling feature of Deinococcus metabolism since it does not require lysine for growth (212). All of the other standard amino acid pathways appear to be functional. Although a few genes seem to be missing from these pathways, they are also absent in some of the other free-living bacteria, where they probably have been displaced by paralogous or nonhomologous enzymes. Some of the genes for enzymes of arginine metabolism are likely to have been acquired by the common ancestor of the Thermus-Deinococcus group from archaea (see Tables Tables1010 and and1111).
Most of the known genes for nucleotide metabolism are present in D. radiodurans. The most conspicuous gap is the absence of purine nucleoside phosphorylase, a key enzyme of purine salvage, which has been found in all free-living organisms investigated. Another noteworthy absence is that of two related enzymes of pyrimidine salvage, cytidine deaminase and dUTPase (important in preventing DNA damage), which are present in most bacteria. As may be the case for absent amino acid biosynthetic genes, there might also be unidentified enzymes that compensate for these pyrimidine salvage activities.
D. radiodurans lacks only one gene from the standard bacterial set of genes coding for enzymes of lipid metabolism, namely, phosphatidylglycerophosphate synthase, which is involved in the biosynthesis of acidic phospholipids. With the exception of the archaeon Methanococcus jannaschii, phosphatidylglycerophosphate synthase has been detected in all organisms with completely sequenced genomes. Its absence in Deinococcus, therefore, is unexpected. Deinococcus encodes multiple copies of several fatty acid biosynthesis genes, of which some could have been transferred horizontally into Deinococcus from distant taxa (Table (Table1).1). Consistent with the unusual structure of the peptidoglycan layer in Deinococcus (see above), we identified all essential genes for ornithine metabolism but did not detect several key enzymes for diaminopimelic acid biosynthesis.
Our experimental data show that Deinococcus is capable of de novo biosynthesis of all principal coenzyme components except for nicotinic acid (212). Consistent with this result, we find that genes for several key enzymes of NAD biosynthesis are missing in the genome, which is unusual since this pathway is present in most free-living organisms. Several other conventional pathways for coenzyme biosynthesis are also not complete (Table (Table1),1), but, given the ability of Deinococcus to grow in the absence of these coenzymes, it probably encodes functional analogs of these.
The translation apparatus is arguably the most highly conserved and uniform of cellular systems, and D. radiodurans is no exception. It contains a typical bacterial complement of translation machinery components. This general uniformity notwithstanding, there are several unique features in the translation apparatus of Deinococcus that have been revealed both experimentally and by genome analysis. In particular, Deinococcus has a unique repertoire of genes and reactions for the formation of glutaminyl-tRNA and asparaginyl-tRNA. Generally, there are two pathways for the activation of glutamine and asparagine: (i) direct charging of tRNAGln and tRNAAsn by glutaminyl- and asparaginyl-tRNA synthetase (Gln-RS and Asn-RS), respectively, and (ii) transamidation of Glu-tRNAGln and Asp-tRNAAsn by the respective amidotransferases (AdT), Glu-AdT and Asp-AdT (101). Usually, the two pathways and the corresponding genes are not present in the same organism. The transamidation pathway for glutamine is predominant in bacteria and archaea, whereas glutaminyl-tRNA synthetase is typical of eukaryotes and gamma proteobacteria (101). In the case of asparagine, archaea primarily use the transamidation pathway, eukaryotes use the direct pathway, and bacteria have a patchy distribution of both systems. Glu-AdT has been studied in detail; it consists of three subunits encoded by the gatABC genes (45). The nature of Asp-AdT is less clear; it has been suggested that it shares A and C subunits with Glu-AdT whereas the B subunit (the likely determinant of tRNA binding) is unique. D. radiodurans encodes Asn-RS, Gln-RS, and the GatABC proteins (45). A recent genome survey has shown that the two systems also coexist in several members of the proteobacteria (85), but Deinococcus is the only nonproteobacterial species with this combination of asparagine and glutamine activation systems. Furthermore, in addition to the intact GatB, Deinococcus encodes a C-terminal domain of this protein that is fused to Gln-RS (Fig. (Fig.1).1). The GatABC complex of D. radiodurans is capable of catalyzing the formation of both Gln-tRNAGln and Asn-tRNAAsn, but in vivo apparently only Asn-tRNAAsn is formed, since the discriminating Glu-RS of Deinococcus does not produce the mischarged Glu-tRNAGln (45). In contrast, Deinococcus encodes two copies of Asp-RS, a typical bacterial discriminating copy and nondiscriminating copy that probably was acquired from the archaea by horizontal gene transfer (45) (see below). The nondiscriminating Asp-RS produces Asp-tRNAAsn, which serves as the substrate for the GatABC enzyme. It has been suggested that the main role of the Asn-tRNAAsn formation in Deinococcus is the synthesis of asparagine, rather than its incorporation into proteins, since Deinococcus does not encode orthologs of known asparagine synthetases (45). Given that GatB is thought to be the tRNA-binding component of Glu-AdT and Asp-AdT, the C-terminal GatB-related domain in Deinococcus Gln-RS could enhance the specificity of this enzyme for tRNAGln. This domain is missing in other Gln-RSs, but the respective organisms do not encode GatB, which in Deinococcus could compete with Gln-RS for binding tRNAGln.
The repertoire of aminoacyl-tRNA synthetases (aminoacyl-RSs) in Deinococcus also shows several other peculiarities. In addition to the corresponding functional enzymes, Deinococcus encodes truncated and apparently inactive forms of Glu-RS and Ala-RS, as well as apparently active paralogs of Trp-RS and His-RS. Possible horizontal transfer of these additional enzymes as well as other aminoacyl-RSs from archaea and thermophilic bacteria could be readily examined once more of these organisms are sequenced.
D. radiodurans contains all the typical bacterial genes that comprise the basal DNA replication machinery (Table (Table2).2). The number of paralogs and the domain organization of the DNA polymerase III α-subunit is variable in the major bacterial divisions in terms of the presence of an active or inactivated PHP domain, which is predicted to possess phosphatase activity, and the proofreading 3′-5′ exonuclease domain. D. radiodurans encodes a single α-subunit that is most similar to proteobacterial polymerases and does not contain the 3′-5′ exonuclease, which is encoded by a separate gene orthologous to E. coli dnaQ. Unlike the proteobacterial orthologs, however, the Deinococcus polymerase contains an apparently active PHP domain. This appears to represent the ancestral bacterial state of the replicative DNA polymerase, which is also seen in bacteria like Synechocystis and Aquifex. In addition to typical proteins involved in replication, Deinococcus encodes DNA polymerase X, which is similar to the eukaryotic DNA polymerase beta (references 27 and 217 and references therein), and is relatively uncommon in prokaryotes. Deinococcus polymerase X contains an N-terminal nucleotidyltransferase domain and a C-terminal PHP hydrolase domain, the same domain architecture that is seen in homologs from B. subtilis and Methanobacterium thermoautotrophicum; this conservation of domain organization suggests horizontal transfer of the polymerase X gene (13). Notably, along with a few other bacteria, such as Synechocystis and Aquifex, Deinococcus encodes three small nucleotidyltransferases (DR1806, DR0679, and DR0248), which are expanded in archaea (13). These “minimal” nucleotidyltransferases are typically accompanied by a small protein that is fused to the nucleotidyltransferase in the DR0248 protein; the function of this protein, however, has not been characterized directly but is likely to be coupled to that of the nucleotidyltransferases.
The repertoire of DNA-associated proteins in Deinococcus is similar to that in other bacteria, but some unique features were noticed. Like other bacteria, Deinococcus encodes an ortholog of the chromosomal DNA-binding protein HU, which is believed to play a central role in DNA packaging and also as a cofactor in recombination (reference 184 and references therein). Interestingly, the sequenced genome of the Deinococcus R1 strain contains three adjacent open reading frames (ORFs) encoding fragments of the single-stranded DNA-binding protein (SSB) but lacks a complete gene for SSB; so far, all sequenced bacterial genomes encoded an intact SSB. Because of the 10-fold coverage during the TIGR sequencing project (218), two sequencing errors in this short gene would seem unlikely. Two explanations arise: (i) Deinococcus could encode an as yet unrecognized SSB analog (or an extremely diverged homolog), making the SSB gene expendable; or (ii) a tripartite SSB gene could be expressed by a translational readthrough mechanism or even a unique RNA-editing mechanism.
Bacterial DNA repair includes several partially redundant pathways and generally shows considerable flexibility (20, 60, 70). We investigated the predicted repair system components of D. radiodurans in detail, to detect any possible correlation with its exceptional radioresistant and desiccation-resistant phenotype. Generally, it appears that Deinococcus possesses a typical bacterial system for DNA repair and that, commensurate with the genome size, its repair pathways even appear to be less complex and diverse than those of bacteria with larger genomes, such as E. coli and B. subtilis. At the same time, there are several interesting and unusual aspects of the predicted layout of the repair systems in Deinococcus that may be linked to its phenotype (Table (Table22).
The nucleotide excision repair system that consists of the UvrABC excinuclease and the UvrD and Mfd (transcription-repair coupling factor) helicases is fully represented in D. radiodurans. Also present are the main components of the base excision repair system including several nucleotide glycosylases and endonucleases, namely, MutM (formamidopyrimidine and 8-oxoguanine DNA glycosylase); MutY (8-oxoguanine DNA glycosylase and apurinic DNA endonuclease-lyase); two paralogous uracil DNA glycosylases (Ung homologs); an additional, recently identified enzyme that has the same activity but is unrelated to Ung (DR1751) (174); endonucleases III (Nth) and V (YjaF); and exonuclease III (XthA). Deinococcus lacks two key enzymes involved in the repair of UV-damaged DNA in other organisms, namely, endonuclease IV (AP-endonuclease) and photo-lyase. Instead, it encodes a typical bacterial UV endonuclease III (thymine glycol-DNA glycosylase) and, more unexpectedly, a TIM-barrel fold nuclease characteristic of eukaryotes and most closely related to the UV endonuclease of Neurospora (20, 223). Eukaryotic-type topoisomerase IB is a truly unexpected protein to be identified in the Deinococcus genome and also could play a role in UV resistance (see “Horizontal gene transfer” below).
The repertoire of recombinational repair genes in Deinococcus includes orthologs of most of the E. coli genes involved in this process (Table (Table2),2), but the RecBCD recombinase is missing. While this complex is not universal in bacteria, it is a major component of recombination systems in most free-living species. In Deinococcus, where recombination is thought to be an important contributor to damage-resistance, the absence of this ATP-dependent exonuclease is unexpected. Deinococcus does encode an apparent ortholog of one of the helicase-related subunits of this complex, RecD, but not the other subunits. The RecD protein in Deinococcus is unusual in that it contains an N-terminal region of about 200 amino acid residues that consist of three tandem predicted HhH DNA-binding domains; this unusual domain organization of the RecD protein is shared with B. subtilis and Chlamydia. Such dissociation of RecD from the RecB and RecC subunits is not unique to Deinococcus; “solo” RecD-related proteins are also present in M. jannaschii and in yeast. The function(s) of RecD, once outside the recombinase complex, is unknown.
Another component of the recombinational repair system in Deinococcus that has an unusual domain architecture is the RecQ helicase. It contains three tandem copies of the C-terminal helicase-RNase D (HRD) domain, instead of the single copy present in all other bacteria except Neisseria that similarly possesses three copies (141) (also see below). RecQ sequences from Neisseria and Deinococcus are more similar to each other than to any other homologs, which, together with the distinctive triplication of the HRD domain, indicates that the recQ gene has been exchanged between bacteria from these two distant lineages. In addition, Deinococcus encodes a protein (DR2444) that contains an HRD domain and a domain homologous to cystathionine gamma-lyase; this is the first example of an HRD domain that is not associated with either a helicase or a nuclease (although it is possible that the domain organization of this protein is an artifact caused by a frameshift). This propagation of the HRD domain in Deinococcus could contribute to the repair phenotype given the interactions of RecQ with RecA in recombination (88).
The methylation-dependent mismatch repair system of D. radiodurans includes the MutS and MutL ATPases and endonuclease VII (XseA). Orthologs of the site-specific methylases Dcm and Dam, which are associated with mismatch repair, are not readily detectable. It appears likely, however, that other distantly related DNA methylases predicted in D. radiodurans could perform similar functions.
Like other bacteria with large genomes, D. radiodurans encodes the LexA repressor-autoprotease (DRA0344), which in E. coli and B. subtilis controls the expression of the SOS regulon. In addition, unlike any of the other bacterial genomes studied, D. radiodurans encodes a second, diverged copy of LexA (DRA0074), which retains the same arrangement of the helix-turn-helix (HTH) DNA-binding domain and the autoprotease domain. Attempts to identify LexA-binding sites and the composition of the putative SOS regulon in D. radiodurans have been unsuccessful (M. S. Gelfand, personal communication). This suggests that D. radiodurans does not possess a functional SOS response system, which is in agreement with the results of previous experimental studies (142). Furthermore, Deinococcus does not encode proteins of the DinP/UmuC family, nonprocessive DNA polymerases that play a critical role in translesion DNA synthesis and associated error-prone repair such as SOS repair in E. coli (117).
In addition to orthologs of well-characterized repair proteins discussed in this section, Deinococcus encodes several unusual proteins and expanded protein families that are less confidently associated with repair but might contribute to the unusual effectiveness of the repair and recombination systems in this bacterium; these proteins are discussed below in the section on the unique features of the Deinococcus proteome.
D. radiodurans encodes a broad spectrum of proteins that have been associated with various forms of stress response in other bacteria as well as several proteins that appear to be unique and could contribute to more specific forms of the stress response (Table (Table3).3). Orthologs of almost all known genes involved in different stress responses in other bacteria (109) are present in Deinococcus. The few stress response proteins that are missing are either specific to the adaptation of a particular organism to its environment or, when of more general significance, likely to be replaced by nonorthologous proteins with similar functions. For example, instead of using the OtsA and OtsB proteins for the synthesis of the osmoprotection disaccharide trehalose, Deinococcus probably uses an alternative pathway via trehalose synthase (DR0933), which has been recently characterized in Thermus (209). Trehalose plays a major role in the desiccation resistance of E. coli (216) and is also likely to be important in Deinococcus. Deinococcus has two additional genes for trehalose metabolism: maltooligosyl trehalose synthase (DR0463), which provides yet another route of trehalose formation, and trehalohydrolase (DR0464). These genes apparently form a mobile operon and probably have been acquired by Deinococcus through horizontal transfer, since their closest homologs are found in Rhizobium, where they appear to have the same operon organization (130).
Among the proteins associated with oxidative stress response, Deinococcus encodes three catalases (DR1998, DRA0259, and DRA0146), two of which are highly similar to one another and to catalases from other bacteria whereas the third is only distantly related to other catalases. The gene for this unusual predicted catalase (DRA0146) is closely linked to and probably forms an operon with a gene for a peroxidase (DRA0145). DRA0146 is most similar to its ortholog from Rhizobium, and these two proteins are, in turn, more closely related to eukaryotic catalases from plants than to bacterial catalases. This suggests that Deinococcus acquired the gene for this catalase from a nitrogen-fixing bacterium, which, in turn, had hijacked it from a plant. In contrast, DRA0145 is distinctly closer to certain peroxidases from fungi, such as Galactomyces geotrichum, than to bacterial forms from Neisseria, E. coli, and actinomycetes. Thus, the entire operon probably has been acquired horizontally. A broad spectrum of other genes that may be involved in the stress response include DRA0149 (agmatinase), DR1353 (an acid-inducible apolipoprotein amino-acetyltransferase), and DR2299, DR1605, and DR2245 (genes of the two-component response and cyclic diguanylate signaling system), which again are very similar to homologs from the family Rhizobiaceae, suggesting significant horizontal gene transfer between these distant bacteria.
In addition to the well-characterized components of stress response systems, Deinococcus encodes several proteins and entire protein families whose specific roles are unknown but are likely to be important for the multiple stress resistance phenotypes of the bacterium. An example of a poorly studied but potentially important system is the “addiction module” response (2), which is encoded by two genes, mazE and mazF (DR0416 and DR0417, respectively). MazF is a stable protein that is toxic to bacteria, whereas MazE protects cells from the toxic effect of MazF and is degraded by the ClpP serine protease. Expression of these two genes is regulated by ppGpp, which is produced by the RelA enzyme (or the bifunctional enzyme SpoT) in response to amino acid starvation. On the basis of these studies, Aizenman et al. (2) have proposed a model of programmed bacterial cell death dependent on the MazEF proteins. Currently, Deinococcus is the only bacterium other than E. coli, the model system in which the role of these proteins was elucidated, that has both genes and retains their operon organization. Another example of poorly characterized genes that are likely to be involved in stress response are two proteins (DR2056 and DR1940) that are homologous to the E. coli heat shock protein HslJ (42). One of these proteins, DR1940, contains three copies of the HslJ domain, a feature that has not yet been seen in this protein family. All the HslJ domains contain two conserved cysteines that could function as a redox pair, with the protein itself being a disulfide bond chaperone. The only prominent chaperone that is missing without an obvious replacement is HSP90, but this gene is also absent in archaea and bacterial thermophiles and therefore appears to be nonessential.
The signal transduction system of D. radiodurans has chimeric features of prokaryotic and eukaryotic systems. This form of chimerism in the signaling system is becoming increasingly evident in several bacterial lineages such as actinomycetes, myxobacteria, and spore-forming firmicutes that undergo cellular differentiation. The typically bacterial components of the signaling system include the two-component systems with the histidine kinase and receiver domains (159) and the cyclic diguanylate signaling system with the GGDEF, EAL, and HD_GYP domains, which appear to function as cyclases and phosphodiesterases (75). In addition, these signaling domains are typically combined with small molecule and protein-binding domains, such as PAS and GAF (17, 203), and the conformation-signaling HAMP domain (16). The two-component phosphorelay system is well developed in Deinococcus, which encodes 23 histidine kinase domains and 29 receiver domains that form several combinations with the GAF and PAS domains. This system is expected to play a major role in sensing redox, light, and other environmental stimuli. Consistent with this, DRA0050, which is orthologous to the cyanobacterial and plant phytochromes, has been shown to be a photoreceptor involved in the regulation of pigment biosynthesis (55), which is likely to affect resistance to DNA-damaging agents (35). Genes encoding two proteins that consist of a sensory transduction histidine kinase and a receiver domain (DRB0028 and DRB0029) appear to be coregulated with an sB operon (DRB0024 to DRB0027). This operon encodes the antisigma factor-regulatory system and is known to be involved in stress response in other bacteria (92, 109). As a whole, this array of six genes appears to comprise a stress response module unique for Deinococcus.
Deinococcus encodes 16 GGDEF domain-containing proteins, which suggests a major role for this uniquely bacterial module that is predicted to function as a cyclase in diguanylate signaling. The two predicted distinct phosphodiesterases of this system, the HD-GYP and EAL domains (six and four copies, respectively, in Deinococcus), complement each other in terms of their copy numbers, as has been observed for other bacterial genomes. These domains tend to combine with the stimulus-sensing PAS and GAF domains. One such interesting architecture is the combination of the GAF domain and the HD_GYP domain in two Deinococcus proteins (Fig. (Fig.2).2). The representation of this signaling system in Deinococcus is comparable to that in other bacteria with moderate-sized to large genomes.
While Deinococcus lacks flagella and is unlikely to be capable of chemotactic motility, it possesses certain remnants of the chemotactic signaling system that are likely to signal through alternative pathways. In particular, there are three methyl-accepting chemotactic receptor proteins (DRA0352, DRA0353, and DRA0354), each containing two HAMP domains, but there is no methyltransferase of the chemotactic signaling pathway. These three proteins are encoded by genes located in the vicinity of genes for a CheA-like histidine kinase and a CheY-like receiver domain, which suggests that the methyl-accepting receptor forms a single functional unit with this two-component system protein. Given the apparent absence of chemotaxis, the methyl-accepting receptors could form a scaffold for binding of the CheA kinase, which might signal the availability of amino acids in the environment.
The tetratricopeptide repeats (TPR) seem to play a special role in Deinococcus signaling. In three distinct proteins, these repeats are combined with typically bacterial signaling modules (Fig. (Fig.2).2). The TPR modules are likely to mediate protein-protein interactions within molecular complexes involving these proteins, as documented in eukaryotic systems (113). WD40 proteins, which often serve as interaction partners to TPR in eukaryotes (210), are also expanded in Deinococcus and could cooperate with the TPR-containing proteins. Of particular interest is another group of at least four β-propeller proteins that appear to be closer to the YWTD class of propellers than to WD40s (DR0960, DR1725, DR2062, and DR2484). In actinomycetes, these propeller domains are fused to protein kinases and are likely to perform specific protein-protein interaction functions in signaling (163).
The prominence of the “eukaryotic” component of the signal transduction systems in Deinococcus is underscored by the fact that it encodes 11 Pkn2-type kinases and 1 kinase of the RIO1 family (DR2209), which is typical of archaea and eukaryotes (121) and was detected in bacteria for the first time. This number is greater than in most other prokaryotes (121), suggesting that protein-serine/threonine phosphorylation-dependent regulatory pathways play a major role in Deinococcus. Consistent with this, Deinococcus also encodes PP2C phosphatases and a FHA domain that typically function in conjunction with the serine/threonine kinases.
Several protein families that have been implicated in stress response and signal transduction in other organisms have undergone specific expansion in Deinococcus; these are discussed in some detail below.
Generally, the genome organization of D. radiodurans is similar to that of other bacteria (218). Many functionally related genes are organized into clusters that are likely to comprise operons, including such common ones as ribosomal protein genes, ATP synthase, NADH dehydrogenase, and various ATP-binding cassette (ABC)-type transport systems. Beyond these generic operons, however, several unusual gene clusters were detected, and some of these are likely to be related to the unique features of Deinococcus (Table (Table4).4).
The first group of such unique gene arrays includes paralogous genes that encode protein families overrepresented in Deinococcus, such as amino-acetyltransferases, Nudix hydrolases, and genes of the TerE and DinB/YfiT families (see below). Some of these clusters appear to have evolved by tandem duplication within the Deinococcus lineage, e.g., an acetyltransferase cluster (DR2254 and DR2255) and a Nudix cluster (DR0783 and DR0784). Other clusters of paralogs clearly resulted from a single horizontal transfer event, e.g., the group of tellurium resistance genes (DR2220 to DR2226) that are related to the corresponding gene cluster on the broad-host-range plasmid R478. Finally, some clusters that consist of related genes with apparent phylogenetic affinities to different bacterial lineages (e.g., an acetyltransferase cluster [DR0675 to DR0677]) seem to have originated within the Deinococcus lineage through gene translocation. The second group of unusual predicted operons includes rare gene clusters that probably were acquired by horizontal transfer. Some of these operons could contribute to damage resistance, e.g., DNA repair-related functions (deoxypurine kinase operon [DR0298 and DR0299], eukaryotic-type uracil-DNA-glycosylase and topoisomerase IB [DR0689 and DR0690]), DNA transformation-related functions (competence genes [DR1854 and DR1855], restriction-modification system [DRB0143 and DRB0144]), stress response (DR0389 and DR0390; DR1160 and DR1161), and pigment biosynthesis (DR0861 and DR0862).
Two operons (DR0853 to DR0854 and DR2180 to DR2181) each consist of a gene for a small GTPase of the Ras/Rab family and a gene coding for a small protein of an uncharacterized family that is widespread in bacteria and archaea (L. Aravind and E. V. Koonin, unpublished data). The orthologous GTPase in Myxococcus is important for gliding motility (90), suggesting a role for these proteins in signaling. Expansion of the uncharacterized protein family encoded by the genes adjacent to the GTPase is seen in Streptomyces and Deinococcus and appears to result from relatively recent duplications (DR0616, DR0995, and DR1612), with three of these genes forming a cluster in the chromosome (DR0993 to DR0995). Juxtaposition of these genes with genes for Ras/Rab-GTPases is frequently observed in other genomes, including Myxococcus and archaeal and bacterial thermophiles, suggesting that they form a mobile operon, with the encoded proteins being functionally coupled.
Another predicted operon (DR0332 to DR0335) that could have been horizontally transferred from cyanobacteria encodes components of a protein kinase-dependent regulatory pathway. These include two active Pkn2-type serine/threonine protein kinase with Zn ribbons, a PP2C-type phosphatase with an N-terminally disrupted Pkn2 kinase domain, and a protein that contains a phosphoserine-binding FHA domain combined with a Zn ribbon domain orthologous to proteins from cyanobacteria (FraH) and actinomycetes (121). The phosphorylation system encoded by this operon may play a role in cellular differentiation, with the Zn-ribbon-FHA protein functioning as the downstream effector that regulates transcription.
The general picture of transcription regulation in Deinococcus emerging from genome analysis is similar to that seen in other bacteria. Among Deinococcus gene products, we detected 104 HTH domain-containing proteins that are predicted to function as transcriptional regulators. This number is close to those detected in other free-living bacteria with similar genome sizes (14); the repertoire of HTH-containing proteins identified in Deinococcus covers most of the diversity of prokaryotic transcriptional regulators. Deinococcus encodes seven members of the MerR/SoxR family of regulators (a greater number than in other characterized bacteria except B. subtilis), which could participate in the regulation of various stress response pathways (24, 155). Another family of predicted HTH regulators of unknown specificity that is expanded in Deinococcus consists of eight paralogs (e.g., DR1954); such an expansion is unprecedented in other bacteria and suggests a unique role in the regulation of a distinct set of genes.
Expansion of specific protein families has been observed for several complete genomes (43, 126, 194). Sometimes there is a clear relationship between the expansion of a particular protein family and the adaptation of the respective organism to its environment. Examples of such adaptive expansions include ferredoxins in autotrophic archaea (126), several families of enzymes involved in lipid degradation in M. tuberculosis (43), and c-type cytochromes in the metal-reducing bacteria Shewanella (148).
In the D. radiodurans genome, we detected several expansions, some of which appear to be related to stress response and damage control (Fig. (Fig.3).3). In particular, several different families of hydrolases are overrepresented compared to other sequenced genomes. These include MutT-like pyrophosphatases (Nudix), calcineurin-like phosphoesterases, lipase/epoxidase-like (α/β) hydrolases, subtilisin-like proteases, and sugar deacetylases. In addition to such specifically expanded families, several other families of hydrolases are present in Deinococcus in elevated numbers although they are also common in other bacteria and are not shown here. Some of these hydrolases are likely to be involved in the decomposition of damage products (“cell cleaning”) under stress conditions. Independent expansions of certain families, such as α/β hydrolases in Deinococcus and Mycobacterium and subtilisin-like proteases in Deinococcus and Bacillus, are noteworthy and probably correlate with the adaptation of these organisms to the facultative or obligatory heterotrophic life-style (43, 116).
Expansion of the Nudix hydrolase protein superfamily is one of the most prominent features of the Deinococcus genome. The MutT protein, the prototype for this superfamily, has been identified as the central component of an antimutagenic system responsible for preventing incorporation of 8-oxo-dGTP into DNA (136). Subsequently, it has been shown that different MutT-like enzymes use a variety of substrates, and the Nudix pyrophosphohydrolases have been tentatively defined as a superfamily of “house-cleaning” enzymes that destroy potentially deleterious compounds (28). A detailed analysis of Nudix proteins in Deinococcus revealed five distinct multidomain proteins, in which the MutT domain is combined with other domains (Fig. (Fig.4).4). Orthologous proteins for three of them also exist in other bacteria. In particular, the family typified by E. coli YjaD contains a Zn ribbon module, which is probably involved in nucleic acid binding. Another Deinococcus protein contains an apparently inactivated (with the catalytic motif REXXEE missing) MutT domain combined with a TagD-like nucleotidyltransferase domain and is likely to perform a regulatory function. A second TagD-like nucleotidyltransferase from Deinococcus (DRA0273) is very similar, but the MutT domain has apparently eroded beyond recognition. Orthologs of a third Nudix protein, which contains an uncharacterized C-terminal domain, are present in Streptomyces, Mycobacterium, and Synechocystis. Again, in most of them, the Nudix pyrophosphohydrolase appears to be inactivated, suggesting a regulatory function.
Two closely related Deinococcus proteins contain a duplication of the MutT domain that has not yet been detected in any other organism. Three more Nudix proteins are specifically related to the proteins containing this duplication, and the genes for two of these are adjacent on the chromosome (DR0783 and DR0784). These seven related MutT domains appear to form a Deinococcus-specific family of Nudix hydrolases. Another Nudix protein consists of three domains, namely, S-adenosylmethionine (SAM)-dependent methylase, MutT, and cytosine deaminase (Fig. (Fig.4).4). This domain combination is unique to Deinococcus and suggests that the protein is involved in an as a yet uncharacterized repair pathway.
Altogether, Deinococcus encodes 23 Nudix superfamily proteins that contain 25 individual MutT domains. Some of these proteins are likely to be repair enzymes with known activities, including the MutT ortholog (DR0261), while others will have novel functions, as suggested by the domain combinations discussed above. Other functions are likely to include utilization of damage products formed under various stress conditions. It is unlikely that a distant ancestor of the Deinococcus lineage encoded all these MutT-containing proteins. Rather, it appears that the heterogeneous collection of these proteins encoded by D. radiodurans was assembled via the mixed routes of serial duplication, particularly in the distinct deinococcal family of seven Nudix domains, and horizontal gene transfer.
Amino group acetyltransferases comprise another family that appears to have undergone independent expansion in Deinococcus and in Bacillus. Acetyltransferases of this type participate in various metabolic pathways, including lipid biosynthesis, and in regulatory systems. Except for B. subtilis, other bacteria have less than half the number of these enzymes with respect to the number found in D. radiodurans. Like the acetylases in other bacteria, these enzymes are likely to participate in detoxification of antibitotics and possibly of toxic products that arise upon DNA damage, as well as in regulatory protein acetylation. A Deinococcus-specific family of acetyltransferases, which consists of at least 11 proteins, is most similar to acetyltransferases involved in peptide antibiotic resistance, such as streptothricin acetyltransferase of Streptomyces (98). These acetyltransferases might aid the survival of Deinococcus in the presence of peptide antibiotics secreted by other bacteria, with which it has to compete for nitrogen and carbon sources as a part of its heterotrophic life-style.
Enzymes of the α/β hydrolase superfamily are mainly neutral lipases or acetyl esterases, but some of them have unusual substrate specificity, e.g., heroin esterase from Rhodococcus (169) and antibiotic bialaphos acetyl esterase from Streptomyces (167); other proteins of this superfamily possess unexpected activities, e.g., metal ion-free oxidoreductase from Streptomyces (91). The expanded families of α/β hydrolases in Deinococcus could be exploited for xenobiotic metabolism and/or the biogenesis of the complex cell envelopes (see above).
In several cases, expansion of specific subfamilies within common protein families appears to be important. Deinococcus encodes three paralogous proteins (DR0202, DR0494, and DR2273) related to the FlaR protein from gram-positive bacteria. One of these proteins has been shown to affect DNA topology and is osmoregulated when expressed in E. coli (173). It also influences the expression of supercoiling-sensitive promoters and is considered to be a chromatin-associated protein (173). Topological changes of DNA could play a role in DNA repair of Deinococcus, and the FlaR homologs might be involved in these processes. The FlaR subfamily belongs to the P-loop-containing kinase superfamily that includes nucleotide, gluconate, and shikimate kinases (224). Deinococcus encodes three paralogous proteins (DR0609, DR2467, and DR2139) that belong to another uncharacterized subfamily of these kinases which is also represented in several other bacteria.
Another interesting case is the LigT protein family, which is found in several bacteria, archaea, and eukaryotes and includes RNA ligases and predicted 2′,5′-cyclic nucleotide phosphodiesterases. In addition to the LigT ortholog (DR2339), Deinococcus encodes two predicted phosphodiesterases of this family (DR1000 and DR1814) that may participate in RNA metabolism or signaling.
Expansion of several other protein families is consistent with the unusual stress resistance capabilities of D. radiodurans. For example, Deinococcus encodes seven small nuclease domains related to the McrA endonuclease of E. coli (94). The McrA-like nuclease domain is part of three multidomain protein architectures that seem to be unique to Deinococcus (see below). This previously unreported propagation of McrA-like nucleases could make a contribution to the repair potential of Deinococcus. In evolutionary terms, the McrA domain, like the MutT domain, apparently has been expanded in Deinococcus through a recent duplication (DR1312 and DR2483 are 50% identical), as well as through acquisition of genes by horizontal gene transfer.
Expansion of proteins of the TerDEXZ/CABP family in Deinococcus is interesting because some of these proteins could confer resistance to a variety of DNA-damaging agents, including heavy-metal cations, methyl methanesulfonate, mitomycin C and UV (21, 103), and other forms of stress (11). Two members of this family, CABP1 and CABP2, are expressed during starvation in Dictyostelium and form a heterodimer that binds cyclic AMP (cAMP) (78), suggesting that other members of the family also bind various small-molecule ligands.
Deinococcus encodes the largest number of the pathogenesis-related 1 (PR1) family proteins (five members) among bacteria. These secreted proteins are widespread in eukaryotes but sporadic in bacteria (195); unlike the eukaryotic members of this family, the bacterial PR1-related proteins lack the disulfide bond-forming cysteines (68). Since they are predicted to be secreted, the bacterial PR1 family proteins might play a role in inhibiting extracellular enzymes or in interacting with other cells, as suggested by the known activities of their eukaryotic homologs (106).
The second largest protein expansion in Deinococcus is the family of uncharacterized small proteins whose prototype is B. subtilis DinB, a DNA damage-inducible gene product (39). Among bacteria, Deinococcus encodes the greatest number of these proteins, although comparable independent expansions are seen in B. subtilis and the actinomycetes (Fig. (Fig.3).3). Examination of the multiple alignment of this family (Fig. (Fig.5)5) reveals three conserved histidines that could form a catalytic triad of a novel metal-dependent enzyme, perhaps a hydrolase. The prediction of enzymatic activity of these proteins raises the possibility that they could be nucleases directly involved in DNA degradation, which begins in Deinococcus immediately after DNA damage (23, 211). This protein family may be particularly amenable to experimental studies, given its expansion in B. subtilis, a model for many DNA repair studies.
Several families of Deinococcus proteins are highly diverged and, in the initial analysis, appeared to have no homologs in other species. Database searches with individual sequences of these proteins failed to show statistically significant similarity to any proteins other than their paralogs from Deinococcus. Only profiles that included information on all of the paralogs (see the description of methods above) allowed the identification of homologs from other organisms. An example of such a family is a distinct group of six HTH-containing DNA-binding proteins predicted to function as transcriptional regulators. This family is of particular interest because at least one of its members (DR0171, or IrrI ) appears to be associated with radiation resistance (22).
About 720 proteins encoded in the D. radiodurans genome have no detectable homologs in the current databases. Most of these are predicted membrane or nonglobular proteins that tend to evolve rapidly, and this impedes the detection of sequence similarity. Nevertheless, we identified 26 families with at least two members each that appear to be Deinococcus specific (Table (Table5).5). Some of these families have conserved sequence and structural features that, in spite of the absence of significant overall similarity to any other proteins, are reminiscent of well-characterized domains. For example, the DR2457-like and DR2241-like families contain pairs of conserved cysteines that resemble Zn ribbons present in different enzymes and nucleic acid-binding proteins. Several other uncharacterized globular proteins in Deinococcus (e.g., DR1088 and DR1486) also contain such cysteine pairs, which suggests metal binding or perhaps nucleic acid binding. An intriguing possibility is that, similarly to better-characterized families, these unique protein families have emerged as a result of adaptation and may be involved in novel mechanisms of DNA repair or stress response specific to Deinococcus.
Combining several domains into one protein may give rise to novel protein functions, enhance the cooperation between existing functionally linked protein activities, facilitate regulation, and/or result in modification of substrate specificity (74, 128, 192). Thus, it seems reasonable to assume that, like expansion of paralogous families, unique domain architectures are lineage-specific adaptations to a particular life-style. The D. radiodurans genome encodes over 20 multidomain proteins with unusual domain combinations that have not been detected in other species (Fig. (Fig.1).1). The two phenomena appear to be linked since there are several examples where the unusual domain architectures are present in members of expanded protein families. The unique combination of the Nudix hydrolase domain with a methyltransferase and a cytosine deaminase has already been described. The McrA-like nuclease domain is a part of three unusual domain arrangements, at least two of which are suggestive of repair functions (Fig. (Fig.1).1). A particularly good example of such a functionally interpretable association is the DRA0131 protein, where the endonuclease domain is combined with a RAD25-like helicase, which in eukaryotes is involved in nucleotide excision repair of UV-damaged DNA (84, 196). In DR1533, an McrA-like endonuclease is linked to a SAD domain, which so far has been detected only in eukaryotic chromatin-associated proteins (71). A third protein, DRA0057, also contains a domain of the TerDEXZ/CABP family (see above) and is likely to be related to stress response.
One of the Deinococcus α/β hydrolases is fused to a flavin-containing monooxygenase domain, also a unique domain configuration (Fig. (Fig.1).1). The well-established role of flavin-containing monooxygenases in xenobiotic transformation and oxygen reactivity (177) strengthens the hypothesis that the two domains function together in the metabolism of some environmental compound or secondary metabolite. There are other such fusions that point to potential novel metabolic functions. For example, the DRA0304 protein contains a metallo-β-lactamase-like domain fused to a C-terminal rhodanese-like domain. Proteins that consist of a single rhodanese-like domain are involved in different forms of stress response. For example, E. coli PspE (phage shock protein E) is induced in response to heat, ethanol, osmotic shock, and phage infection; din1 and sen1 proteins from plants are dark inducible and senescence associated, and the 67B2 protein of Drosophila is also heat shock inducible (96, 109). Proteins with the same domain composition but with the order of the domains reversed are encoded in the gas vesicle plasmid of the archaeon Halobacterium halobium (154) (GenBank ID number [GI], 2822321 and 2822327), which suggests that the hydrolase domain and rhodanese-like domain cooperate in their chaperone or metabolic functions. Another unique domain fusion (DR1207) with a possible role in the metabolism of some amino group-containing compounds includes a cytosine deaminase domain and a PP-loop ATPase similar to the cell cycle protein MesJ.
DRB0098 contains a phosphatase domain and a polynucleotide kinase domain and is another example of an independent origin of a multidomain proteine with analogous domain architectures in distant taxa. Proteins combining these (predicted) enzymatic activities have been found only in Deinococcus, bacteriophage T4, and some eukaryotes, including humans, Caenorhabditis elegans, and Schizosaccharomyces pombe. The phosphatase domain of the phage T4 and eukaryotic proteins belongs to the haloacid dehalogenase superfamily (12, 110), whereas the one from Deinococcus belongs to the HD hydrolase superfamily (15). By analogy to the eukaryotic proteins that function in DNA repair following ionizing radiation and oxidative damage (102), the deinococcal enzyme may be implicated in a similar process.
Most of the HTH-containing proteins predicted to function as transcriptional regulators in Deinococcus share the domain architecture with their bacterial and archaeal homologs. However, one of these proteins, DR2199, has an unusual combination of domains (Fig. (Fig.1).1). In addition to a C-terminal HTH domain, this protein contains (i) a distinct N-terminal domain homologous to the eukaryotic developmental regulator schlafen (180) and to several uncharacterized bacterial and archaeal proteins and (ii) another, uncharacterized domain shared with several bacterial and archaeal proteins. The unusual domain architecture of DR2199 is conserved in two proteins from the archaeon Pyrococcus abyssi (GI, 5459605 and 5458925).
Numerous recent observations support the notion that horizontal gene transfer has played a major role in the evolution of bacteria and archaea (18, 57, 153). Deinococcus is no exception to this trend since it apparently acquired a significant number of genes by horizontal transfer from various sources. The most notable of these genes are listed in Table Table6.6. Several genes found in Deinococcus previously have been detected only in eukaryotes and/or archaea. One of these encodes topoisomerase IB, an enzyme that is highly characteristic of eukaryotes and is present in D. radiodurans in addition to the typical bacterial topoisomerases IA and II. The recent demonstration of a structural and mechanistic relationship between topoisomerase IB and site-specific recombinases (38) makes a role in recombination plausible for the D. radiodurans enzyme. A knockout mutant with this gene deleted is substantially more sensitive to UV (254 nm) but not to ionizing radiation than is the wild type (Daly et al., unpublished). Notably, in the Deinococcus genome, the gene for topoisomerase IB is adjacent to a gene that encodes uracil-DNA glycosylase with a clear eukaryotic phylogenetic affinity. It appears likely that the two genes were simultaneously transferred from a eukaryotic source, possibly a large DNA virus because both enzymes are encoded by poxviruses (182), although a virus in which these genes were adjacent has not yet been detected.
Another typical eukaryotic protein encoded by Deinococcus is a highly conserved ortholog of the eukaryotic RNA-binding protein Ro. This protein has a distinct RNA-binding domain that is shared with the RNA-binding subunits of eukaryotic telomerases such as TP-1 and p80. In eukaryotes, Ro binds specific small RNA molecules (Y RNAs) of ribonucleoprotein particles that are found both in the cytoplasm and in the nucleus (79) and has been proposed to play a role in the “quality control” of large-scale 5S rRNA biosynthesis (79). In Deinococcus, the Ro ortholog is involved in the regulation of UV repair, in which the eukaryote-type topoisomerase IB is believed to participate. It binds to several small RNAs analogous to the Y-RNAs that are encoded by genes upstream of the Ro gene (37). Interestingly, an independent transfer of another eukaryotic member of this family, related to the telomerase RNA-binding subunit, into the genome of Streptomyces suggests a more widespread accquisition of Ro-like RNA-binding proteins by bacteria (L. Aravind, unpublished data).
The gene for a predicted protein kinase of the RIO1 family, which previously has been detected in archaea and eukaryotes but not in bacteria (121), also appears to have been transferred into the genome of Deinococcus.
Four Deinococcus proteins whose plant homologs are induced by desiccation are of particular interest; this is the first report of bacterial homologs of plant desiccation resistance-associated proteins. The DR1372 protein belongs to the Lea-14 (late embryogenesis abundant) family of group 4 of LEA proteins, one of the best-studied plant desiccation response-associated protein families (73, 124, 225, 226). Using iterative database searching, we detected additional homologs of LEA14-like proteins in many archaeal species (Fig. (Fig.6).6). In plants, these proteins are cytosolic. However, DR1372 and some of the archaeal homologs contain a signal peptide, and this suggests that in Deinococcus and in archaea, the subcellular localization of these proteins could be different.
The Lea-76 family belongs to group 3 of LEA proteins, which also are well-characterized and widespread desiccation-induced proteins in plants (46, 58, 99, 140). The main sequence feature in these proteins is a tandem repeat of a distinct 11-mer motif, in which the amino acids at positions 1, 2, 5, and 9 are nonpolar and the rest are charged or amide residues (e.g., AAQKTKDYASD in the Lea-76 protein from soybean; GI, 421875) (58). Besides plants, at least two proteins of this family are present in the nematode C. elegans (GI, 2353333 and 3924824). This motif is conserved in two Deinococcus proteins, DR0105 and DR1172, which show significant similarity to Lea-76 proteins. More generally, several other families of late embryogenesis-abundant and/or water stress resistance-related proteins are rich in repeats and/or have biased amino acid composition (62, 97, 185), complicating the identification of homologs. Therefore, it is possible that some as yet uncharacterized Deinococcus proteins containing compositionally biased sequences are also relevant to desiccation resistance. The DRB0118 protein is a homolog of a desiccation-related protein from Craterostigma plantagineum (GI, 118622), an extremely desiccation-resistant plant from the Asteridae class. In this plant, several water stress response proteins have been identified (161), with the protein homologous to DRB0118 being the only one that has no homologs in other plants. A positive correlation between resistance to desiccation and radioresistance has recently been established by examining a series of D. radiodurans radiosensitive mutants for desiccation resistance (135). It is possible, therefore, that the homologs of plant desiccation resistance-associated genes have been acquired by Deinococcus via horizontal gene transfer; the products of these genes may be generally important to the resistance phenotype (135).
Other apparent horizontal transfers to Deinococcus from eukaryotes are not easily interpretable. For example, DR1790 is a highly conserved member of a protein family that includes the yellow protein of Drosophila and royal jelly protein from the honeybee and so far has not been detected outside the insects (4). This seems to point to a rather precise source of this horizontally transferred gene, but the biochemical function of its product is not known. Based on its role in cuticular pigmentation in Drosophila (111), it may be speculated that it could be an enzyme required for the metabolism of certain pigments.
The genome of D. radiodurans contains a number of predicted mobile elements of different classes. These are of particular interest because of the role some of them could play in recombinational repair.
Two inteins, protein splicing elements that are typically inserted in genes involved in DNA metabolism and other nucleotide-utilizing enzymes (162), were identified in D. radiodurans. One of these is inserted in the ribonucleotide reductase and is similar to the inteins inserted in orthologous enzymes from B. subtilis, pyrococci, and chilo iridiscent virus. This intein contains an inserted cro-like HTH domain (14) followed by a homing endonuclease of the LAGLI-DAG family (47). The second intein is inserted between the P-loop motif and the Mg2+-binding (Walker B) motif of a SWI2/SNF2 family ATPase, which is involved in chromatin remodeling; this is the first documented instance of an intein interrupting a protein of this family. The most unusual feature of this intein that it is encoded by two distinct adjacent ORFs (DR1258 and DR1259), each of which also encodes a portion of the ATPase split by the intein. Recently, it has been proposed and then shown experimentally that the split intein in the Synechocystis DNA polymerase III α-subunit assembles from the two separately translated ORFs and splices out to form a fully functional protein (66, 77, 222). A similar protein transsplicing mechanism is likely to generate an active SWI2/SNF2 ATPase in Deinococcus.
Insertional sequences (ISs) in the D. radiodurans genome were identified during the genome annotation by the presence of ORFs homologous to transposases of several different IS families (34). Several of these ORFs exist in multiple copies. For most of these elements (IS4_DR, TCL9, TCL121, TCL23, IS3_DR, and AXL_DR), the precise length could be determined. All of these elements have the typical features of ISs identified in other species (72). In particular, they contain one or two ORFs that encode a transcriptional regulator and a transposase, as well as inverted terminal repeats and/or internal repeats (data not shown). Three elements (TCL9, TCL121, and TCL23) of the Tc1-mariner family are closely related to each other and are likely to be the product of a recent duplication, probably specific to the Deinococcus lineage (data not shown).
Overall, we detected 52 IS elements in the D. radiodurans genome (Table (Table7).7). The three most abundant ISs are IS4_DR (13 copies), IS2621_DR (11 copies), and IS200_DR (8 copies). IS elements are unevenly distributed on the chromosomes and plasmids. The number of copies per 10,000 nucleotides in the plasmid and the megaplasmid is more than 10 times greater than the number found in chromosomes I and II. Only one IS element is present in the chromosome II, whereas the plasmid contains nine. There are five single-copy IS elements in the D. radiodurans genome, three of them on the plasmid. They may be transpositionally inactive, or, alternatively, they could have been only recently acquired by the R1 strain.
Notably, IS elements are significantly more abundant in D. radiodurans than in any of the other sequenced bacterial genomes (Table (Table8).8). D. radiodurans contains 16.3 IS elements per 1,000 genes, whereas E. coli, ranking second, has only 8.4. If the number of IS elements is a reflection of transposition activity, this would be expected to cause genome instability and result in high levels of genome rearrangement in Deinococcus. There is, however, little direct evidence for any active transposition in D. radiodurans. In the entire genome, there is only one example of gene disruption by an IS element, where IS2621 is inserted into the gene for alkaline serine exoprotease A (aqualysin I). Similarly, only one IS-induced mutation has been detected in D. radiodurans (uvrA ). Nevertheless, the abundance of IS elements in the Deinococcus genome is remarkable, and their involvement in genome instability is the subject of ongoing investigations.
We identified several families of small noncoding repeats (SNRs) in the D. radiodurans intergenic regions (Table (Table9).9). A comparison to other bacterial genomes showed that, like IS elements, SNRs are more abundant in D. radiodurans than in E. coli (Table (Table8).8). However, the location bias observed for IS elements appears to be reversed for SNRs. There are no SNRs in the plasmid, that contains five IS elements. In contrast, chromosome II, that contains only one IS element, has 18 SNRs.
D. radiodurans SNRs have a complex mosaic configuration, as exemplified by SNR2, that consists of five conserved modules (Fig. (Fig.7A).7A). Module I (also shared with the small repetitive element [SRE] family) and module V contain two parts of the inverted repeat present in SNR2. The different configurations of the SNR2 family are shown (Fig. (Fig.7B).7B). These data suggest that deletions and insertions are likely to have played an important role in the evolution of SNRs. For example, module III is likely to be missing when both modules II and IV are present.
The distribution of SNRs along D. radiodurans chromosome I was tested against the null hypothesis of random occurrence of an SNR in the intergenic regions. The analyses for individual families as well as SNRs together showed that, with a single exception, there is no significant deviation from the random-placement model. The exception is the SNR5 family members that show a tendency (P < 0.05) to occur closer to each other than predicted by the random model. There is no significant correlation between the direction of a repeat and the direction of the adjacent gene, nor an apparent relationship between a particular SNR family and the functions of the adjacent genes. Thus, SNRs are not likely to play a direct role in the regulation of transcription or translation. It should be noted in this context that while some D. radiodurans SNRs have characteristics similar to the E. coli families of small repeats (bacterial interspersed mosaic elements [BIMEs] ), SNRs do not share sequence or structural features with E. coli rho-independent transcription terminators (Ter repeats ). The energy of potential RNA secondary structures predicted for D. radiodurans SNRs does not differ from the values obtained for coding regions or other sequence fragments unrelated to SNRs.
A sequence for the SRE from the D. radiodurans strain SARK was published previously (120) and has provided an opportunity to compare two evolutionarily distinct but closely related SNRs. A multiple alignment of SRE sequences from both strains (not shown) showed that most of the strain-specific substitutions were located in the central regions of two pairs of inverted repeats. In strain SARK, these SRE inverted repeats may form a pair of hairpin-like structures (120). However, the substitutions seen in R1 may disrupt these hairpins. The predicted free energy for the consensus hairpin I in SARK is −11.2 kcal/mol, but that in R1 is only −6.1 kcal/mol, as estimated by the Mfold program (134); for hairpin II, it is −17.0 and −11.0 kcal/mol, respectively. The nonrandom clustering of the strain-specific nucleotide substitutions in strain R1 is tantalizing. One could speculate that there is a strain-specific selection pressure for either strengthening (in SARK) or disrupting (in R1) these hairpins within this particular repeat family, and this would suggest a specific function for these repeats. The second possibility is that the multiple substitutions in the hairpin represent regions of the SRE that are hot spots for spontaneous mutagenesis.
The first SRE detected in D. radiodurans was within a cloned mitomycin C-inducible gene of strain SARK (120). Interestingly, a comparison with the corresponding region of the R1 strain shows that the repeat is missing in R1, demonstrating the mobility of SRE (127). The propensity of D. radiodurans to amplify DNA sequences that are flanked by direct repeats (31, 51, 190) is relevant to the large number of repeats, both ISs and SNRs, in its chromosomes and plasmids (4 to 10 copies per cell). The abundance of such repeated sequences flanking genes and operons throughout the genome could provide the potential for expansion and regulation of genomic regions in response to environmental challenges.
Two prophages unrelated to one another are present in the Deinococcus genome. One of these is located on chromosome I (between positions 518499 and 547679), and the other is located on chromosome II (between positions 80554 and 113236). Some of the proteins encoded in these prophages are distantly related to several phage proteins from other bacteria, but most ORFs have no detectable homologs. These prophages contain some genes definitely acquired from a bacterial genome, e.g., a serine/threonine protein kinase (DR0534) and a MotB/OMPA family protein (DR0536), and therefore are possible vectors for horizontal gene transfer.
A specific relationship between Thermus and Deinococcus has been established by both traditional microbiological (32, 146) and molecular phylogenetic (156) approaches. These species currently comprise a bacterial group without a clear relationship to other major branches of bacteria. Previous attempts to clarify these relationships (80) have led to the proposition that the Thermus-Deinococcus group is an intermediate between gram-positive and gram-negative bacteria. Furthermore, on the basis of phylogenetic trees developed for several protein families (HSP70, HSP40, FtsZ, RecA, and some translation elongation factors) and rRNA, an affinity of this group with cyanobacteria has been proposed (reference 80 and references therein).
Sequence analysis on the complete genome scale has revealed a major role of horizontal gene transfer in the evolution of bacteria and archaea. It appears that for most bacterial genomes, at least 10 to 15% of genes have been involved in horizontal transfer (18, 153; K. S. Makarova, L. Aravind, and E. V. Koonin, unpublished data). As discussed above, this level of horizontal gene transfer is consistent with our findings in Deinococcus. The taxonomic distribution of the best BLAST hits for all proteins in the Deinococcus genome is shown in Fig. Fig.8.8. More than half of the genes did not show specific affinity to any major bacterial branch, archaea, or eukaryotes. Some of these genes were unique to Deinococcus, but the majority appear to be more or less equidistant from their homologs from other major taxa. Among the remaining genes, the greatest fraction was most similar to homologs from gram-positive bacteria (Fig. (Fig.8),8), but even in this case it is difficult to distinguish a genuine phylogenetic signal from preferential horizontal gene transfer. Therefore, this form of analysis does not yield a specific phylogenetic placement for the Thermus-Deinococcus group.
For phylogenetic reconstruction, we used a nearly complete set of ribosomal proteins (50 sequences) and three RNA polymerase subunits that are shared among all bacterial species. A slightly smaller protein set was used to additionally include Thermus aquaticus (Fig. (Fig.9).9). All of these proteins are subunits of large, coevolving, macromolecular complexes, and therefore the respective genes are less prone to horizontal transfer. Furthermore, the large amount of sequence information included in this analysis helped to minimize the effects of possible horizontal transfer events or fluctuations in the evolutionary rate that could affect the tree topology for individual protein families. Tree A and tree C (Fig. (Fig.9)9) have essentially the same topology, indicating that the Thermus-Deinococcus group is a deeply rooted bacterial branch with a marginal, but not necessarily reliable, affinity to the cluster of gram-positive bacteria, cyanobacteria, and bacterial thermophiles. Tree B and tree D clearly confirm the strong relationship between Thermus and Deinococcus. These analyses did not detect any evidence for the previously suggested specific relationship between the Thermus-Deinococcus group and cyanobacteria.
Derived shared characteristics between Thermus and Deinococcus were used for a preliminary assessment of the possible genome organization and physiological features of their common ancestor. A comparison of all available protein sequences from Thermus to those encoded in the Deinococcus genome showed several features that are unique to this clade (Table (Table10).10). The conservation of a distinct S-layer-like protein in the Thermus-Deinococcus group suggests that the last common ancestor already possessed the unique membrane structure observed in both organisms (146). Another shared protein unique to these organisms contains a predicted signal peptide that could be involved in the formation of the characteristic infrastructure of their outer membranes. Further, the conservation of two proteins that are distantly related to nitrogen metabolism regulators suggests that a derived state of this system evolved before the divergence of Thermus and Deinococcus (Table (Table10)10) (33). Several proteins that are highly conserved in Thermus and Deinococcus show a clear affinity to archaea and/or eukaryotes, and this may have arisen by ancient horizontal gene transfer events.
We also estimated gene flow and gene loss rates in these moderately related bacteria from the same clade. Deinococcus has twice as many genes as Thermus (http://www.nlm.nih.gov:80/PMGifs/Genomes/bact.html), yet we found a significant number of genes that are present in Thermus but not in Deinococcus (Table (Table11).11). This probably reflects the distinct metabolic repertoires of these bacteria, as well as the presence in Thermus of genes associated with thermophilicity.
The phylogenetic affinity between Thermus and Deinococcus raises the issue of whether their common ancestor was a thermophile. We compared the fractions of genes shared by archaeal and bacterial thermophiles for all bacteria with completely sequenced large genomes. Perhaps not unexpectedly, the greatest fraction was seen in B. subtilis, because many species of the Bacillus-Clostridium group are thermophiles and Thermotoga may be a highly derived member of this group; Deinococcus had the second greatest fraction (Fig. (Fig.10).10). The number of common genes between these thermophiles and Deinococcus is consistent with the hypothesis that the ancestor of the Thermus-Deinococcus group also was at least a moderate thermophile with the descendent clades evolving in different directions and accquiring different sets of genes via horizontal transfer. The complete genome sequence of Thermus and, ideally, other members of this clade would be required for a definitive evaluation of this hypothesis.
The analysis of the D. radiodurans genome resulted in the identification and preliminary characterization of a number of unusual features. For example, the expanded Nudix hydrolase superfamily and the homologs of plant desiccation resistance-associated proteins are likely to contribute to both the extreme radiation and the desiccation resistance of Deinococcus. A variety of other proteins, particularly those that belong to expanded families, are likely to be involved in the unusual phenotype of this bacterium. Furthermore, the unexpectedly numerous nucleotide repeats may also play a role in stress response. The genome analysis yielded many functional predictions that can be tested experimentally and that could prove particularly significant if considered in an evolutionary context. For example, knockouts of the typically eukaryotic genes for TopoIB and Ro protein that were identified in D. radiodurans were generated, and preliminary data were obtained on the DNA repair capabilities and resistance phenotypes of the mutants (37; unpublished observations). In addition to detecting a variety of single horizontal gene transfer events, there is evidence for transfer of entire gene systems. For example, we identified several Deinococcus genes encoding pilus-associated functions: pilus biogenesis regulation operon, several pilins and prepilins, prepilin peptidase, PilT ATPase, and the fimbrial assembly protein PilM. Remarkably, there is no experimental evidence that D. radiodurans is capable of producing any pili, but it seems likely that the products of these genes contribute to the formation of other surface structures, especially those that could be involved in secretory systems similar to the type III secretion pathway. As illustrated repeatedly, the genome promises to open up many new areas for experimental work, and these are likely to further expand as genomes of other species from the same clade are sequenced and analyzed.
The sobering conclusion from this study is that the fundamental questions underlying the extreme resistance phenotype of D. radiodurans remain unanswered. It seems most likely that this phenotype is very complex and is determined collectively by some of the features revealed by this genome analysis, as well as by many more subtle structural peculiarities of proteins and DNA that are not readily inferred from the sequences, at least not with the current limited collection of genomes available for comparative analysis. This is parallel to the results of comparative analysis of the genomes of archaeal and bacterial thermophiles, which provided many tantalizing clues in terms of genes that are shared by these organisms, to the exclusion of mesophiles, and their possible functions but so far have failed to establish an unequivocal molecular basis for thermophilicity (18, 126, 153). We expect that a comprehensive understanding of the mechanisms of damage repair in Deinococcus will arise from a combination of further comparative genomic analysis and prediction-driven experiments.
The annotation of D. radiodurans protein-coding genes is available at ftp://ncbi.nlm.nih.gov/pub/koonin/Deinococcus/.
This research was funded largely by grant DE-FG02-98ER62583 from the Microbial Genome Program, Office of Biological and Environmental Research, Department of Energy (DOE), and grant 5R01-GM39933-09 from the National Institutes of Health. Some of this work was also supported by grants FG02-98ER62492 and FG02-97ER20293 from the DOE.
We are grateful to John Battista (Louisiana State University) and Owen White (The Institute for Genomic Research) for numerous helpful discussions and critical reading of the manuscript.
After the manuscript was submitted for publication, we became aware of several recent findings that provide new insights into Deinococcus gene functions. In particular, an alternative, α-animoadipate pathway of lysine biosynthesis (in contrast to the diaminopimelate pathway, which is typical of most other bacteria) was discovered in Thermus thermophilus (N. Kobashi, M. Nishiyama, and M. Tanokura, J. Bacteriol. 181:1713–1718, 1999). As described above, Deinococcus can grow on minimal media without lysine, and it now appears most likely that it also produces lysine via the α-animoadipate pathway. The following Deinococcus genes are orthologs of the Thermus genes encoding enzymes of this pathway: DR1238 (homocitrate synthase), DR1610 or DR1778 (large subunit of 3-isopropylmalate dehydratase), DR1784 and DR1614 (small subunit of 3-isopropylmalate dehydratase), DR1674 (isocitrate dehydrogenase), and DR2194 (glutaminyl transferase); the pathway also could include additional, still unidentified enzymes. However, in Deinococcus these genes do not form a cluster as in T. thermophilus and Pyrococcus horokoshii (N. Nishida, M. Nishiyama, N. Kobashi, T. Kosuge, T. Hoshino, and H. Yamane, Genome Res. 9:1175–1183, 1999). All 21 Nudix hydrolase genes from Deinococcus were cloned, and some novel enzymatic activities (UDP-glucose pyrophosphatase and CoA pyrophosphatase) were identified (W. Xu, J. Shen, C. A. Dunn, S. Desai, and M. Bessman, Mol. Microbiol. 39:286–290, 2001). The mglB-like genes that are expanded in Deinococcus belong to a protein superfamily that also includes dynein light chains of the Roadblock/LC7 class; together with Ras/Rho GTPases, they form a regulatory module which might be involved in the control of some molecular motors of the cell (E. V. Koonin and L. Aravind, Curr. Biol. 10:R774–R776, 2000).