|Home | About | Journals | Submit | Contact Us | Français|
Restriction–modification (RM) systems are composed of genes that encode a restriction enzyme and a modification methylase. RM systems sometimes behave as discrete units of life, like viruses and transposons. RM complexes attack invading DNA that has not been properly modified and thus may serve as a tool of defense for bacterial cells. However, any threat to their maintenance, such as a challenge by a competing genetic element (an incompatible plasmid or an allelic homologous stretch of DNA, for example) can lead to cell death through restriction breakage in the genome. This post-segregational or post-disturbance cell killing may provide the RM complexes (and any DNA linked with them) with a competitive advantage. There is evidence that they have undergone extensive horizontal transfer between genomes, as inferred from their sequence homology, codon usage bias and GC content difference. They are often linked with mobile genetic elements such as plasmids, viruses, transposons and integrons. The comparison of closely related bacterial genomes also suggests that, at times, RM genes themselves behave as mobile elements and cause genome rearrangements. Indeed some bacterial genomes that survived post-disturbance attack by an RM gene complex in the laboratory have experienced genome rearrangements. The avoidance of some restriction sites by bacterial genomes may result from selection by past restriction attacks. Both bacteriophages and bacteria also appear to use homologous recombination to cope with the selfish behavior of RM systems. RM systems compete with each other in several ways. One is competition for recognition sequences in post-segregational killing. Another is super-infection exclusion, that is, the killing of the cell carrying an RM system when it is infected with another RM system of the same regulatory specificity but of a different sequence specificity. The capacity of RM systems to act as selfish, mobile genetic elements may underlie the structure and function of RM enzymes.
Restriction enzymes and modification enzymes may not be merely sequence-specific DNA nucleases and methylases. Rather, they may represent one of the simplest forms of life, similar to viruses, transposons and homing endonucleases. As will be shown in this review, they can increase their relative frequency within a cell population by three strategies. First, they defend themselves (and the host bacterium) from invaders by attacking ‘non-self’ DNAs. Secondly, they kill cells that have eliminated them (Fig. (Fig.1A)1A) due, for example, to the acquisition of another genetic element (Fig. (Fig.1B).1B). Thirdly, they move between genomes. While such restriction–modification (RM) systems can help protect the cell from foreign DNAs, their behavior appears to reflect a primarily ‘selfish’ purpose, namely, to promote their survival. The purpose of this article is to review the evidence that supports this ‘selfish gene’ hypothesis.
After a brief historical introduction, I will discuss recent evidence, particularly based on bacterial genome analysis, that suggests that some RM gene complexes behave as mobile genetic elements that can shape bacterial genomes. In the third section, I will introduce the parasitic life cycle of RM gene complexes with emphasis on their competition with each other. In the fourth section, the mechanisms used by bacteria and bacteriophages to protect themselves from the parasitic behavior of the RM complexes are described. It will be evident that the host–parasite-type interactions between bacteria and RM systems make a significant contribution to genome evolution.
The biochemistry, structural biology and biological and molecular evolution of RM systems have been reviewed from various points of view (1–7). The reference list here has been limited to recent sources that are immediately relevant to the concept of RM gene complexes as a form of life. Other references, representing many printed and online information sources, are easily accessible in the databases, MEDLINE/GenBank (NCBI), REBASE (New England Biolabs) and TIGR Microbial Database, among others, using keywords scattered throughout this article.
A restriction (R) endonuclease recognizes a specific DNA sequence and introduces a double-strand break (Fig. (Fig.2A),2A), while a cognate modification (M) enzyme can methylate the same sequence and protect it from cleavage. The two together form a RM system. As each restriction gene is usually tightly associated with a cognate modification gene(s), these two (or more) genes can be termed an RM gene complex. There are several types of RM systems (2,7). A typical type II RM gene complex, such as EcoRI, contains one gene, R, for sequence recognition and restriction, and one gene, M, for sequence recognition and modification (Fig. (Fig.2A)2A) (7). There are three, contrasting but not mutually exclusive hypotheses that explain the maintenance of RM systems in evolution.
Restriction enzymes will cleave incoming DNA if it has not been modified by a cognate or other appropriate methylase (Fig. (Fig.2B).2B). It is thus widely believed that RM systems have been maintained by bacteria as tools for defense against infection by viral, plasmid and other foreign DNA. This may be called the cellular defense hypothesis.
Despite their function in cellular defense, there is increasing evidence that some RM systems operate on a primarily ‘selfish’ level. That is, the measures taken to maintain themselves in a cell population can lead to adverse consequences for their host cell. Some RM complexes resist being lost from a cell. For example, a plasmid carrying an RM gene complex cannot be readily displaced by an ‘incompatible’ plasmid carrying a different RM gene complex (8,9). Similarly, an RM gene complex on the bacterial chromosome cannot be easily replaced by a homologous stretch of DNA (a transducing fragment) (10). This resistance is manifested as cell death that occurs after an RM gene complex is lost (8,11–13).
Plausible steps involved in this process are illustrated in Figure Figure2C.2C. If an RM gene complex is lost through an interaction with a competitor gene, for example, the cell’s descendants will contain fewer and fewer molecules of the modification enzyme. Eventually, the enzyme’s capacity to protect recognition sites on newly replicated chromosomes from the remaining pool of restriction enzymes will become inadequate and chromosomal DNA will be cleaved by the restriction enzyme, leading to the cell’s death (11,12). Naturally, the restriction enzyme molecules will also become increasingly limited in number. However, there is asymmetry between the roles of the methylase and the restriction enzyme—the asymmetry between life and death. For the restriction enzyme to kill the cell, a single break on the chromosome may well be sufficient (unless repaired). For the methylase to keep the host alive, all (or almost all) of the hundreds of recognition sites along the chromosome need to be methylated, and this will require a substantial pool of methylases to be present.
Evidence that supports this model was obtained in experiments in which an RM gene complex (PaeR7I, EcoRI and EcoRV) on a temperature-sensitive (ts) plasmid was lost from Escherichia coli cells after a temperature shift (8,11–14). This halted the increase in viable cell counts and resulted in loss of cell viability. Many cells formed long filaments, some of which were multinucleated and others anucleated (referring to those regions stained by DNA-staining dyes as nuclei). Accumulation of long linear chromosomal forms followed by extensive DNA degradation was observed by pulsed-field gel electrophoresis and conventional gel electrophoresis. The cells induced the SOS response.
This situation is reminiscent of ‘post-segregational killing’, which is recognized as a plasmid stabilization mechanism (Table (Table1)1) (15,16). The mechanism consists of a pair of genes, a killer and an anti-killer, that interact either at the protein level or at the RNA level. Loss of the gene pair from a cell brings about the lethal action of the killer element. Indeed, insertion of an RM gene complex into a plasmid increases the stability with which the plasmid is maintained in a culture of bacterial cells (8,11–14,17,18). In the case of the EcoRII RM gene complex on the ts plasmid described above, the temperature shift led to a decrease in the viable cell count instead of simply preventing its increase (14; Y.Naito, N.Handa, N.T.Kobayashi and I.Kobayashi, unpublished). The simplest explanation for this is that the RM system killed the cells when its copy number had decreased even before the complete loss of the gene complex from the cell. ‘Post-disturbance cell killing’ might be a more appropriate description of this behavior than post-segregational cell killing. The resistance of chromosomal RM gene complexes to replacement as described above (10) implies that the RM gene complex, rather than the plasmid, is the unit of post-segregational killing or selection.
Given that RM systems will kill their host cell when they are eliminated from the cell, they could be called selfish genes in the sense that the word is used in genetics, evolutionary biology and in behavioral ecology (Table (Table1,1, I; Table Table2)2) (19–21). The competition between an RM gene complex and a competing genetic element (Fig. (Fig.1B), 1B), which results in the host cell being killed, serves to maintain the presence of the RM gene complex in the neighboring clonal cells (Fig. (Fig.1B).1B). While on a superficial level the cell death resembles altruistic suicide with respect to the other cells in the host cell clone, the cell death is actually programmed by a genetic element (RM) whose interests are not necessarily the same as those of the genome where it resides. Indeed, it will kill the cell in a fight with a competing genetic element. It is hypothesized that this competitive behavior has, at least in part, assured the maintenance of some RM systems (8,13,20,22).
Other similarly selfish genes include bacteriocin operons and meiotic drive genes (Table (Table1,1, I). This mode of cell death—apparently altruistic cell death programmed by a resident genetic element upon invasion of its competitor genetic element—is also seen in phage exclusion (Table (Table1, 1, II). We will see below that RM systems behave similarly.
Several type II systems have shown this type of self-stabilization and/or host killing, including PaeR7I (8), EcoRI (13), Bsp6I (17), EcoRV (18), PvuII (Y.Nakayama and I.Kobayashi, unpublished), EcoRII (Y.Naito, N.Handa, I.Kobayashi, unpublished; 14) and SsoII (14). The effect varies between type II systems as it is stronger for some (EcoRII, for example) and weaker for others (SsoII, for example) under the experimental conditions examined. Such selfish behavior has not yet been reported for other RM system types. The EcoKI system does not resist its loss and is readily replaced by alleles conferring different specificities (23). Similarly, the EcoR124I genes can be lost without killing the host cell (24). However, it would not be surprising if further work revealed a greater variation in the virulence of RM systems, similar to what we find in viruses and transposons.
The third hypothesis about why RM systems are maintained in evolution is the variation hypothesis. This hypothesis suggests that RM systems participate in generating bacterial diversity by promoting homologous recombination (e.g. 25). This hypothesis needs another ad hoc hypothesis to explain why the diversity is advantageous. On the other hand, the selfish gene hypothesis can explain why homologous recombination is advantageous, as we see below.
RM gene complexes can act selfishly if they are independent from the rest of the genome (their host). This independence is facilitated by their movement between genomes. There is increasing evidence that supports the notion of RM independence and mobility, notably from genome analysis (Table (Table33).
Some of the sequenced bacterial genomes, such as those of Haemophilus influenzae, Methanococcus jannaschii, Helicobacter pylori, Neisseria meningitidis, Neisseria gonorhoeae and Xylella fastidiosa, are impressively rich in RM genes (or their homologs) (see REBASE). Some of these genomes share the capacity for natural transformation, whereby chromosomal genes would be frequently replaced by incoming homologous DNA stretches. The RM gene complexes may resist this by killing their host or by cleaving the invading DNAs, thus assuring their continued presence in the cell population. However, other sequenced genomes, such as those of Rickettsia prowazekii, Treponema pallidum, Chlamydia and Buchnera, lack or almost lack open reading frames with homology to known RM genes. With regard to R.prowazekii, Chlamydia and Buchnera, one plausible explanation for their lack of RM genes is that their genomes are isolated from incoming genes because they occupy an intracellular niche within the cells that they infect.
Considerable sequence diversity in the DNA sequences coding for RM enzyme homologs within a given species and within a given genome has been observed. Examples are found in E.coli, N.gonorrhoeae and H.pylori (REBASE) (26–28). Thus, a comparison of the M homologs from two completely sequenced strains of H.pylori, 26695 and J99, showed that, while many pairs were very similar to each other, there were also other M homologs that were specific to each individual strain (28,29). Another comparison between two strains revealed similar polymorphisms: among the gene clones identified as being present in a third strain of H.pylori but absent from the strain 26695, several were RM homologs (30). This diversity was verified by activity measurements (31–33). Polymorphisms of RM homologs have also been found between the genomes of two species of the genus Pyrococcus (34) and of two N.meningitidis strains (35).
Comparisons of RM sequence alignments (often in the form of a phylogenetic tree) with sequence alignments of other genes, such as ribosomal RNA genes, in the genomes suggest that RM genes have undergone extensive lateral gene transfer (29,36,37). The GC content and/or codon usage of RM genes are often different from those of other genes in the genome (28,29,38,39). This is consistent with the notion that these RM genes joined the genome relatively recently, although it is difficult to estimate the time of their arrival.
As implied in the above discussion, any type of genetic element that is relatively independent from the remainder of the genome would benefit from carrying an RM gene complex, as the selfish behavior of this complex would help maintain the element in the cell population. In turn, the RM gene complex would enjoy increased mobility. Supporting this notion is that a variety of mobile genetic units are found to be linked with RM gene complexes (Table (Table3). 3). Some plasmids are known to carry RM gene complexes. For example, the sequencing of the M.jannaschii genome identified a type I RM system on a large plasmid. Prophages are also known to bear RM genes. For example, the HindIII RM gene complex was found on a cryptic prophage identified in the H.influenzae Rd genome sequence (40). Other examples are the Sau42I RM, found on a Staphylococcus aureus bacteriophage (accession no. X94423) and linked with a phage integrase (41), and EcoO109I, present on a P4-like prophage (42). There are also RM gene homologs on a prophage in the sequenced Bacillus subtilis genome (43).
The sequencing of clones containing RM genes often reveals their linkage to homologs of mobility-related genes and elements. For example, the Rle39BI RM is flanked by IS repeats and could move as a composite transposon (44). In addition, the XbaI RM (GenBank accession no. AF051092) and M.Vch01I0 (REBASE, GenBank accession no. X64097) are flanked by repeat elements of integrons, and their sequences suggest that they could move as an integron cassette. Many more RM genes are linked with a site-specific recombinase homolog or a transposase homolog (see REBASE, GenBank and MEDLINE) (41).
RM elements are also found to be inserted into operons. When the HaeIII RM gene region in Haemophilus aegyptius was compared to that in H.influenzae, it appeared that the HaeIII RM had been inserted into an operon in an ancestor of H.aegyptius, replacing a short intergenic sequence (45). Similar insertion into an operon was found in Neisseria (46) and in Streptococcus (47). As an operon can be regarded as a unit of mobility on an evolutionary time scale (48), the association of RMs with operons may be essentially similar to the association of RMs with other mobile elements. The operon would enjoy stability, especially in competition with an operon of a similar function, while the RM unit would enjoy mobility. Another benefit of being inserted into an operon is that the host operon would allow the RM unit to be transcribed.
How often RM units are associated with a mobile genetic element is not clear as yet because cloned RM units are rarely sequenced outside of the coding regions. Thus, it may be that other RM gene units are also associated with mobile genetic elements. However, some RM gene complexes appear to behave as a mobile unit without being linked to another mobile unit. This can be inferred from the comparison between closely related genomes of naturally transformable bacteria. In naturally transformable bacteria, RM gene complexes are as mobile as any other DNA in the genome. They do not have to be linked with a mobile element.
That RM elements have moved between genomes and are associated with genome rearrangements is evidenced by comparison of two closely related genomes. A simple-substitution-type polymorphism (Fig. (Fig.3A3A and E, middle) was detected when two strains of H.pylori (49) and two species of Pyrococcus (34) were compared. Also in H.pylori was observed the unique structure of ‘an insertion with a long target duplication’ (Fig. 3B and E, left) (49,50). This structure is formally similar to the classical transposon insertion except that the target duplications here are much longer. Analysis of the neighboring sequences suggested that this polymorphism reflects an ancient insertion event. This is also consistent with the possibility that later recombination between the duplicated region caused the RM gene complex to be deleted (49). Several RM homologs have also been found to be inserted next to an inversion polymorphism between two closely associated genomes both in H.pylori and Pyrococcus (Fig. (Fig.3C3C and E, right) (28,34,49). Figure Figure3E 3E (right) illustrates a case involving type I RMS homologs.
Comparison of the genome sequences of two H.pylori strains (28,29,49) and of two Pyrococcus species (34) also brought to light cases where a homologous segment containing RM complexes is present at different chromosomal loci in the two genomes (Fig. (Fig.3D3D and E). This strongly suggests that the DNA segments containing the RM complexes can move between different positions in a genome—a transposition or a translocation event. (The transposition/translocation could have occurred between these two loci or could have involved a third locus.) These examples (Fig. (Fig.3D3D and E) suggest that the unit of transposition for a particular RM gene complex can vary between events in that an RM gene complex may or may not move together with neighboring DNA.
Although these polymorphisms involving RM gene complexes can be formally described as insertions, substitutions, inversions, translocations, transpositions etc., the molecular mechanisms generating them are not yet understood. In some cases, certain mechanisms can be ruled out through homology searches (34,49).
One hypothesis of how these polymorphisms arise suggests they are due to attacks by restriction enzymes on the host chromosome after disturbance of the RM gene complex (34,49). An example of how an RM unit could be transposed within a genome is shown in Figure Figure4.4. Here it is assumed that the balance between restriction enzyme and modification enzyme is somehow disturbed, for example, by the insertion of some genetic element in the neighborhood. The modification enzyme now fails to methylate its chromosomal recognition sites, and the remaining restriction enzyme molecules will introduce breaks into the chromosome. Degradation from the cut ends follows. The host, in an attempt to repair the chromosome, tries to rejoin the broken ends. These steps would generate a variety of rearranged genomes. A DNA fragment may be excised and inserted elsewhere in the genome. If the ends are properly closed with the insertion of a DNA fragment carrying the RM gene complex in question, the RM gene complex may be able to express the methylase again, and thus can protect chromosomal sites from its new locus. This would prevent further attack by the restriction enzyme. (Note that the RM unit on a DNA fragment, even if expressed, cannot be stably inherited by the progeny. Their descendants would be attacked by restriction enzyme after the loss of the DNA fragment.) The net result is the transposition of the RM gene complex to a new locus in this surviving clone.
In this case (Fig. (Fig.4),4), the transposition results in substitution polymorphism of an RM segment (Fig. (Fig.3A).3A). The case of ‘insertion with a long target duplication’ (Fig. (Fig.3B) 3B) (mentioned above) can be explained if unequal end joining between two restriction breaks on two sister chromosomes takes place together with the insertion of a DNA carrying the RM gene complex (49). The substitution adjacent to a large inversion (Fig. (Fig.3C)3C) (28,34) can be explained by restriction breaks at two sites, possibly behind each of the two bi-directional replication forks. The fragment between the two breaks will thus be rejoined in the opposite orientation together with the DNA fragment carrying the RM gene complex (34).
In this ‘selfish transposition model’, the RM gene complex appears to take advantage of the host’s attempt to repair in order to transpose itself to a new locus. Supporting the notion that disturbance of the resident RM complex triggers the development of these polymorphisms are experiments demonstrating that an attempt to replace an RM gene complex by allelic DNA lead to genome rearrangements in the laboratory (see below) (10).
In S.aureus, genes homologous to those of type IC RM systems have been found to be inserted into each of two tandem clusters of paralog family members (exotoxin cluster and serine protease cluster) (51). The exotoxin cluster is polymorphic in that the number of repeats vary among strains. The role, if any, of these RM genes in the maintenance or formation of this cluster is to be examined.
Two types of phase variation of RM systems that involve changes in the primary structure of the RM genes have been documented.
One is site-specific recombination within the gene for the specificity subunit (S) of type I systems (52,53). This (i) changes the sequence specificity of the RM system by changing altering the primary structure of the specificity subunit and (ii) also switches gene expression on or off.
The second type of phase variation of RM systems has been predicted to occur in H.pylori, N.gonorrhoeae, H.influenzae and Pasteurella haemolytica (54,55) and is due to changes in the length of simple base pair repeats in either the R or the M gene. Changes in tract length have been demonstrated experimentally (28). How such changes affect RM activities remains to be elucidated, but a change in sequence specificity due to differences in repeat numbers has been reported for type I systems (Eco124 and R124/3) (56).
Allelic RM systems may show phase variation through intercellular DNA transfer (57), and the RM systems showing strain polymorphisms involving substitution and insertion (discussed above) are candidates for this.
Turning R and/or M genes on or off by phase variation could lead to cell death through restriction attacks on the genome. This has been hypothesized to be an altruistic cell death process that releases cellular DNA for its uptake by other cells in natural transformation (54).
Phase variation would be promoted when the genes making up one RM system are not linked together on the genome and are instead at separate locations (discussed further below). This could help facilitate their recombination and, possibly, also their phase variation.
Genomic analysis of a variety of bacterial systems suggests that there are many RM homologs that appear to be inoperative because of insertions, deletions or point mutations (32,49 and REBASE). This is reminiscent of the defective transposons observed in some prokaryotic genomes and many eukaryotic genomes. In H.pylori genomes, there are indications of RM units that have been inserted into another RM unit. An example is HpyAII (= HP1366, HP1367, HP1368) (type IIS) that appears to have been inserted into a type III gene cluster (HP1369/HP1370, HP1371) (Fig. (Fig.3E,3E, left). The resident RM unit appears to have been further destroyed by mutation, possibly after the insertion, because their homologs in another strain, J99, seem to be intact (jhp1284, jhp1285) (Fig. (Fig.3E,3E, left). This is again reminiscent of transposon insertion into another transposon that results in the structures found in the genome of many eukaryotes.
In some cases, the R gene appears to be non-functional while the neighboring M gene seems to be functional (58). The M gene may serve to protect the genome against RM systems that recognize the same sequence (as described below for orphan methylases).
In most of the RM systems identified so far, the constituent genes are tightly linked with each other. This cluster of genes confers the RM unit with mobility and the ability, upon its elimination from a cell, to kill the cell. The simultaneous loss of R and M genes assured by their tight linkage is particularly critical for their maintenance by post-segregational killing. In several cases, however, R, M or S homologs have been found isolated from each other in the genome. Such isolated genes may be due to (i) the decay of an RM cluster, (ii) their unique function as an orphan gene or (iii) their participation in an RM system composed of unlinked genes. Examples of the first possibility (i) are found in sequenced genomes, for example, H.pylori (32,49). Variability in operon structure among diverse genomes has been noticed earlier (59). This may be explained by the decay in the structure of the operon, which represents a unit of mobility (48). That isolated genes may play a unique function as orphan genes (ii) will be discussed later.
The third possibility (iii) is exemplified by an S gene on a plasmid that works together with a chromosomal RM gene cluster in a type I system (60). Similarly, the Mycoplasma pneumoniae genome carries a single type I gene cluster (SMR) together with several orphan S genes at different loci (61) although their activities have not been examined. Candidates for another example of unlinked RM genes are found in the two S.aureus genomes discussed above. In both of these genomes, an S and M gene pair is linked to a toxin gene cluster, while another S and M pair is linked with another, unlinked toxin gene cluster. An R homolog of the same family (type IC) is located elsewhere (unlinked) in the genome. The two M genes are highly homologous while the two S genes diverge in their putative target recognition domains (51). It is not known whether the two different SM pairs, together with the R gene, form two type I systems that differ in sequence specificity. The separation of genes for an RM system into different loci may facilitate their recombination and phase variation (mentioned above). It is possible that such unlinked RM systems may be more common than is currently believed, because the conventional methods of identifying RM genes (as opposed to the genome decoding strategy) may miss RM systems composed of unlinked genes.
Several of the phenomena observed with RM gene complexes and RM enzymes may be better understood if we assume that they are parasitic or symbiotic elements. One example of such parasitic behavior is the post-segregational killing discussed above, which appears to be a programmed process that serves to overcome competing genetic elements (Fig. (Fig.1).1). Other examples of how RM complexes can act in a parasitic or symbiotic manner are described below.
When some RM gene complexes establish themselves in a cell, they delay expression of the restriction enzyme relative to the expression of the methylase to avoid the lethal restriction of the host genome (Fig. (Fig.5A).5A). In this way, RM complexes behave as ‘smart’ parasites because they assure the survival of the host cell. There are several different ways in which this delay is achieved.
Some RM systems (PvuII, BamHI, EcoRV, etc.) employ a regulatory gene, called C, to control the expression of R and/or the repression of M (Fig. (Fig.5C)5C) (18,62–64). The C gene product binds the promoter region of the R gene (65), and either promotes its transcription (64) or stabilizes its mRNA (66). The C gene products for the PvuII and BamHI RM systems recognize the same sequences (65,67), while the C gene product for the EcoRV system does not share this specificity (18,67).
In some systems, the methylase itself either positively regulates R expression or negatively regulates M expression. M.EcoRII autoregulates its own expression (68). In the case of SsoII, the methylase binds the intergenic region of a divergent R and M gene pair, and thus regulates both M and R expression (69). Its N-terminus region is important in this regulation. DNA methylation at the CfrBI site in the promoter region of R.CfrBI is involved in transcriptional control of the CfrBI RM system (70).
The salIR and salIM genes constitute an operon that is mainly transcribed from sal-pR1, a promoter located upstream of salIR. Another promoter, sal-pM, is within the salIR coding region and allows the M gene to be expressed in the absence of sal-pR1. The sal-pM promoter might be involved in the establishment of modification prior to restriction endonuclease activity (71). The pvuIIW gene is a possible modulator of PvuII endonuclease subunit association, which delays restriction upon establishment (72).
The presence of an RM gene complex in a cell would prevent another RM gene complex from establishing itself in the cell because the resident RM would attack the incoming RM if it was not properly methylated, just as it would attack any other DNA. However, should the incoming RM complex be properly methylated or lack the recognition sequence of the resident RM at all, the resident RM may still be able to abort the establishment of the incoming RM in the cell by eliciting host killing (Fig. (Fig.5B)5B) (18,64). It takes place specifically between two RM systems that share the specificity in the regulatory system for establishment (see the previous section). It was demonstrated between two RM systems that are regulated by C gene products that bind the same sequence. The PvuII and BamHI RM systems have such cross-complementing C genes. The presence of the BamHI RM gene complex in the recipient cell reduced the transformation efficiency of a plasmid that carried the PvuII RM gene complex. A similar decrease in transformation efficiency was also observed when the recipient cell contained only the C regulatory gene of BamHI, and this decrease was dependent on the restriction gene of the incoming PvuII RM system (18). Thus, the regulatory C gene product of the resident RM element in the cell appears to force the expression of the incoming R gene, which leads to it cleaving its as-yet-unmodified recognition sites on the chromosome and killing the host cell (Fig. (Fig.5B).5B). This mechanism should work only if the two RM systems involved have non-identical recognition sequences.
This form of competition between RM systems is reminiscent of the super-infection exclusion between viruses (bacteriophages) (and between a plasmid and a virus) whereby cell death is triggered by infection of another virus (Table (Table1, 1, II). Thus, this form of competition has been designated as ‘super-infection exclusion’. It is another example of how resident selfish genetic elements can act in altruistic cell death strategies to protect a clonal cell population against invading genetic elements (Fig. (Fig.1B).1B). By analogy to apoptosis in animal cells infected with viruses, I have called this phenomenon ‘apoptotic mutual exclusion’. The exclusion resembles immunity between prophages and incompatability between plasmids in that the identity of specificity in the regulatory network prevents propagation of a second element. The super-infection RM exclusion would be prominent during the establishment of an RM system and thus resembles surface exclusion between plasmids in this respect.
This type of exclusion constitutes another dimension to RM system classification. Thus, when the regulatory specificities of the two RM systems involved are identical, they constitute an incompatibility group. PvuII RM and BamHI RM consequently define one incompatibility group while the EcoRV RM, which is controlled by a C gene that recognizes different sequences, defines another (18). SsoII RM and EcoRII RM do not show such exclusion, probably because their restriction enzyme expression is regulated with different specificities (14). A bacterial cell can carry multiple RM systems of different super-infection exclusion specificities just as it may carry multiple plasmids of different incompatibility specificities.
Sequence recognition by RM systems is individually highly specific but collectively quite diverse. This has been explained by the existence of frequency-dependent selection in cellular defense (Fig. (Fig.2B).2B). If the recognition sequence be rare, an RM system would be more likely to recognize invading DNAs as non-self.
However, competition between two RM systems for a recognition sequence has also been demonstrated to occur in the absence of any invading DNA (Fig. (Fig.6)6) (13). Post-segregational host killing by an RM gene complex did not occur when a second RM gene complex within the cell shared the same sequence specificity (Fig. (Fig.6C).6C). In other words two RM systems of the same specificity were unable to enjoy stabilization simultaneously. This type of ‘incompatibility’ or mutual exclusion implies competition for specific sequences by RM systems.
In addition, a less specific recognition site can provide an advantage. For example, when two RM systems, one recognizing CCNGG (N = A, T, G or C) and the other recognizing CCWGG (W = A or T), were present in the same cell, the former won the intracellular competition (14). The former was able to prevent host killing by the latter, while the latter was unable to prevent killing by the former. This one-sided incompatibility implies that there is a selective pressure for a less specific recognition sequence.
This direct competition between RM systems in the absence of any invading DNAs thus could cause RM systems to evolve, with respect to their sequence recognition, faster than they would only through selection by invaders.
Bacteria may have evolved several means to defend their genome from attack by restriction enzymes. That restriction enzymes participate crucially in various types of DNA recombination has been known for a long time. Indeed, RM systems are sometimes regarded as a mechanism for generating variety in a genome (the variation hypothesis) as mentioned above (75). A contrasting view is that the various recombination systems are measures taken by the host to protect itself from the selfish behavior of RM systems (20). This hypothesis is described below together with other possible adaptation strategies. Bacterial strategies against type I RM systems have been analyzed (3,76). Our emphasis will be on type II RM systems.
Interaction with RM gene complexes may explain the apparently contradictory behavior of a major homologous recombination pathway of E.coli (77). From a double-stranded break, the RecBCD enzyme begins exonucleolytic DNA degradation. When the enzyme encounters a specific sequence called Chi, the degradation is attenuated and followed by RecA-mediated recombination of this DNA with a homologous DNA. The recombinogenic double-strand breaks may be provided by a type II restriction enzyme in vivo (78). In a mutant defective in RecBCD exonuclease/recombinase, the symptoms of cell death following the loss of an RM gene complex are more severe (12). These include growth inhibition, loss of cell viability, cell filamentation, and loss of nuclei (the region stained by DAPI). Huge linear forms of the chromosome accumulate. Similar syndromes were observed with a recBCD mutant defective in Chi recognition, and growth inhibition is also severe in recA, ruvAB, ruvC, recG and recN mutants. The cells induce the SOS response in a RecBC-dependent manner. These observations strongly suggest that the bacterial cell death after the loss of an RM gene complex is caused by chromosome breakage, and that the bacterial RecBCD/RecA machinery helps the cells to survive by repairing the broken chromosomes. RecA/BC-mediated cell survival is also observed when a mutant restriction enzyme is activated or when a restriction enzyme is overproduced (79).
A plausible explanation for the dual nature of the RecBCD system may be the interactions that occur between three genetic elements (the RM system, invading alien DNA and the host RecBCD system) within one bacterial genome. Thus, the RM system will attack non-self DNA, marked by an unmethylated recognition site, whether it is on invading DNA or on the chromosome. After restriction breakage, the host RecBCD system destroys the invading non-self DNA, defined by the host as the absence of the Chi sequence. On the other hand, it allows the restoration of its own Chi-marked chromosomal DNA through recombination.
If an incoming DNA carries Chi, exonucleolytic degradation will be attenuated. Furthermore, parts of the remaining fragment will be incorporated into the chromosome if they survive the mismatch repair process during homologous pairing. This would result in the mosaic-type genome polymorphism observed within a group of bacteria sharing the identification marker such as Chi (80). Pairs consisting of a RecBCD homolog and a Chi equivalent have been identified in other bacteria such as H.influenzae and Lactococcus lactis (81). The RecBCD enzyme changes its sequence preference through mutation (82,83). The bacterial group defined by the RecBCD analog/Chi analog pair may be regarded as a type of species in the sense that it defines a unit of reproductive isolation. The above role of Chi equivalents is distinct from, but nonetheless analogous to, the role of the uptake signal sequence in naturally transformable bacteria (84).
Genome comparison revealed that RM genes are sometimes associated with large genome polymorphisms as discussed above (in the section ‘linkage of RM genes with large genome polymorphisms’). We then introduced the hypothesis that RM gene complexes can cause genome rearrangements when their presence is threatened. Experiments have provided support for this hypothesis. When the PaeR7I gene complex on the E.coli chromosome was challenged through general transduction by a homologous stretch of PaeR7I-modified DNA, the replacement efficiency was lower than expected. Many of the resulting recombinant clones retained the recipient RM gene complex as well as the homologous donor DNA. Analysis of these clones suggested that multiple rounds of unequal homologous recombination between copies of a repetitive element (IS) generated large-scale duplication and inversion in the chromosome and that only one of the duplicated copies of the PaeR7I gene complex was replaced by the donor allele. The interaction between selfish attack by RM systems and defensive homologous recombination system of bacteria is likely responsible for these genome rearrangements (10).
Of the many anti-restriction strategies used by bacteriophages and plasmids (5), the homologous recombination machinery carried by bacteriophages appears well adapted to counteracting attacks by RM systems (85). An example is bacteriophage lambda, which encodes the homologous recombination proteins Redalpha and Redbeta. Rac prophage, a lambdoid bacteriophage present in some E.coli strains, also encodes similar recombinases, RecE and RecT. Redalpha and RecE both show 5′–3′ exonuclease activity at a double-stranded DNA end and expose a 3′ single-stranded end while Redbeta and RecT proteins stimulate annealing of complementary single strands. DNA strand invasion activity has been demonstrated for RecT (86). The cooperation of Redalpha and Redbeta, or of RecE and RecT, can promote homologous recombination near a DNA double-strand break made by restriction enzymes. Two parental DNAs can give rise either to only one progeny DNA (non-conservative recombination) or to two progeny DNAs (conservative recombination). When it is conservative, the repair may or may not be accompanied by crossing-over of the flanking sequences (87,88). Lambda Redalpha exonuclease has structural similarity to restriction enzymes, its enemies, and may be related to them in evolutionary terms (89).
Recombination stimulated by a restriction break may take place between co-infecting sister genomes in a clone. Recombination may also take place with a partner, possibly present as a prophage, from another phage clone possessing a divergent genome. Such out-crossing could confer several advantages in addition to the primary advantage of immediate restoration of the cleaved chromosome. If the template DNA were to lack the recognition site, recombination might result in a DNA region devoid of this particular restriction site. The crossing-over (or half crossing-over) of flanking sequences triggered by the break would confer a third kind of advantage: alleles at different locations, either sensitive or resistant to a particular restriction enzyme, would be recombined to generate rare combinatory genotypes. Some of these would be more resistant to attack by the present RM systems than the current combination, and they would increase in number. As a phage population encounters bacterial populations possessing various combinations of RM systems of diverse sequence specificities, the process of breakage, repair, gene conversion and crossing-over will continue.
Host bacterial cells also attempt to repair restriction breakage by rejoining the broken ends. Precise joining at a restriction break by ligase would restore intact DNA (90). Non-homologous or illegitimate joining of DNA ends resulting from further degradation is regarded as an unsatisfactory form of DNA repair, because the degraded genes are lost forever, even though the chromosome and the genome may be repaired. Non-homologous end joining at unequal loci in sister chromosomes might result in tandem duplication, but this would be preferable to joining the ends of a single chromosome, as it would not lead to the loss of genes and would increase the probability of genome survival. Insertion of RM gene complexes with a long target duplication (see above) (Fig. (Fig.3B)3B) may be a strategy on the side of RM systems that takes advantage of this defense reaction by the host.
An interaction between non-homologous end joining, homologous recombination, and restriction has been identified in a special type of non-homologous recombination that is dependent upon homologous interaction and type I restriction (91). As expected from this in vivo work, a type I restriction enzyme was shown to cut DNA at a Holliday structure, an intermediate in homologous recombination (92). This could represent an attempt by a type I RM complex to abort homologous recombination involving unmodified non-self DNA.
Some DNA methylase genes are not linked to a restriction enzyme gene and are thus called orphans. One such gene is the dcm gene, present in several bacteria including E.coli, whose product methylates DNA to generate 5′ CmCWGG. A linked Vsr mismatch repair function prevents the C-to-T mutagenesis that is enhanced by this methylation but promotes other types of mutagenesis (93). Why the dcm–vsr gene pair is present has to date been an enigma. However, a clue to the function of dcm was provided by a study that blocked the EcoRII RM gene complex, which recognizes the same sequence as dcm, from replicating. This blockade led to chromosome degradation and severe loss of cell viability (14). However, when dcm was placed on the chromosome, the cell killing was attenuated, while when dcm was over-expressed, cell killing was completely abolished (Y.Naito, N.Handa, A.Chinen and I.Kobayashi, unpublished). Dcm thus can defend the genome against the parasitism of EcoRII.
Some RM gene complexes [HpaII (94), XorII (95), NaeI (96)] carry a Vsr homolog. (It is not known how often a C5 RM system is associated with a Vsr homolog/analog.) It is possible that the Vsr homologs may have evolved to prevent the loss of RM recognition sites, which are the target sites of their parasitism, from the host genome. Vsr has a structure similar to that of restriction enzymes, and is likely to be related to them phylogenetically (93,97).
Some short palindromic sequences are rare in certain bacterial genomes (98,99). This has been described as ‘restriction avoidance’ as it may be an adaptation of these genomes to prevent attack by RM systems. A detailed analysis of completely sequenced genomes (100) showed that the sequences that would be attacked by the resident type II RM complex were amongst the rarest palindromes in the host cell’s genome. For example, EcoRI sites are quite rare in the E.coli chromosome (100). Palindromes recognized by restriction enzymes from other species were also avoided, albeit to a lesser extent.
When Rocha et al. (101) analyzed the oligonucleotide abundance in the complete genome of B.subtilis, they found that prophage-like elements embedded in the genome were more likely to contain several potential restriction sites than the rest of the genome. This suggests that, unlike the recently embedded prophages, the genomes themselves have evolved over past exposure to a variety of restriction enzymes to eliminate their recognition sites (101,102).
The strength of such selection would depend on the population substructure and the modes and frequency of horizontal gene transfer (103). For example, the E.coli population consists of few long-lasting clones and might be easily influenced by the attack by RM systems brought in by plasmids and bacteriophages. One can also imagine that the selection would be stronger for attack by resident RMs than for attack by invading RMs.
I have reviewed here the increasing evidence that supports the hypothesis that some RM systems behave as a simple form of life. They increase their own frequency in the cell population by destroying non-self DNAs, whether these are invading DNA or their (ex-)host chromosome. The non-self DNA is recognized as such by the absence of methylation on a recognition sequence. The observations reviewed here also support the hypothesis that RM gene complexes are mobile genetic elements, in the broad sense of the term, that sometimes move between genomes, and occasionally cause evolutionary changes in these genomes.
Our knowledge of RM systems as a minimal form of life is still fragmentary. However, their life cycle and their interaction with the host (the genome) may be studied in much the same way as other genetic elements or genomic parasites have been studied. These include homing endonucleases (104), transposons, viruses, bacteria and parasites. Post-segregational killing, or addiction (105), is just one example of the principles that can be studied in the simple biological system consisting of RM and the bacterial genome. It might turn out to be a general principle that underlies the symbiosis of a variety of genetic elements in a genome.
Homologous recombination and several other cellular mechanisms appear to be well adapted to the selfish behavior of RM complexes. Several forms of genome polymorphism that were detected when closely related genome sequences were compared may have resulted from parasite–host-type interactions between RM elements and the rest of the genome. This interaction could be a major force for genome evolution in at least some bacteria. Laboratory studies could be designed to test the hypotheses emerging from the comparison of genome sequences. The immediate subjects of study could include genome changes involving RM systems and the host machinery of recombination, repair, replication and mutagenesis.
The behavior of RM gene complexes as genomic parasites or symbionts may have also affected the evolution of the structure and function of the RM enzymes themselves. An unexpected diversity of RM enzymes has been recognized at the levels of amino acid sequence (2,7) and three-dimensional structure (106–108). Structural analysis has revealed surprising diversity in the members of the restriction enzyme family (109). This family includes an exonuclease for homologous recombination (89), Vsr mismatch repair endonuclease (93,97) and MutH mismatch repair endonuclease (110), some of which have been discussed above in the context of their interaction with restriction enzymes. Other new findings include a resolvase of the Holliday intermediate in homologous recombination (111,112) and a transposase (113). More proteins of these families will undoubtedly be found in bacterial genomes, our human genomes and other genomes during the coming years of structural genomics or proteomics, with potentially exciting implications for biology, biotechnology and medicine.
I am grateful to my collaborators in my laboratory for data, figures, database survey and discussion. This work was supported by the MEXT (Ministry of Education, Culture, Sports, Science and Technology) of the Japanese government (Genome, Recombination), NEDO and the Uehara Memorial Foundation.