|Home | About | Journals | Submit | Contact Us | Français|
Like many eukaryotes, bacteria make widespread use of postreplicative DNA methylation for the epigenetic control of DNA-protein interactions. Unlike eukaryotes, however, bacteria use DNA adenine methylation (rather than DNA cytosine methylation) as an epigenetic signal. DNA adenine methylation plays roles in the virulence of diverse pathogens of humans and livestock animals, including pathogenic Escherichia coli, Salmonella, Vibrio, Yersinia, Haemophilus, and Brucella. In Alphaproteobacteria, methylation of adenine at GANTC sites by the CcrM methylase regulates the cell cycle and couples gene transcription to DNA replication. In Gammaproteobacteria, adenine methylation at GATC sites by the Dam methylase provides signals for DNA replication, chromosome segregation, mismatch repair, packaging of bacteriophage genomes, transposase activity, and regulation of gene expression. Transcriptional repression by Dam methylation appears to be more common than transcriptional activation. Certain promoters are active only during the hemimethylation interval that follows DNA replication; repression is restored when the newly synthesized DNA strand is methylated. In the E. coli genome, however, methylation of specific GATC sites can be blocked by cognate DNA binding proteins. Blockage of GATC methylation beyond cell division permits transmission of DNA methylation patterns to daughter cells and can give rise to distinct epigenetic states, each propagated by a positive feedback loop. Switching between alternative DNA methylation patterns can split clonal bacterial populations into epigenetic lineages in a manner reminiscent of eukaryotic cell differentiation. Inheritance of self-propagating DNA methylation patterns governs phase variation in the E. coli pap operon, the agn43 gene, and other loci encoding virulence-related cell surface functions.
The word “epigenetics” is based on the Greek prefix “epi-,” denoting “on” or “in addition,” and “genetic,” meaning “pertaining to or produced from genes.” In the past, the term “epigenetics” has been used to describe the differentiation of genetically identical cells into distinct cell types to form tissues and organs during development of a multicellular organism. In current practice the word is used by biologists to describe heritable changes in gene expression that occur without changes in the DNA sequence. In the strict sense, epigenetic systems involve two or more heritable states, each maintained by a positive feedback loop. In a broader sense, however, any additional information superimposed to the DNA sequence (e.g., methylation of DNA) can be considered “epigenetic.” Here we review the current state of research in the field of bacterial epigenetics, with an emphasis on systems controlled by DNA methylation, which are the best known at the molecular level. We refer the reader to reviews covering other aspects of DNA methylation and related topics (16, 32, 51, 96, 143, 160, 172, 178, 202, 214, 264, 265, 285).
Epigenetic phenomena include prions, in which protein structure is heritably transmitted (223, 231, 235, 259); genomic imprinting, characterized by monoallelic repression of maternally or paternally inherited genes (52, 84, 128, 195, 213); histone modification, such as methylation of lysines by histone phase methyltransferases (MTases) that maintain active and silent chromatin states (132, 273); and DNA methylation patterns formed as a result of inhibition of methylation of specific DNA bases by protein binding (29, 41, 118, 262, 263). Each of these phenomena involve self-perpetuating states, be they protein or DNA related (116, 155, 230-232), and the particular state that the molecule is in affects gene expression.
Epigenetic regulation can enable unicellular organisms to respond rapidly to environmental stresses or signals. For example, the yeast prion PSI+ is generated by a conformational change of the Sup35p translation termination factor, which is then inherited by daughter cells. The PSI+ form of Sup35p allows readthrough of nonsense codons that can provide a survival advantage under adverse conditions such as growth in paraquat or caffeine (259). The PSI+ prion is a metastable element that is generated and lost spontaneously at low rates, and thus within a population of yeast, some yeast cells will carry the prion and others will not. This situation provides potential flexibility in the response of the yeast population to environmental changes, orchestrated through the ability of the PSI+ prion to act upon native Sup35p protein and convert it to prion protein (223).
Methylation of specific DNA sequences by DNA methyltransferases provides another mechanism by which epigenetic inheritance can be orchestrated. For example, in certain eukaryotes, including mammals, methylation of cytosine residues at 5′-CG-3′ (CpG) sequences facilitates binding of methyl-CpG binding proteins (134, 156, 187). In turn, methyl-CpG binding proteins affect the transcription state of a local DNA region through further interaction with chromatin-remodeling proteins (145). Methylation of CpG can affect gene expression, and the methylated state is usually correlated with transcriptional repression. The methylation pattern of a DNA region is defined as the collective presence or absence of methyl groups on specific target sites. DNA methylation patterns can vary between cells, tissues, and individuals. DNA methylation patterns are established via de novo methylation during the first stages of embryonic development (28, 81, 213). Such patterns are propagated by DNA methyltransferases known as maintenance methylases (Dnmt1), which are active on hemimethylated DNA substrates generated by DNA replication. Thus, if a DNA region contains methylated CpG sequences, they will be propagated in the methylated state. Nonmethylated CpG sequences, however, are not substrates for the maintenance DNA methylases. Thus, if a DNA region contains nonmethylated CpGs, they will tend to remain nonmethylated. A major area of research in eukaryotic epigenetic regulation is directed at understanding the mechanisms by which DNA methylation patterns are erased following cleavage of the fertilized egg and then established via de novo methylation (74, 81, 141, 180).
DNA methylation plays important roles in the biology of bacteria: phenomena such as timing of DNA replication, partitioning nascent chromosomes to daughter cells, repair of DNA, and timing of transposition and conjugal transfer of plasmids are sensitive to the methylation states of specific DNA regions (16, 160, 172, 178, 202, 285). All of these events use as a signal the hemimethylated state of newly replicated DNA, generated by semiconservative replication of a fully methylated DNA molecule. In the case of DNA replication, the protein SeqA binds preferentially to hemimethylated DNA target sites (GATC sequence) clustered in the origin of replication (oriC) and sequesters the origin from replication initiation. In addition, SeqA also transiently blocks synthesis of the DnaA protein, which is necessary for replication initiation, by binding to hemimethylated GATC sites in the dnaA promoter (36, 49, 100, 140, 146, 163, 179, 249). In DNA repair, the methyl-directed mismatch repair protein MutH recognizes hemimethylated DNA sites and cuts the nonmethylated daughter DNA strand, ensuring that the methylated parental strand will be used as the template for repair-associated DNA synthesis (8, 12, 25, 178, 227, 237). In transposition of Tn10, hemimethylated DNA plays two roles: enhancing binding of RNA polymerase to the transposase promoter and enhancing binding of transposase to its DNA target sites (144, 181, 219). DNA methylation appears to play similar roles in regulating Tn5 transposition (73, 161, 175, 217, 253, 292). None of these phenomena are heritable since the hemimethylated state of DNA is not heritable, occurring transiently in newly replicated DNA.
Phenomena involving inheritance of DNA methylation patterns are also known in bacteria, and the best-known examples involve phase variation. In phase variation, gene expression alternates between active (ON phase) and inactive (OFF phase) states. For example, uropathogenic Escherichia coli (UPEC) cells undergo pilus phase variation, which can be observed using immunoelectron microscopy with antipilus antibodies marked with colloidal gold (Fig. (Fig.1).1). Phase variation can occur through a variety of genetic mechanisms involving changes in nucleotide sequence (e.g., site-specific recombination and mutation) which result in heritably altered gene expression (1, 4, 26, 32, 33, 42, 53, 69, 75, 79, 86, 98, 113, 119, 122, 133, 164, 191, 229, 240, 244, 256, 265, 298). Bacteria also use epigenetic mechanisms to control phase variation. In all cases examined, these systems use DNA methylation patterns to pass information regarding the phenotypic expression state of the mother cell on to the daughter cells. A DNA methylation pattern is formed by binding of a regulatory protein(s) to a site that overlaps a methylation target, blocking methylation. This pattern can control gene expression if methylation, in turn, affects binding of the regulatory protein(s) to its DNA target site, which could occur by steric hindrance or alteration of DNA structure due to methylation (206, 207). Notably, most adhesin genes in E. coli are regulated by epigenetic mechanisms involving DNA methylation patterns (32, 115, 116, 262).
Little is known concerning how widespread epigenetic control is in the bacterial world and the roles that epigenetic regulatory systems play in bacterial biology, including pathogenesis. Our main goal in writing this review is to introduce the reader to epigenetic regulatory control, focusing on the main features and unique aspects of the epigenetic control systems that have been studied. The list of examples discussed below can be grouped into several classes: (i) strict-sense epigenetic inheritance involving heritable transmission of DNA methylation states to daughter cells, as in the pap operon of uropathogenic E. coli; (ii) DNA methylation signals that generate distinct epigenetic states in DNA molecules coexisting in the same cell, as in IS10 transposition and in traJ regulation; and (iii) systems that are “epigenetic” in a broader sense, since DNA methylation provides a signal for temporal or spatial control of DNA-protein interactions but does not give rise to distinct lineages of cells or DNA molecules. Examples of the last class include the control of bacterial mismatch repair by DNA methylation and the coupling of promoters to distinct DNA methylation states during the cell cycle. We hope that this will be useful not only in understanding experiments carried out to date but also as a primer for future work in bacterial epigenetics.
Most epigenetic systems known in bacteria use DNA methylation as a signal that regulates a specific DNA-protein interaction. These systems are usually composed of a DNA methylase and a DNA binding protein(s) that bind to DNA sequences overlapping the target methylation site, blocking methylation of that site. Methylation of the target site, in turn, inhibits protein binding, resulting in two alternative methylation states of the target site, methylated and nonmethylated. The epigenetic regulatory methylases known in bacteria are designated “orphan” methylases since they lack a cognate restriction enzyme. We begin by discussing restriction-modification (R-M) systems, since they are likely the progenitors of the orphan methylases regulating epigenetic processes. Indeed, DNA methylation plays a regulatory role in some R-M systems, as described below.
DNA methylation was originally discovered in the context of restriction-modification systems, in which a restriction endonuclease recognizes a specific target DNA sequence unless that sequence has been methylated by a cognate DNA methyltransferase (5, 27, 39, 153, 220, 260). Three main groups of R-M systems (types I, II, and III) have been described, based on whether the restriction and modification activities are within a single polypeptide (types I and III) or separate polypeptides (type II) and on whether the restriction enzymes cut at a site close to (types II and III) or far from (type I) the methylation target sequence (185, 221, 236, 238, 284). It has been postulated that R-M systems evolved as a form of cellular defense, targeting incoming viral and other foreign DNA sequences for degradation. Note that foreign DNAs would not be methylated at the appropriate target sites unless that sequence was derived from a bacterium with a cognate methylase of the same specificity (6, 77). In these systems, the restriction enzyme and cognate methylase are both expressed at levels that allow complete methylation of the genome, sufficient to block double-strand DNA cleavage by the restriction enzyme, a potentially fatal event. Incoming foreign DNA is efficiently destroyed, since the restriction enzyme has the upper hand over the methylase: for the DNA to survive, every restriction site it carries would have to be methylated before even a single site is cleaved by the cognate restriction enzyme, an unlikely event.
Work by Kobayashi and colleagues has suggested that R-M systems have attributes of selfish genes (148-150). Nakayama and Kobayashi showed that a plasmid containing the type II R-M EcoRV system could not be displaced from cells by an incompatible plasmid due to the death of cells that lost the EcoRV-containing plasmid, a form of postsegregational killing (186). In cells lacking the R-M gene complex, the levels of methylase and cognate restriction enzyme drop to a point where insufficient methylase is present to protect all chromosomal target sites; the restriction enzyme then cleaves one or more sites, killing the cell. This scenario is similar to that for addiction modules such as hok-sok, in which sok gene expresses an antisense RNA that inhibits translation of the hok toxin gene. When cells lose a plasmid containing hok-sok, they die; since hok mRNA is stable but sok RNA is unstable (half-life [t1/2]), <30 s), translation of hok ensues which leads to cell death (91, 92). Other addiction modules are made of two proteins, a toxin and an antitoxin (82, 90, 106).
Further analysis of the EcoRV system has shown that a regulatory gene designated “C,” sandwiched between the R and M genes, codes for a product that activates R gene expression (186). The C gene appears to be required for expression of the R gene, since postsegregational killing does not occur in C gene mutants. One function of the C gene is in establishment of an R-M system in a new host. In this case the M gene is immediately activated, allowing modification of host DNA sites. At the same time, C gene expression is also activated, building up the C protein level to a point that allows activation of R gene expression. This temporal delay in expression of the restriction enzyme is critical in allowing time for all chromosomal sites to be methylated and protected from digestion. In addition, C also functions as a suicide immunity gene, forcing expression of the R gene of an incoming closely related R-M complex with different restriction specificity, resulting in host cell death. This would be expected to prevent spread of a competing R-M complex of the same C gene immunity group (any R-M complex in which the resident C protein activates expression of an incoming R gene) within a bacterial population (250).
A second regulatory strategy used by R-M systems utilizes methylation of the cognate restriction site to control R-M transcription via a direct effect on RNA polymerase binding. For example, in the CfrBI system of Citrobacter freundii, methylation of a cytosine (underlined) within the 5′-CCATGG-3′ DNA restriction site decreases expression of the CfrBI methylase (CfrBIM) and concomitantly increases expression of the CfrBI restriction enzyme (CfrBIR) (18, 294). This appears to occur as a result of the location of the cfrBI site within the −35 RNA polymerase σ70 binding site of the cfrBIM gene. Since the cfrBIM promoter is stronger than that of cfrBIR, any bacterial cell receiving the CfrBI system will be methylated before restriction can occur. As the intracellular methylase level increases, the cfrBI site is methylated, decreasing expression of cfrBIM and enabling expression of cfrBIR. The latter may protect the cell from incoming foreign DNA lacking methylated sequences.
A third R-M regulatory mechanism utilizes the methylase itself as a feedback regulator. In a number of cases binding of the methylase to DNA occurs via an N-terminal extension containing a helix-turn-helix motif (142, 196, 197). For example, in the SsoII R-M system of Shigella sonnei, the SsoII methyltransferase (SsoIIM) represses its own synthesis and stimulates expression of the cognate restriction endonuclease (SsoIIR). Similar N-terminal extensions are present on a number of 5-methylcytosine methyltransferases, including those in the EcoRII, dcm, MspI, and LlaJI systems (142). The last system, present in Lactococcus lactis, encodes two methylases, M1.LlaJ1 and M2.LlaJ1, recognizing the complementary and asymmetric sequences 5′-GACGC-3′ and 5′-GCGTC-3′, respectively, with methylation of the internal cytosine in each case. Two LlaJI restriction sites are present 8 bp apart within the regulatory region of the llaJI operon, with one site overlapping the −35 RNA polymerase σ70 recognition site of the operon. Notably, methylation of both 5′-GCGTC-3′ sites by M2.LlaJ1 enhances binding of M1.LlaJ1, repressing transcription of the llaJI operon. The ability of the M1.LlaJ1 methylase to distinguish methylated and nonmethylated target sites provides a feedback mechanism by which expression of the llaJI operon is controlled by DNA methylation.
The analysis of regulation of the EcoRV, CfrBI and LlaJI R-M systems described above has provided insight into the evolution of epigenetic control systems that are predominantly controlled by “orphan” methyltransferases, including DNA cytosine methylase (Dcm) (202) in E. coli. It has been postulated that orphan methylases such as Dcm may have arisen by selection as vaccines against invasion of a restriction-modification complex (250). In the case of Dcm, which methylates the duplex sequence 5′-CCWGG-3′ (top strand shown; W = A or T) at the first cytosine, this methylation protects against cleavage by EcoRII. It was shown that postsegregational killing by the EcoRII R-M complex was diminished by the presence of dcm (250), which partially protected host chromosomal DNA from restriction attack. This function of Dcm as a possible molecular vaccine may be analogous to the function of cytosine methylation in certain eukaryotes, including mammals, where methylation has been postulated to inactivate transposons (293), although this hypothesis has been challenged (30). Dcm is not known to be involved in gene regulatory control. However, the other orphan methylase in E. coli, DNA adenine methylase (Dam), with homologues in other Alphaproteobacteria, does play an essential role in regulating epigenetic circuits. As well, Gammaproteobacteria have a cell cycle-regulated methylase (CcrM) which plays a major role in the control of chromosome replication and regulates expression of certain genes. In the next section we describe the biochemical properties of these DNA methylases and additional components of epigenetic switches before discussing specific epigenetic systems in detail.
Dam of E. coli is classified in the α group of DNA MTases based on the organization of 10 domains (167). The E. coli dam gene (accession no. J01600) is 834 bp and codes for a 32-kDa monomeric protein (114). Dam homologues are present in Salmonella spp., Haemophilus influenzae, and additional gram-negative bacteria (16, 204, 254). Dam binds to DNA nonspecifically as a monomer, moving by linear diffusion and specifically methylating 5′-GATC-3′ sequences. At GATC sites the adenine base is flipped out 180° into the active site of the enzyme, where it is stabilized by hydrophobic stacking with a tyrosine in the DPPY motif, which is conserved among adenine methyltransferases (123, 157). The methyl group donor, S-adenosyl-l-methionine (AdoMet), is required for stable binding of the flipped adenine in the active-site pocket of the enzyme and binds to Dam after the methylase binds DNA, transferring a methyl group to the exocyclic N6 nitrogen of adenine (261). AdoMet binds to two sites in the Dam protein: one is the catalytic center, and the other seems to be involved in an allosteric change that may increase specific binding of Dam to DNA (22). Dam appears to methylate only one of the adenosines of duplex GATC DNA sequence at a time (261). Notably, Dam shows high processivity for most DNAs; that is, after one methylation event, it slides on the same DNA molecule and carries out additional methylation events (turnovers). This high processivity effectively increases the rate of Dam methylation and may reflect the fact that there are few (<100) Dam molecules present in a single E. coli cell, yet there are about 19,000 GATC sites to methylate. Dam levels vary according to growth rate as a result of increased transcription from one of five dam gene promoters, designated P2 (158).
Based on the estimated numbers of Dam and GATC target sites per cell, each Dam molecule modifies between 20 and 100 GATC sites per minute (kcat) (261). This number is about 100-fold higher than the turnover number observed in vitro using an oligonucleotide substrate with one GATC site, indicating that there is likely some difference(s) in vivo that enables Dam to be more efficient at methylation (261). One possibility, suggested by Urig et al. (261), is that Dam is associated with the DNA polymerase III machine, scanning DNA for GATC sites as DNA replication proceeds and thus methylating DNA much more efficiently than it would in a random walk.
The processive nature of Dam contrasts sharply with DNA methylases associated with R-M systems, such as the EcoRV methylase (MEcoRV), which methylates its GATATC recognition sites distributively (95). In this case and for other R-M systems, incoming DNA needs to be restricted (cut) by the restriction enzyme before every site is methylated. The restriction enzyme has the advantage, since if just one restriction site in an incoming phage genome is left unmodified, the enzyme can cleave the DNA and block its replication. Note that restriction could be hampered if R-M DNA methylases were highly processive like Dam: processivity would increase the chances that all restriction sites in an incoming phage, for example, would be modified before restriction could occur.
Other gram-negative Gammaproteobacteria besides E. coli, including Salmonella spp., Serratia marcescens, Yersinia spp., Vibrio cholerae, Haemophilus influenzae, and Neisseria meningitidis, code for orphan MTases with significant sequence identity to EcoDam and which target adenosine of the GATC DNA sequence (162). Although Dam is not essential for growth of E. coli and Salmonella on laboratory media (14, 172, 254), the Dam homologues in Yersinia pseudotuberculosis, Yersinia enterocolitica, and Vibrio cholerae are essential gene products (135). However, a strain of Y. pseudotuberculosis in which dam mutations are viable has been described (252). It is not known what essential function(s) Dam plays in the pathogens in which it is essential, but it is provocative that both Yersinia and Vibrio contain two chromosomes, in contrast to the single chromosomes in E. coli and Salmonella spp., where Dam is not essential. A speculation is that Dam may be essential to coordinate DNA replication in bacteria with two or more chromosomes (78).
Dam homologues without a restriction enzyme counterpart are also present in bacteriophages, including Sulfolobus neozealandicus droplet-shaped virus (7), halophilic phage Ch1 (15), H. influenzae phage HP1 (204), phage P1 (61), phage T1 (9), and phage T4 (226). The last MTase, T4Dam, has been well characterized biochemically, primarily by Hattman and colleagues (123, 228). T4Dam, like EcoDam, is highly processive (169) and complements a dam mutant E. coli mutator phenotype (226). T4Dam and EcoDam may have a common evolutionary origin, sharing up to 64% sequence identity in four different regions (11 to 33 amino acids long) (105). After methylation with resulting formation of S-adenosyl-l-homocysteine, AdoMet binds to T4Dam without dissociating from the DNA duplex (299). Like EcoDam, T4Dam appears to flip out the adenosine of GATC sequence, facilitating its methylation (168).
The cell cycle-regulated DNA MTase family (CcrM) constitutes a second important group of orphan methyltransferases, classified in the β group of MTases and originally identified in Caulobacter crescentus (167, 242, 300). CcrM binds to and methylates adenosine in the sequence 5′-GANTC-3′, where “N” is any nucleotide (167, 300). Like EcoDam, CcrM is a functional monomer and acts processively (20), although evidence suggests that it is a dimer at physiologic concentration (234). However, unlike EcoDam, CcrM has a distinct preference for hemimethylated DNA as a substrate, based on the observation that the turnover rate for hemimethylated DNA containing a GANTC target site(s) was significantly higher than that for DNA containing nonmethylated sites (20). CcrM binds to and methylates adenosine in the sequence 5′-GANTC-3′, where “N” is any nucleotide. The GANTC sequence is also the target of HinfM methylase, which shares 49% identity with CcrM and whose cognate restriction enzyme HinfI from H. influenzae cuts at nonmethylated GANTC sites (300).
In Caulobacter, CcrM is an essential cell component and plays a crucial role in cell cycle regulation (20, 139, 170, 214-216, 242, 243, 300). CcrM homologues, which are likewise essential, have been found in Agrobacterium tumefaciens, the causative agent of crown gall disease in plants (137); in Rhizobium meliloti, the nitrogen-fixing symbiont of alfalfa and other legumes (286); and in the animal pathogen Brucella abortus (222). In B. abortus, aberrant CcrM expression impairs the pathogen's ability to proliferate in murine macrophages, raising the possibility that CcrM methylation might control the synthesis of virulence factors (222).
Following passage of the DNA replication fork in E. coli, GATC sites methylated on the top and bottom strands in a mother cell (denoted as fully methylated) are converted into two hemimethylated DNA duplexes: one methylated on the top strand and nonmethylated on the bottom strand and one methylated on the bottom strand and nonmethylated on the top strand due to semiconservative replication (Fig. (Fig.2A).2A). Most GATC sites are rapidly remethylated by Dam and exist in the hemimethylated state for only a fraction of the cell cycle (Fig. (Fig.2A).2A). Exceptions are the DNA replication origin oriC, the dnaA promoter, and possibly additional GATC sites in the chromosome which bind SeqA (60). SeqA preferentially binds to clusters of two or more hemimethylated GATC sites spaced one to two helical turns apart (Fig. (Fig.2B).2B). In the case of oriC, which contains a cluster of 13 GATC sites, sequestration delays remethylation and prevents binding of the DnaA protein, which controls the initiation of DNA replication. At other sites, binding of SeqA tetramers to hemimethylated GATC sites may organize nucleoid domains (100). Notably, the transcription profile of an E. coli SeqA− mutant was found to be similar to that of a Dam overproducer strain. Based on this observation, a model was developed in which Dam and SeqA compete for binding to hemimethylated DNA generated at the replication fork (159).
The half-life of hemimethylated GATC sites not bound by SeqA has been estimated to be between 0.5 and 4 min, based on analysis of synchronized E. coli cells and monitoring the methylation status with restriction enzymes DpnI, which cuts fully methylated GATC sites; MboI, which cuts fully nonmethylated sites; and Sau3AI, which cuts GATC sites regardless of methylation state (50). In contrast, analysis of the origin of replication in the colicinogenic plasmid ColE1 indicated that remethylation of hemimethylated GATC sites occurs within a few seconds of passage of the replication fork (241). Notably, remethylation appeared to occur asynchronously, with methylation at GATC sites on the leading replication arm occurring more rapidly than GATC methylation on the lagging arm (about 2 seconds versus 4 seconds), suggesting that remethylation on the lagging arm occurs after ligation of Okazaki fragments. The reason for the discrepancy in estimation of the half-life of GATC sites is unclear but could reflect differences in chromosomal versus plasmid replication. For chromosomal replication the DNA polymerase III replication machinery is stationary, bound to the cytoplasmic membrane with DNA moving through it (154, 179). It is possible that Dam is present in a complex bound near the origin, methylating nascent DNA sequences as they arise.
The presence of hemimethylated GATC sites provides a signal that DNA replication has just occurred and plays a role in diverse cellular processes. For example, in methyl-directed mismatch repair the MutH protein binds to nonmethylated GATC sites and cleaves the nonmethylated DNA strand, ensuring that mutations in the daughter DNA strand are repaired using the parental strand as a template. In the absence of Dam, MutH can cleave the daughter strand, the parental strand, or both DNA strands. If the cell survives double-strand DNA breakage, 50% of the time the mutant daughter strand is used as a template to “repair” the parental strand, resulting in fixation of a mutation into the DNA (172, 285). Hemimethylated GATC sites are also used to control rates of transposition of insertion sequences IS3, IS10, IS50, and IS903 as well as transposons Tn5, Tn10, and Tn903 (73, 217, 219, 292). Elegant studies from Kleckner's laboratory showed that hemimethylated GATC sites control IS10 transposition in two different ways (181, 219). First, a GATC site present at bp −67 to −70 (here designated GATC-68) within the −10 module of the transposase promoter pIN controls transcription of the transposase gene. Full methylation of the GATC-68 inhibits RNA polymerase binding, reducing the level of tnp IS10 transcription. A second GATC site at bp 1320 to 1323 (GATC-1321) near the inner terminus of IS10 controls binding of transposase. Full methylation of GATC-1321 blocks transposition by inhibiting transposase binding. These two effects of DNA methylation on transposase expression and binding effectively limit IS10 transposition to a brief period immediately following DNA replication when GATC-68 and GATC-1321 are hemimethylated. Remarkably, the two hemimethylated IS10 DNAs have different transposition activities: IS10 methylated on the template strand is about 330 times more active than IS10 methylated on the nontemplate strand and 1,000 times more active than fully methylated IS10 (219). The majority of this difference is due to increased binding of transposase at the inner IS10 terminus; in addition, activation of the transposase promoter is more efficient in the IS10 hemimethylated species whose template strand is methylated. Since transposition of Tn10 does not involve the inner terminus, stimulation of Tn10 transposition following DNA replication is less efficient than for IS10 (219).
Like that of Tn10, transposition of IS50 and of Tn5 is stimulated by DNA replication (175). GATC sites are present within the inside end (IE) of IS50, similar to the case for IS10, and within the −10 region of the transposase regulatory region (73, 253, 292). In both IS50 and Tn5, Dam methylation represses tnp promoter activity and transposase binding to the IS50 IE (73, 253, 292). Increased transposition of IS50 and Tn5 in a Dam− host requires integration host factor (IHF), probably to compensate for a DNA conformational defect associated with the lack of Dam (165). In turn, binding of Fis (factor for inversion stimulation) to the IE inhibits IS50 transposition (276). Methylation of three GATC sites within the Fis recognition sequence inhibits Fis binding. Thus, immediately following DNA replication, Fis binds to the IE, inhibiting IS50 transposition, and counteracts the positive effects of the hemimethylated state on IS50 transposition. In contrast, Tn5 transposition is not inhibited by Fis, since it does not use IE (276).
DNA hemimethylation may regulate transcription of additional genes that contain GATC sites within their promoter regions. The list includes glnS, sulA, trpS, trpR, and tyrR of E. coli and cre of bacteriophage P1 (16, 172, 205, 246). Expression of these genes was increased in the absence of Dam, suggesting that GATC methylation may decrease binding of RNA polymerase. The possible physiologic significance of methylation of these sites is not known, but it could tie gene expression to the replication state of the cell, increasing transcription immediately after passage of the replication fork. In the case of the trpR gene, which encodes the repressor of the trp operon, an attractive speculation has been proposed by M. G. Marinus: because trpR is located between the origin of replication and the trp operon, a transient boost in trpR transcription might provide the increased concentration of repressor necessary to maintain repression when chromosome replication doubles trp operon dosage (171).
About 16 years ago, Blyn et al. discovered that one of two GATC sites within the regulatory region of the chromosomally encoded pyelonephritis-associated pilus (pap) operon of uropathogenic Escherichia coli (UPEC) was heritably nonmethylated, depending upon the pilus expression state of the cells (34). When DNA was isolated from cells expressing pyelonephritis-associated pili (Pap pili) (ON-phase cells), it was found that a GATC site proximal to the pap pilin promoter was methylated, whereas the promoter-distal GATC site was nonmethylated. This DNA methylation pattern characteristic of ON-phase cells differed from that of OFF-phase cells, which contained the converse pattern where the GATC site proximal to the pap pilin promoter was nonmethylated and the promoter-distal GATC site was methylated. The term “nonmethylated” is defined here as a state in which the GATC target of DNA adenine methylase is not methylated on either the top or bottom DNA strand, constituting a DNA methylation pattern analogous to those observed in mammalian cells (34). Since the term “unmethylated” might imply that an active demethylation has occurred, we prefer use of “nonmethylated” to describe DNA lacking a methyl group on both the top and bottom DNA strands. The phenomenon of demethylation, which occurs in eukaryotes to reset the DNA methylation pattern after zygote formation (88, 147), has not been reported to occur in prokaryotes. DNA methylation patterns are formed in bacteria by binding of a protein(s) at a DNA site(s) overlapping or near a GATC site(s), preventing methylation of that site(s) throughout the cell cycle (Fig. (Fig.2C).2C). A direct role for DNA methylation patterns in the heritable control of gene expression in bacteria was first shown in the Pap system (41).
Further analysis of DNA methylation patterns in E. coli showed that multiple GATC sequences (ca. 36 sites) in the genome of E. coli K-12, which lack pap DNA sequences, were stably nonmethylated (218, 272). These sites were identified by digestion of chromosomal DNA with MboI, which cuts at nonmethylated GATC sites. Since nonmethylated GATC sites are rare, the DNA fragments generated by MboI digestion are too large to be resolved by conventional agarose gel electrophoresis. Pulsed-field gel electrophoresis was used to resolve these fragments; however, the DNA sequences flanking the nonmethylated GATC sites were not determined. Ringquist and Smith (218) also showed for the first time that a number of Dcm target sites [CC(A/T)GG; the second cytosine is methylated at the C-5 position] were stably nonmethylated.
Wang and Church analyzed Dam DNA methylation patterns to assess the binding of proteins to chromosomal DNA sites. Chromosomal DNA was digested with MboI and ClaI and cloned into pBluescript, which enabled the nonmethylated GATC sites to be sequenced (272). Since binding of proteins such as catabolite gene activator protein (CAP) is dependent upon environmental conditions via the secondary regulator cyclic AMP (cAMP), DNA methylation patterns within the regulatory regions of genes bound by cAMP-CAP and other regulatory factors were found to be environmentally controlled (218, 251). For example, a GATC sequence within the regulatory region of the car operon, controlling carbamoyl phosphate synthetase and involved in arginine and pyrimidine anabolism, was found to be protected from Dam methylation (272). This nonmethylated GATC site and others are listed in Table Table1,1, with the chromosomal position (bp 29444 for the GATC near the carA gene) in E. coli MG1655 (a K-12 isolate) also shown. No protection of the car GATC site was detected in the absence of pyrimidines, consistent with the hypothesis that a pyrimidine repressor(s) binds to the car promoter region near or overlapping the GATC site, protecting it from methylation. Indeed, CarP and IHF were shown to bind in the regulatory region of carAB and protect GATC-207 (Table (Table1)1) from methylation (54).
Another nonmethylated GATC site identified was in the gut (also known as srl) operon, controlling uptake of the alcohol sugar glucitol (bp 2823768). A binding site for CAP was identified near the nonmethylated GATC site located at −44.5 (GATC-44.5) relative to the transcription start site (263), suggesting the possibility that binding of CAP to the gut promoter blocks methylation of the GATC −44.5 site (note that in Table Table11 this GATC site is 86 bp upstream of the AUG start site for gutA and is thus labeled “−86”). Analysis of DNA methylation in E. coli containing a deletion of the crp gene, coding for CAP, showed that methylation protection of the GATC-44.5 was reduced from 95% in crp+ cells to 50% in Δcrp cells. These data supported the hypothesis that CAP contributes to methylation protection of GATC-44.5 in vivo. However, further analysis of the gut operon showed that although cAMP-CAP binds to sites overlapping GATC-44.5, CAP does not protect this site from Dam methylation (263). Instead, the GutR repressor, which also binds at GATC-44.5, blocks methylation of this site both in vitro and in vivo. GutR-dependent protection of methylation of GATC-44.5 in vivo was not observed in the presence of glucitol, an activator of gut transcription, indicating that under these conditions GutR was no longer bound at GATC-44.5, allowing methylation of this site by Dam. However, methylation of GATC-44.5 did not affect binding of GutR to the gut regulatory region. These results led to the conclusion that although methylation protection indicates the presence of a DNA binding site in vivo, the absence of methylation protection of a GATC site does not prove the absence of binding of a protein at that site (263).
Wang and Church also identified nonmethylated GATC sites within the mtl (mannitol, bp 3769597), cdd (deoxycytidine deaminase, bp 2229798), flh (flagellar synthesis, bp 1976481), psp (stress response, bp 1366007), and fep (iron transport, bp 621523) operons (272). Using a similar approach in which nonmethylated GATC sites in the E. coli chromosome were cloned by digestion with MboI and AvaI, Hale et al. identified four nonmethylated GATC sites in the regulatory regions of the ppiA (bp 3490085), yhiP (bp 3638351), rspA (bp 1653241), and b1776 (bp 1859455) genes (99). Protection of the ppiA GATC site was dependent upon growth phase and carbon source. Protection of a GATC site near yhiP required leucine-responsive regulatory protein (Lrp) and was leucine responsive, similar to the case for some operons controlled by this global regulator (44, 68, 188, 189). The other GATC sites were protected under all the environmental conditions examined (99). A more comprehensive approach to identification of nonmethylated GATC sites was undertaken by Tavoizoie and Church (251); this approach allowed 12 additional sites to be identified, all of which were located within 5′ noncoding regions of genes and open reading frames (Table (Table11).
Recent work by Blomfield's group on fim regulation controlling type 1 pili has identified two nonmethylated GATC sites at bp 4537512 and 4538525 in the E. coli chromosome near yjhA that are stably nonmethylated, separated from the fim locus by 1.4 kilobase pairs (80). These GATC sites are located near cis-active element regions 1 and 2, both of which play positive roles in transcription of the fimB recombinase gene, controlling type 1 pilus phase variation together with FimE (239). Binding of two regulatory proteins, the NanR sialic acid-responsive regulator and NagC, the N-acetylglucosamine-responsive regulatory protein, is required to activate fimB expression. Binding of NanR to region 1 blocks methylation of one adjacent GATC site, and binding of NagC to region 2 blocks methylation of the second GATC site. Only a fraction of the two GATC sites are nonmethylated after growth in glycerol minimal medium (239). Methylation protection of these GATC sites is not observed after addition of sialic acid (also known as N-acetyl-neuraminic acid). This likely occurs via inhibition of NanR binding, which is sensitive to sialic acid and inhibition by NagC via binding of N-acetylglucosamine-6-phosphate generated by sialic acid catabolism. Thus, binding of NanR and NagC controls methylation of two GATC sites adjacent to yjhA, likely by steric hindrance of Dam. However, mutation of the GATC site adjacent to region 1 did not affect fimB expression (239), indicating that methylation of this GATC site does not, in turn, modulate NagC binding. Moreover, in a dam mutant, expression of fimB is decreased, the opposite of what would be expected if GATC methylation inhibits NagC and NanR binding. These results indicate that the reported regulation of fim expression by Dam (199) does not occur via methylation of the GATC sites located near regions 1 and 2 adjacent to fim.
In summary, a small fraction of the approximately 20,000 GATC sites in the E. coli chromosome are totally or partially nonmethylated in any given growth state and environmental condition. The protection of GATC site methylation by Dam is dependent upon competition between Dam and specific DNA binding proteins. Dam appears to methylate most GATC sites in a highly processive manner, as discussed above. Recently, however, analysis of methylation of the regulatory GATC sites in the pap operon indicates that they are not methylated processively (32) . That is, Dam binds to pap DNA, methylates one GATC site, and then dissociates before methylating the second site. This effectively reduces the ability of Dam to compete with proteins that bind to DNA sequences containing one or more GATC sites. Bergerat et al. first proposed that DNA sequences surrounding GATC sites may dictate the avidity of Dam for its target sites (23). Mutation of the AT-rich flanking sequences of the pap GATC sites to CG sequences increased processivity, which appeared to be due to changes in the kinetics of methyl transfer and not in binding affinity (203). Analysis of known nonmethylated GATC sites tentatively suggests a trend toward having AT-rich flanking sequences, though this is not always the case (Table (Table11).
Since DNA methylation patterns are formed as a result of binding of proteins primarily at gene regulatory regions, they are altered by growth conditions that affect regulatory protein level(s) and/or DNA binding properties. As discussed above, identification of nonmethylated GATC sites has been used as a sort of natural in vivo footprint system to track binding of regulatory proteins under different environmental conditions (251, 272). In addition, it is clear that a subset of nonmethylated GATC sites (for example within the pap, sfa, daa, agn43, and other operons [see below]) play important roles in epigenetic regulation. In these systems, not only is a DNA methylation pattern established by protection of specific GATC sites by a regulatory protein(s), but methylation of the GATC site(s), in turn, modulates regulatory protein binding (263). This results in two heritable states: either the regulatory protein is bound to a specific DNA sequence containing a GATC site(s), protecting it from methylation, or the regulatory protein is not bound due to a reduction of binding affinity for target sequence(s) caused by GATC methylation. Clearly, only a subset of all nonmethylated GATC sites have these particular properties and are involved in epigenetic control systems. For example, as shown in Table Table1,1, DNA methylation patterns have been shown to directly control expression of agn43 (111, 271) but do not control the gut (srl) operon (263) and do not appear to directly regulate fim (239). Further study will be necessary to determine if any of the other genes containing nonmethylated GATC sites in their regulatory regions are under methylation pattern control (Table (Table11).
In the sections below we describe the current state of knowledge regarding how DNA methylation controls bacterial gene expression. Our focus for each methylation-controlled operon is on aspects of regulation affected by methylation and not on complete descriptions of regulatory networks.
Pyelonephritis-associated pili play an important role in attachment of UPEC to uroepithelial cells lining the upper urinary tract, facilitating colonization of the kidneys. Pap pilus expression switches on and off within individual cells in the bacterial population, a process known as phase variation. The biological role of Pap pilus phase variation is not known, but possibilities include (i) escape from immune detection; (ii) facilitation of a bind-release-bind series of events in which successive generations of bacteria ascend the urinary tract; and (iii) controlling growth of UPEC by modulating the effects of contact-dependent growth inhibition, a newly described bacterial phenomenon (3).
DNA adenine methylase controls Pap phase variation by methylation of two GATC sites, one proximal to the pap pilin promoter (GATCprox), located 53 bp from the papBA transcription start site, and the other located 102 bp upstream of GATCprox, designated GATCdist (Fig. (Fig.3A).3A). Note that these two GATC sites are located within Lrp DNA binding site 2 and site 5, respectively. Methylation at these two pap GATC sites controls the binding of the global regulator Lrp (44, 189) and the coregulatory protein PapI (118, 138) to pap DNA sites 1, 2, and 3 proximal to the papBA pilin promoter and to sites 4, 5, and 6 distal to papBA. Lrp appears to bind cooperatively to sites 1, 2, and 3 or to sites 4, 5, and 6 (193). Binding to all six sites can be achieved in vitro by addition of sufficient Lrp but rarely occurs in vivo based on analysis of the methylation states of GATCprox and GATCdist (41). In ON-phase cells GATCdist is nonmethylated and GATCprox is methylated (41) (Fig. (Fig.3D).3D). Protection of GATCdist from Dam methylation requires both Lrp and PapI based on the observation that GATCdist is fully methylated in either an lrp or a papI mutant (40, 41). In contrast, OFF-phase cells display the converse DNA methylation pattern in which GATCprox is nonmethylated and GATCdist is methylated (Fig. (Fig.3A).3A). Protection of GATCprox requires Lrp but not PapI (41, 263). Based on these in vivo DNA methylation patterns together with in vitro studies of Lrp binding, it was concluded that in ON-phase cells PapI-Lrp binds to sites 4, 5, and 6, protecting GATCdist from Dam, and in OFF-phase cells Lrp binds to sites 1, 2, and 3, protecting GATCprox from Dam (41). These DNA methylation patterns result from competition between Dam and Lrp for binding at sites 1, 2, and 3 and at sites 4, 5, and 6, containing GATCprox and GATCdist, respectively, as discussed in detail below.
In Fig. Fig.3A3A (lower section), pap regulatory DNA with the OFF-phase DNA methylation pattern is depicted: GATCdist is fully methylated, and GATCprox is fully nonmethylated as a result of binding of Lrp at pap sites 1, 2, and 3 overlapping GATCprox. Transcription from papBA is blocked by binding of Lrp at sites 1, 2, and 3 overlapping the promoter, likely as a result of steric hindrance of RNA polymerase binding (278). The OFF-phase state is stabilized by two main factors: mutual exclusion and DNA methylation. Binding of Lrp at sites 1, 2, and 3 reduces the affinity of Lrp for pap sites 4, 5, and 6 (overlapping GATCdist) by 10-fold via a phenomenon that has been denoted “mutual exclusion” (116). Mutual exclusion requires a supercoiled pap substrate by an unknown mechanism. One possibility is that Lrp could induce bending at sites 1, 2, and 3, propagating an alteration in twist to sites 4, 5, and 6. Methylation of GATCdist reduces the affinity of Lrp for sites 4, 5, and 6 by about 20-fold based on in vitro DNA binding measurements (118). In addition, there is an intrinsic twofold-higher affinity of Lrp for sites 1, 2, and 3 versus 4, 5, and 6. These factors contribute to stabilization of the OFF-phase Pap expression state (116).
The transition from the OFF to ON phase requires that GATCprox be methylated by Dam; either a dam mutant E. coli strain or a GCTCprox A-to-C transversion mutant that cannot be methylated by Dam but does not significantly alter the affinity of Lrp for sites 1, 2, and 3 is locked in the OFF phase (41). In contrast, methylation of GATCdist has an inhibitory effect on the OFF-to-ON switch: overexpression of Dam by just fourfold prevents the OFF-to-ON switch. Moreover, E. coli containing a GCTCdist mutation that blocks Dam methylation is locked in the ON phase, even under conditions of Dam overexpression (41). These data support the hypothesis that OFF-to-ON switching requires DNA replication to generate a hemimethylated GATCdist intermediate, which is bound by PapI-Lrp with a higher affinity than DNA with a fully methylated GATCdist (118). A low level of the coregulatory protein PapI, required for Pap pili expression (138, 193, 194), increases the affinity of Lrp for pap DNA hemimethylated at GATCdist but does not enhance binding of Lrp to pap DNA fully methylated at GATCdist (118). Notably, the hemimethylation state of pap matters: PapI increases Lrp's affinity for DNA methylated on the top strand at GATCdist about fourfold more than for DNA methylated on the bottom strand (118). These results raise the intriguing possibility that Pap phase switching may be biased: daughter cells receiving a DNA methylated on the top strand may have a higher probability of switching to the ON phase than cells receiving DNA methylated on the bottom strand.
PapI is a small (ca. 9-kDa) coregulatory protein expressed from the papI promoter divergent to the papBA pilin promoter (Fig. (Fig.3A,3A, top). PapI increases the affinity of Lrp for pap site 5, and to a lesser extent site 2, but has no effect on binding of Lrp to any of the other four Lrp binding sites (118) (Fig. (Fig.3C).3C). pap Lrp binding sites 5 and 2 share the sequence “ACGATC,” which differs from the other four pap Lrp binding sites and the ilvIH Lrp binding site 2 (65, 129, 138), which do not display PapI-dependent Lrp binding (118). All pap Lrp binding sites share the sequence “GNNNTTT” with the Lrp binding consensus determined by systematic evolution of ligands by exponential enrichment (64).
PapI does not appear to bind specifically to pap DNA by itself, based on gel shift analysis (138) and DNA cross-linking (118). DNA methylation interference indicated that methylation of bases in the sequence 5′-GNCGAT-3′ overlapping GATCdist in the top strand and 3′-TGCTAG-5′ in the bottom strand significantly reduced PapI-dependent Lrp binding compared with binding of Lrp alone. Methylation of the bottom-strand cytosine complementary to the guanine of “GATC” (meC9) blocked formation of the ternary PapI-Lrp-pap site 5 complex without affecting Lrp binding (118). These results support the hypothesis that enhancement of Lrp binding to site 5 occurs via formation of a PapI-dependent ternary complex with Lrp and pap DNA. Cross-linking with a photoactivatible 9-Å azidophenacyl cross-linker three bases from the presumptive PapI binding sequence “ACGATC” showed that PapI and Lrp were both cross-linked to pap DNA in the ternary complex with nonmethylated DNA, while only Lrp was cross-linked with DNA methylated at C9 (118). These results indicate that PapI is located near the pap ACGATC sequence in the PapI-Lrp-pap site 5 ternary complex and may directly contact this sequence.
The observation that PapI (100 nM) increases Lrp's affinity for pap site 2 (which contains the ACGATC PapI-specific sequence identical to site 5) (118) presents an apparent paradox, since this should block pap transcription due to its close proximity to the papBA pilin promoter (278). Further analysis showed that at low PapI levels significant enhancement of Lrp binding occurred at sites 4, 5, and 6 (CGATCdist) but not at sites 1, 2, and 3 (CGATCprox) (118). At 5 nM PapI, the affinity of Lrp was fourfold higher for pap sites 4, 5, and 6 (Kd = 0.25 nM) than for sites 1, 2, and 3 (Kd = 1.0 nM). Conversely, in the absence of PapI, the affinity of Lrp for sites 1, 2, and 3 (Kd = 1.2 nM) was about twofold higher than that for sites 4, 5, and 6 (Kd = 2.5 nM). Thus, binding of Lrp at sites 4, 5, and 6 should be favored at low PapI levels, resulting in activation of papBA transcription. This, in turn, would increase the PapI level via a PapB-mediated positive feedback loop whereby PapB binds upstream of the papI promoter and helps activate PapI expression (11, 85, 288) (Fig. (Fig.3B).3B). High PapI levels could potentially shut off pap transcription by increasing the binding of PapI-Lrp complexes at promoter-proximal sites 1, 2, and 3. However, this is prevented by methylation of GATCprox by Dam, which specifically blocks PapI-dependent Lrp binding without affecting binding of Lrp alone (118).
To determine if the essential role of methylation of GATCprox in the OFF- to ON-phase transition is to specifically block PapI-dependent Lrp binding to sites 1, 2, and 3, the wild-type CGATCprox sequence was mutated to TGATCprox to specifically inhibit PapI-dependent Lrp binding. It was reasoned that under conditions in which PapI-dependent binding of Lrp to sites 1, 2, and 3 was blocked, switching from OFF to ON phase should occur in the absence of Dam. Analysis of the TGATCprox mutant showed that PapI-dependent Lrp binding to sites 1, 2, and 3 was inhibited but binding of Lrp was unaffected both in vitro and in vivo. Switch frequency analysis of E. coli containing the TGATCprox mutation showed that the OFF-to-ON rate (5.6 × 10−4/cell/generation) was about sevenfold higher than that of wild-type cells (8.2 × 10−5/cell/generation). Notably, in a dam null mutant background cells were locked in the ON-phase state, showing that methylation is not required for pap transcription under conditions in which PapI-dependent binding of Lrp to pap site 2 containing GATCprox is blocked. These results support the conclusion that methylation at GATCprox is required for the OFF- to ON-phase transition by specifically inhibiting PapI-dependent Lrp binding to sites 1, 2, and 3 (Fig. (Fig.3C,3C, top).
Binding of Lrp at sites 4, 5, and 6, together with binding of cAMP-CAP at −215.5 (relative to the papBA transcription start site) (277), enhances papBA transcription via contact between CAP activating region 1 and the αC-terminal domain of RNA polymerase (277). In this way, Pap pilus expression is environmentally controlled by carbon source via the cAMP level. The role of Lrp may be structural, bending pap DNA between the CAP binding site at −215.5 and the papBA promoter to facilitate contact between cAMP-CAP and the αC-terminal domain. This results in transcription initiation from papBA and expression of PapB, which has been reported to bind with highest affinity to a site between the papI promoter and the CAP binding site (85), stimulating papI transcription, which constitutes a positive feedback loop (Fig. (Fig.3D).3D). The high PapI level ensures binding of PapI-Lrp to sites 4, 5, and 6, and methylation of GATCprox prevents binding of PapI-Lrp to sites 1, 2, and 3, which would shut off papBA transcription and turn the switch OFF (278). The fact that both PapI and PapB are required for switching from the OFF to ON phase raises a chicken-and-egg problem that has not been adequately addressed: which regulatory factor initiates the switch? We speculate that regulation is at the level of PapB expression and that a low level of papBA mRNA is made following DNA replication and Lrp/H-NS dissociation from sites 1, 2, and 3 (266). If this papBA mRNA is rapidly translated, it would induce papI transcription, initiating the OFF-to-ON switch cascade. There is indirect evidence to support the idea that there may be translational control involved in Pap pilus expression, since a rimJ mutation affects pap gene regulation (280-282). RimJ acetylates ribosomal protein S5 in the 30S subunit. Thus, it is possible that ultimately the initiation of the Pap OFF-to-ON switch may be dependent upon the translation of a basal level of papBA mRNA present immediately following DNA replication.
The global regulatory protein H-NS is not required for Pap phase variation (266), but it does modulate Pap gene expression and Pap switch rates. H-NS represses papBA transcription in response to low temperature (94), high osmolarity (283), and rich medium (283). This may occur by specific binding of H-NS to the pap regulatory region, as evidenced by blocking of methylation of both pap regulatory GATC sites in vitro and in vivo (279). Binding of H-NS near the papBA promoter could inhibit binding of RNA polymerase, repressing transcription. Notably, at 37°C H-NS appears to positively affect Pap phase variation, since the OFF-to-ON switch rate is reduced in an hns mutant (266, 283). This positive effect of H-NS on the OFF- to ON-phase transition could occur via competition with Lrp at sites 1, 2, and 3, which would help to move PapI-Lrp to sites 4, 5, and 6, analogous to the role of methylation of GATCprox (Fig. (Fig.3C3C).
Another environmental input into Pap phase variation is mediated by the CpxAR response regulatory system (117, 127). Under certain conditions that stress the cell envelope, including high pH, CpxA located in the inner membrane autophosphorylates and then transfers a phosphate group to CpxR to yield CpxR-phosphate (CpxR-P) (176, 211). CpxR-P binds to sites overlapping all six pap Lrp binding sites, competes with Lrp for binding to these sites, and shuts off papBA transcription and Pap pilus expression (115, 117). Notably, CpxR-P binding to pap sites 1 to 6 is not inhibited by DNA methylation, in contrast to Lrp, even though CpxR-P, like Lrp, binds at sites overlapping the pap GATCprox and GATCdist sites. The biological role of CpxAR regulation of Pap pilus expression is not fully clear. One possibility is that under conditions of envelope stress it makes sense to curtail pilus expression to prevent further damage to the membrane. Another provocative possibility is that under conditions of stress UPEC cells stop making Pap pili, making them susceptible to contact-dependent growth inhibition (3). The physiologic significance of this is unknown, but it might contribute to survival under harsh conditions by slowing bacterial metabolism and growth (3).
The Pap ON- to OFF-phase transition occurs at about a 100-fold-higher rate than the OFF- to ON-phase transition (35, 266). Notably, factors including H-NS, carbon source, and osmolarity do not affect the ON- to OFF-phase transition rate (35, 266, 283); therefore it appears that the ON- to OFF-phase transition is relatively constant under different environmental conditions. The ON- to OFF-phase transition has not been thoroughly examined, but based on knowledge of the OFF-to-ON switch mechanism (116-118) (see above), the following model is postulated. Starting with a cell in the ON-phase state (Fig. (Fig.4A),4A), DNA replication is postulated to dissociate PapI-Lrp from sites 4, 5, and 6, enabling Dam to compete with Lrp for binding at GATCdist (Fig. (Fig.4C)4C) Methylation of GATCdist is essential for the OFF-phase state (41). DNA replication also generates two hemimethylated GATCprox sites, one methylated on the top strand and one on the bottom strand (Fig. (Fig.4B).4B). Whether a cell remains in the ON phase or transitions to the OFF state may be dictated by competition of Lrp for binding to pap promoter-proximal sites 1, 2, and 3 versus distal sites 4, 5, and 6 (Fig. (Fig.4B).4B). Lrp has about a twofold-higher affinity for the proximal sites than for distal sites, and methylation of GATCprox does not affect Lrp binding to these proximal sites (118). In contrast, methylation of GATCdist inhibits binding of Lrp and PapI-Lrp to the distal sites (118, 194). These two factors should favor binding of Lrp to the proximal sites over the distal sites, which may account in part for the high ON-to-OFF rate observed. Following one additional round of DNA replication, the OFF-phase state is attained (Fig. (Fig.4D4D).
Clearly, the Pap epigenetic switch mechanism is complex, involving distinct DNA methylation and protein-DNA binding states. Therefore, it would be highly useful to have a mathematical model that could predict switch rates under a variety of conditions and identify the key regulatory step(s) determining switch outcome. Liao and coworkers have developed a model for Pap phase variation that takes into account many of the protein-protein and protein-DNA interactions of Lrp, PapI, and Dam described above (131, 297). To rigorously test a model, one would need to alter cellular levels of PapI, Lrp, and Dam and experimentally determine switch rates. In addition, a useful model should be able to predict switch outcomes when the affinities of PapI, Lrp, and Dam for pap DNA have been altered, for example. Although these types of analyses have not yet been carried out, preliminary data suggest that the Markov chain model for Pap may be useful in understanding Pap switch dynamics. However, the frequency of ON-state cells in the population was underestimated, for example (297). Reliable numbers for biochemical parameters of the Pap switch, such as association and dissociation binding constants for PapI-Lrp, Lrp, and Dam at sites 1, 2, and 3 and at sites 4, 5, and 6, and have not yet been obtained. This makes it difficult to determine if the Pap model does not accurately reflect experimental data due to incorrect biochemical parameters used in the model or because assumptions in the model are incorrect or incomplete. Recently, another Pap switch model was developed by Munsky and Khammash (183, 184). Further work as outlined above will be necessary to test these models and determine if they are useful in furthering our understanding of the Pap switch and other epigenetic switch systems (see below).
Analysis of pilus operons containing regulatory regions with homology to pap indicates that there are two groups: those that are positively regulated by PapI homologues, similar to the pap system, and those negatively regulated by PapI homologues.
The regulatory regions of many pilus operons in E. coli, including Pap-related fimbriae (Prf), foo (F1651 pili), clp (CS41 pili), sfa (S pili), daa (F1845), fae (K88), and afa (afimbrial adhesin), share two GATC sites analogous to GATCprox and GATCdist and spaced 102 base pairs apart as in pap (151) (Fig. (Fig.5).5). Moreover, these GATC sites are present within additional conserved sequences, “CGATCdistTTTT” and “CGATCproxTT,” with the entire sequence called a “GATC box” (note the inverse orientations of the GATC boxes in the pilus regulatory sequences shown in Fig. Fig.5).5). Since the GATC box sequence contains binding sites for Lrp and Dam, as well as a portion of the PapI response element “ACGATC,” this provides the means by which these various pilus operons are controlled by DNA methylation patterns.
The sfa, daa, prf (pap-related fimbria), and afa-3 operons appear to be regulated by DNA methylation patterns, analogous to regulation of pap. Each of these pilus operons codes for a PapI and a PapB homologue, and cross-complementation between the PapB and PapI homologues between prf and sfa (182) and between pap and sfa and daa (267) was shown. The DaaF and SfaC proteins function similarly to PapI, positively regulating expression of daa and sfa, respectively, by facilitating binding of Lrp to promoter-distal binding sites overlapping GATCdist (267). Methylation of the pap-related GATC sites, in turn, controls binding of Lrp.
Two methylation-controlled pilus operons in E. coli, clp (CS31A) and fae (K88), and one pilus operon in Salmonella enterica serovar Typhimurium, pef, share common regulatory features with pap but have distinct differences as well. The regulatory regions of clp, fae, and pef contain conserved GATC box sites and spacing identical to that in pap (Fig. (Fig.5).5). Also similar to pap, binding of Lrp to regulatory DNA is controlled by DNA methylation and a PapI homologue. However, all three methylation-controlled operons are carried on plasmids, and in each case PapI homologues negatively control phase variation and transcription.
K88 pili, expressed by enterotoxigenic E. coli infecting pigs, is not under phase variation control, in contrast to the case for all other Pap family members, (124). The fae regulatory region shares GATC box sequences with pap, spaced 102 bp apart, as well as a PapI homologue, FaeA, and a PapB homologue, FaeB (124). A third regulatory GATC site (GATC-III) is present 28 bp downstream (toward the faeB promoter) of GATCprox, and two IS1 sequences are present between faeB and faeA (Fig. (Fig.5).5). In contrast to the case for pap, FaeA and Lrp act to negatively control fae transcription. Data from Huisman et al. indicated that in the absence of FaeA, Lrp binds at sites overlapping GATCprox, protecting it from methylation by Dam (124, 125). However, in contrast to the case for pap, this Lrp binding has little effect on pilin transcription. In the presence of FaeA, the PapI homologue, additional binding of Lrp near GATC-III occurs, blocking methylation of both GATCprox and GATC-III and reducing fae transcription. This GATC-III site shares the “CGATCTTTTA” sequence of the pap and fae GATCdist sites, though in opposite orientation, possibly accounting for FaeA-mediated binding of Lrp to this region. However, FaeA-mediated binding of Lrp to GATCdist was not observed. In fact, mutation of the GATCdist site to GTTC sequence was lethal due to overproduction of K88 pili, indicating that methylation of GATCdist normally blocks binding of FaeA-Lrp. Whether FaeA-Lrp binds to GATCdist under normal physiologic conditions is not clear, but it is possible that binding to a hemimethylated GATCdist site might occur immediately following DNA replication, stimulating K88 expression under certain conditions. Another difference between regulation of fae and pap is in control of faeA and of papI transcription. In the case of pap, papI is regulated by PapB via a positive feedback mechanism (116), whereas in fae, an IS1 insertion apparently disrupts this positive feedback. Instead, FaeA may bind to its own promoter, acting as a positive autoregulator (125).
Regulation of the clp operon, coding for CS31A pili, which are expressed by enterotoxigenic E. coli, shares common regulatory features with pap but, like for fae and pef, has distinct differences as well. In E. coli isolate CS31A harboring clp, CS31A pili are under phase variation control, yet the plasmid-carried clp operon does not have a papI homologue associated with it (62, 173). It seems likely that a pap operon identified on the chromosome of E. coli CS31A supplies PapI in trans, but this has not been confirmed. Analysis of clp regulation in E. coli K-12 (no papI homologue present) showed that Lrp and the PapB homologue ClpB repressed clp transcription. However, even in the presence of Lrp and ClpB, a moderate level of clp pilin transcription was observed. In addition, in lrp+ clpB+ cells lacking Dam, transcription was almost maximally derepressed. Introduction of the PapI homologue AfaF resulted in phase variation of CS31A expression: instead of a normally distributed transcription of CS31A among the cell population, individual cells either transcribed (ON phase) or did not transcribe (OFF phase) the clp operon, with the methylation pattern of the former cells being GATCdist nonmethylated and GATCprox methylated and with the converse pattern for the latter cells. These results can be explained if Lrp and ClpB bind near the clp pilin promoter, moderately repressing transcription but still allowing some pilus expression to occur in the absence of the PapI homologue AfaF. The repressive effect of Dam on clp transcription could occur via methylation of GATCdist to block binding of Lrp to promoter-distal sites. Addition of AfaF should increase the affinity of Lrp for both GATCdist and GATCprox, similar to the case for pap. However, it may be that the affinity of AfaF-Lrp is marginally higher for GATCprox than GATCdist, the reverse of the case for pap, which could explain why only a small fraction of cells are in the ON phase in the presence of constitutively expressed AfaF. This could also explain why the transcription of AfaF+ OFF-phase cells appears to be lower than that of cells lacking AfaF (which do not show phase variation), since AfaF would increase Lrp's affinity for clp pilin promoter-proximal sites and more efficiently block transcription than Lrp alone.
The clp operon and the closely related foo operon, coding for F1651 pili (24, 63, 101), have the distinction of being the only members of the Pap regulatory family controlled by the aliphatic amino acids leucine and alanine. Alanine and, to a lesser extent, leucine reduce the expression of CS31A pili (62, 173). This appears to occur as a result of diminished PapI homologue-dependent binding of Lrp to GATCdist and increased binding of Lrp to GATCprox, locking cells in the OFF-phase transcription state. Lrp has a binding site for aliphatic amino acids, which appears to modulate the multimeric state of Lrp between dimeric, octameric, and hexadecameric states (57, 59). If Lrp binding sites are phased such that they occur on the same DNA face, then an octameric Lrp could engage up to four sites, contributing to binding cooperativity. The reason that the transcription of certain operons, including clp, is modulated by alanine and leucine whereas that of other operons, such as pap, is not is unclear. However, recent results with the ilvIH operon, which is repressed by leucine, indicate that leucine inhibits long-range interactions between Lrp proteins bound to different sites in the ilvIH regulatory region (58).
The pef operon in Salmonella enterica serovar Typhimurium codes for plasmid-encoded fimbriae (Pef fimbriae) that appear to play a role in intestinal colonization (17, 126). Pef fimbriae are encoded on the pSLT virulence plasmid (87). Pef pili are expressed in vivo in bovine ligated ileal loops (126) but in the laboratory are expressed only in acidic (pH 5.1) rich broth in standing culture (190). Under these conditions, Pef pili are expressed under phase variation control. The pefI gene is located about 6 kb away from the pef regulatory region, and PefI acts negatively on Pef phase variation, blocking Pef pilus expression when expressed on a multicopy plasmid (190). This appeared to occur via increased affinity of Salmonella Lrp, which is almost identical to E. coli Lrp (one amino acid difference), for DNA sites overlapping GATCprox (previously denoted GATC II). Binding of Lrp at GATCdist appeared to correlate well with the ON-phase state, similar to the case for pap. Thus, a common theme for pef, clp, and fae is that in each case PapI homologues act negatively by increasing the binding of Lrp to pilin promoter-proximal sites, protecting GATCprox from methylation, and inhibiting transcription. The reason why PapI-Lrp binds with the highest affinity to sites around GATCdist in pap and closely related operons (see above) and to sites around GATCprox in pef, clp, and fae is not known. Analysis by Hernday et al. showed that the affinity of PapI-Lrp for Lrp binding site 5 containing GATCdist was significantly higher than its affinity for site 2 containing GATCprox (118). Analysis of site 2 and 5 regions in pap versus pef, clp, and fae does not provide any simple possible explanation for the mechanism by which PapI-Lrp affinity is reversed in these operons (Fig. (Fig.5).5). However, this regulatory distinction may explain the reason why the papI homologues in pef, clp, and fae have been disconnected from the positive feedback loop operating in other pap-related operons. If they were connected, one would expect that the consequence would be to turn off pilus expression entirely. Since pef and clp expression is under phase variation control, this shows that a positive feedback loop is not essential for phase variation. In fact, Pap phase variation occurs in papI-minus mutants containing PapI expressed constitutively on a plasmid, showing that disconnection of the feedback loop is not an essential feature of phase variation, although it likely contributes to signal-to-noise parameters. Although it is not clear why pef, clp, and fae display this regulatory difference from pap, it provides an additional means by which Pef, CS31A, and K88 pilus expression can be controlled by environmental and host factors via regulation of pefI, the resident pap operon(s), and faeA, respectively.
Besides the pap regulatory family of operons described above, the only other characterized phase variation system regulated by DNA methylation patterns is a gene originally designated by B. Diderichsen as flu for “fluffing,” based on the propensity of bacteria to “aggregate, fluff, and sediment” (71). Henderson et al. and Owen et al. (111, 200) later identified and characterized an autotransporter protein denoted antigen 43 (Ag43), which was shown to be identical to the flu product, and the gene was renamed agn43. The regulatory region of agn43 has a consensus binding site for the OxyR repressor (296) present on a number of genes regulated by oxidative stress, including mom in phage Mu (see below). In addition, three closely spaced GATC sites (GATC-I, GATC-II, and GATC-III) are present in the regulatory region within the OxyR binding sites (Fig. (Fig.6A).6A). Transcription of agn43 begins at the “G” of the promoter-distal GATC-I site (269, 271) (Fig. (Fig.6A).6A). Binding of OxyR to the agn43 regulatory region represses agn43 transcription in vivo based on the phase-locked ON phenotype of oxyR mutants (112). Based on these observations, it was proposed that the Ag43 phase switch is controlled by competition between OxyR and Dam for binding and methylation within the agn43 regulatory region. Methylation of any two of the three agn43 regulatory GATC sites was sufficient to inhibit binding of OxyR in vitro and allow phase variation to occur in vivo (269), although all three sites appear to be required for attaining normal phase variation rates (271). Binding of OxyR protected all GATC sites from methylation (60) and repressed agn43 transcription in vitro (271). This regulatory arrangement is similar in basic form to that of the pap regulatory family: in both systems binding of a global regulator to upstream regulatory sequences blocks methylation of GATC sites within the region and directly affects transcription. Methylation of these sites, in turn, inhibits regulatory protein binding. For Pap, the switch between the OFF and ON phases is facilitated by the coregulator PapI, which controls binding of Lrp between two GATC site regions by altering its affinity for pap DNA. For OxyR, which binds to one DNA region of about 60 bp encompassing all three GATC sites, it is not clear whether environmental inputs control phase switching as they do for pap. One possibility that has been considered is that the oxidative state of OxyR might be important in Ag43 regulation. This hypothesis is attractive since it would tie the oxidative stress response to biofilm formation, which is aided under certain conditions by Ag43 (66).
OxyR exists in two redox states within cells, formed by disulfide bonding between cysteines 199 and 208. Disulfide bond reduction occurs enzymatically, primarily by glutaredoxin 1 (295). Data from Schembri and Klemm showed that expression of type 1 pili (fim) and P pili (pap) blocked Ag43 expression (225), which was proposed to occur via disulfide bridge formation in these pili, possibly driving OxyR toward the reduced state, repressing Ag43 expression. If this is correct, then transcription of other genes in the OxyR regulon, such as katG, should be affected, but this was not tested. Further analysis of the possible role of the redox state of OxyR in Ag43 regulation was done using OxyR(A233V) and OxyR(H198R) mutants, which are locked in the oxidative form and constitutively activate genes in the OxyR regulon (152). Neither mutant was found to repress Ag43 expression (112, 224), and it was concluded that only the reduced form of OxyR represses agn43 expression. However, Wallecha et al. showed that the affinity of OxyR(A233V) for nonmethylated agn43 regulatory DNA was at least fivefold lower than that of oxidized, wild-type OxyR and that the affinity of OxyR(H198R) was also lower than that of wild-type OxyR (270). Thus, the assumption that these mutants accurately reflect the role of oxidized wild-type OxyR does not appear to be valid. In vitro analysis showed that oxidized wild-type OxyR binds to agn43 DNA and represses agn43 transcription (270). Therefore, it appears that the redox state of OxyR does not control phase variation of Ag43 (270).
The mechanism(s) by which agn43 expression switches between the OFF and ON states is not known, though it likely requires DNA replication to generate a hemimethylated DNA intermediate (60). OxyR affinity for fully methylated agn43 regulatory DNA is too low to be measured by electrophoretic mobility shift, but the Kd of binding to nonmethylated agn43 is about 2 nM. Binding of OxyR to hemimethylated agn43 methylated on the top or bottom strand is similar, with at least a sixfold reduction in affinity compared to nonmethylated DNA (60). The intermediate affinity of OxyR for hemimethylated agn43 provides a switch transition mechanism: immediately following DNA replication OxyR presumably dissociates from agn43 DNA in OFF-phase cells, giving a window of opportunity for Dam to compete with OxyR due to the decreased affinity of OxyR for hemimethylated DNA (Fig. (Fig.6B).6B). Full methylation of the agn43 GATC sites could occur in one step, preventing OxyR binding and repression, forming the ON-phase state. Similarly, hemimethylated DNA could facilitate the ON- to OFF-phase transition by providing an opportunity for OxyR to bind to hemimethylated agn43 GATC sites, blocking their methylation by Dam (Fig. (Fig.6C).6C). After an additional round of replication, the OFF-phase DNA methylation pattern would be formed in half of the transitioning cells (Fig. (Fig.6D6D)
It is not clear if environmental or cellular factors directly regulate Ag43 switching, but it is possible that SeqA may play a role. SeqA binds to agn43 regulatory DNA containing hemimethylated GATC sites but does not bind to fully methylated or nonmethylated DNAs (60). The OFF- to ON-phase rate was reduced in a seqA mutant, but much of this effect could be accounted for by a reduction in the Dam/DNA ratio caused by increased asynchronous initiation of DNA replication that occurs in the absence of SeqA, which normally sequesters oriC and plays a critical role in timing of DNA replication (36). Under these conditions the balance is tipped toward repression, since OxyR more effectively competes with Dam.
In enteric bacteria, very-short-patch (VSP) repair recognizes G-T mismatches and corrects them to G-C (25). VSP repair activity is partially redundant with Dam-directed mismatch repair, and the mechanisms that coordinate the use of either system are not fully understood (25). MutL and MutS are required for VSP repair, while MutH is not involved. Dam methylation is dispensable for VSP repair: mismatched duplexes containing GATC sites are repaired with similar efficiencies in methylated and nonmethylated DNA substrates. However, Dam− mutants of E. coli are defective in both Dam-directed mismatch repair and VSP repair (19), and their VSP repair defect appears to be caused by lack of Dam methylase. Synthesis of Vsr, the endonuclease that initiates VSP repair, is reduced in Dam− mutants, suggesting that Dam methylation regulates Vsr synthesis (19). The vsr gene is cotranscribed with dcm, the gene for Dcm methylase; however, synthesis of Dcm remains unaffected in a Dam− background (19). The absence of GATC sites in the dcm promoter (67) provides further evidence that Dam-mediated control of the Vsr level is not transcriptional. Because DNA modification cannot be expected to act directly at the posttranscriptional level, we are left with two alternative explanations: (i) the Dam methylase might have additional, hitherto unknown functions unrelated to DNA modification, or (ii) more likely, Dam methylation may regulate Vsr synthesis in an indirect fashion, by controlling transcription of one or more cell functions involved in posttranscriptional control. The case of vsr is unlikely to be unique, since evidence for posttranscriptional regulation by Dam methylation has been also found in the std fimbrial operon of Salmonella enterica (130). These examples raise the possibility that Dam methylation might regulate cell functions involved in RNA stability, mRNA translation, or protein turnover. However, the underlying molecular mechanisms remain to be identified.
In the genomes of certain virulent phages of enteric bacteria, GATC sites are relatively scarce. Total E. coli DNA contains GATC sites at a frequency of one GATC site per 232 bp, which approaches the predicted random frequency of one GATC site per 256 bp (110). In contrast, bacteriophage T7 contains 6 GATC sites, while the predicted number is 141 (174). In the genomes of temperate phages such as E. coli lambda and Salmonella P22, the frequency of GATC sites is also lower than expected from their nucleotide composition, but the differences are not as spectacular as in the case of T7 (110, 174). Other phage genomes contain GATC sites at frequencies similar to that found in the host genome (31). It has been proposed that scarcity of GATC sites in the genomes of virulent phages may protect against DNA digestion by the host MutH endonuclease (70). Note that Dam-directed mismatch repair requires partial degradation of the daughter strand and resynthesis by host DNA polymerase I and DNA ligase, a laborious process that may not be feasible during the late stages of phage growth. On the other hand, T-even, P1, and other phages carry their own dam genes, which may ensure methylation of GATC sites during the lytic cycle (31). Aside from conferring protection from accidental MutHLS cleavage of concatemeric DNA, T4Dam may also protect T4 phage DNA from restriction by competing P1 phage (177).
Packaging of phage P1 DNA into capsids proceeds by a processive headful mechanism that uses concatemeric phage DNA molecules produced by rolling-circle replication during the late stages of phage infection (291). Packaging is initiated at the pac site, a 162-bp DNA sequence that contains seven GATC sites, a density 10-fold above random. The methylation state of these GATC sites affects packaging of P1 DNA into capsids, because the P1 packaging enzyme can cut pac only if most of its GATC sites are methylated in both DNA strands (245). The importance of Dam methylation in the regulation of P1 packaging is illustrated by the observation that growth of a P1 Dam− mutant on a Dam− E. coli strain causes a 20-fold reduction in phage progeny compared to infections carried out in the presence of either phage or host Dam methylase (245). Furthermore, the few phage produced in the absence of Dam methylation carry genomes which lack pac sequences at their ends (245).
Cutting phage genomes in a precise manner may optimize DNA packaging and facilitate circularization of phage DNA upon entry into the next recipient cell. However, the use of Dam methylation to label phage DNA ends is an enigmatic evolutionary acquisition. Because the DNA substrate for packaging is concatemeric DNA, methylation of all pac sites in a concatemer would permit multiple packaging initiations, disrupting the serial process of head filling. A model proposed by Yarmolinski and Sternberg in the late 1980s envisages that the P1 packaging enzyme (protein 9), which is the product of an early phage gene, might bind hemimethylated pac sites produced by theta replication and protect them from the host Dam methylase. P1 circular molecules with hemimethylated and nonmethylated pac sites would thus be produced (291). In the second stage of replication (rolling circle), P1 Dam methylase, the product of a late gene, would be allowed to methylate one and only one pac site per concatemer; the other pac sites would be protected (but not cut) by protein 9. This mechanism would permit headful packaging and avoid cutting of pac sites inside a concatemer (291). Note that every concatemer contains several P1 genomes, and cutting every pac site would prevent headful packaging and thus waste phage DNA.
Cre is a site-specific recombinase involved in cyclization of P1 DNA upon injection into the host cytoplasm. The cre gene is driven by three promoters, and one of them (pCre1) contains two GATCs in its −35 module (246). Transcription from pCre1 is repressed by Dam methylation (246). The significance of this Dam dependence is unknown. Cre is expressed in cells lysogenized by P1 and may play a role in the partition of newly replicated prophages (291). Based on these observations, one may speculate that hemimethylation might cause transient derepression of the pCre1 promoter. The resulting boost in Cre synthesis might ensure proper partition of the daughter prophages.
The mom gene of bacteriophage Mu encodes a DNA modification enzyme that converts adenine to N6-carboxy-methyl-adenine (102, 248, 257). Mom-mediated modification of Mu DNA is postreplicative and protects Mu DNA from cleavage by a number of restriction endonucleases (103). Mom is not essential for phage growth but increases the host range of Mu within E. coli: if Mu infects a bacterial cell harboring restriction-modification systems different from those found in its last host, Mom-modified Mu DNA will be protected against nucleolytic attack (103). The mom gene is part of the mom operon, which includes a second gene, com, involved in translational regulation of the com-mom transcript (103). In turn, transcription of the mom operon requires a phage product, protein C, which binds the mom upstream activation sequence (UAS) −33 to −52 relative to the transcription start site (38). In the absence of protein C, RNA polymerase starts transcription at the opposite DNA strand, generating a transcript directed away from the mom gene (247). The DNA region upstream from the C binding site contains three GATC sites, spaced between −54 and −85 (103). This region serves as a binding site for a host-encoded protein, the redox-sensitive regulator OxyR, which acts as a repressor of mom transcription (103). However, OxyR can bind the mom UAS only if the GATCs therein are nonmethylated or hemimethylated (37, 104). The biological role of Dam methylation in the regulation of mom transcription is not fully understood. However, Mom− mutants have a subtle phenotype that may provide hints about the role of Dam in mom control: Mu DNA produced after infection is less modified by Mom than Mu DNA produced after prophage induction (258). A tentative explanation is that the mom promoter is fully methylated in a lysogen, thereby preventing OxyR-mediated repression (103). This may permit a level of synthesis of Mom product sufficient to modify phage DNA molecules produced upon induction. In an endogenous infection, however, the lag between phage DNA replication and Dam methylation will increase the chances that OxyR binds to a hemimethylated mom promoter, repressing transcription (103). Hence, phage DNA with a relatively low level of Mom modification will be introduced into capsids.
A decade ago, a screen for genes regulated by Dam methylation identified the transfer (tra) operon of the Salmonella virulence plasmid (pSLT) as a Dam-repressed locus (254). Derepression of tra in a Dam− background results in increased frequencies of conjugal transfer, a phenomenon also observed in other plasmids of the F-like family such as F and R100 (47, 255). In pSLT, Dam methylation does not act directly on the tra operon but acts on the regulatory genes traJ and finP (45, 255). Transcription of traJ, which encodes a transcriptional activator of tra, is repressed by Dam methylation (46). In turn, transcription of finP, which encodes a small RNA that antagonizes TraJ expression, is activated by Dam methylation (46, 255). This dual effect of Dam methylation accounts for the increase in tra operon expression observed in Dam− donors (48).
Repression of traJ transcription by Dam methylation is a typical case of regulation of gene expression at the hemimethylated DNA state, reminiscent of Dam-mediated coupling of IS10 transposition to passage of the DNA replication fork (see above) (219). The traJ UAS contains two binding sites for Lrp, which is an activator of traJ transcription (45, 48). Both Lrp binding sites are necessary for transcriptional activation, and one of them (LRP-2) contains a GATC site whose methylation state affects Lrp binding. When the GATC is hemimethylated or nonmethylated, Lrp binds to LRP-2 with high affinity. If the GATC is methylated, however, the affinity of Lrp for LRP-2 is lowered. The binding pattern of Lrp at the traJ UAS is also different depending on the methylation state of LRP-2: DNase I footprinting reveals that Lrp protects the traJ UAS from −132 to −42 when the LRP-2 GATC site is nonmethylated and from −132 to −52 when the GATC site is methylated. Increased distance between the downstream end of the region bound by Lrp and the −35 module of the traJ promoter may explain the failure of Lrp to activate traJ transcription when the GATC within LRP-2 is methylated (48). Footprint analysis also shows that methylation of the LRP-2 GATC alters the distribution of DNase I-hypersensitive sites in the traJ UAS, providing further evidence that Lrp binding follows different patterns depending on the methylation state of the LRP-2 GATC (46). Lrp can also bind a hemimethylated traJ UAS (see below), suggesting that Dam methylation may serve as a sensor of plasmid replication: traJ transcription will be repressed in a nonreplicating plasmid, but repression will be lifted during the transient hemimethylation lapse that follows passage of the replication fork (46).
The affinity of Lrp for hemimethylated traJ UAS is influenced by the location of the methyl group within LRP-2. High-affinity Lrp binding occurs if the methylated GATC lies in the noncoding (template) strand of traJ. In contrast, Lrp binds to a hemimethylated DNA substrate containing a methyl group in the traJ coding strand with lower affinity. If these observations faithfully reproduce the scenario of a replicating plasmid, passage of the replication fork will permit Lrp binding to one daughter DNA molecule but not to the other, and traJ activation will occur in only one of the newly replicated plasmids. Electrophoretic migration of free, unbound traJ DNA is also different depending on the strand that contains N6-methyl-adenine (N6meA): a DNA fragment containing N6meA in the noncoding strand migrates like nonmethylated DNA, while a DNA fragment containing N6meA in the coding strand migrates like methylated DNA (46). A single methyl group is able to induce structural changes in a DNA fragment (72). Hence, subtle structural differences between the two hemimethylated traJ substrates may explain why Lrp is able to discriminate between “isomeric” DNA molecules.
If the above model is correct, Lrp-mediated activation of traJ transcription will be restricted to one hemimethylated daughter plasmid molecule (46). This epigenetic switch may be viewed as a mechanism to limit TraJ synthesis and hence to restrain activation of conjugal transfer. Higher TraJ levels might be superfluous, if not an energetic waste. Furthermore, because the pSLT strand transferred during conjugation is the noncoding strand, the active epigenetic state of traJ may be transmissible to the recipient cell: use of the incoming DNA strand as template will reproduce the methylation pattern that permits traJ activation, and the recipient cell will instantly become a donor if sufficient Lrp is available (Fig. (Fig.7).7). This infectious transmission of an epigenetic state may facilitate spread of the plasmid: as far as recipient cells are available, new donors will be formed by a positive feedback loop (46).
Transcription of the pSLT finP gene occurs at reduced rates in Dam− mutants (46, 255). A combination of genetic evidence and gel retardation analysis has indicated that repression of finP transcription in a Dam− background is exerted by the nucleoid protein H-NS (46). However, the different expression levels of the finP gene in Dam+ and Dam− strains cannot be explained by a local effect of Dam methylation upon H-NS binding, because Dam-mediated repression is still observed in a mutant finP promoter lacking the GATC site that overlaps the −10 module (46). The involvement of upstream DNA sequences is likewise discarded by deletion analysis (46). Hence, H-NS-mediated repression of finP may reflect a condition or state that occurs in Dam− mutants but not in the wild type. Tentative explanations may be that a higher H-NS concentration exists in Salmonella Dam− mutants, as reported for E. coli (199), or that lack of N6meA favors a change in the pattern of H-NS association to the cell nucleoid. Because N6 methylation at individual GATC sites is known to influence local DNA structure (72), it seems conceivable that the methylation state of thousands of GATCs might influence nucleoid organization and potentially affect H-NS binding. Support for this hypothesis was obtained by microarray analysis of gene expression in E. coli overexpressing Dam (159).
In Salmonella, Haemophilus, and certain strains of Yersinia pseudotuberculosis, lack of Dam methylation causes attenuation of virulence in model animals (89, 93, 107, 201, 252, 274). In other pathogens, virulence attenuation is observed if Dam methylase is overproduced (56, 136). Albeit widespread, the involvement of Dam methylation in bacterial virulence is not universal; for instance, Dam− mutants of Shigella flexneri are not attenuated (121).
The involvement of Dam methylation in bacterial virulence may provide an example of a housekeeping function that has permitted adaptation to challenges associated with a pathogen lifestyle. One such challenge is the maintenance of genome integrity when the pathogen encounters DNA-damaging agents synthesized by the host (198, 268). In bacterial species that use Dam methylation as a strand discrimination signal for DNA mismatch repair, lack of Dam methylation leaves the cell at the mercy of the MutHLS system: if DNA lesions are produced, double-strand DNA breaks introduced by MutH can kill the cell (121).
Lack of mismatch repair is not the only virulence-related phenotype of Dam− mutants. Dam methylation regulates invasion of epithelial cells in Salmonella enterica (89) and Haemophilus influenzae (274), secretion of Yersinia outer membrane proteins (10, 136), and synthesis of Std fimbriae in Salmonella (13). It is intriguing to speculate that Dam methylation could provide a type of short-term memory for bacterial pathogens via formation of DNA methylation patterns that control expression of virulence genes. A potential advantage of such an epigenetic memory system is that information regarding environments that mother cells have encountered could be passed on to daughter cells, which might be useful in orchestrating appropriate temporal control of gene expression contributing to pathogenesis. Despite these examples and possibilities, the roles of Dam methylation in bacterial virulence are not fully understood, and their study might uncover hitherto unknown roles of N6meA in the bacterial cell.
Dam− mutants of Salmonella enterica serovar Typhimurium are severely attenuated in the mouse model: the 50% lethal dose of a Dam− mutant is 10,000-fold higher than that of the wild type when administered by the oral route and 1,000-fold higher when administered intraperitoneally (89, 107). Attenuation by dam mutations is likewise observed in S. enterica serovar Enteritidis (93). Microscopic examination of murine ileal loops infected with Dam− salmonellae reveals a reduced ability of Dam− cells to interact with the intestinal epithelium. Furthermore, infection of epithelial cell lines indicates that Dam− strains have an invasion defect. This defect may be caused by reduced expression of genes in pathogenicity island 1 (SPI-1), including the main regulatory gene, hilA (13). The mechanisms by which Dam methylation activates gene expression in SPI-1 are not yet known. In silico examination of SPI-1 regulatory regions does not reveal the existence of any GATC clusters (13). However, this does not exclude the possibility that Dam methylation may activate SPI-1 expression at the transcriptional level, since the methylation state of a single GATC site can govern specific DNA-protein interactions (46, 118, 219). An additional defect of Salmonella Dam− mutants that may contribute to inefficient invasion of the intestinal epithelium is reduced motility, which may be caused by uncoordinated expression of flagellar genes (13).
Another relevant defect of S. enterica Dam− mutants is envelope instability, with release of outer membrane vesicles (210) and leakage of proteins (89). Vesicle release has been tentatively associated with impaired binding of Tol and PAL (210) proteins to peptidoglycan (210). Protein release may also be a side effect of envelope fragility. In addition, a fimbrial operon that is tightly repressed in the wild-type, stdABC, undergoes derepression in Dam− mutants (13). In a Dam− background, std mRNA increases over 100-fold (13), and the StdA protein becomes one of the most abundant proteins detected by two-dimensional gel electrophoresis in cell extracts (2). The presence of three GATC sites clustered in a 24-bp interval upstream from the stdABC promoter is reminiscent of genes in which Dam methylation regulates binding of a trans-acting regulator, for example, OxyR binding to agn43 (97, 111, 269) (see above), and raises the possibility that Dam methylation may control stdABC transcription (13). However, discrepancies between the std transcription rates and the levels of Std fimbrial proteins provide evidence for posttranscriptional control by Dam methylation (130), as previously described for the E. coli vsr gene (19). Production of Std fimbriae is tightly repressed in LB medium and becomes derepressed in ileal loops (126). Hence, the stdABC operon may provide an interesting example of the use of Dam methylation as a signal that is responsive to environmental cues. On the other hand, massive fimbrial expression on the cell surface, together with the envelope defects discussed above, may contribute to the avirulence of Dam− mutants by activating the host immune system. In fact, Dam− mutants of S. enterica have been shown to elicit animal immune responses with high efficiency (76, 108). The observation that Dam methylation often regulates cell surface functions (fimbriae, flagella, envelope structures, and secreted proteins) is intriguing and may suggest that certain gene families are more prone than others to fall under Dam control.
An additional defect of Salmonella Dam− mutants is sensitivity to bile (108, 210). Bile salts are detergents and DNA-damaging agents, and both activities appear to contribute to Dam− mutant killing during infection. Because of their envelope defects, Dam− mutants are more sensitive to the detergent activity of bile. In addition, in the absence of Dam methylation, exposure to bile salts triggers killing of Dam− cells by their own MutHLS system; every attempt to repair bile-induced DNA damage in the absence of DNA strand discrimination can result in a double-strand DNA break performed by the MutH endonuclease (208). A summary of the pleiotropic effects of a dam mutation on S. enterica serovar Typhimurium gene expression and physiology is shown in Fig. Fig.88.
Unlike for Salmonella, Shigella, and Haemophilus, analysis of Dam's role(s) in other pathogens has encountered the obstacle that Dam methylation is an essential function. An approach to overcome this hurdle was devised in M. Mahan's laboratory. Based on the previous finding that both lack of and overproduction of Dam methylase attenuated virulence in Salmonella enterica (107), the effects of Dam overproduction in Yersinia pseudotuberculosis and Vibrio cholerae, two species in which dam mutations are lethal, were tested (135). In both Yersinia and Vibrio, overproduction of Dam was tolerated and caused virulence attenuation (135). An independent study showed that overproduction of Dam methylase in Yersinia enterocolitica enhances invasion of epithelial cells yet results in decreased virulence (83).
Dam-overproducing strains of Yersinia pseudotuberculosis show increased secretion of Yersinia outer proteins (Yops), a group of virulence proteins that are injected in the host cytoplasm via a type III secretion apparatus (136). Yop secretion is tightly regulated by environmental signals such as temperature and calcium concentration (136). Upon Dam overproduction, synthesis of the YopE cytotoxin is insensitive to both temperature and calcium concentration, and YopE secretion becomes temperature independent (136). Synthesis of LcrV, a low-calcium-responsive virulence factor involved in Yop synthesis and translocation, is also altered in Dam-overproducing strains and may contribute to explaining the altered expression pattern of Yop proteins associated with Dam overproduction (10).
The success in attenuating virulence by overproduction of Dam methylase is intriguing and may indicate the existence of virulence genes regulated by stable undermethylation of critical GATC sites, in a fashion reminiscent of the pap operon or the agn43 gene. An alternative explanation is that Dam methylase overproduction might interfere with cellular processes which require SeqA binding to hemimethylated GATC sites, potentially disrupting organization of the nucleoid (159, 160). The latter view may be supported by the observation that SeqA− mutants of Salmonella enterica display virulence defects in the mouse model (209).
Caulobacter is a dimorphic bacterium with two different cell types: the stalked cell and the swarmer cell (170). These cell types are formed by asymmetric cell division, and they differ in morphology and behavior. The swarmer cell is unable to divide and differentiates into a stalked cell which undergoes chromosome replication and cell division. Initiation of chromosome replication, which occurs only in the stalked cell, requires that the GANTC sites within the Caulobacter chromosomal origin (Cori) are methylated (170). Chromosome replication produces hemimethylated DNA, and the daughter chromosomes of the stalked cell remain hemimethylated until CcrM is produced during the late stage of chromosome replication (215). When CcrM is synthesized, methylation of the newly replicated chromosomes occurs. After cell division, the inheritance of a methylated Cori will permit the initiation of a new replication round in the daughter stalked cell (170, 215).
The fact that two independent bacterial lineages (Gammaproteobacteria and Alphaproteobacteria) use DNA adenine methylation as a signal for the initiation of chromosome replication is an interesting case of evolutionary convergence, which is strengthened by the evidence that the DNA methylases involved (Dam or CcrM) are also of independent origin.
Shortly after cell division, CcrM is degraded by a Lon-like protease in both daughter cells (170, 214). In the nondividing swarmer cell, initiation of chromosome replication is blocked by CtrA, a global regulator that binds the methylated Cori. In the stalked cell, CtrA is degraded and remains undetectable until chromosome replication has initiated (170). Because the ccrM gene is not transcribed until chromosome replication approaches the terminus, the origin (and most of the chromosome) will remain hemimethylated until the late stages of replication, when a burst in CcrM synthesis occurs (170). Transcription of the ccrM gene is activated by CtrA, which accumulates in the stalked cell as chromosome replication progresses. However, CtrA-mediated activation of ccrM transcription is inhibited by methylation of two GANTC sites located in the leader of the ctrA coding sequence (243). This inhibition may contribute to delay ccrM transcription until the replication fork reaches ccrM and may serve to prevent earlier activation by CtrA (215). If high levels of CcrM are present throughout the cell cycle, Caulobacter DNA is methylated all the time, the cell cycle is disrupted, and filaments made of polyploid cells are formed (287).
Synthesis of the cell cycle regulator CtrA is regulated by GANTC methylation in a fashion reminiscent of Dam-repressed genes such as tnp (IS10) and traJ (46, 219). One of the two ctrA promoters (P1) contains a GANTC site near its −35 module (216). Transcription starting at P1 is repressed when the GANTC is methylated. Passage of the replication fork renders the promoter hemimethylated and activates transcription (216). This mechanism may serve to boost ctrA gene transcription in response to replication progress. In turn, CtrA accumulation will activate transcription of the ccrM gene as soon as the replication fork renders the ccrM promoter hemimethylated. Note that the ability of the CtrA transcription factor to recognize hemimethylated ccrM DNA is a crucial factor to permit an orderly sequence of events during chromosome replication. The importance of hemimethylation in the Caulobacter cell cycle is supported by genetic experiments carried out in Shapiro's laboratory: if the ctrA gene is moved to an ectopic position near the replication terminus, ctrA transcription from the methylation-sensitive P1 promoter remains repressed for a longer lapse of the cell cycle, and CtrA accumulates more slowly (216). These elegant experiments provide further evidence that the hemimethylation wave associated with chromosome replication serves as a molecular clock for the Caulobacter cell cycle.
DNA methyltransferases are widespread in bacteria, and most of them are part of restriction-modification systems. In addition, certain bacterial genomes contain solitary DNA methylases that are not involved in protecting DNA from a cognate restriction enzyme. Two of these enzymes, the Dam methylase of enteric bacteria and the CcrM methylase of Caulobacter crescentus, are paradigms of an evolutionary process in which DNA adenine methylation acts as a signaling mechanism that regulates DNA-protein interactions. In both Gamma- and Alphaproteobacteria, DNA adenine methylation regulates chromosome replication and couples transcription of certain genes to passage of the DNA replication fork. In some cases, regulatory protein binding inhibits DNA methylation, generating DNA methylation patterns that are hallmarks of alternative epigenetic states. DNA methylation patterns are modulated by environmental conditions via alterations in regulatory protein binding. Specific DNA methylation states can be propagated by positive feedback loops, and in certain cases they are clonally inherited by daughter cells. Protein binding prevents maintenance methylation, thereby generating sites that are stably hemimethylated or nonmethylated. Methylation-blocking factors include transcriptional regulators such as CAP, Lrp, OxyR, and other DNA binding proteins. Inheritance of DNA methylation patterns is a phenomenon reminiscent of eukaryotic imprinting of genes and may convey adaptive value: bacterial populations may use inherited DNA methylation patterns as a short-term memory of the metabolic conditions in which the former generation thrived and divided. DNA methylation also plays an essential role in diverse bacterial pathogens, raising the possibility of designing new antibacterial drugs that might inhibit DNA adenine methylation. A drug of this kind could be expected to inhibit the virulence of wild-type bacteria by transforming them into phenocopies of Dam− mutants.
We thank Bruce Braaten, Aaron Hernday, Stephanie Aoki, Brooke Trinh, and Marjan van der Woude for reading parts of the manuscript and/or helpful advice and Edward Robinson for work on Fig. Fig.11.
Work in our laboratories is supported by grants BIO2004-3455-CO2-02 and GEN2003-20234-CO6-03 from the Spanish Ministry of Education and Science and the European Regional fund (to J.C.) and by National Institutes of Health grant AI23348 (to D.L.).