|Home | About | Journals | Submit | Contact Us | Français|
In response to an assault by foreign organisms, peripheral B cells can change their antibody affinity and isotype by somatically mutating their genomic DNA. The ability of a cell to modify its DNA is exceptional in light of the potential consequences of genetic alterations to cause human disease and cancer. Thus, as expected, this mechanism of antibody diversity is tightly regulated and coordinated through one protein, activation induced deaminase (AID). AID produces diversity by converting cytosine to uracil within the immunoglobulin loci. The deoxyuracil residue is mutagenic when paired with deoxyguanosine, since it mimics thymidine during DNA replication. Additionally, B cells can manipulate the DNA repair pathways so that deoxyuracils are not faithfully repaired. Therefore, an intricate balance exists which is regulated at multiple stages to promote mutation of immunoglobulin genes, while retaining integrity of the rest of the genome. Here we discuss and summarize the current understanding of how AID functions to cause somatic hypermutation.
Diversity in antibodies is produced during two stages in B cell development. In pre-B cells, rearrangement of variable (V), diversity (D), and joining (J) gene segments occurs to produce the primary repertoire of immunoglobulin (Ig) receptors. In mature B cells, Ig receptors undergo affinity maturation (AM) and class switch recombination (CSR) to produce the secondary, or memory, repertoire of antibodies. The latter event occurs after antigen binds to the receptor, which initiates a dynamic cascade of cell signaling events to cause cellular activation (Gauld et al., 2002; Kurosaki, 2002; Niiro and Clark, 2002). The result of this activation is the differentiation of B cells into plasma or memory cells, which now express a large repertoire of antibodies to clear a plethora of different foreign antigens.
Diversity in the secondary repertoire is created by modifying rearranged V(D)J sequences and switching heavy chain constant genes (CH). Alteration of the V gene sequence is achieved by either direct mutagenesis or DNA strand breaks during gene conversion (GC), where strand breaks are repaired using different pseudo-V gene segments in a templated recombination mechanism. In either case, cells containing mutations which increase antibody affinity will be selected to divide and further mutate, while mutations which decrease affinity will be lost through apoptosis. Alteration of the CH gene occurs by DNA strand breaks in the switch (S) regions flanking the different CH gene exons. Breaks in two different S regions are then repaired by non-homologous end joining to remove the intervening introns and exons. This recombination event allows for production of a defined VDJ exon with different CH gene isotypes to regulate antibody function.
A single enzyme is responsible for initiating diversity in V(D)J and CH genes: activation induced deaminase (AID), which is a cytosine deaminase that enzymatically converts cytosine to uracil. Uracil is mutagenic when paired with guanosine in DNA, since dU mimics dT during replication, and the U:G mismatch triggers error-prone DNA repair in B cells. Thus, AID introduces somatic hypermutation (SHM) by converting dC to dU. In this review, the initiating events caused by AID will be referred to as SHM, regardless of whether dU is found in the V or S regions. If dU occurs in V(D)J genes, SHM can produce AM or GC. If dU occurs in S regions, SHM can produce CSR. Furthermore, the proteins that process dU, such as UNG, MSH2, MSH6, and DNA polymerases, have the same activity whether dU is located in the V(D)J or S regions. Therefore, SHM, caused by AID-generated dU, underpins the three mechanisms of AM, GC, and CSR.
One key aspect of AID biology is the balance between mutagenic diversity and genomic integrity. When AID functions at non-Ig loci, both mutation and translocations can promote carcinogenesis (Ramiro et al., 2007). Thus it is imperative to the organism that AID activity be tightly controlled to inhibit possible oncogenic transformation, while still allowing for the production of a wide diversity of antibodies. In this review, we will highlight the intricate aspects of AID biology and regulation.
The mechanisms of AM, GC, and CSR were significantly advanced by the ground-breaking discovery of AID (Muramatsu et al., 1999) and its subsequent genetic analysis in humans, mice, and chickens (Arakawa et al., 2002; Rada et al., 2002b; Revy et al., 2000). Broader analysis of AID indicates that an intricate network of regulatory mechanisms controls its expression at the levels of gene transcription, mRNA stability, protein localization, protein phosphorylation, and cell signaling.
The Aicda locus, which encodes AID, is comprised of four regions which control transcription (Yadav et al., 2006). Starting at the 5' end of the locus, the first region is located about 8 kb upstream of exon 1 in the mouse, and contains potential motifs for NF-κB, STAT6, C/EBP, and Smad3/4 proteins (Tran et al., 2009; Yadav et al., 2006). This region may respond to stimulation by the mitogen lipopolysaccharide (LPS) and the T-cell mimic anti-CD40 antibody to upregulate expression of AID after stimulation. The second region is located about 1 kb upstream of exon 1, and has sites for NF-κB, Stat 6, Sp transcription factors, HoxC4, and Pax5 (Dedeoglu et al., 2004; Gonda et al., 2003; Park et al., 2009; Yadav et al., 2006). The third region is found in the intron between exon 1 and exon 2, and contains sites for NF-κB, E proteins, Pax5, and several other factors (Gonda et al., 2003; Sayegh et al., 2003; Tran et al., 2009; Yadav et al., 2006). The fourth region is located about 6 kb downstream of exon 5 in the mouse (Tran et al., 2009; Yadav et al., 2006), and appears to function as an enhancer (Crouch et al., 2007). Many of the sites in the first three regions bind to transcription factors that are upregulated after B cell stimulation, and so they likely play a role in inducing AID in vivo.
Conversely, the proteins Id1, Id2, and Id3 reduce CSR (Goldfarb et al., 1996; Quong et al., 1999), potentially by inhibiting AID transcription (Gonda et al., 2003; Sayegh et al., 2003). The Id proteins function by binding to stimulatory factors such as E47 and Pax5, which prevents their binding to DNA. Other factors inhibit transcription by binding to sites in the third region (Tran et al., 2009), and may play a role in restricting AID expression to B cells, and not to other cell types.
Additional proteins bind to sites in the second region that appear to function independently of antigen stimulation. For example, Sp1 and Sp2 proteins bind to sites in vitro (Yadav et al., 2006), but their role in vivo is not known. Recently, the sex hormones estrogen and progesterone have been found to regulate AID expression (Pauklin and Petersen-Mahrt, 2009; Pauklin et al., 2009). Both estrogen and progesterone response elements have been found within the second region, and they could have a potential role in upregulating AID in hormone-based cancers and autoimmunity (Maul and Gearhart, 2009; Petersen-Mahrt et al., 2009). However, as with the Sp-binding sites, further research is required to understand the role these factors play during normal B cell development and activation. Finally, B cells from old mice and humans have less AID and reduced CSR compared to B cells from young mice and humans. This may be due in part to degradation of E47 mRNA, which encodes molecules that stimulate AID transcription (Frasca et al., 2008). In B cells from old mice, tristetraprolin binds to E47 mRNA and degrades it, whereas in B cells from young mice, tristetraprolin is phosphorylated and cannot bind to the mRNA (Frasca et al., 2007). Thus, by defining the factors that limit antibody diversity with age, it may be possible to increase the efficacy of vaccines in the elderly.
Once the Aicda gene is transcribed, the level of transcripts can be controlled by regulation though microRNA molecules. Specifically, miR-155 binds to the 3’ untranslated region of AID mRNA, and destabilizes the message to reduce SHM and CSR. However, in vivo analysis of miR-155 function in SHM is complicated by the global defects seen in different lymphoid cells, which alters germinal center cell number and function (Kohlhaas et al., 2009; Thai et al., 2007; Vigorito et al., 2007). To overcome the global effect, specific mutants of the 3’ untranslated region of AID, which prevent binding of miR-155, were utilized to examine SHM and CSR in vivo. As predicted, AID expression was increased in splenic and Peyer's patch mutant B cells (Dorsett et al., 2008; Teng et al., 2008), and there was a dramatic increase in chromosomal translocations between Myc and Igh genes (Dorsett et al., 2008). However, the increased AID protein only modestly elevated the level of SHM in the V or S regions and decreased AM, suggesting that excess AID was not specifically targeted to the Ig locus. This is consistent with earlier studies showing that overexpression of AID does not always produce increased SHM or CSR, perhaps due to inactivation of the protein by an unknown mechanism (Muto et al., 2006). Another molecule, miR-181b, regulates AID expression in a similar manner, by binding to the 3' untranslated region and lowering the levels of AID and CSR (de Yebenes et al., 2008). Nonetheless, since both of these miR molecules affect multiple genes as well as Aicda, their biological role in AID expression remains unclear.
Extensive analyses have identified post-translational mechanisms that coordinate AID sub-cellular localization. Surprisingly, AID protein is far more abundant in the cytoplasm than in the nucleus, as first seen in the Ramos cell line using artificial AID-GFP constructs (Rada et al., 2002a) and in primary B cells looking at endogenous AID (Schrader et al., 2005). Three mechanisms appear to be involved in actively moving AID in and out of the nucleus, and their respective amino acid residues are illustrated in Fig. 1. First, high levels of AID may be retained in the cytoplasm through an anchor sequence in the C-terminal region of AID (Patenaude et al., 2009). This would be advantageous, as the protein is quickly degraded when it is in the nucleus through polyubiquitination (Aoufouchi et al., 2008). Second, AID is actively imported into the nucleus through the use of a N-terminal nuclear localization signal (NLS); however, the exact amino acids which form the NLS are currently unclear. Di Noia and colleagues speculated that a non-classical NLS exists in AID (Fig. 1 - NLSa) (Patenaude et al., 2009), while Honjo and colleagues have identified a classical bipartite NLS (Fig. 1 -; NLSb) (Ito et al., 2004; Shinkura et al., 2004). In addition it has also been proposed that AID may passively diffuse into the nucleus (Brar et al., 2004; McBride et al., 2004). However, the identification of an interaction of AID with importin-α suggests that AID does contain a NLS for active nuclear import (Patenaude et al., 2009). Third, a conserved nuclear export signal (NES) in the C-terminal residues 189–198 transports most of the protein out of the nucleus (Brar et al., 2004; Ito et al., 2004; McBride et al., 2004). Treatment of B cells with leptomycin B, a potent inhibitor of the CRM1 export receptor, increased the abundance of AID in the nucleus. Further dissection of the C-terminal domain showed that export can be abolished by the single point mutation F198A, resulting in increased SHM and CSR activity (McBride et al., 2004). Therefore, these three mechanisms exquisitely regulate the amount of AID in the nucleus, ensuring that there will be only low levels of the mutagenic protein after cell activation. Perturbing any one of these pathways can affect the fine balance between antibody diversity and chromosomal mutagenesis.
While sub-cellular localization and degradation coordinate the access of AID to genomic DNA, phosphorylation regulates the activity of the protein. AID phosphorylation was first identified by examining catalytic differences between protein purified from B cells (AIDBcell) or from 293 kidney cells (AID293). Alt and colleagues reported that AID293 was less active than AIDBcell when tested for deamination activity using an in vitro transcription-based assay (Chaudhuri et al., 2004). Subsequent analysis by mass spectroscopy identified four phosphorylation sites in AID: T27, S38, T140, and Y184 (Basu et al., 2005; McBride et al., 2006; McBride et al., 2008; Pasqualucci et al., 2006). Residues T27 and S38 are phosphorylated in a coordinated fashion by protein kinase A (PKA), and they regulate protein-protein interaction between AID and replication protein A (RPA) (Basu et al., 2005; Pasqualucci et al., 2006). Mutation of these residues in B cells stimulated ex vivo or in DT40 cells inhibits SHM, GC, and CSR (Basu et al., 2005; Chatterji et al., 2007; McBride et al., 2006; Pasqualucci et al., 2006; Vuong et al., 2009). Indeed, S38 appears to be the key phosphorylation site in vivo, as mice with a mutation of this residue had reduced SHM and CSR (Cheng et al., 2009; McBride et al., 2008). Recently Nussenzweig and colleagues reported phosphorylation of AID residue T140 in mouse B cells after activation with LPS and IL-4 (McBride et al., 2008). Unlike S38, T140 is not a substrate for PKA phosphorylation but rather for protein kinase C (PKC). Furthermore, mutation of T140 to alanine in mice showed that SHM was affected more significantly than CSR, suggesting differential phosphorylation of S38 and T140 can produce different biological outcomes. In contrast to residues T27, S38, and T140, phosphorylation of Y184 may not play a significant role in AID function, since replacement of the amino acid in B cells did not have an effect on CSR (Basu et al., 2005).
As implied in its name, AID is induced after activation by exogenous stimuli, such as bacterial or viral molecules, CD40 ligand, and antigen or anti-Ig. To study the signaling pathways for these activators, murine splenic B cells can be conveniently stimulated ex vivo, and the levels of AID and CSR can be measured. Abundant AID expression and CSR occurs after stimulation with LPS, which binds to toll-like receptor 4, or with anti-CD40 antibody, which binds to the CD40 receptor. In contrast, AID expression is delayed and CSR is ablated when cells are stimulated with anti-IgM, which binds to the Ig receptor (Heltemes-Harris et al., 2008; Jabara et al., 2008). Furthermore, AID expression and CSR are actually suppressed when anti-IgM is added to cells stimulated with LPS or anti-CD40 (Heltemes-Harris et al., 2008; Jabara et al., 2008; Rush et al., 2002). This inhibition has been linked to upregulation of the phosphatidyl inositol 3 kinase (PI3K) pathway (Doi et al., 2008; Heltemes-Harris et al., 2008; Omori et al., 2006). Indeed, balancing the levels of PI3K activation may determine whether CSR is induced or suppressed. B cells treated with LPS and IL-4 activate PI3K signaling, which phosphorylates AKT to a low enough concentration to allow CSR. In contrast, the addition of anti-IgM along with LPS and IL-4 enhanced AKT phosphorylation to a greater extent to inhibit CSR (Heltemes-Harris et al., 2008). A recent report shows that anti-IgM also inhibits AID expression through the calcium-signaling pathway (Hauser et al., 2008). Taken together, these reports suggest that IgM crosslinking ex vivo mimics the end of a germinal center response, where B cells with high affinity receptors stop SHM and are converted into memory and plasma cells. However, the complete story has yet to emerge, as another stimulator, 8-mercaptoguanosine, which binds to toll-like receptor 7 located in endosomes, has the ability to work synergistically with IgM crosslinking to promote AID expression and CSR (Tsukamoto et al., 2009). Perhaps after B cell receptor crosslinking, stimulation from different cellular microenvironments regulates the outcome of B cell differentiation.
The immunoglobulin loci are mutated in well defined regions encoding rearranged V genes on the heavy and light chain loci, and S regions on the heavy chain locus. Sequence analysis has shown that mutation occurs in a 2 kb region around V(D)J genes (Lebecque and Gearhart, 1990), and in a 4–7 kb region around S regions (Xue et al., 2006). Thus it can be assumed that AID functions on 10−5 to 10−6 of the genome at a given time, suggesting that exquisite levels of regulation target AID to such a small percentage of the genome. Examination of the mutational pattern in both the V(D)J and the S regions shows that mutation occurs in close proximity to either the V gene promoter or S intron promoters, respectively. This has led to the hypothesis that AID is recruited to the Ig region in association with the transcriptional machinery. In fact, mice with transgenes that have been altered to express different amounts of transcripts have mutation frequencies that correlate with the level of transcription (Bachl et al., 2001; Fukita et al., 1998; Sharpe et al., 1991). Transcription of the heavy and light chain loci is coordinated by three well characterized promoters/enhancers shown in Fig. 2: V gene promoter, intronic enhancer (iE), and downstream enhancers (3’E and hypersensitive (HS) sites). While all three elements and the V/S regions are involved in transcription and SHM, the specific role for each has been harder to elucidate.
The V gene promoter can be replaced with other non-Ig promoters in transgenes and still promote SHM, suggesting that the promoter may only serve to maintain adequate levels of transcription (Betz et al., 1994; Fukita et al., 1998; Shen et al., 2001; Tumas-Brundage and Manser, 1997). However, this may not be the whole story since at least one promoter, human elongation factor 1-α, can induce high levels of transcription without supporting SHM (Yang et al., 2006). Furthermore, substitution of the V gene promoter has not been studied in the endogenous locus with knock-in mice, so its requirement is not fully resolved.
The seeming lack of specificity for promoters would suggest that the V(D)J sequence contains the information required for targeting. However, when the VJ sequence in removed and replaced by different sequences in Igκ transgenes, the new sequence is still subject to SHM (Peters and Storb, 1996; Yelamos et al., 1995). Additionally, when a 750 bp insert is introduced between the leader sequence and the VDJ sequence, the mutation window shifts ~750 bp towards the promoter (Tumas-Brundage and Manser, 1997). Interestingly, when the promoter is removed from the leader sequence by insertion of 2 kb of λ DNA, SHM is lost, indicating that targeting may be linked by the proximity of the promoter to the leader sequence and/or the leader splice site (Winter et al., 1997). However, the caveat still remains that replacement of the V(D)J sequence in the endogenous context has not been studied, and so its requirement is not certain.
Switch regions are also a target for SHM, and sustain as high a frequency of mutation as the V region. Knockout mice with a partial deletion of Sμ tandem repeats had decreased CSR (Luby et al., 2001; Schrader et al., 2007), and when the entire 4.6 kb region containing Sμ tandem repeat sequences was deleted, SHM and CSR were ablated (Khamlichi et al., 2004). Thus the repetitive sequences in Sμ are a magnet for AID activity, and are required.
Intronic enhancers are located in the intron sequence between either the VDJ sequence and the S region for Cμ, or the VJ sequence and C gene in the Igκ locus. A plethora of papers in the 1990's addressed the role of iE for SHM in murine transgenes encoding rearranged V and C genes from the Igh and Igκ loci (Odegard and Schatz, 2006). Interpretation of the varying results was complicated by the random location of transgenes in the genome, which could affect transcription levels. As technology advanced, it became possible to directly delete iE from the endogenous locus in knockout mice. These studies consistently showed that iE had no effect on SHM in the Igh or Igk loci (Inlay et al., 2006; Perlot et al., 2005).
3' enhancers are located downstream of the C genes on the heavy and light chain loci, and are important for transcription of rearranged V genes. Interestingly, it now appears that all three loci have multiple 3' enhancers which play a role in SHM. As in the iE studies, conflicting results were obtained from transgenic mice, whereas more reliable data was found in germline knockout mice. The IgH 3’E is characterized by the presence of four DNaseI HS sites downstream of the Cα gene (Dariavach et al., 1991; Lieberson et al., 1991; Madisen and Groudine, 1994; Matthias and Baltimore, 1993; Pettersson et al., 1990). Partial deletion of the HS sites did not affect SHM (Le Morvan et al., 2003), whereas deletion of the entire four sites in a 230 kb bacterial artificial chromosome mouse model showed reduced transcription and SHM (Dunnick et al., 2009). The Igκ locus has a 3'E (Meyer and Neuberger, 1989) and a downstream enhancer, Ed (Liu et al., 2002). Deletion of 3'E reduced transcription but did not affect SHM (Inlay et al., 2006; van der Stoep et al., 1998), whereas deletion of Ed reduced both transcription and SHM (Xiang and Garrard, 2008). Likewise, the chicken Igλ locus has a defined 3'E (Bulfone-Paus et al., 1995) and another downstream enhancer, 3'RR (Kothapalli et al., 2008). In the DT40 cell line, deletion of the 3'E had no effect on SHM (Yang et al., 2006), but deletion of 3'RR ablated transcription and SHM (Kothapalli et al., 2008). Thus, the presence of two enhancers for both Igκ and Igλ explains why the low levels of SHM, previously seen with single deletions, may be due to residual activity of the additional elements. Furthermore, in DT40 cells, the 3'RR appears to contain a site that recruits AID to the Igλ locus (Blagodatski et al., 2009; Kothapalli et al., 2008).
To summarize, transcription is necessary for SHM. The V gene promoter may not be required, the V(D)J sequence may not be required, the S region sequence is required, the iE is not required, and both the 3' E and downstream enhancers are required. Finally, the downstream enhancer may contain a motif that recruits AID to the Ig loci, although more work is needed to establish this.
In addition to AID being recruited to the Ig loci, it has become increasingly clear that AID is erroneously targeted to non-Ig genes throughout the genome. In the absence of DNA repair proteins uracil DNA glycoslyase (UNG) and mismatch repair protein MSH2, Schatz and colleagues found a high mutation frequency for several non-Ig genes which was only 10-fold lower than in V genes (Liu et al., 2008). The handful of different genes that might be targets for AID suggests that the promoters/enhancers of Ig genes are not the only elements which attract AID. Interestingly, the recent finding that Igλ, Igh, and Myc are spatially contained in close proximity in the nucleus suggests that nuclear organization affects both Ig and non-Ig targeting (Wang et al., 2009a).
Other transcriptional events such as chromatin acetylation may play a role in coordinating AID activity. Examination of the abundance of histone acetylation upon stimulation in vivo or ex vivo, suggests that the V and Sμ regions are maintained in a hyperacetylated (open) state independent of cellular activation, and the C region had a lower level of acetylation (Li et al., 2004a; Odegard et al., 2005; Wang et al., 2006; Wang et al., 2009b). Furthermore, downstream S regions are maintained at a low level of acetylation until stimulated with specific cytokines to promote transcription (Li et al., 2004a; Nambu et al., 2003; Wang et al., 2006; Wang et al., 2009b; Woo et al., 2003). While these events seem to be independent of AID, the difference in the acetylation state might coordinate a functional window for AID activity in the V and S regions, and block activity in the C region.
Once recruited to the Ig loci, AID-generated mutations show a distinctive bell-shaped pattern, suggesting increased activity in particular regions (Fig. 2). This pattern of mutation is advantageous to antibody diversity in that the peak of mutation is either over the V exon or the repetitive core in the S regions. Thus, in the V region, SHM promotes AM and GC, and in the S region, SHM initiates double strand breaks for CSR. This focusing of mutation in a defined region, or sub-targeting, is determined by the nucleotide sequence of the loci which coordinates transcription and AID activity.
Targeting to the V region is similar for rearranged V genes on the Igκ, Igλ, and Igh loci. Mutations start just downstream of the V promoter, proceed for about 2 kb, and then trail off (Lebecque and Gearhart, 1990). The pattern is the same regardless of which J gene segment is being used. For example, if a V gene rearranges to JH1, mutations cease 1550 bp before the intronic enhancer, iEμ, whereas if a V gene rearranges to JH4, mutations end just before iEμ. This indicates that the V gene promoter determines the start of mutation, and the intronic enhancer does not stop mutation. Rather, AID appears to be recruited to the promoter, proceeds for two kb and then may dissociate from the ongoing transcription complex. Indeed, one of the most perplexing questions is, why do mutations start and why do they stop? The frequency of mutations, about 10−2 to 10−3 mutations per bp, is similar in both coding and flanking sequences around the V gene, and is higher in the complementarity-determining regions due to selection during AM for amino acids giving high affinity interactions with the cognate antigen.
Targeting to the intronic S region promotes a high level of deamination events in close proximity to allow for double strand break formation. Mutations start downstream of the intronic exon promoter, accumulate for about 4–7 kb depending on the length of the S region, and then fall off (Xue et al., 2006). The pattern, although longer, is thus similar to that in the V region, in that AID may assemble at the intronic promoter, proceed through the S region, and then dissociate. The S regions in mammals are unique in that they contain clusters of G nucleotides on the nontranscribed strand, and repetitive hotspot motifs for AID deamination, WGCW (W = A or T). The G clusters have been shown to form R-loops, or RNA-DNA hybrids, in vivo (Yu et al., 2003) and in vitro (Roy and Lieber, 2009). The corresponding C clusters on the transcribed strand can stably hybridize to G-rich RNA. One effect of R-loop structure is that the nontranscribed strand becomes single-stranded in regions larger than transcription bubbles to maximize deamination by AID. In contrast to mice, switching in frogs is not as efficient, since the Sμ regions do not have G clusters or R-loops (Zarrin et al., 2004).
A second effect of R-loop structure is to slow down RNA polymerase II molecules as they move through the S region. It has been shown in vitro that R-loops block transcription (Canugovi et al., 2009; Tornaletti et al., 2008), because the polymerases may have difficulty unwinding the stable RNA-DNA hybrid. A recent study (Rajagopal et al., 2009) measured the density of polymerases in vivo by nuclear run-on, and found a 5–10 fold increase in polymerases located in close proximity to the Sμ repetitive core. These polymerases appeared to be piling up because they encountered a road-block ahead caused by the RNA-DNA hybrids. Once the polymerases slowly make it through the repetitive core, they speed up again as the R-loop density is lower. In addition to Sμ, R-loops are present in other S regions (Huang et al., 2006; Yu et al., 2003), and RNA polymerase II has been shown to accumulate in Sγ3 (Wang et al., 2009b), which suggests a conserved mechanism for deamination in S regions. If AID is associated with transcription, this model of paused polymerases may allow AID more opportunity to deaminate DNA, producing more mutations and strand breaks.
Is AID differentially targeted to V or S regions during certain cellular responses? This may seem to occur during stimulation of B cells ex vivo with mitogens, where mutations are found only in S regions and not in V regions (Reina-San-Martin et al., 2003). Conversely, in B cell lines such as Ramos and DT40, mutations accumulate in V regions and apparently not in S regions, as the cells do not undergo switching. In addition, IgM memory cells from humans (Klein et al., 1998; Rosner et al., 2001) and mice (Dogan et al., 2009) have mutations in the V region, but have not switched isotypes. This suggests that specific co-factors may guide AID to the V or S regions; however, other interpretations are possible. Mutations may occur first in S regions in cells stimulated ex vivo because of the formation of R-loops and RNA polymerase II pausing, which magnifies AID activity. Likewise, B cell lines or memory IgM cells may have mutations in S regions but not switch because some proteins involved in NHEJ are not functioning. It would be interesting to compare the time course of mutations in V and S regions following immunization in vivo, to see if they occur simultaneously or differentially.
As mentioned above, SHM occurs at a greater frequency in a defined sequence motif, WRC (W = A/T, R = A/G) (Rogozin and Kolchanov, 1992). Prior to the discovery of AID, sequence analysis of V genes indicated that the complementarity-determining regions are heavily biased in using the serine codons AGC or AGT, while framework regions utilized the TCN serine codons (Wagner et al., 1995). Additionally, SHM occurred in AGY (Y = C/T) codons at a higher rate than in TCN, indicating a bias to focus mutation within the complementarity-determining regions for AM. Further analyses using in vivo mouse models deficient for different DNA repair enzymes, have defined WGCW in V and S regions as the mutational hotspot for SHM (Delbos et al., 2007; Ehrenstein and Neuberger, 1999; Martomo et al., 2004; Rada et al., 1998). Characterization of AID activity in vitro indicates that deamination events occur with high frequency within the WRC context (Bransteitter et al., 2004; Larijani et al., 2005; Yu et al., 2004). As with AM in V genes, CSR has evolved to utilize these hotspots to focus AID activity to the WGCW motif within the repetitive core repeat (Davis et al., 1980; Dunnick et al., 1980; Kataoka et al., 1981; Sakano et al., 1980). Additionally, the frequency and palindromic nature of the motif allows for deamination events on both strands to occur in close proximity to promote double strand break formation versus mutagenic repair.
In addition to the catalytic residues in the AID protein, a loop at residues 113–123 has recently been identified as the main determinant in directing activity to the WRC hotspot (Fig. 1). Altering this loop to resemble the homologous loop from APOBEC family members switches the hotspot motif to that of the APOBEC enzyme (Kohli et al., 2009; Wang et al., 2010). Taken together, this indicates a co-evolution of both the Ig loci sequence and the AID enzyme.
An additional phenomenon of AID activity within the Ig loci is the ability of AID to access and mutate both strands. Mutational analysis in Ung−/−Msh2−/− and Ung−/−Msh6−/− mice indicates that both the transcribed and non-transcribed strands are mutated at an equal frequency (Rada et al., 2004; Shen et al., 2006; Xue et al., 2006). Most models for AID deamination suggest that AID can access single strand DNA within transcription bubbles and/or R-loops. However, these models would only allow for deamination of the non-template strand as the template strand would be associated with either the RNA polymerase or contained within an RNA-DNA complex. To achieve access to both strands, it has been proposed that antisense transcription occurs throughout the Ig loci. In support of this model, RT-PCR has been utilized to identify low levels of antisense transcripts in V and S regions (Chowdhury et al., 2008; Perlot et al., 2008; Ronai et al., 2007). Although these findings have been called into question (Zhao et al., 2009), the identification of ~11 nt single strand DNA patches on both strands of DNA in Ramos cells supports the presence of transient transcription bubbles moving in opposite directions (Ronai et al., 2007). However, the finding that cytosines are also mutated on both strands in the S regions ((Xue et al., 2006) is particularly perplexing, since the transcribed strand contains an RNA-DNA hybrid in the R-loop structures. Two other theories have been proposed to make this strand available for attack by AID. (1) The DNA upstream of an elongating RNA polymerase II may be supercoiled and unwound, which allows AID access to both strands (Shen and Storb, 2004). (2) The DNA in R-loops may be collapsed by endogenous RNase H digestion, which would expose single strand regions on the transcribed strand (Huang et al., 2007). Taken together, this data suggests that the models for AID activity within the Ig loci are still in a state of flux and require further experimentation to fully define AID targeting.
In addition to the mechanisms discussed above, extensive work has been performed in an attempt to identify AID protein partners. It has been hypothesized that targeting will be tightly regulated by protein co-factors which coordinate the recruitment and activity of AID. With recent advances in AID protein biochemistry, the intricate network of AID interactions is just beginning to emerge.
As discussed in section 2.4, AID is phosphorylated by PKA and PKC (Basu et al., 2005; McBride et al., 2008; Pasqualucci et al., 2006). The phosphorylation by PKA is required for interaction with RPA and disruption of this interaction inhibits AID activity (Basu et al., 2005; Cheng et al., 2009; McBride et al., 2006; Vuong et al., 2009). While these interactions are well characterized, the precise mechanism by which RPA assists AID is unclear. Recently Chaudhuri and colleagues found that neither PKA or RPA are required to physically recruit AID to DNA (Vuong et al., 2009). In vitro analysis suggests that RPA stabilizes the transcription bubble to allow AID activity on single strand DNA (Chaudhuri et al., 2004). Yet it remains unclear whether the interaction between the proteins promotes a coordinated hand-off of the DNA between the two proteins, and if the RPA-AID interaction exists during the deamination step.
While a genetic interaction between transcription and SHM has been well documented, very little is know about the role of the transcription complex in physically recruiting AID to the DNA. AID has been shown to physically interact with RNA polymerase II; however no further analysis was done to examine which subunit is responsible for this interaction (Nambu et al., 2003). Recently, Neuberger and colleagues identified an interaction between AID and CTNNBL1. Deletion of CTNNBL1 or mutation of AID residues 39–42 abolished AID activity (Conticello et al., 2008). Interestingly, CTNNBL1 physically interacts with proteins associated with the RNA polymerase II spliceosome, suggesting that AID may travel with the transcription complex. This interaction also highlights the potential role for splice sites in initiating AID activity on DNA. In a critical experiment, Radbruch and colleagues (Hein et al., 1998) reported that switching was abrogated when the splice donor site for Iγ1 was deleted, even though Iγ1 transcripts were made. Another study showed that mice lacking the splice donor site for the Iμ exon had switching, but splicing still occurred in transcripts using pseudo-splice donor sites (Kuzin et al., 2000). Thus, AID could be brought to the Ig loci through interaction with cis-acting elements (potentially using E2A family members (Michael et al., 2003; Schoetz et al., 2006; Tanaka et al., 2010)), bind to the RNA polymerase II spliceosome complex through CTNNBL1, and load onto DNA at donor splice sites to interact with RPA and deaminate cytosine residues. This may explain why mutation is highest in V genes after donor splice sites in the leader and V exons, and in S regions after donor splice sites in intronic exons preceding Sμ, Sγ, and Sα.
Since the first identification of AID, extensive work has been performed in an attempt to elucidate the mechanism of how it promotes genomic mutation. Initial identification of the sequence similarity between AID and APOBEC-1 suggested that AID may function as a RNA deaminase (Muramatsu et al., 1999). Honjo and colleagues proposed a RNA editing model where AID binds to an unidentified mRNA partner in the cytoplasm and deaminates C to U. The edited mRNA would then produce a protein, perhaps an endonuclease, that cleaves DNA during the immune response (Honjo et al., 2005). Alternatively, identification of a mutational hotspot in SHM (WRC, discussed above) suggested a mechanism where alterations occur directly at dC:dG basepairs (Rada et al., 1998). Neuberger and colleagues proposed a DNA deamination model where AID deaminates dC bases to dU, which initiates error-prone processing by some proteins in the base excision repair (BER) and mismatch repair (MMR) pathways. Support for direct deamination of DNA came from the finding that uracil DNA glycosylase (UNG or UDG) is required for CSR and GC, and alters SHM frequency and mutational spectra (Di Noia and Neuberger, 2002; Di Noia and Neuberger, 2004; Petersen-Mahrt et al., 2002; Rada et al., 2002b; Saribasak et al., 2006). The importance of UNG to the mechanism of CSR is further confirmed by genetic mutations in the human UNG gene that block CSR and cause hyper-IgM syndrome (Imai et al., 2003; Kavli et al., 2005). During classical BER, UNG binds to dU:dG mispairs in DNA, and the uracil base is cleaved to form an abasic site. Abasic sites are then cleaved by apurinic/apryimidinic endonuclease (APE1) to remove the abasic nucleotide, and DNA polymerase (Pol) β re-synthesizes the DNA strand. However, during SHM and CSR, the canonical mechanism of BER is impaired by altering the re-synthesis step with low-fidelity polymerases, which introduce mutations and single strand DNA breaks. Consistent with this model shown in Fig. 3, deletion or inhibition of UNG and APE1 results in decreased CSR (Guikema et al., 2007; Rada et al., 2002b; Schrader et al., 2005). Alternatively, deletion of Polβ supports increased CSR and double strand breaks by inhibiting the faithful re-synthesis step of canonical BER (Wu and Stavnezer, 2007).
In opposition to the existence of uracil in DNA, several reports from Honjo and colleagues have suggested that UNG is important for CSR through an alternative mechanism not requiring DNA glycosylase activity (Begum et al., 2004). Mutants of active site residues in UNG had no detectable glycosylase activity in vitro, but they were proficient for CSR when complementing Ung−/− cells (Begum et al., 2004; Begum et al., 2009). Additionally, the identification of normal γH2AX foci formation and strand break junctions in the absence of UNG supports a model by which UNG is involved in resolving the double strand breaks, perhaps as a scaffold for other proteins, but not in the direct formation of the breaks (Begum et al., 2007). However, these results have been called into question due to a possible dissociation between in vitro UNG glycosylase activity and in vivo CSR (Di Noia et al., 2007; Kavli et al., 2005; Stivers, 2004). It has been shown that several UNG active site mutants with severally diminished in vitro activity may still retain enough glycosylase activity in vivo to promote CSR. Honjo and colleagues also report that deletion of either APE1 or APE2 had no effect on CSR (Sabouri et al., 2009), in contrast to an earlier finding by Stavnezer and colleagues (Guikema et al., 2007). Furthermore, the results looking at γH2AX and strand breaks did not take into account the interplay between BER and MMR in causing double strand breaks and CSR in the S region, since it is possible that breaks were still being formed due to cleavage during MMR (Di Noia et al., 2007; Rada et al., 2004; Shen et al., 2006).
Additional evidence for AID acting as a DNA deaminase comes from direct analysis of the purified protein in vitro. Using specific single strand DNA substrates, recombinant AID is able to convert a single dC residue to dU, creating a UNG/APE1 sensitive substrate (Bransteitter et al., 2003; Dickerson et al., 2003). Additionally, looking at whole cellular extracts from either splenic B cells or HEK293T cells expressing recombinant AID, single strand DNA oligomers containing multiple hotspot motifs become susceptible to treatment with UNG and APE1 (Chaudhuri et al., 2003). Futhermore, double strand DNA substrates were protected from AID deamination except in the presence of transcription, which explains the requirement for a single strand DNA substrate (Chaudhuri et al., 2003; Ramiro et al., 2003). However, in these experiments, AID was able to bind to RNA molecules, albeit with lower affinity than the single strand DNA template, allowing for the slight possibility that AID may also function on RNA (Dickerson et al., 2003). Additionally, the known RNA editing enzyme APOBEC1 has residual activity on single strand DNA in similar assays, suggesting these enzymes might act promiscuously (Harris et al., 2002). Taken together with the genetic data, most evidence suggests that AID functions as a DNA deaminase. However, the true test of AID activity would be to directly detect the accumulation of dU residues in genomic DNA during an immune response, which has yet to be characterized at this time.
In addition to being processed by UNG, deoxyuracil can be recognized by some proteins in the MMR pathway (Fig. 3). The canonical role for MMR is to remove DNA mismatches and repair DNA in an accurate manner. The MSH2-MSH6 or MSH2-MSH3 heterodimer binds to a mismatch, and recruits MLH1-PMS2 to the site. This nicks the DNA downstream of the mismatch, and attracts exonuclease 1 (Exo1) to remove the strand containing the mismatch. The gap is then filled in by high fidelity Polδto restore the original sequence. However, during the immune response, deficiency in some MMR proteins resulted in decreased, not increased, mutation frequencies, suggesting that these proteins actually generate mutations. Extensive analyses have examined the roles of the above proteins in processing mismatches in V and S regions (Bardwell et al., 2004; Ehrenstein and Neuberger, 1999; Ehrenstein et al., 2001; Frey et al., 1998; Jacobs et al., 1998; Kim et al., 1999; Kong and Maizels, 1999; Li et al., 2004b; Li et al., 2006; Martin et al., 2003; Martomo et al., 2004; Phung et al., 1999; Phung et al., 1998; Rada et al., 2004; Rada et al., 1998; Schrader et al., 2003; Shen et al., 2006; Wiesendanger et al., 2000; Winter et al., 1998). The prevailing model is that MSH2-MSH6 binds to a U:G mismatch and recruits DNA Polη (Wilson et al., 2005), a low-fidelity polymerase that preferentially synthesizes mispairs when copying T nucleotides (Matsuda et al., 2001). This explains why the frequency of mutations at A:T bp drops dramatically in the absence of MSH2, MSH6, and Exo1, which interact with Polη. In contrast, mice deficient for the other MMR proteins had no alteration in the SHM spectra.
Throughout evolution, specialized DNA polymerases have evolved to copy DNA with low fidelity to bypass DNA damaging lesions. To see if these polymerases are recruited to the Ig loci to increase sequence diversity, SHM was examined in mice deficient for 8 polymerases. Pols β,μ, λ, and ι are not involved in SHM (Bertocci et al., 2002; Esposito et al., 2000; Martomo et al., 2006; McDonald et al., 2003), while the role of Polθ is currently unclear due to conflicting results (Martomo et al., 2008; Masuda et al., 2007; Masuda et al., 2006; Masuda et al., 2005; Zan et al., 2005). However, there is well defined evidence that Polζ, Rev1, and Polη, have distinct roles during SHM. Conditional inactivation of Polζ in mice resulted in ~2–3 fold decrease in mutation frequency without altering mutation spectra, consistent with a role for Polζ in extending DNA mismatches (Diaz et al., 2001; Schenten et al., 2009; Zan et al., 2001). However, due to the embryonic lethality and genomic instability seen in Polζ deficient mice and B cells, the full extent for a role of Polζ has yet to be defined. Rev1 is a cytidyl transferase which causes G:C to C:G transversions in SHM (Arakawa et al., 2006; Jansen et al., 2006; Masuda et al., 2009; Ross and Sale, 2006). More recently, a catalytically inactive Rev1 mutant has been examined, and suggests a minor role for mouse Rev1 contributing to transition mutations as well (Masuda et al., 2009), however this was not seen in DT40 cells (Ross and Sale, 2006).
As compared to the modest roles for Polζ and Rev1, Polη has been shown to contribute significantly to diversity during SHM. Genetic mutation of Polη in humans with xeroderma pigmentosum variant disease or deletion of the gene in mice, resulted in a dramatic decrease in mutations at A:T residues with a modest decrease in overall mutation frequency (Delbos et al., 2005; Faili et al., 2004; Zeng et al., 2004; Zeng et al., 2001). Significantly, the decrease in A:T mutations is similar to the effects seen in mice with deficiencies in MSH2, MSH6, and Exo1, suggesting that they all act in the same pathway. Additionally, the MSH2-MSH6 heterodimer physically and functionally interacts with Polη, suggesting a shared role in producing A:T mutations during SHM (Wilson et al., 2005). However, close examination of the spectra from Msh6−/− mice or double deletion of both MSH2 and Polη indicates that the effects are not completely overlapping (Delbos et al., 2007; Martomo et al., 2005). Individually Polh−/− (which encodes Polη) or Msh2−/− mice display ~15% residual A:T mutations, while the double knockout shows only ~1% A:T mutations. This suggests that while Polη and MSH2 function together, they are also able to contribute to SHM independently of each another. Interestingly, a recent paper by Reynaud and colleagues analyzed a Polk−/− Polh−/− mouse strain which shows ~7% mutations at A:T, suggesting that Polκ may function during SHM and contribute a modest amount of mutations to the overall spectra (Faili et al., 2009). The identification of Polκ dependent mutations is significant as previous reports failed to identify such a role (Schenten et al., 2002; Shimizu et al., 2005; Shimizu et al., 2003). However, it is difficult to differentiate between the possibility that Polκ function is obscured by the high mutation rate of Polη, or if the normal presence of Polη blocks Polκ access in wildtype mice. The residual A:T mutations seen in the absence of both Polη and Polκ also suggests a third polymerase can function in SHM. The total lack of A:T mutations in the Polh−/−Msh2−/− mice indicates that MSH2 is responsible for recruiting Polη, Polκ and perhaps other polymerases.
One key aspect which promotes error-prone replication is the role of proliferating cell nuclear antigen (PCNA) monoubiquitination. PCNA is a replication accessory factor which functions in recruiting, tethering, and switching DNA polymerases at the primer-template junction. To coordinate these events, PCNA is post-translationally modified at residue K164 to either initiate error-free or error-prone repair (Ulrich, 2009). Mutation of the K164 residue, or deletion of PCNA ubiquitin ligase Rad18, resulted in a dramatic decrease in A:T mutations during SHM (Arakawa et al., 2006; Bachl et al., 2006; Langerak et al., 2007; Roa et al., 2008). This suggests that PCNA modification is regulated to cause DNA synthesis by Polη in activated B cells. Additionally, loss of PCNA ubiquitination in DT40 cell lines, but not mice, showed a decrease in overall mutation frequency, indicating an increased utilization of PCNA-Ub in DT40 (Arakawa et al., 2006; Bachl et al., 2006). Interestingly, in DT40 cells, the combination of a PCNA mutant and deletion of Rev1 showed almost complete loss on SHM, suggesting a potential inability of the canonical DNA replication machinery to bypass dU (Arakawa et al., 2006). Therefore it would be interesting to further understand the role of PCNA modification in SHM, as the K164 mutation also effects other modifications such as sumoylation and polyubiquitination.
Finally, do the UNG and MSH2-MSH6 pathways shown in Fig. 3 operate at the same time and compete for the same dU? Some models suggest they do (Rada et al., 2004; Schanz et al., 2009), whereas others propose the pathways are temporally separated during the G1 and S phases of the cell cycle (Krijger et al., 2009; Weill and Reynaud, 2008). A non-competitive model is appealing in that MSH2-MSH6 could recognize U:G in double strand DNA during G1, and UNG would be most active on single strand DNA in replication during S phase. This latter model is based on the recent finding that UNG is weakly expressed during G1 and is upregulated during early S phase (Hagen et al., 2008). However, Stavnezer and colleagues show that UNG is fully active during the G1 phase and that double strand DNA breaks are produced during G1 (Schrader et al., 2007). This process of breaks is dependent upon both UNG and MSH2-MSH6, suggesting that both pathways function in G1. Taken together, it is currently unclear when and by what nature UNG and MSH2-MSH6 function in relation to each other when processing AID-generated uracils.
Even with the shear girth of information on AID biology, it is still unclear how the cell fully coordinates deamination events with mutagenic repair. As mentioned above, the mechanisms of AM, GC, and CSR start with a single protein, yet require extensive cellular coordination to produce the initiating deamination. It has been established that AID is tightly regulated at the levels of transcription, translation, phosphorylation, ubiquitination, cellular localization, protein stability, and protein-protein interaction. While much has been discovered, many components are still unknown. It is clear that cis-regulatory elements and transcription are involved, yet no true recruiting factor has been identified for AID. AID specifically interacts with RPA, yet it is unclear how or if this interaction assists in AID localizing to single strand DNA. Does AID travel with the transcription machinery in association with CTNNBL1 alone or is it more complicated? These questions and many more require further studies to understand how AID is targeted to the Ig loci to cause SHM.
In addition to regulation of AID deamination, the processing of dU also plays a significant role in achieving efficient antibody diversity. Of specific interest is how a B cell can manipulate DNA repair to function either less efficiently or less faithfully at the Ig loci, while still maintaining overall genomic integrity. Does the frequency and/or proximity of deamination events overwhelm the faithful capacities of BER and MMR? Is DNA repair specifically inhibited at the Ig loci during an immune response, or does repair become error-prone throughout the cell? It has been reported that DNA repair can be more or less efficient in different regions of the genome (Alrefai et al., 2007; Liu et al., 2008), however, it is not known what mechanisms coordinate this phenotype. Therefore, it remains to be fully understood what role DNA repair plays in transforming AID-dependent uracils into mutations.
This work was supported entirely by the Intramural Research program of the NIH, National Institute on Aging. We gratefully thank Sebastian Fugmann and Huseyin Saribasak for insightful comments.