|Home | About | Journals | Submit | Contact Us | Français|
Activation induced deaminase is the single B cell specific factor mediating class switch recombination and somatic hypermutation. Numerous studies have shown that AID preferentially targets Ig substrates and also attacks non-Ig substrates to create DNA damage that contributes to lymphomagenesis. AID targeting to Ig loci is linked to transcription but the mechanism governing this process has been obscure. Here we discuss research that illustrates the connection between AID targeting to DNA substrates and transcription processes to reveal rules governing the specificity of AID attack. These observations are woven together to provide a integrated view of AID function and a surprising linkage with global regulation of gene expression.
Humoral immunity is mediated in B cells by antigen receptors (BCR) that are composed of immunoglobulin (Ig) heavy (H) and light (L) chains. The antigen receptor component of Ig is assembled from multiple gene segments via V(D)J recombination during early B cell development in the bone marrow and is mediated by the RAG1 and RAG2 recombinases . In the peripheral lymphoid organs, B cells become activated by antigen and undergo somatic hypermutation (SHM) and class switch recombination (CSR) and upon terminal differentiation secrete Ig as antibody [2,3]. Antibody repertoires are diversified at high frequency during somatic hypermutation (SHM), by introducing mutations into Igh and Igl V(D)J exons. SHM occurs in germinal centers of the peripheral lymphoid organs where mutations are selected to produce BCR with increased affinity. Constant (CH) region genes encode the C-terminal domains of the Ig chains and determine IgH effector activities. IgH effector function is diversified through CSR, while retaining the original antigen binding specificity arising during V(D)J recombination. Activation induced deaminase (AID), a cytosine deaminase is essential for initiating both SHM and (CSR) in mature B cells **.
The mouse Igh locus includes eight CH genes, encoding μ, δ, γ3, γ1, γ2b, γ2a, ε, and α chains and each is paired with repetitive switch (S) DNA (with the exception of Cδ). The CH region genomic area spans 220 kb and is flanked by the intronic Eμ and 3′Eα enhancers . CSR is focused on S regions and involves intra-chromosomal deletional rearrangements that replace the initial Cμ with a downstream CH region gene (fig. 1). Prior to and during CSR, a S-S synaptosome is formed to facilitate proximity between the donor Sμ and a downstream acceptor S region [6–9]. Formation of the S-S synaptosome is critically dependent on long range chromatin interactions that are tethered by key transcriptional elements (reviewed in ). AID initiates CSR by deaminating cytosines in S regions in donor Sμ and a downstream acceptor S region (Box 1) . AID dependent DNA damage is focused to highly degenerate WRC hotspot motifs (W=A/T, R=A/G) that are found in V genes and at high density in S regions (reviewed in ). Conversion of AID induced lesions to DNA double strand breaks by general DNA repair factors has been extensively reviewed [2,3,11–17]. Degeneracy of the AID hotspot motif raises questions regarding the mechanism by which AID is targeted to its Ig substrates.
AID introduces DNA damage by converting deoxycytidine (dC) to deoxyuracil (dU) . The AID initiated DNA lesions are processed by engagement with base excision repair (BER) and mismatch repair (MMR) pathways to create mutations required for SHM and DNA double strand break (DSB) intermediates that consumed in CSR (reviewed in [2,3,11,14]). The observations that AID dependent dU residues are detected in the 5′Sμ region  and that uracil DNA glycosylase (UNG) is required for formation of double strand breaks (DSB) and mutations demonstrate that AID deamination initiates SHM and CSR [100,101]. Conversion of AID induced lesions to DNA double strand breaks is mediated by general DNA repair factors .
Transcription is a hallmark of SHM and CSR and transcriptional elements are candidates for providing specificity to AID targeting [5,18]. V region transcription initiates from a promoter 5′ proximal to the rearranged V(D)J exon and terminates 3′ of the CH region exons. CSR is focused to specific S regions by differential activation of germline transcription [10,19]. CH gene transcription units are comprised of the noncoding intervening (I) exon, an S region and a CH coding region. Germline transcription initiates at a transcription start site (TSS) 5′ of each I exon, proceeds through the S region and terminates downstream of the corresponding CH gene (fig. 1A). The V(D)J mutation profile has a sharp 5′ boundary ~120 bp downstream of the TSS and a less defined 3′ border ~1 kb downstream of the promoter. Alteration of the V region promoter position displaces transcription initiation and perturbs the mutation distribution [20*,21] thereby linking induction of AID dependent mutations with transcription. Similarly, AID induced DNA lesions in S regions begin ~150 bp downstream of the I exon TSS . A recent study shows that sequence intrinsic features target AID dependent DNA lesions to Ig templates . Although AID targeting to Ig substrates requires transcription, the unique transcriptional features that determine preferential AID attack at Ig templates have been difficult to discern.
AID induced double strand breaks (DSBs) in normal B cells occur at hundreds of non-Ig sites many of which are syntenic with sites of translocations, deletions, and amplifications found in human B cell lymphomas . Physiological levels of AID in GC B cells have been linked to deamination of a large cohort of non-Ig genes [25*–28] where the mutation rate is 20–100 fold lower than at Ig loci **. The findings that SHM and CSR are tightly linked to transcription [20*,29**,30] and AID interacts with RNA Pol II (RNAP II) [27*,31] have led to the notion that transcription and RNAP II-associated proteins might facilitate the binding of AID to target DNA sequences. Based on provocative new studies, we address three important interrelated questions regarding AID targeting. What is the mechanism by which AID preferentially targets Ig substrates and its corollary, how is AID attack at non-Ig substrates directed? Finally, is there an adaptive advantage for AID attack at non-Ig templates? We focus primarily on CSR as this area has been the most intensively investigated. Recent work implies that AID attack on non-Ig substrates may mediate functional outcomes that contribute to a biological “jackpot”.
S regions are repetitive, nonidentical, 1–12 kb long and guanine rich on the nontemplate strand . Deletion, inversion or replacement of S regions reduces CSR frequency indicating that S regions are specialized targets of CSR [33–36]. However, CSR S/S junctions are notable for their lack of consensus sequence or homology originating from site specific or homologous recombination. The degeneracy of the S region repeats and the absence of discernable recombination signal motifs have led to models in which higher order structures provide recognition motifs for the CSR machinery. In vitro transcription studies indicated that the looped out ssDNA nontemplate strand can assume specialized structures including stem loops , four-stranded G quartets  and R-loops (Box 2) [39,40]. In vivo studies confirm that transcription through S regions in vivo creates G quartets [41,42] and generates long stretches of R-loops that form on the nontemplate strand [33,43*,44] and that enhance CSR efficiency.
R-loops are composed of RNA:DNA hybrids in which the nontemplate ssDNA strand is looped out while the template strand is stably annealed to nascent S region RNA transcripts [43,102]. The presence of R-loops within a transcription unit can cause RNAP II stalling . Frequent R-loop mediated RNAP II stalling may account, at least in part, for enrichment of initiating RNAP II occupancy in S regions that in turn mediates the introduction of activating histone modifications and increased chromatin accessibility [45,104]. Accordingly, targeted inversion of an S region leads to loss of R-loop formation and reduction of CSR frequency . Furthermore, S region deletion leads to reduced RNAP II occupancy at flanking sites [45,104] and loss of activating histone marks . The enrichment of initiating RNAP II at promoter distal sites provides a plausible explanation for increased levels of histone activating marks and chromatin accessibility throughout the S region. Biochemical studies indicated that transcription generated R-loops within dsDNA permit AID directed deamination on the ssDNA nontemplate strand while the template strand remained blocked by the nascent RNA transcript (reviewed in [10,11]).
Epigenetic studies indicate that the initiating form of RNAP II phosphortylated on serine 5 (p-ser5) and activating histone marks decorate the length of transcribed S regions [45*,46]. In contrast, genome wide studies show that transcriptionally active genes are associated with promoter proximal enrichment of RNAP II p-ser5 coupled with activating histone modifications [47–51]. R-loops within a transcription unit can act as a structural impediment for RNAP II elongation and cause stalling (reviewed in ). S region inversion led to loss of R-loops and reduced CSR in vivo thereby linking R-loops and CSR . However, the contribution of RNAP II stalling in S regions to the CSR reaction has remained unclear.
Several AID binding proteins that facilitate association of AID with transcribed S DNA and modulate CSR have been identified in genetic screens, including the adaptor 14-3-3 protein  and the RNA processing and/or splicing factors, polypyrimidine-tract binding protein (PTBP2) , RNAP II stalling cofactor, Spt5  and the 11-subunit cellular noncoding RNA 3′–5′ exonucleolytic processing complex, RNA exosome **. SPT5 collaborates with the DSIF complex and negative elongation factor (NEF) to stall RNAP II p-ser5 at promoter proximal sites  and links RNAP II to splicing factors , capping enzyme [57,58], and the RNA exosome (reviewed in ). The RNA exosome contains 3′–5′ exoribonucleases that process structural RNA, degrade improperly processed pre-mRNAs and some long noncoding (lnc) RNAs of which GLT RNA hybridized to the S region in an R-loop is an example [52,59]. The RNA exosome interacts indirectly with RNAP II via SPT5 and SPT6 **. Deletion of PTBP2, SPT5 and RNA exosome components impair CSR and link transcription processes with AID recruitment and function at S DNA (reviewed in ). These observations directly demonstrate that the transcriptional machinery associated with RNAP II stalling, the presence of RNA:DNA hybrid structures as impediments to RNAP II elongation and the release of those obstacles by means of the RNA exosome are integral to targeting AID to S region DNA.
A recent provocative study has identified another regulatory feature contributing to AID targeting specificity that is directly dependent on expression of the long noncoding GLT RNA from transcribed S regions. Evidence indicates that AID functions as an RNA binding protein with specificity for S region GLTs through G-quadruplexes structures and is required for CSR . Indeed, AID is one of 12 members of the APOBEC family of DNA/RNA cytidine deaminases and has recently been shown to mutate small RNA genes when expressed in yeast  suggesting a role for RNA in the recruitment of AID to S regions. Furthermore, short RNA segments have been shown to function as specific guides for nuclease modification of the genome  and to regulate DNA rearrangements in ciliates . The requirement for G rich S region guide-RNAs for AID targeting implies that AID off target genes might also express non-coding RNA that functions in a similar capacity. Thus, AID targeting to Ig loci is dependent on transcription machine components.
Mistargeting of AID to non-Ig substrates has been implicated in the pathogenesis of B cell lymphomas and chromosomal translocations [63,64]. In the Western world, the vast majority of lymphomas arise in B cells that actively engage in SHM and CSR  that in turn generate recurrent chromosomal translocations . Normal mature B cells are particularly prone to dynamic AID dependent chromosomal translocations that juxtapose Ig genes and proto-oncogenes. The high frequency of these events allows detection in the absence of selection (reviewed in ). The preferential focus of AID to Ig loci and the nonrandom aspect of AID attack on non-Ig genes implies a determinant set of rules governing AID targeting. Recent GRO-seq studies indicate that most AID induced off-target translocations occur at defined regions of target genes in which sense and anti-sense transcription converge **. Strikingly, convergent transcription is due to antisense transcription originating from super-enhancers within sense transcribed gene bodies . Super-enhancers are clusters of enhancers that have prominent roles in cell type specific processes [68,69] and express relatively high levels of enhancer RNAs (eRNA) . Super-enhancers have the propensity to associate in three dimensional nuclear space via long range chromatin interactions  that are mediated by lineage specific transcription factors (TFs) . Hence, lineage specific TFs indirectly facilitate AID initiated chromosomal translocations **.
The RNA exosome is an RNA surveillance complex that degrades a variety of non-coding RNAs (reviewed in . RNA species arising in response to exosome subunit deficiencies have revealed the identities of non-Ig genes and intergenic regions that are the focus of AID mediated translocations [52,72]. Several types of non-coding exosome substrate RNAs accumulate in the transcriptomes of exosome-deficient B cells including, transcription start site (TSS)-associated antisense transcripts (xTSS-RNAs). The xTSS-RNAs are divergently transcribed from cognate coding gene transcripts that accumulate R-loops, AID-mediated mutations, and/or are frequent translocation partners of Igh DNA DSBs in B cells . A subset of xTSS-RNAs originate as overlapping sense and antisense x-eRNAs at super-enhancers and are located at recurrent translocation hotspots . It is not clear whether the characterization of these transcripts as divergent represents a real or semantic difference with the “convergent” transcripts defined by Meng et al. described above. Collectively, these findings begin to define a set of predictive transcriptional features that characterize Ig– and non-Ig substrates of AID.
The prevalence of AID off-target sites implies that this phenomenon may have adaptive advantages. Low levels of AID have been detected in a number of tissues and cell types including early stage B cells [73,74], and in pluripotent cells such as embryonic stem cells * and spermatocytes  where it may have an unanticipated function in cellular reprogramming [77–80]. AID expression can be upregulated in some instances of infection  and under conditions of chronic inflammation that promote tumorigenesis [81–85]. Intriguingly, inflammatory stimuli (LPS) robustly induce AID in primary pro-B cells and Abelson transformed B cell lines leading to CSR prior to V(D)J joining  and mutations associated with acute lymphoblastic leukemia [87,88]. Strikingly, AID expression in transitional B cells appears to mediate tolerance [89,90] by a B cell intrinsic mechanism . It is intriguing to speculate that by analogy to new gene expression linked to RAG recombinase induced DNA DSBs , AID induced DSBs direct new gene expression leading to functional outcomes including immune tolerance.
One provocative explanation for AID expression during development and in early B cells is related to DNA demethylation. DNA methylation is an epigenetic modification that is considered central to the establishment and maintenance of stable cellular identities (reviewed in ). The processes by which methylation is removed from cytosine were unclear until recent studies indicated active modes of DNA demethylation that involve modification of the meC base coupled to DNA repair. One pathway proceeds through oxidation catalyzed by the TET (ten eleven translocation) enzymes [93,94]. A second pathway uses AID, which promotes DNA demethylation through direct deamination of meC to thymidine *, and subsequent repair of the resultant T:G mismatch by classical repair pathways [77,79,80,95]. Evidence suggests that AID’s demethylation activity is required for reprogramming in zebrafish embryos  and in mice [79,96]. AID interacts with and demethylates the promoters of the OCT4 and NANOG genes during reprogramming of human fibroblasts fused to mouse ES cells . In mice, DNA demethylation is mediated by base excision repair (BER) through AID/Apobec deamination of 5meC to thymidine followed by G:T mismatch repair by TDG [75*,79,95] (Box 1). Notably, differentially methylated cytosines between naïve and GC B cells are enriched in non-Ig genes that are targeted by AID for SHM, and these genes form networks required for B cell development and proliferation . Thus, emerging evidence implies that genome wide DNA demethylation by AID may be an important mechanism toward global regulation of gene expression programs .
Recent discoveries indicate that AID targeting to Ig and non-Ig loci is highly integrated with the transcription machinery and non-coding RNA biogenesis and degradation implying that off-target events are regulated and might be adaptive. Parallel studies link AID to active genome wide DNA demethylation. The final link in this chain is a new study that shows that gene loci hotspots for AID dependent mutagenesis are also DNA demethylated and their expression is networked. This raises the intriguing possibility that what we had previously considered AID off-targeting is actually integral to functional gene regulation. More work is necessary to define the molecular mechanism mediating apparently non-canonical AID effects at non-Ig loci.
This work was supported in part by the National Institutes of Health (R01AI121286, R21AI117687, R21AI117687) to A.L.K.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.