While it is accepted that AID mediates SHM and CSR by deaminating ssDNA within Ig genes, the nature of the ssDNA and how it is generated has remained elusive. In this report, we reveal that Ig genes and a non-Ig transgene that is mutated by AID are enriched for short ~7 nucleotide ssDNA patches. These ssDNA patches are reactive to sodium bisulfite only in intact nuclei
[18]. Indeed, the ssDNA patches described in this report are consistent with the parameters necessary for an
in vivo AID substrate, namely that: (1) these patches are found on both DNA strands thereby providing an explanation for the unbiased strand activity of AID during both SHM and CSR
[10],
[11] (
Figure S2D), (2) the size of these patches correspond with the preferred
in vitro substrate size for AID
[49], and (3) patch formation is dependent on transcription elongation, concurrent with the requirement of transcription for SHM
[5],
[6]. Furthermore, we found that the frequency of these ssDNA patches strongly correlates with SHM rates within the Ig V-region and within a non-Ig transgene (GFP) that is mutated by AID. Together these data not only provide better insight into the role of DNA accessibilty during SHM and CSR, they also provide a model and mechanism of formation for one of the possible
in vivo substrates of AID deamination.
The Ig switch regions have been shown to adopt a structure whereby the nascent sense transcript from the RNA polymerase II complex forms a RNA:DNA hybrid (i.e. R-loop) with the bottom DNA strand. The locations of these R-loops correlate with CSR
[24],
[50] and a large deletion of the switch region reduces both R-loop formation and CSR
[30]. In addition, inverting the γ1 switch region reduces CSR to γ1 by ~3 fold, and this inversion is thought to reduce R-loop formation
[51]. Thus, R-loops play an important role in producing ssDNA during CSR. However, while the R-loop model explains how the top strand becomes accessible to AID, on its own it fails to explain how the bottom strand would be targeted for deamination, a prerequisite for CSR, and thus other processes must be taking place to allow for AID to access the bottom strand. It was recently shown that sense transcription through the switch region is sufficient for CSR while antisense transcripts are dispensable for this process
[52]. This suggests that secondary structures produced during sense transcription elongation provide access to AID on both strands. Indeed, ExoI excision tracts can also potentially expose ssDNA on the bottom strand during CSR
[25]. Previous findings showed that the bottom strand in the Ig switch region does not harbour ssDNA
[24],
[53]. This result is most likely due to the method of bisulfite deamination, which was carried out on purified nucleic acid
[24],
[53]. By carrying out the sodium bisulfite assay on intact nuclei, we show that the region immediately upstream of the μ switch region harbours short ssDNA patches on both DNA strands that increase in frequency when primary B cells () or CH12F3-2 cells () are stimulated to undergo CSR. This assay also identifies long stretches of sodium-bisulfite conversion within switch regions in murine B cells
[18] or CH12F3-2 cells () which are likely caused by R-loop formation. Although the ssDNA patches are shorter (~7 nucleotides) within the switch region than the observed ssDNA within R-loops that can be kilobases in length
[24], the frequency of these patches are significantly higher (~1 patch per kilobase within the 5′μ switch region) than the frequency of R-loops in primary murine B cells (4% of switch regions contain R-loops)
[30]. Furthermore, since R-loops only form in G-rich sequences
[24], rare A:T rich switch sequences that are enriched for AID hotspot motifs, such as in
Xenopus laevis, are unlikely to adopt an R-loop configuration, but nonetheless support CSR
[54] indicating that they are accessed by AID through an unknown mechanism. In scenarios where R-loops are either decreased or absent, we suggest that ssDNA patches that are found on both DNA strands are sufficient to produce the AID-initiated staggered dsDNA breaks associated with CSR events. In addition, it is also possible that an AID induced mutation leads to Exo1-mediated excision of the top strand exposing the bottom strand to AID-attack
[25]. In the context of normal R-loop formation, we suggest that all of these processes cooperate to allow AID access to the non-template (top) and template (bottom) DNA strands.
It has been appreciated for some time that transcription of the Ig gene is required for both SHM and CSR processes
[5],
[6]. It was assumed that transcription produces the ssDNA necessary for AID reactivity
[7],
[15],
[16],
[32]. Our findings that transcription initiation
[18] and elongation inhibitors ablate ssDNA patches in the V-region in Ramos cells and the 5′μ switch region in CH12F3-2 cells ( and , respectively) and that nontranscribed genes contain no ssDNA patches (; CD4 gene) provides additional independent evidence that the short ssDNA patches observed in Ig sequences are transcription-dependent. However, it is unlikely that the ssDNA patches are produced by bisulfite conversion of ssDNA tracts within the RNA polymerase II transcription bubble itself. First, the ssDNA patches observed in this report are shorter than the predicted transcription bubble size of ~11 nucleotides
[19]. Second, a recent report shows that the RNA polymerase II complex transcribes the Ramos V-region only in the sense direction
[23] which would lead to ssDNA formation on the top strand only since the nascent transcript is expected to protect the bottom strand from bisulfite conversion. However, ssDNA patches are observed on both strands at approximately equal frequency in the Ramos V-region
[11] arguing against the transcription bubble as the source of ssDNA. Moreover, it is unlikely that AID can gain access to dC within the transcription bubble since it is largely occupied by the RNA polymerase II complex
[19],
[20]. In the context of the AID targeting factor Spt5, crystallographic studies of Spt5 complexed to RNA polymerase II show that ssDNA is buried within the active centre cleft of the transcription machinery
[21],
[22].
DNA supercoiling caused by transcription is a potential explanation for the ssDNA patches that we observe in Ig genes. Indeed, we observe an alteration in ssDNA frequencies in mammalian cells treated with the topoisomerase I inhibitor camptothecin, which can alter local DNA superhelicity. Furthermore, TopA-deficiency in
E.coli results in hyper-negative supercoiling of transcribed plasmid DNA, increased frequency of ssDNA patches, and AID mutagenic activity (). It is known that TopA-deficiency can lead to R-loop formation upon induction of transcription. However, R-loop formation does not occur during transcription induced negative supercoiling in a plasmid system when the nascent mRNA is translated
[55]. In our system, AID is actively transcribed and translated upon IPTG induction. In addition, TopA-deficiency did not lead to increased ssDNA patch lengths which would be expected if these were R-loops (). Furthermore, we did not observe an increase in ssDNA patches on the top strand in the TopA-deficient clones (
Figure S4B), which would be expected since the sense strand would be displaced by the transcript. These data are consistent with the notion that negative supercoiling is the likely source of the ssDNA patches.
Further support for negative supercoiling as the source of ssDNA patches that we observed is that AID and sodium bisulfite can deaminate supercoiled plasmid DNA but not relaxed linearized DNA
in vitro
[26]. During transcription, negative supercoiling develops upstream of the transcription complex
[40],
[41] and has been associated with melted DNA that can in turn lead to the formation of secondary DNA structures, such as stem loops and cruciforms
[42],
[43]. In contrast, positive supercoiling occurs downstream of the transcription complex
[40]. The dual effects of positive and negative supercoiling may work in concert to increase targets for AID. That is, as the transcription complex progresses through the gene leaving in its wake under wound and melted DNA, positive supercoiling downstream of the transcription complex may act to slow down or pause RNA polymerase II. Indeed, Canugovi
et al. observed that inducing pausing/stalling of T7 RNA polymerase resulted in the accumulation of multiple clustered AID-induced mutations
in vitro
[56], and AID was recently found to interact with Spt5, a factor associated with stalled RNA polymerase II
[3]. Rajagopal
et al. recently showed that RNA polymerase II complexes pause and accumulate upstream of the μ switch region
[57], which might serve to provide DNA structures that act as targets for AID during CSR. Indeed, the slight increase in ssDNA patch frequency that we observed in sequences that contained R-loops in the CH12F3-2 cells () correlates nicely with the findings of Rajagopal
et al.
[57] which showed an accumulation of RNA pol II just 5′ of the Sμ region, possibly due to R-loop formation. Furthermore, Wang
et al. have shown histone marks at the switch regions indicative of open and accessible chromatin and this finding was also associated with RNA polymerase II presence and stalling at switch regions
[58]. Stalling of the RNA polymerase II may not only result in the production and maintenance of R-loops, but may act to sustain secondary structures in the DNA produced by negative supercoiling. These findings suggest that the activity of AID on short ssDNA patches would largely be limited by the activity of topoisomerases, removing transcription-induced supercoiling and thus eliminating ssDNA patches for AID to act on. Indeed, Kobayashi
et al. showed that topoisomerase 1 mRNA and protein levels were reduced upon AID expression and this reduction was associated with altered DNA structure at the μ switch region, increased switch region cleavage, and increased CSR
[37]. The reduction in topoisomerase 1 may therefore lead to transcriptional pausing allowing for the increased duration of negatively supercoiled DNA that can be mutated by AID. On the other hand, complete inhibition of topoisomerase 1 by camptothecin might indirectly lead to the cessation of all RNA polymerase II transcription as the RNA polymerase may not be able to bypass the complex or lesion produced by the camptothecin thereby resulting in a reduction in ssDNA patches () and AID-induced DNA breaks
[37]. Another potential source for the generation of these ssDNA patches is the RNA exosome which was recently reported to associate with AID and stimulate AID activity to both DNA strands in a manner that is independent of replication protein A (RPA) or the phosphorylation status of AID
[4]. Our current and previous findings
[18] that the immunoglobulin genes are enriched for ssDNA patches on both strands is consistent with the activity proposed by the RNA exosome. Nevertheless, our current findings support the role of negative supercoiling in the generation of these patches, but do not preclude the involvement of the RNA exosome or the Spt5 factor. Future work will reveal whether these factors are in part or in whole responsible for the generation of these ssDNA patches that are observed in this report. Furthermore, it is important to note that while we suggest that ssDNA patches observed in this report are produced by transcription-induced negative supercoiling, ssDNA could be produced by other mechanisms, such as transcription-induced G4 DNA formation
[59], melting of DNA during replication, interaction of transcription factors with DNA, and DNA repair intermediates
[25].
While the evidence supports the notion that the bisulfite-accessible ssDNA patches that are observed in Ig genes are substrates for AID, their frequency is likely not the sole determinant of mutability. First, while the 5′ end of the V-region is enriched in ssDNA patches (), it does not harbour many mutations (
Figure S2D). Thus, near the V-region promoter, there is poor correlation between ssDNA patches and mutation frequency. In fact, previous studies have shown that the region near the promoter is spared from mutation (e.g.
[60]) for reasons that are not known. Our data clearly states that it is not because there are no ssDNA patches there, and hence there must be another reason for this result. Perhaps the explanation is that AID associates with the elongation RNA polymerase II complex, or AID associates with stalled RNA polymerases, and both of these don't occur near the promoter region. Second, non-mutating genes also harbour ssDNA patches (). Rather, our work suggests that ssDNA patches, which occur in transcribed genes and are produced by negative supercoiling, render DNA single stranded and accessible to AID, however some other molecular feature is required to target AID to Ig genes to mediate SHM and CSR. Thus, B cells have likely evolved several mechanisms to ensure enhanced targeting of AID to Ig genes. It is likely that multiple conditions must be met in a gene in order to produce the potential for high mutagenic activity by AID; these include a high frequency of ssDNA, the presence of specific cis-acting sequences
[61],
[62], a high degree of transcriptional pausing
[3],
[56], association with the RNA exosome
[4], and the association of AID to trans-acting factors which function to link AID to each of the above-mentioned conditions (e.g. transcriptional pausing and Spt5). Integration of these distinct targeting mechanisms would ensure that the Ig locus is preferentially mutated by AID over other genomic regions, while if any of these conditions on their own are met, it could subject that gene to low levels of AID activity
[38],
[63],
[64].