By means of deep sequencing, we have defined the global occupancy by AID and its cofactor RPA in the B cell genome. To better characterize the data, we further annotated the B cell genome by comprehensively mapping 36 epigenetic marks, the mRNA transcriptome, PolII, p300 and CTCF binding. We found that the association of AID with genes across the genome was widespread and correlated with activating chromatin marks. Among those we found acetylation of H3, H4 and H2B, as well as all three methylated forms of H3K4. By extension, genes with an inhibitory chromatin configuration (such as H3K27me3 and H3K20me3) failed to recruit AID or hypermutate. One example of this was Mycn
, which is epigenetically and transcriptionally silent, does not bind AID and is rarely involved in chromosomal translocations in the mature B cell compartment46,47
Our data have established a tight correlation between AID and PolII, as predicted by the coimmunoprecipitation of AID and PolII from ex vivo
–activated B cells29
. The pausing factor Spt5 is required for the linkage of AID to the transcription apparatus30
. In agreement with those data, we have now shown that AID associated mainly with paused polymerases at promoter-proximal sequences across the genome. Published observations have suggested a link between AID activity and pausing of gene transcription. For example, stalling of PolII at tandem repeats of the immunoglobulin S domain has been associated with AID activity during CSR41,42
. One interpretation of those results was that pausing of the transcription machinery might promote DNA deamination by facilitating the interaction of AID with ssDNA substrates41,42
. Consistent with that hypothesis, we found that AID occupancy and hypermutation coincided with stalling of divergent polymerases upstream of TSSs, a feature not previously appreciated. In addition, deep-sequencing studies have shown tight correlation between genome-wide recruitment of AID and sites of ssDNA (A.Y. and R.C., unpublished data). On the basis of these findings, we postulate that Spt5-stalled polymerases recruit AID across the genome, thus explaining the degree of AID’s promiscuity in B cells.
The large number of AID targets explains the broad genomic instability observed in primary and premalignant cells after sustained or aberrant AID expression14
. The data also underscore the decisive role of base-excision repair and mismatch repair in safeguarding the genome from promiscuous SHM. A pertinent example is the Myc
proto-oncogene, which accumulates substantial hypermutation in Ung−/−
cells but is fully protected in wild-type germinal center or activated B cells9,12
, as shown here. Despite efficient repair, however, the Myc
locus often participates in large-scale chromosomal alterations and translocations that are dependent on AID14,46,48,49
. Thus, high-fidelity repair at off-target sites is not sufficient to prevent (and perhaps even promotes) DNA breaks, chromosomal translocations and B cell malignancy. In this context, we have shown that AID occupancy at Myc
coincided precisely with mapped sites for canonical translocation breakpoints. We anticipate that the AID ChIP-seq data will help identify new tumor-inducing targets of AID.
The broad recruitment of AID in the B cell genome raises the question of whether AID has additional functions beyond diversification of immunoglobulin genes. Studies have linked cytidine deamination and AID to the elusive mechanism of DNA demethylation. In zebrafish, AID and Apobec2 seem to be required for the demethylation of exogenous DNA21
, and AID deficiency is reported to result in genome-wide hypermethylation of mouse primordial germ cells20
. The reprogramming of mouse-human heterokaryons, which requires the demethylation of promoters of genes encoding key transcription factors, is also facilitated by AID19
. On the basis of those observations, it has been proposed that high AID expression in germinal center B cells might also engage in active demethylation of the B cell genome19,20
. Our observation that AID deaminated basal promoters would be consistent with that hypothesis.
A prominent feature of our results is that whereas AID was highly promiscuous, its cofactor RPA seemed to be specific for immunoglobulin genes under the conditions tested. However, our data do not exclude the possibility that RPA is recruited to any given off-target site in only a fraction of the cells being assayed. A signal present in only some cells and in different genomic locations in subpopulations of cells would not be detected above background. Heterogeneity among dividing cells is probably the reason we did not detect interaction of RPA with the DNA-replication machinery in our nonsynchronized cultures. As expected from published work24
, the localization of RPA to Igh
requires phosphorylation of AID. The disparity between the genome-wide recruitment of AID and RPA is also consistent with the idea that RPA may function as an amplifier of AID activity on the immunoglobulin locus22,23
. In biochemical assays, RPA seems to stabilize ssDNA displaced by the transcribing holoenzyme22,23,50
, thus providing AID with a window of opportunity to initiate cytidine deamination. Given the established role of RPA in DNA repair27
, an additional, not mutually exclusive possibility is that RPA functions downstream of the initial DNA damage, for example, by stabilizing ssDNA exposed during the repair phase of an AID lesion. In conclusion, we have identified here the broad range of genes targeted by AID, and the epigenetic and PolII stalling signature associated with targeting. We have demonstrated that AID recruitment alone is insufficient to explain the difference in mutator activity at immunoglobulin and off-target genes.