|Home | About | Journals | Submit | Contact Us | Français|
We have assembled, annotated, and analyzed a database of over 1700 breakpoints from the most common chromosomal rearrangements in human leukemias and lymphomas. Using this database, we show that although the CpG dinucleotide constitutes only 1% of the human genome, it accounts for 40–70% of breakpoints at proB/pre-B stage translocation regions – specifically, those near the bcl-2, bcl-1, and E2A genes. We do not observe CpG hotspots in rearrangements involving lymphoid-myeloid progenitors, mature B cells, or T cells. The stage-specificity, lineage-specificity, CpG targeting, and unique breakpoint distributions at these cluster regions may be explained by a lesion-specific double-strand breakage mechanism involving the RAG complex acting at AID-deaminated methyl-CpGs.
Chromosomal rearrangements arise when DNA double-strand breaks (DSB) occur at two separate regions of the genome and the resultant DNA ends are aberrantly joined into a new configuration. Pathologic rearrangements are especially frequent in human lymphomas and leukemias, and the most common are the bcl-2 and bcl-1 translocations. However, the molecular mechanisms underlying the DSBs at the bcl-2 and bcl-1 regions are unknown.
Three main types of DSB mechanisms have been identified in human lymphoid malignancies: (1) V(D)J-type, in which the RAG endonuclease creates a site-specific DSB 5′ of a recombination signal sequence (RSS) with a consensus of CACAGTG (12 or 23 spacer) ACAAAAACC in early B or T cells (Bassing et al., 2002; Dudley et al., 2005; Schatz, 2004; Schlissel et al., 2006; Tonegawa, 1983); (2) class-switch recombination (CSR) type, in which activation-induced cytidine deaminase (AICDA or AID) deaminates cytidines on the single-stranded portions of kilobases-long R-loops at immunoglobulin (Ig) switch regions, leading to DSBs in activated B cells (DiNoia and Neuberger, 2007; Ramiro et al., 2004; Unniraman and Schatz, 2006); and (3) random-type breakage, presumed to be due to sequence-independent mechanisms such as reactive oxygen species (ROS) or ionizing radiation which generate DSBs randomly throughout the genome (Greaves and Wiemels, 2003; Lieber et al., 2006).
As we and others have shown, DSBs at the bcl-2 and bcl-1 do not fit into any of these three mechanisms (Jaeger et al., 2000; Marculescu et al., 2002; Raghavan et al., 2004a; Welzel et al., 2001). Rather, breakpoint sequence analysis reveals that DSBs at the bcl-2 and bcl-1 regions, and additionally a third region, E2A, fall into a fourth type of DSB mechanism, CpG-type, marked by a high degree of focusing to the dinucleotide sequence, CpG, and occurrence only in the pro-B/pre-B stage. We propose a novel mechanism of translocations at methyl CpG sequences in early B cells due to the sequential action of AID and the RAG complex.
Human lymphoid malignancies are predominantly B cell in origin, and many involve chromosomal translocations between one site and the Ig loci, or between two non-Ig loci (Fig. 1). Though the DSBs at the Ig loci are well-characterized, the cause of the DSBs at non-Ig loci, such as bcl-2, bcl-1, and E2A, are poorly understood.
The first clue for determining how a DSB occurred is its precise location. Free DNA ends are often degraded prior to joining, thus the position of the initial DSB, before degradation, can be difficult to determine. The closest approximation is the breakpoint, or exact position where the rearranged allele ceases to match the normal alleles. Analysis of 695 human translocation breakpoints on bcl-2, bcl-1, and E2A revealed a distinctive pattern of clustering around CpGs (Table 1, Fig. 2, Suppl. Table 1, Suppl. Figs. 1–3). Breakpoints at CpGs account for 43% of events at bcl-2, 35% of events at bcl-1, and 53% of events at E2A (Fig. 2, Suppl. Fig. 3). These frequencies are dramatically higher than the 6% frequency one would expect if breakpoints occurred randomly throughout the breakpoint region (Suppl. Tab. 1, Suppl. Fig. 3) and may be underestimates of the true frequency occurring at CpGs. Due to DNA end degradation prior to joining, breakpoints close to, but not directly at, CpGs may have originated from CpG positions (Suppl. Fig. 3).
Within the bcl-2 region, breakpoints are highly concentrated at three small clusters. Fifty percent occur in the major breakpoint region (MBR); 13% in the intermediate cluster region (icr) about 18 kb centromeric to the MBR; and 5% in the minor cluster region (mcr) about 11 kb centromeric to the icr (Suppl. Fig. 1A) (Weinberg et al., 2007).
Of the bcl-2 translocations sequenced, 88% occur in the 175 bp MBR, which itself contains three breakpoint peaks (Fig. 2A). Strikingly, CpGs are found at the highest spikes of all three peaks. Of 487 breakpoints in the MBR, 208 (43%) occur precisely at one of the 5 CpG sites in the region (p<10−95). The average MBR breakpoint is 4.4 bp from the nearest CpG, compared to 11.2 bp if the breakpoints were randomly distributed (p<10−41) (Suppl. Table 1).
At the 105 bp icr, eight of the 11 sequenced breakpoints occur at one of the two CpGs (p<10−7), and the remainder are less than 6 bp away from a CpG (Fig. 2B). In contrast, none are observed in the 75 bp region between the two CpGs. Overall, the average distance of these 11 breakpoints from a CpG site is 0.8 bp, versus 22 bp if they were randomly distributed (p<10−6) (Suppl. Table 1).
The 561 bp mcr contains six CpGs, spaced widely apart (Fig. 2C). Again, breakpoints are located very close to CpGs, while intervening regions are devoid of breakpoints (Fig. 2C). Of the 19 sequenced breakpoints, 14 are located directly at CpGs (p<10−17). The average distance between a breakpoint and a CpG site is 0.6 bp, in contrast to 40 bp if these breakpoints were randomly distributed (p<10−12) (Suppl. Table 1).
In addition to the breakpoints located in the three cluster regions described above, 27 sequenced bcl-2 breakpoints are located between these three cluster regions. Of these, 13 (48%) are located less than 10 bp away from a CpG, including 5 (19%) directly at CpGs (p=0.002) (Suppl. Table 1).
These findings strongly suggest that CpG is a DSB hotspot in the MBR, icr, and mcr clusters of bcl-2. Statistical analysis using the binomial distribution, Student’s t-test, and Mann-Whitney U-test indicates that the proximity of these breakpoints to CpGs is highly significant (Suppl. Table 1).
In mantle cell lymphomas, 30% of translocations occur within the 150 bp major translocation cluster (MTC) (Bertoni et al., 2004) which contains 7 CpGs (Suppl. Figs. 1B and 2). The MTC contains 104 of the 114 bcl-1 translocation breakpoints sequenced, 38 (37%) of which occur at CpGs (p<10−8). MTC breakpoints average 2.5 bp from CpGs, compared to 7.8 bp if breakpoints were distributed randomly over the 150 bp region (p<10−11). Notably, the ten breakpoints outside the MTC average 8.8 bp from CpGs, in contrast to 45.3 bp if randomly distributed (p=0.0009) (Suppl. Table 1). Hence, CpGs appear to be DSB hotspots in the MTC as well.
Most of the breakpoints at E2A occur within a 23 bp zone containing two CpGs (Wiemels et al., 2002) (Suppl. Fig. 1). Fifteen of the 24 sequenced breakpoints occur at CpGs (p=0.0001), averaging 1.1 bp to the closest CpG, versus 3.3 bp if they were randomly distributed in the 23 bp region (p=0.0005). Interestingly, the corresponding breakpoints on PBX1 do not have a tendency to occur at or close to CpGs (p>0.9) and have only a slight tendency towards CAC (p=0.05) (the minimal V(D)J recombination motif), though not directly at CAC (p>0.1) (Suppl. Table 1). Therefore, CpGs appear to be DSB hotspots in the E2A cluster but not in PBX1.
To confirm our observation, we examined breakpoint proximity to all the other dinucleotide motifs, to A, C, G, T, W, and S, and to CAC/GTG (Suppl. Table 1 and Supplementary text). While certain motifs have lower average distances than CpG (e.g., the mononucleotide motif S (C or G, of which CpG is a subset)), they are not statistically significant. Certain motifs score significantly because they happen to occur next to CpGs but not nearly to the same level as CpG, and these never extend to other translocations or clusters. Among the bcl-2 MBR, bcl-2 icr, bcl-2 mcr, bcl-1 MTC, and E2A breakpoint clusters, and to a lesser degree the unclustered breakpoints (the ones at the bcl-2, bcl-1 or E2A gene loci, but not within the preferred breakpoint zones), CpG is consistently the most significant motif in three different statistical tests.
Examination of both derivatives of a reciprocal translocation helps localize, more precisely, the position of the originating DSB (Suppl. Fig. 4A-B). This is because the two breakpoints define a window within which initial breakage likely occurred. This information is only available for a subset of the translocations analyzed above: 46 cases from bcl-2, 23 from bcl-1, and 6 from E2A (Suppl. Fig. 2). Of these, 32 (70%) from bcl-2, 15 (65%) from bcl-1, and 4 (67%) from E2A have one or both breakpoints at CpGs. Significantly, 1 case from bcl-1 and 9 cases from bcl-2, including one case from the bcl-2 mcr, exhibit a breakage window of zero. This strongly suggests that the precise position of the initial DSB was at CpG in these latter 10 cases.
In order to elucidate the mechanism behind CpG-type translocations, we sought to determine the developmental stage(s) in which they occur. An overview of the major recurrent rearrangements in myeloid and lymphoid tumors is provided in the supplemental text, and the results of our and others’ analyses are shown in Fig. 4.
Staging translocations is not obvious because tumors can be at one stage of development but contain translocations from prior stages (Fig. 4B–C). For instance, bcl-2, bcl-1, c-myc, and bcl-6 translocations are all found in mature B-cell lymphomas. However, we know that bcl-2 and bcl-1 translocations occur at the preceding pro-B stage because they join to IgH RSSs, characteristic of V(D)J recombination which occurs during the pro-B stage. On the other hand, c-myc and bcl-6 translocations occur at the mature/activated B-cell stage, as they involve the IgH switch regions, characteristic of CSR, which occurs during the activated B-cell stage. Similarly, LMO2, HOX11, TTG1, and SCL translocate with TCR RSSs during the pro-T or pre-T stage. But for translocations where neither of the DSBs can be assigned to a stage of development, additional information is required.
Junctional additions at translocation junctions provide such a clue. The presence of apparently random nucleotides between the two breakpoints at a translocation junction is characteristic for the activity of terminal deoxynucleotidyl transferase (TdT), which is normally expressed only in pro-B/pre-B or pro-T/pre-T cells, and aberrantly in some AMLs. As expected, bcl-2, bcl-1, LMO2, HOX11, TTG1, and SCL translocations contain high proportions of junctional additions, while bcl-6 and c-myc translocations contain relatively low proportions (Fig. 4A).
The high proportion of junctional additions in E2A-PBX1 translocations indicates that they occur at the pro-B/pre-B stage, as has been described previously (Wiemels et al., 2002). Additionally, TEL-AML1 and MLL translocations likely occur before the pro-B stage, prior to TdT and RAG expression, as previously inferred biologically (Castor et al., 2005; Hong et al., 2008; Hotfilder et al., 2005; Jansen et al., 2007; Pine et al., 2003) (Fig. 4). BCR-ABL translocations in primary ALLs (not ALLs resulting from CML blast crisis) have been proposed to originate in B-cell progenitors, as opposed to the lymphoid-myeloid hematopoietic stem cell (LM-HSC) origin of BCR-ABL translocations of CMLs (Castor et al., 2005). Nevertheless, BCR-ABL translocations of primary ALLs appear to occur prior to expression of TdT and RAGs, as none of the nine junctions obtained contained nucleotide additions. Based on these analyses, we stage the lymphoid and myeloid translocations as in Fig. 4C.
While CpG-type DSBs are prominent in the pro-B/pre-B stage translocations (i.e. bcl-2, bcl-1, and E2A), no significant CpG proximity is found in rearrangements at the LM-HSC stage. In TEL-AML1 translocations, there is no significant CpG proximity among 53 breakpoints spread across 13 kb of TEL or their partners scattered over 165 kb of AML1. The same is true for breakpoints involving the 8.6 kb MLL region in 291 cases of primary ALL, 24 cases of primary AML, and 35 leukemias secondary to topoisomerase II-inhibitor treatment. AML1-ETO translocations show no significant CpG proximity among 67 breakpoints strewn over 24 kb of AML1 or 66 breakpoints dispersed across 57 kb of ETO. Philadelphia chromosome t(9;22)(q34;q11) translocations from CMLs are no different, at either 35 breakpoints found in the 2.8 kb major breakpoint cluster region (M-BCR), or the 25 locatable breakpoints stretched over 125 kb of ABL.
No significant CpG proximity is found in mature/activated B-cell translocations. Neither 125 breakpoints on c-myc, nor 37 breakpoints on bcl-6 spanning 2 kb are significantly close to CpG motifs. Interestingly, c-myc and bcl-6 have higher CpG densities than the bcl-2 MBR and bcl-1 MTC, yet only 17% of breakpoints occur at CpGs in c-myc and only 19% in bcl-6, compared to 43% for the MBR and 37% for the MTC.
T-cell and some pre-B cell rearrangements tend to use a V(D)J-type rather than a CpG-type mechanism, occurring very nonrandomly next to CACs (the minimal heptamer sequence of the RSS) rather than CpGs (Suppl. Table 1, Fig. 3, Suppl. Fig. 3). 208 of 209 SCL-SIL breakpoints spanning 89 kb, 35 of 41 lymphoid Δp16 breakpoints spanning 231 kb, all 13 HOX11 breakpoints, and all 6 TTG-1 breakpoints are compatible with a typical V(D)J-type breakage mechanism. Δp16s from B-ALLs behave similarly (Kitagawa et al., 2002). CpGs are often found in the vicinity of the applicable CACs, but are not used as the site of the breakage.
The 19 SCL breakpoints from SCL-TCR translocations cannot be confidently assigned to a single breakage type, and could be a mixture which includes random-type breakage. Statistical analysis is ambiguous, being significant for both CpG and CAC. CAC is more dominant in 31 LMO2 breakpoints, of which 21 are compatible with V(D)J recombination; however, the remainder could be other types (CpG plus random breaks).
CpG-type breakpoint clusters differ in width and shape from V(D)J-type, CSR-type, and random-type DSBs. In rearrangements such as BCR-ABL and TEL-AML1, breakpoints extend over tens or hundreds of kilobases and different leukemias rarely share the same breakpoint. Presumably, the breakage mechanism is sequence-independent and, thus, any clustering of breakpoints is due to random chance, selective growth advantage, and/or chromatin accessibility (Greaves and Wiemels, 2003). Such broad clustering effects, however, do not explain how V(D)J-type and CpG-type rearrangements can occur over equally large or larger regions but strongly prefer small windows and even specific nucleotide positions.
The distribution of V(D)J-type and CpG-type breakpoint clusters are also very different from one another, suggesting that they operate by very different mechanisms. V(D)J-type clusters have a “spike-and-single-sided tail” distribution, because a site-specific break first occurs 5′ to the CAC of the RSS, followed by limited recessing only into the pseudo-coding end side of the break (Fig. 2D and E, Suppl. Fig. 5). However, CpG-type breakpoint hotspots are more bell-shaped, with breakpoints on either side of the highest CpG spike; therefore, initial breakage likely occurs around CpG sites with less precision than the V(D)J-type (Fig. 2A, Suppl. Fig. 5). Heterogeneity in nick sites is not typical of sequence-specific processes such as V(D)J recombination, or sequence-specific nucleases such as restriction enzymes; it is, however, consistent with structure-specific nucleases.
In addition to cutting sequence-specifically at an RSS, the RAG complex is also a structure-specific endonuclease (Raghavan et al., 2007; Raghavan et al., 2005a; Santagata et al., 1999). Moreover, RAGs are expressed during the pro-B/pre-B stage when CpG-type DSBs occur, and not expressed during the LM-HSC and mature/activated B-cell stages when CpG-type DSBs do not occur. Thus, the RAG complex is a natural candidate for involvement in CpG-type DSBs. We sought to determine the flexibility of the structure-specific nuclease activity of the RAG complex as it could relate to CpGs.
In accord with its structure-specific endonuclease activity, we have previously shown that the RAG complex can nick large structural deviations such as three to ten-base bubbles and heteroduplexes, but not one-base bubbles (Raghavan et al., 2007; Raghavan et al., 2005a). However, after lengthening the oligonucleotide substrates, we find nicking can occur at one-base bubbles (mismatches) and nicking efficiency appears to correlate with structural instability (Fig. 5A, Suppl. Fig. 6) (Peyret et al., 1999). The most unstable mismatches, C:T and C:A, are nicked at about 6% the efficiency of a very strong RSS. The more stable mismatches T:T and T:G are nicked about six-fold less efficiently, while control C:G and T:A duplexes were not detectably nicked. Although nicking of the T:G mismatch is less efficient than that at C:T and C:A, it is still quite substantial and greater than at certain known sites of V(D)J recombination such as SCL and SIL (Zhang and Swanson, 2008).
The RAG complex also cuts on the anti-parallel strand across from nicks, gaps, and flaps at a rate similar to or higher than that for unstable mismatches in vitro (Fig. 5B). Such activity results in DSBs. Importantly, the nick, gap, and flap substrates display a heterogeneous nicking pattern around the lesion, similar to the pattern of breakpoints seen around CpG at the bcl-2 MBR.
Mechanistically, the RAG complex is capable of generating DSBs in a manner consistent with CpG hotspots.
In vertebrates, CG is a very special dinucleotide due to methylation of the C5 position of cytosines in the CG context. Spontaneous deamination of 5-methylcytosine results in thymine, and thus a T:G mismatch. Unmethylated cytosines also deaminate, but result in U:G mismatches (Suppl. Fig. 7). Normally, both types of lesions are corrected by base-excision repair (BER). A glycosylase first removes the damaged base, creating an abasic site. APE1 or spontaneous β-elimination then creates a single-strand gap, which is filled-in by polymerase β, and the resulting nick is sealed by ligase III:XRCC1 (Friedberg et al., 2006; Lindahl et al., 1997; Nash et al., 1997). But when DNA replication occurs before BER can be completed, a polymerase places adenine across from the deaminated base, resulting in a stable point mutation.
T:G mismatches, however, are thought to be more persistent than U:G mismatches. T:G mismatch processing glycosylases, thymine DNA glycosylase (TDG) and methyl-CpG binding domain protein 4 (MBD4), are >2500-fold less efficient in vitro than uracil DNA glycosylase (UDG), which processes U:G (Schmutte et al., 1995; Walsh and Xu, 2006). As a result, CpG to TpG transitions account for a disproportionate number of mutations in p53, BRCA1, BRCA2, p16, and Rb across a wide range of neoplasms (Pfeifer, 2006). Moreover, over evolutionary time, CG has become statistically underrepresented in vertebrate genomes, but is preserved in small and functionally important regions around promoters termed CpG islands (Antequera et al., 1990; Bestor, 2003; Bird, 1992; Takai and Jones, 2003; Yang et al., 1996).
Persistent T:G mismatches, per se, do not lead to DSBs. However, as demonstrated above, the RAG endonuclease could act at the T:G mismatch and/or the vulnerable nick and gap intermediates during the BER process. Direct nicking of the T:G mismatch would result in flap-mismatch lesions, which unlike normal V(D)J-type nicks at CACs, would not be religated readily by ligase I due to the mismatch at the point of ligation (Tomkinson et al., 2006). Alternatively, a one-base gap could first be generated from normal processing by TDG or MBD4 and APE. A subsequent RAG nick across from such a nick or gap would then result in a DSB. We observe that these nick sites are heterogeneous rather than site-specific in vitro, consistent with the heterogeneity in patient breakpoints at CpG hotspots. Involvement of the RAG complex would therefore explain two key aspects of CpG hotspots: (1) why CpG-type rearrangements only occur at the pro-B/pre-B stage during which the RAG complex is expressed and, (2) why the distribution of CpG-type hotspots is heterogeneous around CpGs (Figs. 3, ,66 & Suppl. Figs. 5).
However, T cells also express RAGs but do not appear to have CpG-type rearrangements. One way to account for this may be the recent finding of activation-induced deaminase (AID) expression in early B cells (Crouch et al., 2007; Feldhahn et al., 2007; Mao et al., 2004), in addition to its established pattern of expression in mature activated B cells (and its known absence from T cells). AID in early B cells is consistent with previous work showing that pre-B cell lines can undergo class switch recombination (Alt et al., 1982; Burrows et al., 1983). AID deaminates cytosines in single-stranded DNA in mature B cells during class switch recombination (at R-loops) and somatic hypermutation (DiNoia and Neuberger, 2007). Lower but still significant rates of deamination exist for genomic regions which are outside of the Ig loci, such as c-myc and bcl-6 (Liu et al., 2008). Importantly, AID can act on methylated cytosines (Bransteitter et al., 2003; Morgan et al., 2004). Therefore, involvement of AID would explain why CpG-type rearrangements only occur in the B-cell lineage, and its cytosine deamination activity is exactly the same as the basis for enrichment of point mutations at CpGs in cancers and CpG suppression.
Based on this evidence, we propose the lesion-specific DSB mechanism diagrammed in Fig. 6. A low level of AID expression in early B cells leads to an increased rate of cytosine deaminations, which are adequately repaired at unmethylated cytosines but persist at methylated cytosines as T:G mismatches. These sites are then converted to DSBs by RAGs, before or during attempted repair by T:G mismatch repair enzymes. The resulting DNA ends join to ends from a V(D)J recombination or other double-strand break event and lead to the chromosomal translocations observed so frequently in human B-cell lymphomas.
For the most part, mice do not develop lymphomas and leukemias in the same way as humans. In the United States, human lymphomas are overwhelmingly of B-cell origin, tend to occur in older individuals, and most contain clonal translocations (Jaffe, 2001; Wada et al., 1993). By contrast, lymphomas in mice, including p53−/− and p53+/− mice, are mostly of T-cell origin and lack clonal translocations (Bassing et al., 2008; Liao et al., 1998). The one exception is mouse plasmacytomas with t(12;15) c-myc-IgH translocations induced by pristane and IL-6 (DiNoia and Neuberger, 2007; Muller et al., 1995). The equivalent in humans is the t(8;14) described previously, which occurs at the mature/activated B-cell stage and not the pro-B/pre-B stage of interest.
The observed incongruency between mouse and human lymphomagenesis may be explained in several ways. First, point mutations of critical genes are sufficient to drive T-cell lymphomagenesis at early ages in mice, masking slower-developing tumors. Experimental demonstration for this comes from a study where mice constitutively expressing c-myc developed only T-cell lymphomas, while RAG1-knockouts constitutively expressing c-myc developed a range of hematological malignancies, including B-LBLs, pre-T cell lymphomas, myeloid progenitor tumors, and cutaneous macrophage tumors (Smith et al., 2005).
Second, translocations and even oncogene overexpression often have no effect on mouse lymphomagenesis. Mice with genome-wide instability, such as H2AX−/−, develop many different translocations, but do not develop tumors (Bassing et al., 2008). Overexpression of the bcl-2 or bcl-1 oncogenes in mice can cause hyperplasia but does not lead to lymphoma (Lovec et al., 1994; McDonnell et al., 1989). Even asymptomatic humans can harbor B cells with t(14;18) translocations (Liu et al., 1994), suggesting additional mutations may be required to drive these translocation-bearing B cells to malignancy.
Third, few mouse models are checked past one year. It may be of some value to follow mice into older ages in an attempt to detect slower-developing tumors. However, even calorie-restricted mice do not live past three years, which may still be simply too short in comparison to human lymphomas which develop in the 5th or 6th decades of life.
According to the proposed mechanism, AID-overexpressing mice should have increased rates of translocation in both the B and T lineages. Similarly, RAG-overexpressing, MBD4-knockout, UNG-knockout, and MSH2-knockout mice should develop more translocations in germinal center B-cells.
AID-overexpressors developed only T-cell lymphomas with no clonal translocations (Okazaki et al., 2003). As described above, point mutations may be driving lymphomas in these mice and masking latent tumors.
Mouse B-cell follicular lymphomas arose in UNG−/− mice after 12 months, but these were not characterized for translocations (Nilsen et al., 2003). MSH2−/− mice developed B and T lymphoblastic lymphomas, and these were not examined for translocations either (Reitmair et al., 1995). It may be valuable to assess them for translocations, though it would not be surprising if AID-induced p53 mutations or c-myc-IgH translocation events were driving these lymphomas. MBD4 knockout mice are not especially susceptible to tumors (Wong et al., 2002). RAG1/2-overexpressing mice are small and die early (Barreto et al., 2001).
Thus CpG-type translocations, as they occur in humans, are not observed in existing mouse models. Part of the problem may be that most studies focus on lymphomagenesis and, with the exception of c-myc, translocations in mice typically do not cause lymphomas. More sensitive assays, such as high-throughput methods, may be required to detect any increases in low-frequency translocations events.
If the theory is correct, it has some intriguing implications. First, individuals whose pro-B/pre-B cells have higher AID activity would be more susceptible to these rearrangements. Second, individuals with SNPs at CpGs within the MBR and MTC, as well as individuals whose pro-B/pre-B cells are unmethylated at those CpGs, would be less susceptible to these rearrangements than individuals in which those CpGs are methylated. However, it is important to note that removal of any single CpG would result in only a small drop in DNA breakage in the MBR and MTC. The model predicts that individuals with SNPs at specific CpGs would not have chromosomal translocations at those SNP sites. The emerging affordability of high-throughput sequencing may allow one to investigate whether the normal pro-B/pre-B cells in patients with such lymphomas have higher levels of AID activity and/or DNA methylation at the breakpoint cluster regions.
There are a multitude of approaches one may take to test the theory by conventional experimental means. The main problem seems to be the exceedingly low frequency of deamination events and rearrangement events. Without accounting for repair, the frequency of spontaneous hydrolytic deamination of cytidines in duplex DNA is on the order of 10−6 per cytosine over 14 days (Frederico et al., 1990). In mice, the highest reported rate of AID-catalyzed mutation at non-V(D)J loci was about 0.1% per base in mature/activated B-cells (Liu et al., 2008; Mao et al., 2004). However, with high-throughput sequencing, AID-induced increases in C to T mutations at CpG sites may be detectable.
As mentioned above, B cell lymphomas arising in UNG−/− or MSH2−/− mice may be worth analyzing for chromosomal translocations (Nilsen et al., 2003; Reitmair et al., 1995). Using high-throughput sequencing, mutated CpG sites could be analyzed for evidence of double-strand breaks in the form of small deletions and additions, indicative of DSBs that were repaired by nonhomologous DNA end joining.
The proposed mechanism also has implications for a long-standing mystery —why the translocations cluster in the first place; that is, why they occur so frequently at CpGs within the MBR and MTC, as opposed to distributing across all the other CpGs in the surrounding tens or hundreds of kilobases. The very same problem is seen for CACs in V(D)J-type rearrangement clusters, where breakpoints occur at only a fraction of the available CACs, and not the ones predicted by experimental systems (Marculescu et al., 2002; Raghavan et al., 2001; Zhang and Swanson, 2008). As far as clustering with CpG-type hotspots, possible explanations can be grouped into those which promote breakage and recombination (such as increased methylation, selective AID or targeting of the RAG complex, slow repair rate, chromosomal positioning, and failure to sequester free ends at certain loci) and those which prevent recombination events from appearing in cancers (such as lack of a growth advantage). As the other possibilities are discussed elsewhere, only the first two will be discussed below.
Though unlikely, the most obvious explanation would be selective hypermethylation within the MBR and MTC. Methylation analysis of the MBR and MTC showed considerable variation within and between FACS-sorted pre-B cells from five healthy individuals of similar age. CpGs outside the clusters were methylated at the same or higher rate (Suppl. Fig. 8). Hence, while there are plenty of methylated CpG targets for deamination-based translocations, selective hypermethylation does not appear to explain targeting to the MBR or MTC – at least to the extent that our samples are similar to those which develop into lymphoma (Fig. 4A).
Hydrolytic deamination of methylcytosine is extremely slow in duplex DNA, but is accelerated 500 to 1000-fold when the base is unpaired or single-stranded (Frederico et al., 1990). Moreover, AID is much more active on unpaired cytosines than on base-paired cytosines (Bransteitter et al., 2003; DiNoia and Neuberger, 2007; Pham et al., 2003; Yu et al., 2004). Therefore, one would expect CpGs in single-stranded regions to be more susceptible to deamination. We have previously shown that the bcl-2 MBR reacts with the single-strand-specific chemical probe bisulfite, suggesting some degree of single-stranded character inherent to the DNA in this region (Raghavan et al., 2004a). The centromeric end of the 150 bp bcl-1 MTC also shows a high degree of bisulfite reactivity (AT & MRL, unpublished). This naturally leads to the hypothesis that AID may target cytosines within these regions due to their increased propensity for single-strandedness.
Single-stranded character may also be caused by slippage at repeat sequences, induced by a passing RNA or DNA polymerase. Such a mechanism has been suggested based on sequence data from a yeast LYS2 reporter system (Kim et al., 2007). AID is more active on small bubble structures (such as 3 bp to 7 bp) than on larger bubble structures, and more active on supercoiled DNA than nicked circular DNA (Larijani and Martin, 2007; Shen and Storb, 2004). Therefore, even slight perturbations in the DNA can significantly enhance AID action. At the bcl-2 mcr, there is an eight base pair direct repeat on each side of one CpG hotspot. A slippage event between the top and bottom strand would place the CpG within a loop, thereby making it vulnerable to hydrolytic deamination or to AID.
In summary, we have analyzed over 1700 human chromosomal rearrangement breakpoints grouped according tissue type, lineage, and developmental stage. Based on this, we have identified a new type of double-strand breakage motif, which appears to shape the breakpoint distributions within the bcl-2 MBR, bcl-2 icr, bcl-2 mcr, bcl-1 MTC, and E2A cluster regions. This CpG-type breakage is prominent in pro-B/pre-B cells and apparently absent or heavily diminished in other cell types. We propose that deaminated methyl-CpGs within cluster regions are intercepted by RAGs before repair can be completed, generating the requisite double-strand breaks in several of the most common translocations in human lymphoma.
Junctions and breakpoints were obtained from literature and GenBank searches and aligned with hg18 from the UCSC genome browser (Kent et al., 2002) using BLAT (Kent, 2002) and verified by eye. A breakpoint “at” a CpG is defined as being either 5′ to the C, between the C and G, or 3′ to the G, i.e. CG, CG, or CG. For binomial probabilities, the random probability p is calculated as the number of sites at motifs divided by the total number of sites from the most 5′ to the most 3′ breakpoint. “Distance” to a motif is calculated as the number of nucleotides between the breakpoint and the closest motif site. Distance distributions for uniformly or “randomly” distributed breakpoints were generated by traversing the region from the most 5′ to the most 3′ breakpoint, treating each strand independently, and calculating the distance to the closest motif for each site along the way. These distributions were used for a Student’s t-test with a standard Box-Cox log (x+1) transformation to correct for non-normality, and for a Mann-Whitney U-test. t and z-values were converted to p-values in Excel. Details of this analysis are discussed in the supplementary text. The database of annotated breakpoints and junctions will be available online in computer-readable format.
We thank Dr. Susan Groshen and the USC Norris Cancer Center Biostatistics Core Shared Resources; Drs. Darryl Shibata and Allen Yang for comments on the manuscripts; and members of the Lieber lab.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.