|Home | About | Journals | Submit | Contact Us | Français|
Upon transcription of some sequences by RNA polymerases in vitro or in vivo, the RNA transcript can thread back onto the template DNA strand, resulting in an R loop. Previously, we showed that initiation of R-loop formation at an R-loop initiation zone (RIZ) is favored by G clusters. Here, using a purified in vitro system with T7 RNA polymerase, we show that increased distance between the promoter and the R-loop-supporting G-rich region reduces R-loop formation. When the G-rich portion of the RNA transcript is downstream from the 5′ end of the transcript, the ability of this portion of the transcript to anneal to the template DNA strand is reduced. When we nucleolytically resect the beginning of the transcript, R-loop formation increases because the G-rich portion of the RNA is now closer to the 5′ end of the transcript. Short G-clustered regions can act as RIZs and reduce the distance-induced suppression of R-loop formation. Supercoiled DNA is known to favor transient separation of the two DNA strands, and we find that this favors R-loop formation even in non-G-rich regions. Most strikingly, a nick can serve as a strong RIZ, even in regions with no G richness. This has important implications for class switch recombination and somatic hypermutation and possibly for other biological processes in transcribed regions.
R loops are structures in which an RNA strand is base paired to the template DNA strand and the nontemplate DNA strand remains single stranded. R loops were named by analogy to D loops, where all three strands are DNA. A wide range of prokaryotic and eukaryotic RNA polymerases can permit the nascent RNA transcript to anneal upstream of the polymerase to the template DNA strand, resulting in an R loop. This indicates that R-loop formation is a consequence of transcription and the participating nucleic acid molecules rather than of the nature of the RNA polymerase. In vivo, R-loop formation has been documented in bacteria at any sequences when the DNA is hypernegatively supercoiled (1-3, 24, 35). In mammalian cells, immunoglobulin (Ig) class switch sequences encode G-rich transcripts (5, 15, 23, 33, 38, 39). Transcription of Ig class switch sequences in vitro results in R-loop formation, and our laboratory, and subsequently others, demonstrated R loops at Ig switch sequences in vivo (4, 10, 11, 25, 26, 28-30, 37). In addition to serving a role in the physiologic DNA breakage-and-rejoining process of class switch recombination, beautiful work has also demonstrated that R-loop formation can also be responsible for hot spots of mitotic recombination and genomic instability (7-9, 12, 13, 16-20).
Thus far, all of the key parameters that dictate R-loop formation in vitro appear to be relevant to R-loop formation in vivo. For Ig class switch sequences, this includes initiation of R loops at locations (called R-loop initiation zones [RIZs]) where there are clusters of Gs on the nontemplate strand and therefore in the RNA transcript (11, 29, 30). Elongation of R loops at regions called R-loop elongation zones (REZs) requires only a high G density. R loops terminate in vitro and in vivo when the G density falls to the normal or random sequence level (11, 29, 30). Removal of the G-rich zone eliminates R-loop formation in vitro and in vivo (38). Transcription is required for R-loop formation in vitro (4), and this appears to be the case in vivo as well (31, 36, 37). R loops form only cotranscriptionally on the cis-transcribed template and not by annealing of the RNA transcript in trans to the template DNA strand (6).
Given the very close correspondence in features between in vitro and in vivo R-loop formation at Ig switch sequences in mammalian B cells, we have examined several mechanistic features with prokaryotic RNA polymerases in defined biochemical systems. In one earlier study, we found that R loops form by a thread-back mechanism and that G-quartet formation is not essential (11, 29, 30). Moreover, the length and G richness of the R-loop region are important. In a second previous study, we were able to dissect the importance of the initial portion of the RNA transcript (encoded within an RIZ) from the remainder of the RNA transcript (REZ). As mentioned above, the RIZ must have clusters of Gs (G clusters) in order to form R loops (11, 29, 30). The REZ merely needs to have a high G density (not G clusters) to permit extension of the R loop.
In this current in vitro mechanistic study, we (i) examined the effect of strand length on the competition between the 5′ end of the RNA transcript and the nontemplate DNA strand on R looping, (ii) compared linear versus supercoiled DNA templates for R-loop formation, and (iii) examined the impact of a nick in the nontemplate strand on R looping. We find that all factors that favor the ability of the RNA strand to anneal to the template strand also favor R-loop formation. This suggests that a higher collision frequency (shorter distance) of the 5′ end of the RNA with the template strand improves R-loop formation. Negative supercoiling favors R-loop formation by favoring breathing of two DNA strands. Finally, a nick in the nontemplate strand reduces the collision frequency of the resulting 5′ end of the nontemplate DNA strand with the template DNA strand, thereby disfavoring DNA:DNA duplex formation and favoring R-loop formation. This means that, a nick, even in a sequence without G clusters, can serve as an efficient RIZ. This last result provides a theoretical basis for a model of short R-loop formation and has potentially profound implications for class switch recombination and even for the related process of somatic hypermutation.
pDR72 was constructed by cloning the non-G-rich (random-sequence) double-stranded DNA formed by annealing DR120 (5′-TGCAGCTTGCAATCGCTAGCTCAATATCCTAACTATCATACCTATCACATAATCAC-3′) and DR121 (5′-TGCAGTGATTATGTGATAGGTATGATAGTTAGGATATTGAGATTGCAAGC-3′) in the XhoI site of pDR51 so that the DR120 sequence is the nontemplate strand during transcription from the T7 promoter (30). To make pDR72A, pDR72C, and pDR72B, pDR72 was digested with SacI, and the 3′ overhangs were blunted with T4 DNA polymerase before ligating the short double-stranded inserts containing sequences for motif A (two four-G clusters), C (one four-G cluster), or B (no G clusters). Motif A was made by annealing oligonucleotides DR122 (5′-GGTGCTGGGGTAGG-3′) and DR123 (5′-CCTACCCCAGCACC-3′). Motif C was made by annealing DR126 (5′-GGTGCTCGACTACA-3′) and DR127 (5′-TGTAGTCGAGCACC-3′), while motif B was made by annealing DR124 (5′-TGCACTCGATCTAT-3′) and DR125 (5′-ATAGATCGAGTGCA-3′). pDR72A has DR122 sequence in its nontemplate strand, while pDR72C and pDR72B have the sequences of DR126 and DR124 in their respective nontemplate strands. pDR87 was made by site-directed mutagenesis of pDR18 (described in reference 30) using oligonucleotides DR130 (5′-GAGCTGGTTTAGTGAACCTCAGCTCCGCTAGCGCTACCGG-3′) and DR131 (5′-CCGGTAGCGCTAGCGGAGCTGAGGTTCACTAAACCAGCTC-3′) to introduce a BbvCI site (CC/TCAGC) on the nontemplate strand DNA upstream of the T7 promoter. Nt.BbvCI (New England Biolabs, MA), an altered form of BbvCI, recognizes this sequence and introduces a single-stranded cut after the second C in the recognition sequence while leaving the other strand intact. The T after the cut site is at position −39 of the transcription start site. pDR88 was made in a similar manner to introduce a BbvCI site between the T7 promoter and the start of the switch repeats so that the T after the cut site is now at +10, using oligonucleotides DR132 (5′-GACTCACTATAGGGCGAACCTCAGCTCTCGAGGGGTGCTG-3′) and DR133 (5′-CAGCACCCCTCGAGAGCTGAGGTTCGCCCTATAGTGAGTC-3′). pDR89 was made by digesting pDR87 with XhoI to release the four Sγ3 repeats and religating the backbone. pDR90 was constructed similarly by removing the XhoI fragment containing the four Sγ3 repeats from pDR88. pZZ6 was made by taking pDR72 and performing site-directed mutagenesis to introduce a BbvCI site immediately after the T7 promoter using oligonucleotides ZZ07 (5′-CTCACTATAGGGCGAACCTCAGCTCTCGAGCTTGCAATCT-3′) and ZZ08 (5′-AGATTGCAAGCTCGAGAGCTGAGGTTCGCCCTATAGTGAG-3′). The T after the cut site in pZZ6 is at position +10. pZZ7 was made similarly using ZZ09 (5′-ACTATCATACCTATCACCTCAGCACTCGAGGGGTGCTGGG-3′) and ZZ10 (5′-CCCAGCACCCCTCGAGTGCTGAGGTGATAGGTATGATAGT-3′) to introduce the nick site immediately upstream of the start of the switch repeats. In this construct, the T after the cut site is at position +70.
In vitro transcription with T7 RNA polymerase was done the same way as described in previous studies (29, 30). For nicking experiments, the substrates were incubated with or without Nt.BbvCI overnight, and then the enzyme was heat inactivated at 72°C for 20 min. Nicking of the substrates was assessed by running the samples on agarose gels and poststaining with ethidium bromide to locate the “nicked circular” band. The nicked substrates were then linearized with SalI and assessed for linearization before being used for transcription.
All other methods used (in vitro transcription in the presence of [α-32P]UTP, gel analysis of transcription-induced shifted species, sodium bisulfite treatment, and colony lift hybridization) were the same as previously described (29, 30). Oligonucleotide DR075 (5′-CAAAACTATCCAACCTGATTCCCATACTC-3′) was radiolabeled and used as a probe to look for R-loop molecules in the colony lift hybridization experiments with linearized or supercoiled pDR72A.
Though we have reported on R loops in the genomes of B cells at switch regions, many physical parameters that influence R-loop formation cannot be readily examined in vivo. We have described an in vitro system using T7 RNA polymerase and defined Ig class switch substrates (29, 30), which we use here in this in vitro study. Each mouse γ3 switch repeat averages 49 bp, and in the studies here, we position up to four repeats downstream of the T7 promoter. For most studies, the promoter and switch repeats are on a 1.2-kb linear fragment, except where specified as a corresponding supercoiled or nicked circular substrate. After transcription incubation, we purify the nucleic acid and analyze on DNA gels and using sodium bisulfite treatment and subsequent sequencing to determine the locations of any R loops. Colony lift hybridization can also be used to determine the percentage of templates (substrates) that have acquired an R-loop conformation.
In the genomic Ig heavy-chain locus, the switch regions are several hundred base pairs downstream from their respective cytokine promoters and I exons. Unlike the G-rich switch regions, the DNA sequence between the promoter and the switch regions is not G rich. For example, the mouse Sγ3 region is about 0.9 kb downstream of the Iγ3 sterile transcript promoter. This raises the question of how such a long distance between the promoter and the G-rich switch region affects R-loop formation.
To study the effect of distance, we used our switch substrate pDR51 with three Sγ3 repeats (as defined previously ) and modified it by introducing a one repeat long random DNA sequence between the T7 promoter and the first switch repeat. This plasmid is called pDR72. Therefore, while both pDR51 and pDR72 have three wild-type Sγ3 repeats, the repeats in pDR72 are located one repeat length further downstream from their position in pDR51. Upon transcription, pDR51 normally forms an R loop within the switch repeat zone, as we have described previously. For pDR72, we found that the amount of transcription-induced shift decreases markedly compared with pDR51 (Fig. (Fig.1A,1A, compare lanes 2 to 4 for pDR51 to lanes 7 to 9 for pDR72). Therefore, increasing the distance between the promoter and the G-rich portion of the transcript reduces R-loop formation.
During transcription, pDR72 generates a transcript that is not G rich in the first ~50 nucleotides (nt) but is followed by a G-rich region corresponding to the transcription of the switch regions (top panel of Fig. Fig.1B).1B). The non-G-rich part of the transcript is inefficient at forming a stable RNA:DNA hybrid, despite having greater molecular mobility (owing to its position close to the free RNA end) and therefore a greater ability to anneal with the DNA template strand to form the RNA:DNA hybrid portion of an R loop. The G-rich part of the transcript can form a hybrid with the template strand DNA, but being internal in position in the transcript, the reduced molecular motion of this region (due to an overall lower diffusion rate) may reduce the frequency of productive collisions between the G-rich part of the transcript and the corresponding C-rich region in the template DNA strand, thereby reducing R-loop formation. In addition to this factor, the more 5′ non-G-rich portion of the RNA strand, being free in solution, may reduce R-loop formation efficiency by exerting a physical drag that might reduce R-loop stability at the initiation or elongation regions even further. Therefore, it is not surprising that pDR72 shows a much-reduced R-loop formation efficiency compared to pDR51, given that pDR51 has G-rich zones closer to the promoter.
We reasoned that if the nonhybridizing portion of the transcript is refractory to R-loop formation, a selective removal of this region, while keeping the G-rich region of the transcript intact, would improve R-loop formation efficiency at the distant switch regions. To test this hypothesis, we performed in vitro transcription using pDR51 and pDR72 in the absence or presence of RNase A during the transcription. RNase A cuts at Cs and Us in the RNA and therefore is inefficient at digesting the G-rich regions in RNAs. RNase T1 would be unsuitable for such a study because RNase T1 cuts at Gs in the transcript, and the presence of RNase T1 would therefore destroy the G-rich part of the transcript. The initial ~50 nt of the RNA generated during transcription of pDR72 is non-G rich (nonhybridizing portion of the transcript), whereas the next ~150 nt (corresponding to the three switch repeats) is G rich. Although the presence of RNase A during transcription would affect overall R-loop formation at both substrates, the non-G-rich part of the transcript would be digested more than the G-rich portion, thereby causing a relative increase in R-loop formation efficiency at pDR72 in the presence of RNase A by removal of the nonhybridizing portion of the transcript (bottom panel of Fig. Fig.1B).1B). Removal of the nonhybridizing portion of the transcript would permit greater molecular motion at the downstream G-rich, hybridizing portion of the transcript, and this would favor R-loop formation. When RNase A was not present during transcription, the amount of shift observed for pDR72 (Fig. (Fig.1A,1A, lanes 7 to 9) was ~12% of the shift observed for pDR51 (Fig. (Fig.1A,1A, lanes 2 to 4) based on the radiolabel intensities (calculated by dividing the radiolabel intensities at the shifted positions for pDR72 by that for pDR51 for each set). However, in the presence of RNase A during transcription, the amount of shift at pDR72 was ~55% of the shift observed for pDR51, indicating a more than fourfold increase in R-loop efficiency (compare lanes 17 to 19 for pDR72 with lanes 12 to 14 for pDR51 in Fig. Fig.1A).1A). The overall decrease in shift values between transcription without and with RNase A for pDR51 (Fig. (Fig.1A,1A, compare lanes 2 to 4 with lanes 12 to 14, for presence of RNA after or during transcription, respectively) indicates a reduction in R-loop formation because of RNase A digestion of the G-rich portion of the transcript. Hence, we find that removal of the non-G-rich portion of the transcript improves R-loop formation.
In the genome, although the defined G-rich switch repeats are located at large distances from the promoters, G clusters often are found between the promoter and the switch repeats, sometimes as parts of degenerate repeats. We wondered if these types of G clusters that are located far upstream of the switch repeats can influence R-loop formation. We introduced short sequences containing two, one, or no GGGG clusters (defined in our previous paper as motifs A, C, or B, respectively) in pDR72 at a location that was downstream of the T7 promoter but ~50 bp upstream of the start of the three wild-type switch repeats. In effect, we made three more substrates derived from pDR72 where the RIZ motif A (two four-G clusters), C (one four-G cluster), or B (no four-G clusters, random sequence) was placed upstream of the switch repeats (REZ region) and separated by a random spacer sequence of ~50 bp. The new substrates were called pDR72A, pDR72C, and pDR72B, respectively (see the line drawing of the substrates in Fig. Fig.2A).2A). Transcription-induced shift assays on linearized substrates revealed that the G clusters could improve R-loop formation efficiencies, even if they were located at a distance from the switch repeats. While pDR72B showed the amount of shift without any G clustering upstream (Fig. (Fig.2A,2A, lanes 12 to 14), R-loop formation was seen to improve for both substrates pDR72A (two four-G clusters) and pDR72C (one four-G cluster) (Fig. (Fig.2A,2A, lanes 2 to 4 for pDR72A and lanes 7 to 9 for pDR72C). The improved shift of pDR72C showed that the presence of even one G cluster is capable of improving R-loop formation efficiencies at downstream switch regions separated by random sequences. Figure Figure2B2B (top panel) shows a model for R-loop formation and the regions of RNA:DNA hybridization at the G-cluster motif and the G-rich switch regions. The subsequent steps and the model were constructed based on observations from R-loop mapping experiments that are explained in further detail in the next section (see also Fig. Fig.2B2B legend).
To determine the location of R-loop regions in these substrates, we sequenced R-loop derivative molecules after treating the linearized and transcribed pDR72A substrate with sodium bisulfite, TA cloning, and then performing colony lift hybridization to select molecules with R-loop conformations. We found that of the 20 molecules which were R-looped (single stranded on the nontemplate strand) at the switch repeats (REZ), none showed single strandedness extending to the random sequence spacer region or to the A motif RIZ (two four-G clusters), located ~50 nt upstream of the start of the switch repeats (molecules shown in Fig. Fig.3A).3A). We infer that G clusters can improve R-loop formation at the switch repeats by acting as RIZs and keeping the transcript transiently hybridized in the vicinity of the transcribing DNA, thereby increasing the probability of R-loop formation at the switch repeats (with more G clusters corresponding to a greater ability of the RNA to hybridize, consistent with Fig. Fig.2A).2A). However, the R-loop structures do not start at these G clusters, presumably because of a lack of a suitable REZ region around these G clusters to provide stability. Because of the small size of the RIZ motifs and lack of a longer, R-loop-stabilizing G-rich REZ immediately downstream, the short nucleating RNA:DNA hybrids formed at the RIZ G clusters are expected to be less stable and are likely to dissociate more easily than the longer R-loop regions at the downstream switch repeats (Fig. (Fig.2B2B shows an explanatory model). Therefore, we do not see any R-loop-induced single strandedness on the nontemplate strand region at the G-clustered motifs upstream of the switch repeats.
Though mammalian chromosomes are linear, the structural organization of the genomic DNA is determined by various factors that affect local conformation of the DNA (14, 32). As an example, the FUSE element in the c-myc locus is thought to acquire transcription-induced dynamic negative supercoiling that can then lead to binding of structure-specific binding proteins (14, 32). The twin-domain model proposes that transient negative supercoiling trails the elongating RNA polymerase (21, 34). It is also known that negative supercoiling of DNA promotes transcription from a promoter, partly because it promotes DNA strand separation.
We wanted to test how negative supercoiling would influence R-loop formation. All of the experiments that have been described thus far have been done using linearized substrates, because linear substrates can be studied for the effect of DNA sequence features without the influence of supercoiling, which might also affect transcription. To study the effect of supercoiling, we therefore chose to study and compare the distributions of R-loop regions (extent and location of single strandedness on the nontemplate strand) instead of comparing the frequencies of R-loop formation.
We chose the negatively supercoiled plasmid version of pDR72A for these studies. Having seen the R-loop locations in the linearized version of pDR72A, we hypothesized that negative supercoiling would favor transient DNA strand separation and might cause the random sequence spacer region between the A-motif RIZ (two four-G clusters) and the switch region REZ to be in an R-loop conformation. This is in contrast to the R loops formed on linear DNA templates, which were all contained within the REZ, with none showing conversions at the random sequence spacer upstream of the REZ. We transcribed supercoiled pDR72A plasmid in vitro, did DNA bisulfite sequence analysis, and then selected R-loop molecules at the switch region (REZ) using the colony lift hybridization method. Upon sequencing, we found that out of 29 molecules that were single stranded (in an R-loop conformation) at the switch repeats (REZ), 7 molecules (~24%) showed continuity of the single strandedness, beginning from either the promoter or the RIZ motif A. Four more molecules (~17%) exhibited single strandedness of >25 nt in length at the promoter or the RIZ motif A but were not continuous with the single-stranded regions of the REZ switch repeats, thus making a total of 11 molecules (~38%) with long stretches of single strandedness upstream of the switch repeats. (Five molecules [~17%] also showed extension of the single-stranded region downstream of the end of the REZ switch repeats [the molecules are depicted in Fig. Fig.3B],3B], which is an increase over the number for the linear substrate, but not significantly so.) The upstream extensions of single strandedness indicate that the presence of negative supercoiling in DNA can reduce the dependence of R-loop formation on G richness of the DNA substrate. (The increased frequency of downstream extensions may also reflect this.) Figure Figure3C3C depicts the model of R-loop formation on a supercoiled version of pDR72A. While RNA:DNA hybrids can form more readily at the G-clustered motif (RIZ) or the G-rich switch repeats (REZ) owing to their G content, the RNA:DNA hybridization at the random sequence spacer is facilitated primarily because the inherent negative supercoiling favors DNA strand separation, resulting in a relative increase in the DNA strand separation propensity (increased DNAOFF). This results in increased invasion by the transcript into the transiently opened DNA region (Fig. (Fig.3C).3C). The experimental results described above also indicate that the distance-induced suppression of R-loop formation can be mitigated by negative supercoiling.
While DNA is more stable and thought to be more conformationally homogeneous than RNA, genomic changes that affect DNA conformation play significant roles in various biological processes and possibly also misregulations. Single-stranded breaks can be considered one such kind of DNA lesion and, if not repaired, can cause genomic perturbations leading to deletions. We wondered if a DNA nick would have any effect on R-loop formation by affecting the competition between the RNA transcript and the nontemplate DNA strand. To test the effect of DNA nicks on R-loop formation, we made two substrates with a specific sequence upstream or downstream of the promoter where a specific nick can be introduced. We took pDR18 (described in reference 24), which contains four wild-type mouse Sγ3 repeats downstream of a T7 promoter, and made two variants, pDR87 and pDR88. In pDR87, a site-directed sequence modification was done to introduce a BbvCI recognition site upstream of the T7 promoter. Nt.BbvCI, a derivative of the BbvCI enzyme, recognizes the nonpalindromic recognition sequence CCTCAGC in its double-stranded form but cuts only the strand containing this sequence after the second C (CC/TCAGC), while leaving the other DNA strand intact. We introduced this sequence so that pDR87 and pDR88 would have this sequence in the nontemplate strand. In pDR87, the position of the nick is 39 nt upstream of the transcription start site. In pDR88, a Nt.BbvCI site was introduced downstream of the promoter at nt +10 but upstream of the start of the switch repeats.
For these experiments, the supercoiled forms of pDR18, pDR87, and pDR88 were treated with Nt.BbvCI to introduce the nick. pDR18 has no recognition site for the enzyme. After confirming that pDR87 and pDR88 have been nicked (most of the DNA in the nicked form of the plasmid runs at the “nicked circular” position as opposed to the “supercoiled” position), we linearized them and performed in vitro transcription, followed by agarose gel electrophoresis (Fig. (Fig.4).4). We found that the nicked substrate exhibited greatly increased R-loop formation efficiency when the nick was present downstream of the promoter and upstream of the switch repeats (Fig. (Fig.4,4, lane 19). This was consistently observed across multiple experiments and with many different substrates (see below and unpublished data). There was only a minimal difference in R-loop formation efficiency between the unnicked substrate and a substrate with the nick upstream of the promoter. Importantly, transcription efficiency does not account for this difference. We find that transcription is lower for the substrate that has a nick immediately downstream of the promoter (see Table S1 in the supplemental material), yet R-loop formation is much more efficient.
This result indicates that a nick downstream of a promoter (promoter-downstream nick) can help in R-loop formation, presumably by causing the transient removal of the nontemplate strand after the nicked position during transcription elongation. This gives the RNA transcript an increased opportunity to compete with the nontemplate DNA strand for hybridizing with the template strand DNA. The decrease in the ability of the nontemplate strand to form a DNA duplex with the template strand is likely due to the loss in continuity of the nontemplate strand at the nick. Loss of this continuity in the nontemplate strand would reduce the ability of the substrate to form duplex DNA upstream of the RNA polymerase after the nick. This transient displacement of the nontemplate DNA after the nick would provide an increased opportunity for the transcript to anneal with the template DNA strand. The lengths of the displaced nontemplate DNA strand after the nick and the RNA are comparable when the nick is present on the nontemplate strand very close to the transcription start site, and thus both the competing strands would have comparable molecular mobilities and consequently comparable chances for effective collisions with the template strand to nucleate a DNA:DNA or an RNA:DNA duplex. The presence of the nick, therefore, increases the chances of RNA:DNA hybrid formation compared to transcription of unnicked duplex DNA. Once formed, RNA:DNA hybrids are thermodynamically more stable than the DNA duplex. In addition, because of its displacement and dissociation from the duplexed nontemplate strand upstream of the nick, the nontemplate DNA strand downstream of the nick is even less likely to displace the RNA once the RNA is hybridized with the template strand DNA. Figure Figure55 presents a model of R-loop formation at nicked substrates when the nick is present upstream (top panel) or downstream (bottom panel) of the promoter.
Since the gel mobility shift for nicked pDR88 was markedly stronger than the mobility shifts for pDR18 (no nick; Fig. Fig.4,4, lanes 2 and 5) or pDR87 (with or without the nick upstream of the promoter; Fig. Fig.4,4, lane 12) or the unnicked version of pDR88, we wondered if a nick downstream of a promoter can initiate R-loop formation even in the absence of a G-rich switch region. Therefore, we made pDR89 and pDR90, where nick sites are introduced either upstream (in pDR89) or downstream (in pDR90) of the T7 promoter, but these substrates had no switch repeats downstream of the T7 promoter. As a control substrate, we used pDR16, which has no switch sequences and no nick site. Except for the positions of the nicking sites in pDR89 and pDR90, there is no other difference between them and pDR16. We did the same experiment as described above (for pDR18, pDR87, and pDR88) with these new substrates (pDR16, pDR89, and pDR90) and observed that while there were no differences between pDR16 (no nick; Fig. Fig.6,6, lanes 2 and 5) and pDR89 (nick upstream of promoter; Fig. Fig.6,6, lane12) or between the nicked and unnicked versions of pDR89 (compare lanes 9 and 12 in Fig. Fig.6),6), the nicked version of pDR90 (nick downstream of the promoter; Fig. Fig.6,6, lane 19) showed the presence of a discernible shifted species (Fig. (Fig.6)6) that was resistant to RNase A and sensitive to RNase H treatment. This indicates that even a non-G-rich RNA can achieve some degree of hybridization (although lower than that for a G-rich transcript) with the template strand if the nontemplate strand is transiently displaced from the template strand at the nick during transcription.
We then wondered if we could test the influence of nicks at different downstream positions along the R-loop substrates. The promoter-upstream nicks, as deduced from experiments discussed above, do not have much effect on R-loop formation. In the two sets of experiments described in the previous paragraphs, the promoter-downstream nicks were positioned close to the promoter, and in pDR88, the nick was also close to the downstream four Sγ3 switch repeats. To test whether nicks are more effective closer to the promoter or closer to the switch region, we took pDR72 and made two more substrates in which we introduced a promoter-downstream nick site either close to the T7 promoter (in pZZ6) or far downstream of the promoter but immediately upstream of the three Sγ3 switch repeats (in pZZ7). If pDR72, pZZ6, and pZZ7 are aligned, the nicking sites in pZZ6 and pZZ7 are 60 nt apart and the sequence between the nick site of pZZ6 and that of pZZ7 is composed of random non-G-rich sequence.
Upon transcription, we found that linearized and nicked pZZ6 showed the most efficient R-loop formation (Fig. (Fig.7A,7A, lane5). Nicked and linearized pZZ7 also showed a shifted species (Fig. (Fig.7A,7A, lane 12) but to a lesser extent than nicked and transcribed pZZ6 (compare lane 12 for pZZ7 with lane 5 for pZZ6). Nicked forms of both the substrates were more efficient in R-loop formation than their unnicked forms. Therefore, while any promoter-downstream nontemplate strand nick is more efficient in initiating R-loop formation at a distant downstream R-loop-forming site, it is more efficient if the nick is closer to the promoter. A promoter-downstream nick can thus mitigate the distance-induced suppression of R-loop formation (seen in Fig. Fig.1),1), depending on its position with respect to the promoter (Fig. (Fig.77).
Figure Figure7B7B shows the model of R-loop formation at these nicked substrates. In pZZ6 (Fig. (Fig.7B,7B, top panel), the position of the nick is very close to the transcription start site (and therefore to the 5′ end of the transcript; the nick site is located between positions +9 and +10). Therefore, during transcription, the length of RNA and the nontemplate strand DNA (from the nick) are comparable upstream of the RNA polymerase. The molecular mobilities of the ends being comparable (as described above), the transcript and the DNA end have somewhat equal chances to hybridize with the template strand DNA. Given equal chances, RNA:DNA duplexes are stronger than DNA:DNA duplexes (27).
In contrast, in nicked pZZ7 (Fig. (Fig.7B,7B, bottom panel), the transcription start site is 69 nt upstream of the nick. Therefore, during transcription near the transcription start site, the transcript is single stranded more often because of its lack of G richness and also because the two DNA strands can reanneal upstream of the RNA polymerase more efficiently (the nick is located further downstream). By the time the polymerase reaches the nick, the transcript is already ~70 nt long, and even though the DNA strands would be relatively less efficient in reannealing after the nick (as explained in the previous sections) compared to the case without a nick, the RNA would be even more disadvantaged because of the extra bulk of its unhybridized portion. Because of its more internal position in the transcript, the molecular mobility of the G-rich portion is also decreased. In comparison, the 5′ end of the nontemplate DNA strand after the nick is shorter than the RNA and is consequently more mobile. This increases the frequency of collision between the nicked nontemplate DNA strand and the template strand DNA, resulting in more efficient DNA duplex formation in nicked pZZ7 than in nicked pZZ6.
However, even with such factors causing suppression of R-loop formation at nicked pZZ7, the nicked version of this substrate shows improved R-loop formation efficiency compared with unnicked versions (Fig. (Fig.7A,7A, compare lane 12 for nicked and transcribed pZZ7 with lanes 2 and 9 for unnicked transcribed pZZ6 and pZZ7, respectively). This demonstrates that an R loop forms more readily if there is a nick downstream of the promoter, although the efficiency decreases with increasing distance between the transcription start site and the nontemplate strand nick.
Our goal was to study effects of distance, superhelicity, and nicks on R-loop formation separately. In the course of this work, we noted the following observations relevant to combinations of the factors. First, when we nick our supercoiled substrates at sites downstream of the promoter, we still find R-loop formation to be much more efficient than when the nick is upstream. This is the same result as for the linear substrates (Fig. (Fig.44 and and6).6). Second, on supercoiled substrates with the various R-loop RIZs described above (Fig. (Fig.2A),2A), we find that a stronger RIZ still yields more R-loop formation. We note that supercoiling as a single factor is stronger than RIZ strength. Third, the relative effects of G clusters and nicks are seen in Fig. Fig.44 and and66 (lanes 15 to 20 in each). For Fig. Fig.4,4, we use G-clustered repeats after the nick, whereas for Fig. 6, a random sequence is located after the nick. The amount of R-loop formation is much greater for the nicked G-clustered substrate in Fig. Fig.44 (lane 19) than for the non-G-clustered but nicked substrate in Fig. Fig.66 (lane 19).
In efforts to understand R-loop formation at class switch regions, we have studied several mechanistic aspects using defined biochemical systems. Mechanistic features determined using such systems show good correspondence with in vivo observations. Here, we find that the competition between the transcript and the nontemplate strand dictates R-loop formation efficiency. Conformational variation of the substrate (e.g., negative supercoiling or nicks) can affect R-loop formation by modulating the competition between the transcript and the nontemplate DNA strand for hybridization with the template DNA strand.
We find that a single-strand DNA nick can strongly initiate R-loop formation if it is positioned downstream of the promoter. Downstream G-rich regions can help increase the level of R-loop formation, but R loops can form even without these G-rich regions. If a nick can serve as a strong RIZ, even in a region of sequence where the transcript does not have G clusters, then this markedly broadens the range of possible locations for R-loop formation.
Increased distance between a promoter and a G-rich zone reduces R-loop formation (Fig. (Fig.1).1). Given this, how can R-loop formation occur when the promoter for the switch regions is 200 to 1,500 bp upstream of the switch regions (REZ)? In light of the nick observations here (Fig. (Fig.44 to to7),7), one possible answer to this question can now be hypothesized. A low level of activation-induced deaminase (AID)-initiated nick formation (nicks achieved by uracil glycosylase [UNG] and AP endonuclease [APE] action on AID deamination sites) can lead to R-loop formation, which would lead to more nicks and more R-loop formation. One can think of this as rounds of nick and R-loop amplification. This quite conceivably explains the existence of R loops seen in vivo at considerable distances downstream from the promoter and upstream of the G-rich class switch repeat regions (11, 29, 30). Once such nick/R-loop cycles reach the G-rich class switch repeats, more stable, rather than transient, R loops form.
The ability of a nick to serve as a strong RIZ could be of profound importance for somatic hypermutation. In that process, there are no obvious RIZ or REZ sequence motifs (22). However, AID is thought to initiate (with UNG and APE) low levels of nicking in regions of somatic hypermutation. Such initial nicks could lead to short regions of R-loop formation (possibly those reported in reference 28), even though these regions lack G richness. The single strandedness in those short R loops would then permit AID to much more efficiently initiate many more nicks and lead to amplification of rounds of R-loop formation and AID-initiated nicking, thereby facilitating somatic hypermutation.
Encounter of the RNA polymerase with AID/UNG/APE lesions is more likely because these are not simple nicks (5′ P and 3′ OH). Rather these “complex” nicks have a 3′ OH but a 5′ phosphodeoxyribosyl group (abasic). Such lesions are not repaired rapidly by simple ligation because they require nucleolytic resection. The greater longevity of such lesions would increase the probability that RNA polymerases would encounter them.
This work was supported by NIH. We thank members of the Lieber lab for assistance.
Published ahead of print on 19 October 2009.
†Supplemental material for this article may be found at http://mcb.asm.org/.