|Home | About | Journals | Submit | Contact Us | Français|
Conceived and designed the experiments: SJ YJC RP. Performed the experiments: SJ YJC. Analyzed the data: SJ YJC. Contributed reagents/materials/analysis tools: SJ YJC. Wrote the paper: SJ YJC RP.
Large-scale DNA deformation is ubiquitous in transcriptional regulation in prokaryotes and eukaryotes alike. Though much is known about how transcription factors and constellations of binding sites dictate where and how gene regulation will occur, less is known about the role played by the intervening DNA. In this work we explore the effect of sequence flexibility on transcription factor-mediated DNA looping, by drawing on sequences identified in nucleosome formation and ligase-mediated cyclization assays as being especially favorable for or resistant to large deformations. We examine a poly(dA:dT)-rich, nucleosome-repelling sequence that is often thought to belong to a class of highly inflexible DNAs; two strong nucleosome positioning sequences that share a set of particular sequence features common to nucleosome-preferring DNAs; and a CG-rich sequence representative of high G+C-content genomic regions that correlate with high nucleosome occupancy in vivo. To measure the flexibility of these sequences in the context of DNA looping, we combine the in vitro single-molecule tethered particle motion assay, a canonical looping protein, and a statistical mechanical model that allows us to quantitatively relate the looping probability to the looping free energy. We show that, in contrast to the case of nucleosome occupancy, G+C content does not positively correlate with looping probability, and that despite sharing sequence features that are thought to determine nucleosome affinity, the two strong nucleosome positioning sequences behave markedly dissimilarly in the context of looping. Most surprisingly, the poly(dA:dT)-rich DNA that is often characterized as highly inflexible in fact exhibits one of the highest propensities for looping that we have measured. These results argue for a need to revisit our understanding of the mechanical properties of DNA in a way that will provide a basis for understanding DNA deformation over the entire range of biologically relevant scenarios that are impacted by DNA deformability.
Although it has been known since the work of Jacob and Monod that genomes encode special regulatory sequences in the form of binding sites for proteins that modulate transcription, only recently has it become clear that genomes encode other regulatory features in their sequences as well. Further, with the advent of modern sequencing methods, it is of great interest to have a base-pair resolution understanding of the significance of the entirety of genomes, not just specific coding regions and putative regulatory sites.
One well-known example of other information present in genomes is the different sequence preferences that confer nucleosome positioning –, with similar ideas at least partially relevant in the context of architectural proteins in bacteria also . It has been shown both from analyses of sequences isolated from natural sources and from in vitro nucleosome affinity studies with synthetic sequences that the DNA sequence can cause the relative affinity of nucleosomes for DNA to vary over several orders of magnitude, most likely due to the intrinsic flexibility, especially bendability, of the particular DNA sequence in question , –. The claim that intrinsic DNA sequence flexibility determines nucleosome affinity has led not only to many theoretical and experimental studies on the relationship between sequence and flexibility –, but also to the elucidation of numerous sequence “rules” that can be used to predict the likelihood that a nucleosome will prefer certain sequences over others (summarized recently in , ). For example, AA/TT/AT/TA steps in phase with the helical repeat of the DNA, with GG/CC/CG/GC steps five base pairs out of phase with the AA/TT/AT/TA steps, are a common motif in both naturally occurring and synthetic nucleosome-preferring sequences , . Similarly, the G+C content of a sequence and occurrence of poly(dA:dT) tracts have been very powerful parameters in predicting nucleosome occupancy in vivo , –. Our aim here is to explore the extent to which these sequences, when taken beyond the context of cyclization and nucleosome formation to another critical DNA deformation motif, exhibit similar effects on a distinct kind of deformation.
There has been an especially long history of the study of these intriguing sequence motifs known as poly(dA:dT) tracts, in the context of nucleosome occupancy as well as many other biological contexts. Such sequences, composed of 4 or more A bases in a row ( with ) or two or more A bases followed by an equal number of T bases ( with ), strongly disfavor nucleosome formation, both in vivo – and in vitro –, and are in fact thought to be one of the primary determinants of nucleosome positions in vivo , , with their presence upstream of promoters and in the downstream genes correlating with increased gene expression levels , , . Poly(dA:dT) tracts show unique structural and dynamic properties in a variety of in vitro and in vivo assays (summarized recently in , ), with one of their hallmark characteristics being a marked intrinsic curvature . There is evidence that poly(dA:dT) tracts may also be less flexible than other sequences –, which is often given as the reason for their low affinity for nucleosomes, though there is some evidence that poly(dA:dT) tracts might actually be more flexible than other sequences . It is clear, however, that some special property or properties of A-tracts leads them to be especially resistant to the deformations that are required for DNA wrapped in a nucleosome , , and, indeed, to their important functions in several other biological contexts as well .
In this work, we make use of sequences that, in the context of nucleosome formation and cyclization assays, appear to be associated with distinct flexibilities as a starting point for examining the question of what sequence rules control deformations induced by a DNA-loop-forming transcription factor, as opposed to those induced in nucleosomes. We have previously argued using two synthetic sequences that DNA looping does not necessarily follow the same sequence-dependent trends as do nucleosome formation and cyclization . Here we expand our repertoire of sequences to specifically test the generalizability of three sequence features known to be important in nucleosome biology and cyclization. We focus in particular on the intriguing class of nucleosome-repelling, poly(dA:dT)-rich DNAs that are thought to be especially resistant to deformation, making use of a naturally occurring poly(dA:dT)-rich sequence that forms a nucleosome-free region at a yeast promoter . We note that the poly(dA:dT)-rich DNA we use here differs from the phased A-tracts that have been extensively characterized in the context of DNA looping, both in vivo and in vitro –. Phased A-tracts contain short poly(dA:dT) tracts spaced by non-A-tract DNAs such that the poly(dA:dT) tracts are in phase with the helical period of the DNA, generating globally curved structures that are known to significantly enhance DNA looping –. The poly(dA:dT)-rich sequence we examine here contains unphased A-tracts that we do not anticipate to have a sustained, global curvature.
We compare the effects on looping of this poly(dA:dT)-rich DNA not only to the effects of two synthetic sequences we have previously studied, but also to those of two additional naturally occurring, genomic sequences: the well-known, strong nucleosome positioning sequence 5S from a sea urchin ribosomal subunit , which, along with the 601TA sequence we previously studied, contains the repeating AA/TT/TA/AT and offset GG/CC/CG/GC steps that are common in nucleosome-preferring sequences; and one of the GC-rich sequences that are abundant in the exons and regulatory regions (e.g. promoters) of human genes, and that correlate with high nucleosome occupancy in vivo , , . The 5S sequence has been examined using both in vitro cyclization and in vitro nucleosome formation assays and, along with the two synthetic sequences E8 and 601TA , , can be used as a standard for comparison between our and other in vitro assays. The five sequences used in this work and their effects on nucleosomes are summarized in Table 1.
To measure the effect of these sequences on looping rather than nucleosome formation, we made use of a combination of an in vitro single-molecule assay for DNA looping, called tethered particle motion (TPM) –, with the canonical E. coli Lac repressor to induce looping, and a statistical mechanical model for looping that allows us to extract a quantitative measure of DNA flexibility, called the looping J-factor, for the DNA in the loop , . We have recently demonstrated  that this combined method offers a powerful and complementary approach to established assays that have been used to probe the mechanical properties of DNA, particularly at short length scales, to great effect, such as ligase-mediated DNA cyclization , , – and measured DNA end-to-end distance by fluorescence resonance energy transfer , . In particular, using the Lac repressor as a tool to probe the role of DNA deformability in loop formation allows us to examine the effect of sequence on the formation of shapes other than the roughly circular ones formed by cyclization and nucleosome formation, which we have argued may be an important caveat to discovering general flexibility rules from nucleosome formation and cyclization studies alone .
Interestingly, we find that the poly(dA:dT)-rich sequence that strongly excludes nucleosomes in vivo  and that belongs to a class of sequences usually thought of as highly resistant to deformation is in fact the strongest looping sequence we have studied so far. Moreover, the 5S and TA sequences, which share sequence features important to nucleosome formation (see Figures S1 and S2 in File S1 and Ref. ) as well as trends in apparent flexibility in in vitro cyclization and nucleosome formation assays , , , behave very differently from each other in the context of looping. We also find that G+C content, a good predictor of nucleosome occupancy, is not likewise positively correlated with looping, and in fact our data suggest the G+C content and looping may be anticorrelated. Taken together, these results strongly suggest that very different sequence rules determine DNA looping versus cyclization and nucleosome formation, possibly because of the protein-mediated boundary conditions that differ between looping geometries and nucleosomal geometries, and that the biophysical characteristics of poly(dA:dT)-rich DNAs and their biological functions may be more diverse and context-dependent than has been previously appreciated.
Our experimental approach to examining the effect of DNA sequence on looping combines an in vitro single-molecule assay for DNA looping, called tethered particle motion (TPM) –, with a statistical mechanical model that allows us to extract biological parameters from the single-molecule data , . As shown schematically in Fig. 1(A), in TPM, a microscopic bead is tethered to a microscope coverslip by a linear piece of DNA, with the motion of the bead serving as a reporter of the state of the DNA tether: the formation of a protein-mediated DNA loop in the tether reduces the motion of the bead in a detectable fashion –. We use the canonical Lac repressor from E. coli to induce DNA loops. Because more readily deformable sequences allow loops to form more easily, we can quantify sequence-dependent DNA flexibility by quantifying the looping probability, which we calculate as the time spent in the looped state divided by the total observation time (see Methods for details).
More precisely, our statistical mechanical model (described in the Methods section) allows us to extract a parameter called the looping J-factor from looping probabilities . The J-factor is the effective concentration of one end of the loop in the vicinity of the other, analogous to the J-factor measured in ligase-mediated DNA cyclization assays , , and is mathematically related to the energy required to deform the DNA into a loop, , according to the relationship:
where ( being Boltzmann's constant and the temperature). A higher J-factor therefore corresponds to a lower free energy of loop formation. In the case of cyclization, where the boundary conditions of the ligated circular DNA are well understood, the J-factor can be expressed in terms of parameters describing the twisting and bending flexibility of the DNA, and its helical period , , , . However, in the case of DNA looping by the Lac repressor, where the boundary conditions are not well known (summarized in Fig. 4 of ), an expression for the looping J-factor in terms of the twist and bend flexibility parameters of the loop DNA has not been described. Nevertheless, by measuring the J-factors for different sequences, we can comparatively assess the effect of sequence on the energy required to deform the DNA into a loop, and thereby gain insight into the sequence rules that control this deformation.
Given that 5S and TA share both sequence features and similar trends in apparent flexibility in the contexts of nucleosome formation and cyclization , , , we expected these two sequences to behave similarly to each other in the context of looping. On the other hand, since poly(dA:dT)-rich sequences are supposed to assume such unique structures as to strongly disfavor nucleosome formation , , while high GC content is one of the strongest predictors of high nucleosome occupancy , , we expected these two sequences to behave very differently from each other in the context of looping. Given the common assumption that poly(dA:dT)-rich DNAs are highly resistant to deformation, we especially did not expect to observe much, if any, loop formation with the poly(dA:dT)-rich, nucleosome-repelling sequence.
As shown in Fig. 1, none of these expectations were borne out. TA and 5S do not behave similarly, nor do CG and poly(dA:dT) behave especially dissimilarly, nor does poly(dA:dT) resist loop formation. Moreover, the behavior of these special nucleosome-preferring or nucleosome-repelling sequences is dependent on the larger DNA context, in that the addition of the 36-bp bacterial lacUV5 promoter sequence to these roughly 100-bp loops changes the relative looping probabilities of the five sequences (see Methods for the rationale behind the inclusion of this promoter). Without this promoter sequence (Fig. 1(C)), the two synthetic sequences, E8 and TA, exhibit comparable amounts of looping, while the three natural sequences, including both 5S and poly(dA:dT), all loop more than either E8 or TA. With the promoter (Fig. 1(D)), however, TA loops more than E8, but 5S less than either E8 or TA. Both with and without the promoter the supposedly very different GC-rich and poly(dA:dT)-rich DNAs loop more than the random E8 sequence. The looping probabilities of the poly(dA:dT) sequence are especially surprising—instead of looping very little, as we expected, this sequence loops more than any other sequence without the promoter and a comparable amount to TA with the promoter.
These five sequences differ not only in looping probability, but also in the loop length at which that looping is maximal: the poly(dA:dT) sequence is maximized at 104 bp, the 5S and CG sequences at 105 bp, and the E8 and TA sequences at 106 bp. These different maxima could be explained by different helical periods for these five DNAs, though without more periods of data we cannot definitively quantify their helical periods. In the case of the poly(dA:dT) sequence, an altered helical period would not be unexpected, as pure poly(dA:dT) copolymers are known to have shorter helical periods (10.1 bp/turn) than random DNAs (10.6 bp/turn) , . On the other hand, 5S exhibits the same helical period as E8 and TA in cyclization assays , so it is intriguing that its looping maximum occurs at a different length than that of E8 and TA, perhaps suggesting a different helical period in the context of looping than that of E8 and TA. The promoter does not appear to alter the maximum of looping for a given sequence. As noted above, it is difficult to use these looping data to comment further on other DNA elasticity parameters, in particular any sequence-dependent differences in torsional stiffness, but in Fig. S3 in File S1 we provide evidence that these sequences may share the same twisting flexibility, even if they differ in helical period.
The effect of the promoter on loop formation can be more clearly seen when looping J-factors are compared across sequences, instead of the looping probabilities. Because the no-promoter and with-promoter loops are flanked by different combinations of operators (Fig. 1(B); see also Methods), their looping probabilities cannot be directly compared. However, as described above and in the Methods section, we can use the statistical mechanical model that we have described for this system to extract J-factors from each looping probability . These J-factors are shown in Fig. 2. Loop sequence can modulate the looping J-factor by at least an order of magnitude (compare the poly(dA:dT) J-factors to those of 5S with promoter or E8 and TA, no-promoter). The lacUV5 promoter has the largest effect on the TA and 5S sequences (though of opposite sign), but appears to have little effect on poly(dA:dT)-containing and E8-containing loops, and moderate effect on CG-containing loops. It is intriguing how large and diverse an effect the 36-bp lacUV5 promoter has on the roughly 100 bp loops we examine here; but one possible explanation for its minimal effect on the poly(dA:dT)-rich sequence, at least, compared to the others, is that the properties of A-tract structures tend to dominate over the properties of surrounding sequences . We note that our results in  comparing the effect of sequence versus flanking operators on measured J-factors preclude the possibility that the differences between the no-promoter and with-promoter constructs are due to the difference in flanking operators. We also note that it is possible that the effect of the promoter stems not from the promoter sequence itself, but from the fact that the sequences of interest that form the rest of the loop are shorter when 36 bp of the loop are replaced by the promoter sequence. However, we consider this explanation to be less likely, because as shown in the left-hand panels of Fig. 1(C) and (D) above, we have measured the looping probabilities (and J-factors; see ) of more than two periods of E8- and TA-containing DNAs, allowing a direct comparison of loops that contain the same amount of E8 and TA both with and without the promoter (compare, for example, no-promoter loop lengths of 90 bp to with-promoter lengths of 120 bp). In this case we still find that without the promoter the J-factors of the E8- and TA-containing loops are indistinguishable, but with the promoter the TA sequence loops more than the E8 sequence, indicating that it is the promoter and not a shortening of some unique element(s) of the E8 or TA sequences that cause the difference in J-factors with versus without the promoter for these two sequences.
TPM trajectories not only provide information about the free energy of loop formation, captured by the J-factors discussed in the previous section, but also contain some information about the preferred loop conformation as a function of sequence, through the observed length of the TPM tether when a loop has formed. In fact, previous work from our group and others has shown that the Lac repressor can support at least two observable loop conformations for any pair of operators, with any sequence, because these conformations lead to distinct tether lengths in TPM , –, , , , –. Although the underlying molecular details of these two looped states, which we label the “middle” (“M”) and “bottom” (“B”) states according to their tether lengths relative to the unlooped state, are as yet unknown, they must differ in repressor and/or DNA conformation in a way that alters the boundary conditions of the loop, since they are distinguishable in TPM. It has been proposed that the two states arise from the four distinct DNA binding topologies allowed by a V-shaped Lac repressor similar to that shown in the Lac repressor crystal structure , , and/or two repressor conformations, the V-shape seen in the crystal structure and a more extended “E” shape , , , , –. It is likely, in fact, that the two observed looped states are each composed of more than one microstate (that is, some combination of V-shaped and E-shaped repressor conformation(s) and associated binding topologies ). Even without knowing the details of the underlying molecular conformation(s) of these two states, however, we can use them to provide a window into the effect of sequence on preferred loop conformation.
In particular, by examining the relative probability of the two looped states as a function of both loop length and loop sequence, we can assess the contributions of sequence to the energy required to form the associated loop conformation(s). As shown in Figure 3, which of the two looped states predominates depends in a complicated way upon the loop sequence, the presence versus absence of the lacUV5 promoter, and the loop length. In , we showed that having E8 or TA in the loop region, over two to three helical periods, leads to alternating preferences for the middle versus the bottom looped state, with the middle state predominating when the operators are in-phase and looping is maximal, but the bottom state predominating when the operators are out-of-phase. The inclusion of the promoter in the loop increases the preference for middle state for out-of-phase operators. These trends are captured in the top left panel of Fig. 3.
These trends do not hold for the three genomically sourced loop sequences (CG, dA, and 5S). For the poly(dA:dT)-rich sequence, as with E8 and TA, the promoter increases the preference for the middle looped state for out-of-phase operators; for 5S, however, the presence of the promoter decreases the preference for the middle state. The preferred state of the CG sequence is mostly insensitive to the presence versus absence of the promoter. Both with and without the promoter, though, the middle state is generally preferred () at more loop lengths for the genomically sourced DNAs than for the synthetic sequences, insofar as we are able to determine from the lengths shown in Fig. 3. These results demonstrate a complicated dependence of preferred loop state on sequence that does not always follow overall trends in looping free energy: for example, 5S and TA are the two sequences that show the largest change in J-factor with the inclusion of the promoter, but E8 and TA are the sequences that show the largest change in preferred looped state with the promoter. However, the trend seen in the preceding section with CG and poly(dA:dT) having more in common than 5S and TA holds true for preferred loop conformation as well.
A different measure of loop conformation can be derived from the TPM tether lengths themselves—that is, from the measured root-mean-squared motion of the bead, , as in the example trajectory shown in Fig. 4(A), which exhibits three clear states, the two looped states and the unlooped state. Because of variability in initial tether length, even in the absence of Lac repressor, we calculate a relative measure of tether length for the unlooped and looped states, where the motion of each bead is normalized to its motion in the absence of repressor. We might expect, then, that in the presence of repressor, the unlooped state would fall at a relative of zero, and the looped states at negative values. However, as can be seen in the sample trace in Fig. 4(A) and in the lefthand panels of Fig. 4(B), the unlooped state in the presence of repressor is actually shorter than the tether in the absence of repressor (i.e., the horizontal black dashed line in Fig. 4(A) lies above the mean of the unlooped state in the blue data). In  we present evidence for this shortening of the unlooped state in the presence of repressor being due to the bending of the operators induced by the Lac repressor protein that is observed in the crystal structure of the Lac repressor complexed with DNA . (We note that this is a Lac repressor-specific result; compare, for example, the recent results from Manzo and coworkers with the lambda repressor , where a similar shortening of the unlooped state is shown to be due to nonspecific binding. For example, the Lac repressor does not exhibit the dependence of the looped tether length on repressor concentration that is seen with the lambda repressor , ).
As shown in Fig. 4(B), the length of the TPM tether in both the unlooped and looped states is similar but not identical for the five sequences and eight lengths that we examine here. The most obvious modulation of tether length correlates with loop length, with the shortest unlooped- and looped-state tether lengths occurring near the maxima of the looping probability. We believe this modulation with length is due to the phasing of the bends of the DNA tether as it exits the repressor-bound operators in the looped state, or the phasing of the bent operators in the unlooped state. At the repressor concentration we use here, the unlooped state should be primarily composed of the doubly-bound state , meaning that the two operators are both bent by bound repressor. As shown schematically in Fig. 4(C), when these bends are in-phase, the tether length should be shortest (and also the looping probability is highest, because the operators are in-phase). A similar argument can be made for the modulation of the looped state, regarding the relative phases of the tangents of the DNA exiting the loop.
It is interesting to consider how the sequence of the loop might influence the length of the tether in the unlooped state, when no loop has formed; see, for example, the CG with-promoter versus 5S with-promoter sequences, where the latter is consistently longer than the former (Fig. 4(B)). We do not see a sequence dependence to tether length in the absence of repressor, ruling out the possibility of a detectable intrinsic curvature to the CG sequence. We speculate instead that CG alters the trajectory of the DNA as it exits the bend in the operators in the unlooped state, compared to the trajectory when the sequence next to the operators is 5S, leading to a consistent difference in unlooped tether lengths.
Interestingly, in contrast to its influence on preferred looped state (middle versus bottom), the promoter does not alter the length of the tether for a given sequence at a given loop length (see also the bottom left panel of Fig. S5 and Fig. S6 in File S1). On the other hand, as shown in Fig. 4(D), the poly(dA:dT)-rich sequence, noticeably more so than the other sequences, stands out as a sequence that does strongly affect the tether length of the loop, in that it mandates a very narrow range of tether lengths as a function of looping J-factor (related, for a particular sequence, to the loop length or equivalently the operator spacing). A similar but less pronounced trend can be observed for the unlooped state with the GC-rich sequence (Fig. 4(D)). The other sequences allow much more variability in tether length as a function of J-factor/operator spacing (see Figure S6 in File S1). This strong trend in tether length as a function of J-factor could be evidence of the formation of special, defined loop structures with the GC-rich and poly(dA:dT)-rich sequences that constrain the allowed loop conformations as a function of operator spacing more than the other sequences do.
Further computational and modeling efforts will be required to relate these data on tether lengths and preferred loop length to loop structure, similarly to how Towles and coworkers have used TPM tether lengths to show that different DNA loop topologies can explain the observed tethered lengths of the two looped states . However, even without currently knowing the underlying molecular details causing these sequence-specific trends in tether length and preferred loop state, and therefore in loop conformation, it is clear that it is the loop sequence, and not the Lac repressor itself, that determines the loop conformation to a large degree. It has been shown recently that the Lac repressor is capable of accommodating many different loop conformations , which is consistent with the results we present here. We hope that computational and modeling efforts with these data, as well as continued efforts to use assays such as FRET to directly probe loop conformation –, will shed light on this complex interplay between sequence and loop conformation.
In  we showed that the synthetic E8 and TA sequences show no sequence dependence to looping in the absence of the lacUV5 promoter but a nucleosome-like sequence dependence in the presence of the promoter. We hypothesized that perhaps the promoter alters the preferred state of the loop to one whose shape is more similar to that of DNA in a nucleosome or DNA minicircle formed by cyclization, leading to similar sequence trends with the promoter as with nucleosomes. We still attribute the difference in the patterns of sequence dependence that we observe between looping and nucleosome formation to the role of the shape of the deformation in determining the observed deformability of a particular sequence. However, we have shown here with a broader range of sequences that the role of the promoter in controlling loopability is more complicated than we had previously hypothesized. Neither with nor without the promoter does loop formation follow the sequence trends of nucleosome formation. As shown in Figure 5, if looping J-factors did follow the same patterns of sequence preference as do cyclization J-factors and nucleosome formation free energies, a plot of the looping J-factors versus cyclization J-factors for the various sequences we have studied here would fall on a line with a positive slope. We find that this is not the case; in fact, without the promoter there is perhaps a slight anticorrelation between looping J-factors and cyclization J-factors (and no discernible correlation with the promoter).
The strong correlation between a sequence's ease of cyclization and of nucleosome formation, as shown in Fig. 5(A), has been used to argue that nucleosome sequence preferences depend largely on the intrinsic mechanical properties of a DNA, particularly its bendability , though other mechanisms have also been proposed, such as that described by Rohs and coworkers, which depends not on sequence-dependent DNA flexibility but on sequence-dependent minor groove shape . We have shown here that three sequence features that commonly determine nucleosome preferences, either through their effect on DNA flexibility or on other structural aspects recognized by the nucleosome, do not likewise determine looping, arguing for the need to identify a different set of sequence features that determine loopability. The most striking contrast between previously established sequence “rules” derived from nucleosome studies and the trends in looping J-factors that we observe here is that of the nucleosome-repelling, poly(dA:dT) sequence, which has the lowest looping free energy that we have quantified so far. Other in vitro assays predominantly show poly(dA:dT) copolymers to be highly resistant to deformations; for example, Vafabakhsh and coworkers recently used a FRET-based cyclization assay, analogous to traditional ligase-mediated cyclization assays, to show that poly(dA:dT)-rich sequences have cyclization rates significantly smaller than other sequences such as E8 and TA . Although ease of cyclization is often equated with bendability, it appears that such observed bendability is more context-dependent than has been previously appreciated: that is, the simplest model that one would write down to describe the energetics of these different deformed DNAs would feature the persistence length as the governing parameter that is used to characterize bendability, and yet, the distinct responses seen in looping, nucleosomes and cyclization belie that simplest model. It will be informative to extend this study of an unphased poly(dA:dT) tract in DNA loops to include more sequences containing both pure poly(dA:dT) copolymers and naturally-occuring poly(dA:dT)-rich DNAs that exclude nucleosomes in vivo, in order to elucidate the precise role of poly(dA:dT)-tracts in determining looping. It is clear, however, that poly(dA:dT)-rich DNAs should not be exclusively thought of as stiff or resistant to bending in all biological contexts.
A second striking contrast between our results here and previously established rules for nucleosome formation concerns the role of G+C content in determining loop formation. The G+C content of a DNA is one of the most powerful parameters for predicting nucleosome occupancy in vivo , , with higher G+C content correlating with higher occupancy. However, as shown in Fig. 6, G+C content offers little predictive power for loopability, or is anticorrelated with looping. We note that a recent, systematic DNA cyclization study demonstrated a quadratic dependence of DNA bending stiffness on G+C content . In our case of protein-mediated DNA looping, the looping J-factor contains contributions from protein elasticity in addition to those from DNA elasticity, and our DNA sequences contain A-tracts and GGGCCC motifs that were excluded in , making a direct comparison between our results and theirs difficult; but it is possible that the looping J-factor is neither correlated or anticorrelated with G+C content but instead depends quadratically on G+C content, as do cyclization J-factors. More data will be necessary to make a strong statistical statement about the anticorrelation or lack of correlation between the looping J-factor and G+C content, and to determine the form of the relationship between the looping J-factor and G+C content (e.g. quadratic versus linear), but we propose low G+C content as the starting point of a potential new sequence “rule” for predicting looping J-factors, and a fertile realm of further investigation. Finally, we have shown that the repeating AA/TT/TA/AT and GG/CC/GC/CG steps that characterize the 5S and TA sequences, as well as many nucleosome-preferring sequences, do not likewise determine looping J-factors, as these two sequences behave very differently from each other in the context of transcription factor-mediated DNA looping.
Here we have extended our previous work on the sequence dependence of loop formation by the Lac repressor to include three naturally occurring, genomic sequences that have either nucleosome-repelling or nucleosome-attracting functions in vivo, in addition to the two synthetic sequences we described previously . We find that two sequences that share sequence features important to nucleosome formation and that share trends in observed flexibility in cyclization and nucleosome formation assays, the 601TA and 5S sequences, behave less similarly in the context of DNA looping than the two sequences that should have least in common, the GC-rich, nucleosome attracting sequence and the poly(dA:dT)-rich, nucleosome repelling sequence. 5S and TA share neither trends in looping free energy relative to the random E8 sequence, nor loop length where looping is maximal, nor preferred loop conformation, nor their response to the larger sequence context (as evidenced by the fact that the inclusion of the lacUV5 promoter sequence in the loop increases the looping J-factor for TA but decreases it for 5S).
We have also shown that a poly(dA:dT)-rich DNA that forms a nucleosome-free region in yeast  is actually extremely deformable in the context of looping by a transcription factor. The rest of the sequences show a range of J-factors that does not correlate with any observed trends in flexibility as measured by ligase-mediated cyclization assays, nor with the observation that high G+C content correlates with nucleosome occupancy . The diversity of the effects on DNA looping that we observe with these five sequences (ten, if the inclusion of the promoter is considered to create a “new” sequence) underscores the necessity of a large-scale screen for sequences that control loop formation both in vivo and in vitro, much as has been done in the context of nucleosome formation to help establish the sequence-dependence rules of that field (for example, see , ).
Our work in no way undermines previous claims of the sequence dependence to nucleosome formation and/or occupancy either in vivo or in vitro; rather, it demonstrates that the “rules” of sequence flexibility derived from cyclization and nucleosome formation studies are inapplicable to DNA looping, possibly due to the difference in the boundary conditions and therefore DNA conformations involved in forming a protein-mediated loop versus a DNA minicircle or a nucleosome. It will be interesting to extend these studies of the role of sequence in loop formation to other DNA looping proteins besides the Lac repressor. As noted above, it has been shown recently that the Lac repressor can accommodate many different loop conformations . The variety in tether lengths and preferred looped states that we observe are consistent with a forgiving Lac repressor protein. Nucleosomes, on the other hand, have a more fixed structure that should not be as accommodating to a range of helical periods and DNA polymer conformations (hence the hypothesis that poly(dA:dT)-rich DNAs disfavor nucleosome formation because they adopt geometry that is incompatible with the structure of the DNA in a nucleosome ). It would be informative to measure the looping J-factors of these same sequences with a more rigid looping protein. It will also be interesting to see if other bacterial promoter sequences have similar effect of altering the looping boundary condition as the very strong and synthetic lacUV5 promoter. In fact, the lacUV5 promoter should be a key starting point for identifying sequences that have a strong effect on looping, since it can have significant effects on the behavior of a loop, even when it comprises only one-third of the loop length.
The poly(dA:dT)-rich sequence (from Fig. 4 of Ref. ), GC-rich sequence (from “Human 2” at http://genie.weizmann.ac.il/pubs/field08/field08_data.html), and 5S sequences (from Fig. 1 of ) were cloned into the pZS25 plasmid used in , with these eukaryotic sequences replacing the E8 or TA sequences in that plasmid. In cases where the loop lengths used in this study were shorter than the 147 bp that are wrapped in nucleosomes, the corresponding looping sequences used in TPM were taken from the middle of these sequences (relative to the nucleosomal dyad); in cases where the nucleosomal sequences were shorter than the desired loop length, they were padded at one end with the random E8 sequence , , . See Figures S1 and S2 in File S1 for details. As in , “no-promoter” loops were flanked by the synthetic, strongest known operator (repressor binding site) and the strongest naturally occurring operator ; “with-promoter” loops were flanked by and a weaker naturally occurring operator, , because these with-promoter constructs are also used in in vivo studies of the effect of loop architecture on YFP expression, in which case is a more convenient choice of operator than . Similarly, the motivation to include the lacUV5 promoter in the loop stems from parallel in vivo studies, in which the promoter is a natural part of the looping architecture. The promoter is included in the loop between the sequence of interest and the operator. Figures S1 and S2 in File S1 gives the exact sequences used in this work; Fig. 1(B) shows the TPM constructs schematically.
Cloning of the sequences of interest into the pZS25 plasmid was accomplished in either one or two steps. For the 5S sequences, oligomers were first ordered from Integrated DNA Technologies as single-stranded forward and reverse complements, consisting of 69 bp (for the “with-promoter” constructs) or 105 bp (for the “no-promoter” constructs) of the 5S sequence, plus the and / operators, and, where applicable, the lacUV5 promoter sequence. These oligomers were annealed and then ligated into the pZS25 plasmid at the AatII and EcoRI restriction sites that fall just outside the operators that flank the E8 or TA sequences in the original pZS25 plasmids . Second, Quik-Change mutagenesis (Agilent Technologies) was performed to generate additional lengths (that is, to introduce insertions or deletions) of the 5S sequence from the initial 105 bp loop lengths. However, we found that this site-directed mutagenesis step generated distributions of products for the poly(dA:dT) constructs, possibly due to replication slipped mispairing over repetitive sequences . Therefore all lengths of the poly(dA:dT) sequence, as well as of the GC-rich sequence, which also have the potential to contain such “slippery” regions, were created by ligation of synthesized oligomers into the pZS25 plasmid. All constructs were confirmed by sequencing (Laragen Inc.) to have clean sequence reads, and the approximately 450 bp digoxigenin- and biotin-labeled TPM constructs were created by PCR as described for the E8- and TA-containing constructs in , . Sequences of TPM constructs were again confirmed by sequencing before use.
Tethered particle motion assays were performed as described in . Briefly, linear DNAs, labeled on one end with digoxigenin and on the other end with biotin, were introduced into chambers created between a microscope slide and coverslip, with the coverslip coated nonspecifically with anti-digoxigenin. Streptavidin-coated beads (Bangs Laboratories, Inc) were then introduced into the chamber to complete the formation of tethered particles. The motion of the beads was tracked using custom Matlab code that calculated each bead's root-mean-squared (RMS) motion in the plane of the coverslip, and looping probabilities were extracted from these RMS-versus-time trajectories as the time spent in the looped state (reduced RMS), divided by total observation time. Similarly, the probabilities of the “bottom” versus “middle” states (see Results section) were defined as the time spent in a particular state, divided by the total observation time.
By measuring the looping probability of a construct at a particular repressor concentration, and using the repressor-operator dissociation constants for , and in , we can calculate the J-factor for that construct. All measurements in this work were carried out at 100 pM repressor, using repressor purified in-house. The relationship between the looping probabilities measured in TPM (), the repressor-operator dissociation constants for the two operators that flank the loop (, and ), and the looping J-factor of the DNA in the loop () can be described as
where is the concentration of Lac repressor, and and are repressor-operator dissociation constants of the two operators flanking the loop ( and or ). A similar expression can be derived for the J-factors of the individual “bottom” and “middle” looped states and is given in .
The J-factors plotted in Figure 5 are the maximum looping or cyclization J-factors over a particular period. Specifically, the looping J-factors used are those at 104 bp for dA, 105 for 5S and CG, and 106 for E8 and TA; the cyclization J-factors are for 94 bp of the E8, 5S or TA sequences and are taken from . Although we are not directly comparing identical lengths between cyclization and looping, the general trends hold regardless of lengths chosen. In fact, identifying the loop length that corresponds to a particular cyclization length is difficult, given that the flanking operators for looping must be taken into account in some fashion. That is, for cyclization, DNA length is easy to compute—it is simply the length of the oligomer used in the ligation reactions. However, in the case of looping, it is unclear if the appropriate length for comparison is just the DNA in the loop (excluding the operators), or the length between the midpoints of the operators, or including all of the operators. Similarly, we are not comparing identical loop lengths across sequences; we chose to compare loop flexibility at the looping maximum for each sequence in an attempt to compare lengths at which the operators are most likely to be in phase, such that we are comparing only bending and not twisting flexibility. Finally, we note that here we are interested in the same kind of comparison that Cloutier and Widom were in Ref. , which was the inspiration for this figure; in , Cloutier and Widom compared cyclization and nucleosome formation free energies, even though the cyclization experiments were performed with roughly 100 bp DNAs and the nucleosome formation assays with roughly 150 bp DNAs. Likewise, we do not expect that the fragments of nucleosome-preferring or nucleosome-repelling sequences that we examine here in the context of looping will necessarily have exactly the same characteristics as the full-length nucleosomal sequences from which they were derived; but we are interested in comparing general trends in observed flexibility of these roughly 110 bp loops with those of roughly 100 bp ligated minicircles and of roughly 150 bp nucleosomal DNAs.
Supporting figures. Figure S1 “No-promoter” looping sequences used in this work. Figure S2 “With-promoter” looping sequences used in this work. Figure S3 Sequence-dependent twist stiffness. Figure S4 Looping probabilities and J-factors for the two looped states separately. Figure S5 Tether lengths of looped and unlooped states as a function of loop length and sequence. Figure S6 Tether length as a function of J-factor.
We are indebted to the late Jon Widom for the inspiration of this project and for his guidance, mentorship and friendship over many years. We thank Chao Liu, David Wu, David Van Valen, Hernan Garcia, Martin Lindén, Mattias Rydenfelt, Yun Mou, Tsui-Fen Chou, Eugene Lee, Matthew Raab, Daniel Grilley, Niv Antonovsky, Lior Zelcbuch, Matthew Moore, Ron Milo, Eran Segal, and the Phillips, Mayo, Pierce and Elowitz labs for insightful discussions, equipment and technical help; and Winston Warman at Transgenomic, Inc. (Omaha, NE, USA) and Jin Li at Laragen, Inc (Culver City, CA, USA) for special help with sequencing the poly(dA:dT)-rich DNAs.