|Home | About | Journals | Submit | Contact Us | Français|
It is known that DNA-binding proteins can slide along the DNA helix while searching for specific binding sites, but their path of motion remains obscure. Do these proteins undergo simple one-dimensional (1D) translational diffusion, or do they rotate to maintain a specific orientation with respect to the DNA helix? We measured 1D diffusion constants as a function of protein size while maintaining the DNA-protein interface. Using bootstrap analysis of single-molecule diffusion data, we compared the results to theoretical predictions for pure translational motion and rotation-coupled sliding along the DNA. The data indicate that DNA-binding proteins undergo rotation-coupled sliding along the DNA helix and can be described by a model of diffusion along the DNA helix on a rugged free-energy landscape. A similar analysis including the 1 D diffusion constants of eight proteins of varying size shows that rotation-coupled sliding is a general phenomenon. The average free-energy barrier for sliding along the DNA was 1.1 ± 0.2 kBT. Such small barriers facilitate rapid search for binding sites.
Many nucleic acid enzymes and proteins that act on DNA quickly locate target sites by diffusing along nonspecific DNA. It has been shown that proteins can both hop and slide along double-stranded DNA1–3, although the microscopic mechanism of protein motion along DNA molecules is still not understood in molecular detail. In particular, the path traced by a sliding protein molecule along the surface of DNA has not been established. Both linear paths, parallel to the DNA axis, and helical paths, following a strand or groove of the DNA around the DNA axis, have been taken as assumptions in biophysical and biochemical models. Although rotation of sliding proteins around the DNA helix was implicitly4 and explicitly5,6 anticipated, such rotation was not shown to occur during diffusive sliding. The concept of rotational coupling has also arisen among structural biologists based on concepts of molecular recognition and observations of detailed structural complementarity between proteins and DNA7–9. Despite the persistent high profile of this question in the literature, it remains unknown whether sliding proteins track the DNA helix. Such tracking would have major biophysical and biochemical implications: for example, only a limited set of enzyme-helix juxtapositions would need to be considered in questions of protein-DNA interaction. In this work, we examine the dependence on protein size of the diffusion constant for sliding along DNA in order to distinguish pure translational diffusion (Fig. 1a) along DNA from rotation-coupled (or -slaved) diffusion (Fig. 1b). The result offers insights into the mechanism of target search and recognition of all DNA-binding proteins.
As a protein moves along DNA, it experiences three different frictional forces arising from random collision with the solvent molecules, and all three are proportional to the solvent viscosity, retarding the protein’s diffusive motion. One is the friction on colinear motion parallel to the DNA axis. In addition, if the protein spins along the DNA helix, there are two rotational components of the friction: the rotational friction for motion along the offset helical path due to circumnavigation of the DNA axis, and the additional rotational friction that arises from the body-centric protein rotation.
Einstein’s treatment of translational diffusion as a Brownian motion, together with Stokes’ expression for viscous friction, indicates that the diffusion constant of a protein sliding along DNA should vary with protein size as 1/R, where R is the radius of the protein. This 1/R dependence of the 1D diffusion constant (D1) is valid if the protein experiences only translational friction as it slides along DNA, regardless of the magnitude of this friction. However, if the protein is constrained to track the DNA helix (for instance, in order to maintain optimum contact between its DNA-binding patch and the surface of the DNA helix), the protein will be forced to rotate while translating, and, as a result, the size dependence of the diffusion constant can be quite different. Inclusion of protein rotational friction leads to a much stronger dependence of the diffusion constant on R, of the order 1/R3, typical of rotational diffusion. From the cocrystal structures of DNA-bound protein molecules, we know that many proteins bind DNA with a significant offset from the DNA axis. When a protein so bound undergoes motion along the DNA double helix, the path it traces through space is not a straight line. We recently developed a theory to take the nonlinear path traced by offset protein molecules into account10. The new theory differs from Schurr’s original treatment5 by the incorporation of a helical path for sliding, as parameterized by ROC, the minimum distance from the protein center of mass to the DNA axis (Fig. 1b). The resulting expression (ref. 10 and below) provides a different numerical estimate of the friction experienced by a sliding protein molecule, although the size dependence remains close to 1/R3 when ROC approximates R.
In addition to the frictional forces, interactions between the protein and the DNA retard the diffusion of proteins. The interaction between the protein and the DNA is a sum over a large number of two-body interactions involving the atoms of the protein and the DNA, and it depends both on the distance of separation between the protein and the DNA and on their relative orientation. We can divide this complex protein-DNA interaction into an average part and a fluctuating part. The average part defines a binding potential that governs the pathway along the helix. The fluctuating part gives rise to potential energy barriers along the diffusion pathway. The average of the fluctuating potential is zero; however, because of the heterogeneity of the DNA base sequence and resulting variation in the DNA helical structure, the protein experiences random minima and barriers on its path along the DNA helix. Therefore, it is a rough (or rugged) energy landscape that the protein must navigate by diffusion (Fig. 1c).
We now quantify the effects of the rough energy landscape on the diffusion of a protein moving along the DNA helix. It is known that a rough energy landscape with small barriers can retard diffusive motion. We express this retarding factor as F(ε), as did Zwanzig11, who showed that the actual diffusion coefficient can be written as the product of the hydrodynamic diffusion constant (the value obtained in the absence of the rough potential) and F(ε). Zwanzig also showed that if the fluctuating part of the potential obeys a Gaussian distribution, then F(ε) = exp[−(ε/kBT)2] where ε denotes the rms variation of the fluctuating portion of the potential, indicating the average energy of the barriers that the protein crosses while sliding. Using the expression for the hydrodynamic friction10, we arrive at the following expression for the diffusion constant along a helix, which is dependent on the protein radius (R) and the minimum distance between the protein center of mass and the DNA axis (ROC) (Fig. 1b and ref. 10):
b2 describes the effect of the helix pitch on the diffusion constant for sliding. b is given with dimensions distance per full rotation. Specifically, motion following the DNA helix would correspond to a value of 10.5 base pair (bp) for b, or about 3.4 nm. We note that, although ε is most naturally evaluated when the protein translocates a distance of 1 bp along the DNA (Fig. 1c), ε is scale invariant (independent of b) because the measured diffusion coefficient () is scale invariant for Brownian motions. If ROC is diminished to a value of zero in equation (1), Schurr’s form is recovered (without the F(ε) term on the right hand side).
For protein-sized objects, the magnitude of the rotational friction dominates the translational friction by 1–3 orders of magnitude, and we may then ignore the translational contribution to obtain the simplified form:
Many DNA-protein complexes show ROC ≈ R, in which case 1/R3 scaling of D1 is obtained, albeit with nearly twice the hydrodynamic friction predicted by Schurr. When ROC differs from R, then a different (but still 1/R3-like) scaling, 1/[R3 + ¾R(ROC)2], is obtained.
However, if protein rotation is not required for motion along the DNA, then diffusion should exhibit a vastly different dependence, 1/R, on the size of the diffusing protein molecule. This difference in size dependence, 1/R versus 1/[R3 + ¾R(ROC)2], can thus serve as a tool to identify the basic mechanism of protein transport along DNA, and is used below.
We used a single-molecule fluorescence tracking assay to obtain experimental values of D1 for labeled human oxoguanine DNA glycosylase 1 (hOgg1) (R = 3.2 nm, ROC = 2.5 nm) sliding along double-stranded DNA3,12,13 (Fig. 2 and Fig. 3). The assay takes advantage of a simple flow-stretching method to prepare linear DNA templates and of total internal reflection fluorescence excitation to illuminate sliding protein molecules while excluding much of the specific fluorescence background. Because of the limited spatiotemporal resolution of this optical assay, very fine or very fast motions of the protein are not observed and the possible rotation of the protein is not easily discerned from the molecular trajectories, which appear as 1D diffusion. Thus, an indirect method (analysis of R and ROC dependence) that takes advantage of accurately determined apparent 1D diffusion constants is used.
We first chose the human DNA repair protein hOgg1 as a platform to test for protein rotation. hOgg1 has a modest interaction area with DNA14–16 and is therefore a stringent test case for helical coupling, as rotation is unlikely to be enforced by steric constraints alone. To vary the size of hOgg1, we replaced the small fluorescent label (the Cy3B dye molecule) on hOgg1 with a larger one, streptavidin decorated with dye molecules.
All labeling of hOgg1 reported here was carried out at the protein C terminus. This site is well characterized, having been designed for, and experimentally demonstrated to have, a minimal impact on the hOgg1-DNA interaction and specifically on D1 (ref. 3). Notably, the strong reactivity of a C-terminal, engineered cysteine residue relative to the internal cysteines was demonstrated by mass spectrometry. This reactivity allows quantitative labeling of hOgg1 at the engineered site with complete specificity. The C terminus of full-length hOgg1 (345 amino acids) is well removed from the DNA-binding interface, although the C terminus can be truncated to bring a label nearer to the DNA-binding interface and active site. In fact, this was done, in order to test the sensitivity of the hOgg1-DNA interaction to C-terminal labeling. hOgg1, C-terminally truncated at position 322, was labeled with small-molecule fluorophores of varying shape, chemistry and charge (both cationic and anionic dyes were tested), and the diffusion constant for sliding along flow-stretched DNA was determined in the single-molecule assay. No difference in D1 was found with respect to the position or identity of the small-molecule label, demonstrating the notable insensitivity of the hOgg1 DNA-binding interface to C-terminal labeling in general, and specifically to the presence or absence of macromolecular structures in the region of the C terminus, such as the truncated residues 323–345.
We chose streptavidin as a steric and fluorescent label for hOgg1 based on a number of considerations. First, the streptavidin tetramer has a net negative surface charge at the assay pH and very low affinity for DNA. Second, a specific and very robust attachment chemistry, the biotin-streptavidin interaction, is intrinsic to the protein and simplifies the preparation of monodisperse conjugates. Employing the same excess label-hOgg1-specific affinity chromatography strategy used to generate 1:1 hOgg1-Cy3B conjugates3, we attached labeled streptavidin to the C terminus of hOgg1 using a flexible, biotinylated poly(ethylene glycol) (PEG) linker, resulting in conjugates with R = 4.4 nm and ROC = 3.3 nm (Fig. 2 and Fig. 3).
A number of experimental observations validate our expectations that streptavidin does not interact with DNA in the single-molecule assay and that DNA-binding constant of hOgg1 is not substantially altered by the conjugation of streptavidin. First, in a control experiment with fluorescently labeled streptavidin and hOgg1 (nonbiotinylated) in assay buffer, no binding of strepavidin to DNA was observed. Second, the average numbers of hOgg1-streptavidin and hOgg1-Cy3B molecules bound per DNA at steady state in the single-molecule assay are comparable, given similar experimental conditions (as described for hOgg1 in Online Methods with 50 pM labeled molecules): 0.47 ± 0.79, and 0.78 ± 0.92, respectively. Finally, hOgg1, hOgg1-Cy3B, and hOgg1-streptavidin all elute from GE’s HiTrap SP HP resin at about the same salt concentration (130 mM NaCl) at pH 8.0, providing another indication that the electrostatic binding properties of the two conjugates are equivalent and, furthermore, are equivalent to those of the native enzyme.
The PEG linker of roughly 68 ethylene glycol units has several desirable properties that simplify the interpretation of our measurements. The linker is highly flexible and soluble (solvent-like), making it unlikely to impose undesired intramolecular interaction between hOgg1 and streptavidin, or between streptavidin and the DNA, that could affect the diffusion constant in a manner superfluous to the intended hydrodynamic perturbation. Furthermore, the high solubility of PEG gives it well-defined conformational properties that have been extensively studied and described in the polymer physics literature. For instance, the radius of gyration of a 3,000-Dalton PEG chain in aqueous buffer can be accurately calculated at 2.25 nm. The linker’s bulk serves to extend the attachment point for streptavidin away from the hOgg1-DNA interface, further reducing the chance of streptavidin-DNA interactions. Finally, the rapid relaxation of PEG chains maintains streptavidin at a constant, well-defined position relative to hOgg1 as hOgg1 undergoes translational and possibly rotational dynamics changes during sliding. Specifically, the maximum relaxation time of the linker is under 50 ns, easily fast enough to keep pace with the hOgg1-streptavidin conjugate, which takes on average more than 200 ns to slide 1 bp based on the data reported below.
We now introduce a broadly accepted statistical method, the bootstrap17, and apply it to single-molecule diffusion data to extract diffusion constants. Bootstrap resampling is a simulation-based statistical inference method that can be used to obtain estimates of diffusion constants based on nearly independent, identically distributed protein displacements (see Online Methods and Supplementary Note for detail). The bootstrap estimates generally agree with diffusion constants determined using the traditional method, where trajectory average and ensemble average mean-square displacement (<Δx2(τ)>) functions are constructed and subjected to linear regression. However, the bootstrap method is more efficient (with smaller error bars) than regression, and converges to the true value faster than regression with little dependence on model assumptions (for example, Gaussian-distributed errors) while providing a straightforward, empirical means for the analysis of errors (Supplementary Fig. 1). To obtain a more accurate estimate of the hOgg1 D1, we used the bootstrap to reanalyze the hOgg1-Cy3B dataset reported in 2006 (ref. 3), and obtained D1 = 5.87 ± 0.07 M(bp)2 s−1, a value that falls within the error bar of the originally reported diffusion constant (4.8 ± 1.1 M(bp)2 s−1).
Figure 2d depicts the trajectories of hOgg1-Cy3B and the larger conjugate, hOgg1-streptavidin, diffusing along DNA. Clearly, hOgg1-streptavidin diffuses more slowly than hOgg1-Cy3B, with a diffusion constant of only 2.21 ± 0.09 M(bp)2 s−1. In all cases reported here, the diffusion trajectories lack long-term drift, indicating that there is no significant bias induced by the flow stream used to stretch the DNA.
To make an even larger conjugate, we coupled streptavidin-coated quantum dots to hOgg1 (here with limiting hOgg1 and no purification to make 1:1 conjugates), despite concern that the outsized label could cause artifacts such as flow-induced drift, a change in the diffusion mechanism, or interactions with the surface of the flow cell. Furthermore, heterogeneity in the size of the quantum dots could falsely indicate variation in the diffusion constants among individual protein molecules and affect our analysis of size dependence. We obtained trajectories of hOgg1-quantum dot conjugates (R = 9.0 nm and ROC = 10 nm) sliding on DNA but were unable to determine a consistent estimate of the diffusion constant by using the bootstrap or by regression of mean-square displacements. For this reason, we have excluded the hOgg1-quantum dot data from our quantitative analysis of size dependence. We note that quantum dots have been used recently in studies of protein translocation along DNA, and we caution against the use of semiconductor labels in such studies because of significant reduction in the protein diffusion constant and the possible effect on the translocation mechanism.
To explore the functional dependence of D1 on R, we plotted D1 for hOgg1 and hOgg1-streptavidin versus 1/R and fit the data to a line that passes through the origin (Fig. 3a). D1 deviates strongly from 1/R across the conjugates (reduced χ2 = 386), indicating that the motion of sliding hOgg1 molecules along DNA is not simple linear diffusion along the chain in a sliding, hopping or hybrid modality. A different theory is clearly required to address the observed nontrivial size dependence.
We applied our theory10 to understand the single-molecule results, noting that what we measured in the single-molecule experiments is an apparent D1. The limited temporal and spatial resolution of the experiments preclude direct observation of protein rotation, but the effect of rotation is manifest in the apparent D1. First, we plotted the measured D1 versus 1/[R3 + ¾R(ROC)2], testing for the scaling predicted by equation (2) by fitting to a line that passes through the origin (Fig. 3b). The size dependence of D1 can only show 1/[R3 + ¾R(ROC)2] scaling if hOgg1 undergoes persistent sliding with strongly dominant rotationtranslation coupling. The fit is excellent (reduced χ2 = 0.75), indicating that hOgg1 spins while moving along DNA. This result alone indicates that the apparent 1D motion we observe along the DNA is not mediated by hopping or microdiffusion alone, but principally by sliding of hOgg1 in close contact with the DNA, a binding interaction persistent enough to strongly couple high-friction rotation of the protein molecule to its translational motion.
An expansive understanding of the protein-DNA interaction requires analysis of the free-energy landscape along the protein’s path of diffusion and generalization beyond observations of hOgg1 to cases where R and ROC differ significantly. Using the single-molecule method described above, we measured and analyzed the diffusion constants of eight proteins and protein conjugates of various sizes. The panel of diffusion constants consists of two published values, those for hOgg1-Cy3B3 and Escherichia coli LacI-YFP dimers18, and six newly determined values, those for hOgg1-streptavidin, Bacillus stearothermophilus MutY, E. coli MutM M74A, the adenoviral AVP-pVIc complex, the BamHI restriction endonuclease dimer, and the Klenow fragment of E. coli DNA polymerase I (raw diffusion traces for these proteins are presented in Supplementary Figure 2). This group includes proteins with diverse functions, including DNA replication, DNA cleavage, DNA repair, transcriptional regulation and proteolytic activation; variously structured DNA-binding domains, including the classical helix-hairpin-helix, zinc finger and polymerase folds; and diverse organisms of origin, including mammals, bacteria, bacteriophage and human viruses.
Realizing this set of proteins presents an assortment of sliding interfaces, we make a global analysis of R and ROC dependence, as the effect of differences in ε across the set will be muted across the broad range in size and the greater average difference between R and ROC compared with the hOgg1 conjugates. In Figure 4 we plot the measured D1 versus 1/R and 1/[R3 + ¾R(ROC)2] for these eight proteins and protein conjugates and fit the dataset with lines that pass through the origin. The result is decidedly characteristic of rotation-coupled diffusion along DNA, as D1 for this eclectic collection of proteins correlates strongly with 1/[R3 + ¾R(ROC)2] (Fig. 4a, coefficient of determination, R2 = 0.94), but not with 1/R (Fig. 4b, R2 = 0.52). This interesting result allows us to draw two conclusions. First, despite the fact that each protein has its own DNA-binding interface (and hence, its own ε), the F(ε) values are similar. Second, the broad diversity of the sliding interface among the eight proteins suggests that rotation-translation coupling is a mechanism common to all DNA-binding proteins.
We used equation (1) to obtain values of ε for each protein, taking advantage of the experimental values of D1 and diffusion constants calculated from the helical sliding model using R and ROC, with b fixed at 10.5 bp, indicating helical sliding that matches the helical pitch of the DNA (Fig. 4c). The result demonstrates that the values of ε calculated for all these proteins are small and tightly clustered between 0.75 and 1.35 kBT averaging 1.1 ± 0.2 kBT (±s.d.) across the group. The independently derived ε values from the hOgg1 and hOgg1-streptavidin datasets agree closely, consistent with the fact that the two protein-DNA complexes share the same DNA-binding interface. This is a further indication that the labeling of hOgg1 with streptavidin was minimally perturbative of the nonspecific binding interaction. Although small barrier heights have been theoretically implicated19, our measurements represent the first experimental determination, to our knowledge, of the rug-gedness parameter, ε.
The data presented here reveal small average barriers tightly clustered around 1 kBT, which explain how these proteins can slide rapidly on the rugged energy landscape in search of their targets. Because of the strong dependence of the diffusion constant on ε, proteins whose cellular function depends on fast sliding have evolved to minimize ε. The remaining small yet statistically significant differences among the calculated ε values within this diverse set of proteins reflect the differing physical and functional constraints under which each protein’s sliding activity has evolved. Recent papers reporting sliding by several additional proteins20–23, including a study implicating helical sliding by proliferating cell nuclear antigen, are consistent with our conclusion that rotation-coupled translation is a general feature of protein sliding.
It is conceivable that differences exist among the helical paths traced by different protein types. Such differences may be important for both sliding activity and protein function. For example, different proteins may track different parts of the DNA helix, such as the major groove, the minor groove, both grooves or neither groove (in the case of backbone tracking). The degree of interaction with the bases is also likely to vary widely with protein type, the DNA state, DNA sequence, the presence or absence of cofactors and the solution conditions. New structural and biophysical approaches using sophisticated biochemical methods and perturbations can reveal the interplay between rotation-coupled sliding activity and protein function (recognition and/or catalytic activity) in thermodynamic and kinetic terms.
We have shown that nonspecifically bound protein molecules diffuse along the helical path defined by DNA and rotate in order to keep the DNA-binding face of the protein in contact with DNA during fast 1D sliding along DNA. Obligate tracking of the DNA helix by sliding protein molecules, and the accompanying 360-degree rotation per helical turn, are prerequisites for efficient recognition of targets in DNA3,16 and are necessary for the conveyance of some types of information by proteins along DNA, as recently proposed for the type III restriction enzymes24. For each protein type, a particular helical track along the DNA represents a locus of low free-energy nonspecific binding states that support rapid sliding along the rugged free-energy landscape with small barriers along the DNA. By maintaining the protein position and orientation with respect to targeted base pairs, the protein-DNA complex is able to retain a kinetically efficient target-recognizing configuration while rapidly scanning a DNA substrate.
Methods and any associated references are available in the online version of the paper at http://www.nature.com/nsmb/.
We would like to thank Y. Qi (Harvard Univ.) for providing labeled E. coli MutM, W.J. McGrath and V. Graziano (Brookhaven Natl. Lab.) for a gift of labeled AVP-pVIc, and E.S. Vanamee and A. Aggarwal (Mount Sinai School of Medicine) for a gift of labeled BamHI. We also thank G.W. Li and J. Elf for contributing LacI-Venus diffusion data. B.B. was supported partly by a grant from DST (India) and by a JC Bose Fellowship. Work at Harvard was funded by the NIH Director’s Pioneer Award and by NSF.
Note: Supplementary information is available on the Nature Structural & Molecular Biology website.
Reprints and permissions information is available online at http://npg.nature.com/reprintsandpermissions/.