|Home | About | Journals | Submit | Contact Us | Français|
Energy landscape theory is a powerful tool for understanding the structure and dynamics of complex molecular systems, in particular biological macromolecules1. The primary sequence of a protein defines its free energy landscape, and thus determines the folding pathway and the rate constants of folding and unfolding, as well as its native structure. Theory has shown that roughness in the energy landscape will lead to slower folding1, but derivation of detailed experimental descriptions of this landscape is challenging. Simple folding models2,3 show that folding is significantly influenced by chain entropy; proteins where the contacts are local fold fast, due to the low entropy cost of forming stabilising, native contacts during folding4,5. For some protein families, stability is also a determinant of folding rate constants6. Where these simple metrics fail to predict folding behaviour it is probable that there are features in the energy landscape that are unusual. Such general observations cannot explain the folding behaviour of the R15, R16 and R17 domains of α-spectrin. R15 folds ~3000 times faster than its homologues, although they have similar structures, stabilities and, as far as can be determined, transition state stabilities7-10. Here we show that landscape roughness (internal friction) is responsible for the slower folding and unfolding of R16 and R17. We use chimeric domains to demonstrate that this internal friction is a property of the core, and suggest that frustration in the landscape of the slow folding spectrin domains may be due to mis-docking of the long helices during folding. Although theoretical studies have suggested that rugged landscapes will result in slower folding, this is the first time that such a phenomenon has been shown experimentally to directly influence the folding kinetics of a “normal” protein with a significant energy barrier – one which folds on a relatively slow ms-s timescale.
The folding rate constants of all but the fastest folding proteins are assumed to be determined by the free energy barrier between the unfolded and transition states and a kinetic pre-factor. This is reflected in the ability of simple structural parameters such as contact or long-range order2,3, to predict folding rate constants. R15, R16 and R17 (~30% pairwise sequence identity) have the same three-helix bundle structure (Fig. 1a), and similar thermodynamic stabilities, but R15 folds and unfolds ~3 orders-of-magnitude faster than R16 and R17 (Fig. 2a)7. The folding rate constant for R15 is well-predicted by contact and long range order plots but R16 and R17 are outliers (Supplementary Fig. 1). Since the difference is evident in both folding and unfolding, the simplest explanation is that the transition state (TS) of R15 is more structured (and thus more stable) than that of its homologues. It is not possible to measure the free energy of the TS directly, however experimental data show that the three TSs are similar in terms of compactness (βT), and protein engineering Φ-value analysis of all three suggests that they are generally similar in overall structural and energetic terms (same average Φ-values, and same regions of the structure folded)8-10. If such significant differences in folding kinetics cannot easily be ascribed to differences in TS structure it is necessary to look for an alternative explanation.
Energy landscape theory has shown that protein folding is best described as a Kramers'-like11 diffusive process across the energy landscape and the folding rate constant is dependant on a number of aspects of the landscape1. Explicitly, for a one-dimensional free-energy surface with harmonic wells, the folding time (τf) is related to the shape of the energy landscape , the height of the energy barrier (ΔG‡) and the diffusion coefficient (D0);
One principle component of the diffusion coefficient is solvent friction, which slows folding. Diffusion over smooth landscapes is relatively fast, so that folding time of a given protein generally depends only on the height of the free energy barrier and solvent viscosity. However, theory suggests that when the landscape is rough, diffusion will be slowed by the time taken for the chain to escape from the local minima which constitute the rougher (more frustrated) landscape1. One possible explanation for the slow folding of R16 and R17 is that diffusion is impeded by kinetic traps in the energy landscape. This would be remarkable because landscape roughness has never been observed explicitly for any two-state protein with a significant energy barrier (i.e. a relatively slow folding protein with folding times in the order of ms – s), although frustration resulting from formation of stable misfolded intermediates has been described12.
to relate the rate constant for folding (or unfolding) (kf, ku) to solvent viscosity, η, and the so-termed internal friction of the protein, σ. Thus for a system with a smooth energy landscape (where internal friction is negligible i.e. σ << η), kf will be inversely proportional to solvent viscosity. Such a relationship has been observed for a number of small proteins16-18 1.
Solvent viscosity is easily controlled through addition of small molecule viscogens, however these tend to increase the stability of proteins. The isostability approach, whereby the stabilisation caused by the viscogen is counteracted by the addition of a chemical denaturant, has been widely used16-18,20,21. We applied this approach to investigate the hypothesis that internal friction slows the folding of R16 and R17, but is insignificant in the fast folding R15.
The equilibrium stability and folding kinetics of all three proteins were investigated over a range of solvent viscosities using glucose as the viscogen, and guanidinium chloride (GdmCl) as denaturant (Supplementary Figs 2 and 3). Glucose increases the stability of all the proteins but the m-value decreases. This decrease in m-value is associated with a decrease in the refolding m-value, but the unfolding m-value is constant, suggesting that glucose causes a collapse of the denatured states (Supplementary Fig. 4). This effect is seen in all three proteins, and more importantly, since the unfolding m-value is unaffected by the viscogen this suggests that the TS position (and structure) is unaffected by glucose in all cases. Folding and unfolding rate constants were determined at isostability in two ways. (i) The equilibrium data were used to calculate the concentration of GdmCl at which ΔGD-N = 1.5 kcal mol−1 for each glucose concentration. kf and ku values at this GdmCl concentration were determined from the fits of the chevron plots. (ii) The chevron plots alone were used to determine rate constants at ΔGD-N = 0 (i.e. where kf = ku).
Relative rate constants (k0/k) were plotted vs. relative viscosity (η/η0) (Fig. 2b). Gradients for folding and unfolding rate constants were consistent. From Equation 2, at isostability, if σ is small relative to η, the gradient of this plot will be approximately 1. If, however, σ is large relative to η, the gradient will be significantly < 1. The data for R15 show that both the folding and unfolding rate constants are strongly dependant on solvent viscosity, with gradients close to 1 (Supplementary Table 1), suggesting that for R15 internal friction is negligible. However, the rate constants of R16 and R17 show only a very weak dependence on solvent viscosity (mean slopes ~0.2). This provides strong evidence to suggest that the reason R16 and R17 fold so much more slowly than R15 is due, at least in part, to internal friction i.e. to roughness in the energy landscape. Similar investigations have been undertaken for four other proteins which fold on timescales comparable to those of the spectrin domains16-18,20. As seen here for R15, there is no evidence to suggest that internal friction plays any role in determining the rate constant for folding of these proteins. R16 and R17 are unexpectedly atypical.
Using equation (2) it is possible to estimate the internal friction, σ, in the transition states of the spectrin domains. (See Supplementary Results/Discussion). R15 has a value of σ of 0.25 ±0.16 cP, significantly lower than water (η ~1 cP). By contrast, the σ values for R16 and R17 are significantly higher than water (4.4 ±1.6 cP, and 12.0 ± 6.6 cP, respectively). These high values of internal friction are similar to those found in studies of dynamics of essentially fully folded proteins15,22.
These values of σ can be used to evaluate the relative magnitude of the ruggedness of the energy landscapes of our spectrin domains. Assuming randomly (Gaussian) distributed roughness, Zwanzig23 obtained an expression relating the amplitude of the roughness (characterised by variance ε2) on a one-dimensional energy landscape to the effective diffusion coefficient, D*:
where D is the diffusion coefficient across a given smooth landscape and ε is the characteristic magnitude of the roughness (in energy units). This expression can be used to estimate the relative landscape roughness of the slow folding proteins R16 and R17, compared to R15, Δε (see Supplementary Results/Discussion). Δε is ~1.7 RT for R16 and ~2.0 RT for R17, similar to the value observed in experimental studies on peptides, denatured proteins and small, fast folding proteins and from theory24-28. As discussed by Zwanzig, Gaussian noise gives one limit for the microscopic barrier heights; the alternative extreme of periodic barriers of equal height results in a slightly larger Δε. It is remarkable that such a small increase in landscape roughness can result in such a significant change in the viscosity dependence of the folding kinetics. In fact, these microscopic barriers must be relatively small; kinetic traps involving barriers much larger than about 3-4 RT would result in accumulation of intermediates, not observed in spectrin domains1,12,27.
What then, is the source of the frustration in the energy landscape observed in R16 and R17? More than 200 variants of the spectrin domains have been investigated8-10. None significantly speeds the folding of R16 / R17 nor significantly slows the folding of R15. Four core-swap proteins, were designed (Fig. 1b,c and Supplementary Fig. 5) containing the core residues from one parent, the “minor parent”, grafted into the “major parent” which contributes the “outside” of the core-swapped protein. R15o16c (the outside of R15 and the core of R16) and R15o17c could not be expressed solubly. However, both proteins containing the core residues from R15, R16o15c and R17o15c, fold and unfold significantly faster than their major parent (R16 and R17), suggesting that the origin of fast/slow folding lies within the core (Figs. 3a & 3b).
Furthermore, for both R16o15c and, more notably, R17o15c the increase in the rate constants is accompanied by an increase in the dependence of the rate constants on solvent viscosity, compared to their major parent (Fig. 3c, Supplementary Fig. 6a,b, Supplementary Table 1). The slopes of the relative rate constant vs. relative viscosity plots for R17o15c are comparable to those of R15, and significantly different to its major parent R17. For R16o15c, the slopes are significantly higher than those of R16, but lower than R15 or R17o15c. Thus faster folding is associated with a decrease in internal friction.
Mechanistic differences may offer insight into the source of the roughness in the energy landscapes of the slow-folding spectrin domains. R16 and R17 fold via a framework-like mechanism, in which formation of the helices precedes helix packing8,9. However, R15 folds by nucleation-condensation, where secondary and tertiary structure form concomitantly10. Evidence for different folding mechanisms is seen most clearly in the C helix (Fig. 4a). Φ-value analysis of the C helix of R16o15c clearly indicates that the pattern of Φ-values resembles R15, not R16 (Fig. 4, Supplementary Table 2, Supplementary Fig. 7, Supplementary Fig. 8). Thus slow folding and increased internal friction may be related to a framework-like folding mechanism. (Note that R16o15c has significantly reduced helical propensity in the C-helix, compared to R16 (Supplementary Fig. 9)).
In R15 nucleation, involving the central regions of the A and C helices, establishes the correct register for the docking of these long helices. In R16 and R17 however, the C helix (and to lesser extent A helix) is apparently pre-formed, and must find the correct register to dock. A potential source of conformational frustration is the occurrence of a number of non-native docking events as the polypeptide chain crosses the TS barrier. Indeed, early, out-of-register mis-docking events are seen in MD simulations of R17 folding9. We propose that this misfolding is a likely source of the frustration in the folding landscapes of R16 and R17, and note that transient mis-docking would be likely to result in roughness of the magnitude observed for transient contact formation in unfolded peptides (0.5-2 RT) as we find here. Paradoxically, studies of the 3-helix bundle homeodomain family of proteins suggest that folding via a framework mechanism results in faster folding than nucleation condensation29. The difference between the spectrin and homeodomain proteins is the size of the helices. There is, perhaps, little scope for mis-docking in the small, ~12 residue homeodomain helices whereas for the long (~30 residue) spectrin helices it is more difficult to establish the correct alignment.
All our results are consistent with the hypothesis that the slow folding and unfolding of R16 and R17 are due to roughness in the energy landscapes. We suggest that this friction results from residue-specific phenomena, such as frustration caused by mis-docking of helices. Although theoretical studies have long suggested that frustrated landscapes will result in slower folding, this is the first time that such a phenomenon has been shown experimentally to directly influence the folding kinetics of a two-state protein which folds on a relatively slow ms - s timescale. It is possible that slow folding / unfolding kinetics might be advantageous to proteins, such as spectrins, which have very long half-lives in vivo. Spectrin is a protein of the intracellular matrix of red-blood cells where it is important for membrane elasticity. Red blood cells live for ~120 days. Slow unfolding kinetics will result in far fewer domain unfolding events during this lifetime, perhaps decreasing the likelihood of degradation, or other detrimental effects. We note that mutations in spectrin domains which reduce inter-domain cooperativity, and thus also increase the likelihood of domain unfolding, result in disease30.
Protein expression and purification was carried out as described elsewhere7,10. Design of the core-swapped proteins is described in the Methods. Equilibrium stability was determined monitoring the CD at 222 nm, and kinetics followed changes in fluorescence as described in 7. Methods of fitting the kinetic data are described in detail in the Methods. Note that our previous work on these domains has been carried out in urea7-10, however due to the stabilising effect of the glucose, the stronger denaturant GdmCl was used. The exception was R17o15c: due to a combination of the destabilisation and the effect of ionic strength on its stability, all analysis of this domain was carried out in urea.
It is important to note that in the strictly comparative studies done here, all five proteins respond in the same manner to denaturant / viscogen. In all cases the position of the TS (relative to N) is unaffected by the viscogen, the denatured states show similar evidence for collapse and the free energy of unfolding is affected in the same way (Supplementary Figs 2 & 4).
Kinematic viscosity was measured using U-tube viscometers (Poulten Selfe & Lee Ltd), and multiplied by the density to find the dynamic viscosity.
All experiments were carried out at 25±0.1 °C in 50 mM sodium phosphate buffer (except the R16o15c Φ-value analysis carried out at 10°C). For R17 and R17o15c 5 mM DTT was added to the buffers.
R16 and R16o15c have a single proline residue, and so refolding data were described by a double exponential equation, with the major, fast folding phase accounting for ~ 80% of the amplitude. The slow phase has been shown to result from proline isomerisation7. All unfolding data and refolding data for R15, R17, and R17o15c were well described by a single exponential process.
All the chevron plots, with fits, are shown in Supplementary Fig. 3. For each domain, the sets of eight chevron plots at varying [glucose] were fitted globally. This was both to reduce error in the fits, and to allow fitting of the curvature seen in all the chevrons except those of R15.
Data collection for R15 was limited by the fast folding and the dead time of our stopped-flow instrument. As a result, the arms of the R15 chevron plot are short (Supplementary Fig. 3a), and the curvature we have previously inferred in the unfolding arm is not seen10, so the chevrons were fitted to a linear chevron fit. The very short arms made accurate fitting of the gradient of the folding arm, mkf, and the gradient of the unfolding arm, mku, difficult. Individual fitting showed that mku is unaffected by increasing [glucose], but mkf decreases (Supplementary Fig. 4a). This leads to a decrease in the kinetic m-value, mkin, where mkin = RT(mkf + mku). This decrease is comparable to the decrease in equilibrium m-value, meq, with increasing [glucose] seen in the equilibrium data (Supplementary Fig. 4a). As mkin = meq, within error, for R15, mkf was shared, and mku constrained such that RT (mkf + mku) = meq in the global fitting of R15 (Supplementary Fig. 3a; Supplementary Fig. 4b).
Because R16 and R17 fold so much slower than R15, curvature can clearly be seen in the unfolding chevron arms. These data are best fitted using a sequential transition state model31-34. R16 has been fitted in this manner previously8,35 and the longer unfolding arms seen for R17 through the use of the denaturant GdmCl, rather than urea, makes it possible to fit R17 in the same way. The fitting was carried out as described in8. As in R15 the unfolding m-values (in this case m−1, m2 and m−2) were shared, for each set of eight chevron plots (Supplementary Fig. 3b,c). The m-values shared correspond to the mku that was shared for R15. In R16 the kinetic and equilibrium m-values were similar which is an indication that the fitting method is appropriate. In R17, although mkin > meq (Supplementary Fig. 4d), this has been seen before for R17 using other fitting methods9, and is a characteristic of the domain, so was not considered further here.
Curvature was seen in both chevron arms for R16o15c. At high [glucose] the curvature in the refolding arm became negative and is probably due to aggregation of a non-evolved domain. This curvature has been excluded from the analysis. Although R16o15c folds faster than R16 and R17, enough curvature was seen in the unfolding arm to allow fitting using the sequential transition state model used above, again sharing the unfolding m-values. For R17o15c, short arms displayed little or no curvature, and longer arms displayed significant curvature, irrespective of which arm they are. Because of this a broad transition state barrier model was considered the most appropriate fitting method to use35-40. This describes the rate limiting transition state moving towards the native state as the concentration of denaturant increases. In this fitting method a second-order polynomial term is added to a two-state chevron fit to account for the curvature seen in the chevron limbs. The eight chevrons were fitted globally, and the curvature term and mku shared globally. Sharing these two terms was not necessary for the fitting, but reduced fitting errors considerably.
We note that the commonly used isostability approach16-18,20,21, using denaturant to counteract the stabilising effects of the viscogen, has been questioned on the grounds that the two may not be directly additive41. It is therefore most important to note that here, in these strictly comparative studies, all five proteins respond in the same manner to denaturant / viscogen. In all cases the position of the TS (relative to N) is unaffected by the viscogen, the denatured states show similar evidence for collapse and the free energy of unfolding is affected in the same way
The pdb structure of R15, R16 and R17 as a tandem repeat, 1u4q42, was used to determine which residues had side chains with ≤ 15% solvent accessible surface area (SASA) which were defined as core. By comparing across the three domains, outliers due to very large / small side chains were excluded, and the same 35 residues were defined as core for all three domains (Supplementary Fig. 5; Fig. 1b,c). The core residues that were not identical between the domain pairs were identified and synthetic genes produced. R16o15c (a protein with the outside of R16 and the core of R15) and R15o16c were made using overlapping primers and standard PCR techniques, and R17o15c and R15o17c were purchased from GenScript. Each was inserted into the modified pRSETA vector used for R15, R16 and R17, and expressed, purified and characterised (for R16o15c and R17o15c) as for its major parent. R15o16c and R15o17c were expressed insolubly and could not be easily refolded.
Φ-value analysis is a powerful method for investigating the structure of the transition state (TS) for folding43. The regions of the protein that are significantly structured in the TS will have high Φ-values, and the Φ-values in unstructured regions are low. The Φ-value analysis of R16o15c was carried out in a very similar manner to those of R15 and R168,10. The structure of R16o15c has not been determined, however the similarity to its parents in terms of purification and biophysical characteristics indicates that the structures are similar. Consequently, the same positions, and types of mutations made, were chosen as the two parents. Furthermore, the change in stability on mutation for the core residues, ΔΔGD-N, was the same as observed in R15, suggesting that the core structure was maintained (Supplementary Fig. 8). All experimental work was carried out as for R15, including working at 10 °C, to access longer chevron arms than is possible at 25 °C. Urea was used as the denaturant, to allow direct comparison with the Φ-value analysis of the two parent proteins. Equilibrium curves were analysed as described44 and kinetic traces were analysed as described for R16(above and 10). The fitting of the chevron plots was carried out as describe for wild-type in GdmCl (above) using the sequential transition state model and sharing the unfolding m-values (Supplementary Fig. 7). Folding Φ-values, Φf2M, were calculated at 2 M urea, to avoid long extrapolations (Supplementary Table 2; Fig. 4a):
where and are the folding rate constants at 2 M urea for wild-type and mutant proteins, respectively. For surface Ala-Gly Φ-values Ala was used as the reference (wild-type) and Gly the mutant.
The Φ-values were calculated for the first TS, which is the rate limiting TS for WT and all mutants both in water and at 2 M urea. This TS was compared with the equivalent TS in R16, and the rate limiting TS in water for R15 (Fig. 4). Note that R17o15c is not amenable to Φ-value analysis because it is too unstable to tolerate large deletion mutations.
This work was supported by the Wellcome Trust (Grant number 064417/Z/01/A ). B.G.W. was supported by an MRC studentship. J.C. is a Senior Wellcome Trust Research Fellow. We thank William Eaton, Peter Wolynes, Robert Best and Ben Schuler for very helpful discussions.