PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of bpapBiophysics and Physicobiology
 
Biophys Physicobiol. 2016; 13: 263–279.
Published online 2016 November 18. doi:  10.2142/biophysico.13.0_263
PMCID: PMC5221509

Characterization of protein folding by a Φ-value calculation with a statistical-mechanical model

Abstract

The Φ-value analysis approach provides information about transition-state structures along the folding pathway of a protein by measuring the effects of an amino acid mutation on folding kinetics. Here we compared the theoretically calculated Φ values of 27 proteins with their experimentally observed Φ values; the theoretical values were calculated using a simple statistical-mechanical model of protein folding. The theoretically calculated Φ values reflected the corresponding experimentally observed Φ values with reasonable accuracy for many of the proteins, but not for all. The correlation between the theoretically calculated and experimentally observed Φ values strongly depends on whether the protein-folding mechanism assumed in the model holds true in real proteins. In other words, the correlation coefficient can be expected to illuminate the folding mechanisms of proteins, providing the answer to the question of which model more accurately describes protein folding: the framework model or the nucleation-condensation model. In addition, we tried to characterize protein folding with respect to various properties of each protein apart from the size and fold class, such as the free-energy profile, contact-order profile, and sensitivity to the parameters used in the Φ-value calculation. The results showed that any one of these properties alone was not enough to explain protein folding, although each one played a significant role in it. We have confirmed the importance of characterizing protein folding from various perspectives. Our findings have also highlighted that protein folding is highly variable and unique across different proteins, and this should be considered while pursuing a unified theory of protein folding.

Keywords: free-energy profile, contact-order profile, circular permutation, framework model, nucleation-condensation mechanism

Understanding the problem of protein folding poses a great challenge, particularly from a biophysics perspective. The protein-folding problem involves three major issues: (1) how the three-dimensional (3D) structure of a protein is determined from its amino acid sequence, (2) how a protein folds from a random coil to a native structure, and (3) how the 3D structure is characterized from static and dynamic perspectives. The second problem, which concerns the folding process (or folding pathway) of a protein, is the focus of this paper.

Regarding the folding process, there are currently two major hypotheses: the three-step mechanism or framework model [14] and the nucleation-condensation model [5,6]. In the framework model, secondary-structure elements fold first, followed by the coalescence of preformed secondary-structure elements to yield the native structure. The occurrence of step-by-step folding from short- to medium- and then to long-range interactions is a key assumption. In contrast to this, the nucleation-condensation model assumes that the secondary and tertiary structures are formed in parallel, or even in reverse order. That is, the long-range interactions or tertiary structures can be formed before or at the same time as the secondary structures. Interpretations of these two hypotheses seem to differ among researchers. For convenience and clarity, in this paper we refer to these two hypotheses but with the following specific preconditions: the framework model does not allow longer-range interactions to form before shorter-range interactions, but the nucleation-condensation model does.

The above key principles have contributed significantly to the conceptual basis of protein folding. However, there is currently a critical limitation: the difficulty of quantitatively connecting theoretical predictions to experimental results [7]. Researchers have not been able to use an existing theory to consistently interpret their experimental results. Progress has been made towards eliminating this limitation by using improved simulations based on molecular dynamics and analytical calculations that use simple statistical-mechanical models. Since computer simulations of protein folding are time-consuming and can be performed only for one specific protein at a time, simple statistical-mechanical models of protein folding—which permit the direct analysis and fitting of experimental data—have been desired.

An Ising-like statistical-mechanical model of protein folding was first introduced by Wako & Saitô in 1978 [8,9]. This model was developed further by Gō & Abe [10], and Abe & Gō [11] demonstrated that it could accurately reproduce the results of computer simulations of two-dimensional lattice-protein folding. However, it was some time before a similar model could be applied to real proteins. This was first accomplished by Muñoz & Eaton in 1999 [12], who demonstrated that theoretically calculated folding rates could correlate well with experimentally observed ones.

In this paper, the Φ values for every residue of a specific protein calculated using a simple statistical-mechanical model are discussed and compared with the corresponding experimentally observed Φ values. Despite the simplicity of the statistical-mechanical model, the experimental Φ values were reasonably reproduced for many proteins, but not for others. Whereas the successful cases supported the assumptions of the model, the failures revealed the limitations of those assumptions. We will discuss protein-folding mechanisms while examining specific examples where the model succeeded and where it failed.

Φ-value analysis was introduced by Fersht and his colleagues [13,14] to provide information regarding the transition-state structures along the folding pathway of a protein by measuring the effects of amino acid mutations on folding kinetics. The Φ value is calculated as ΔΔG‡-D/ΔΔGN-D for a two-state-folding protein, where ΔG‡-D and ΔGN-D are the free energy (FE) differences between the unfolded and transition states and between the unfolded and folded states, respectively, and ΔΔ denotes the changes in these FE differences brought about by a point mutation at a specific amino acid residue. A Φ value of 1 indicates that all of the mutated residue interactions are formed in the transition state whereas a Φ value of 0 means that the residue is not involved in stabilizing the transition state. Intermediate Φ values indicate that the interactions are partially formed or that there are two populations with mostly unfolded and mostly folded states. However, because the relationship between the actual Φ value and the extent of protein structural formation is not necessarily linear, the interpretation of intermediate Φ values is still controversial [15].

Φ values have been investigated experimentally for many two-state-folding proteins (see Table 1). Φ values obtained in such experiments have provided valuable information for refining protein-folding models such as the nucleation-condensation model and the framework model [16].

Table 1
Proteins examined in this study

Φ values have also been studied from a theoretical perspective. Since folding simulations can provide information about the folding pathway of the protein, the transition-state structures in these simulations are associated with the experimentally observed Φ values [1719]. More directly, Φ values have been calculated based on the FE profiles obtained from statistical-mechanical models of protein folding [12,2027]. Although various models have been proposed, their concepts are essentially similar and assume that each residue is in one of two states: native (folded) or random-coil (unfolded). It is also assumed that one or more contiguous segments in the polypeptide chain consists of residues in the native state, and that only individual amino acids in a pair within a native-structure polypeptide segment and in contact in the native structure interact with each other (this is the so-called Gō model). Zero interactions are assumed to occur in segments consisting of residues still in the random-coil state. In these studies, the calculated Φ values generally showed a reasonable agreement with the experimental Φ values.

In this paper, we report the results of Φ-value calculations made using a previously developed, alternative, simple statistical-mechanical model of protein folding [2830]. Although this model was developed earlier than those mentioned above, various studies have recently adopted this model or others that are similar but extended, to study the protein-folding problem [25,27,3138]. We also determined the Φ values for lattice proteins using this alternative model with particular focus on the dependence of lattice proteins on amino acid sequence and folding topology [28,29]. In line with these studies, we have calculated the Φ values of real proteins in the present study. For these calculations, we selected 27 proteins (each with 54 to 128 amino acid residues in length) with known experimentally derived Φ values.

In the next section, we briefly describe a statistical-mechanical model of protein folding that we developed. The calculated Φ values are shown and discussed in Results and Discussion along with the FE profiles, contact-order (CO) profiles, and sensitivities to the parameters used in the Φ-value calculation. In this study, we investigated protein folding from various perspectives rather than assessing the specific statistical-mechanical model used here, which we have already done in a previous paper [30].

Materials and Methods

Statistical-mechanical model of protein folding

The simple statistical-mechanical model of protein folding and unfolding used in this study was constructed with the intention of introducing the following picture of protein folding [811,2830]. A protein folds in a stepwise manner along the polypeptide chain; this assumption conforms to the framework model. In this scheme, the first stage of folding includes short-range interactions that work dominantly to form small, native-like, or secondary structures such as the α-helix, β-strand, and hydrogen-bonded turn. Next, these structures grow gradually through medium-range interactions. Finally, these substructures coalesce into the native structure via long-range interactions.

It should be noted that our model is not exactly the same as the framework model. Whereas the framework model usually assumes a highly defined folding pathway in which specific local structures (e.g., secondary structures, super-secondary structures, domains) are formed in a hierarchical way, our model is more probabilistic, i.e., any local structures can form in any order according to their statistical weights, as described below. When we say that our model is based on the framework model, we refer to the important feature common to both the framework model and our own: longer-range interactions are not allowed to form without forming short-range interactions first.

The statistical-mechanical model of the folding and unfolding of a protein consisting of n residues in the above-mentioned picture is formulated as follows [2830] (see Supplementary Text S1 for details).

  1. Each amino acid residue is assumed to be in either a native (folded) or random-coil (unfolded) state.
  2. The conformation of a protein at any stage in the folding process is represented by a sequence of two types of regions of various sizes, namely a “local structure” (or “island” in Wako & Saitô [8,9]) and a random-coil region, arranged alternately along the chain. The term “local structure” is used with a specific meaning in this model. A local structure and a random-coil region are defined as continuous regions in which all amino acid residues are in the native state and random-coil state, respectively. It should be noted that the number of local structures is not restricted to one in this model; this is important because some models have restricted the number of local structures formed to only one.
  3. The key assumption of this statistical-mechanical model is that only Gō-type native interactions between amino acid residues within a local structure are considered [39]. The other interactions, such as those between the residues in different local structures and those within a random-coil region, are neglected.
  4. For the FE within a random-coil region (where no interaction between the residues exists), it is assumed that only the chain entropy, which depends on the number of residues, contributes to the partition function. The random-coil state is the reference state; i.e., its statistical weight is set to unity.
  5. The minimum size of a local structure is four consecutive residues. An amino acid residue i is considered to be in the native state if a region of four consecutive amino acid residues, (i–1) to (i+2), adopts the same conformation as the native structure (i=2, 3, ..., n–2). Otherwise, the amino acid residue is considered to be in the random-coil state. In other words, it is assumed that no interactions occur between amino acid residue i and amino acid residues i±1 and i±2.

The partition function for this statistical-mechanical model is given by the following recurrent relationship:

Z1,j=Z1,j-1+m=1j-3f(m,j)-1exp {-βE(m,j)}Z1,m+1,(j=4,5,....,n-1,n)Z1,1=0,Z1,2=Z1,3-1,
(1)

where β=1/kBT, kB is the Boltzmann constant, T is the absolute temperature, and Z1,j is the auxiliary partition function of a hypothetical protein molecule consisting of amino acid residues 1 to j. By definition, the partition function of the entire protein molecule is Z1,n ([equivalent]Z(T)).

The conformational energy of a local structure consisting of amino acid residues m to j is given as

E(m,j)=mk,ljU(ξk,ξl)Γk,l,
(2)

where U(ξk, ξl) is the interaction energy between amino acid residues ξk and ξl. The contact matrix Γk,l=1 if amino acid residues k and l are in contact with each other in the native conformation, and Γk,l=0 otherwise. Because the interactions between neighboring residues and residues separated by one residue are not considered, Γk,l=0 for |kl|≤2. U(ξk, ξl) depends on amino acid types ξk and ξl. Γk,l, i.e., whether the residue pair is in contact or not depends on the native structure. Consequently, E(m, j) depends on the amino acid sequence and the native structure. In other words, the amino acid sequence and the native structure are considered through E(m, j) in this model.

Function f(m, j) in Eq. (1) corresponds to the number of possible conformations of a segment between the amino acid residues m and j in the random-coil state; thus, kB ln f(m, j) is the chain entropy of the segment in the random-coil state, and −kB ln f(m, j) is the entropy loss of the segment when it forms the local structure.

Following our previous paper [30], we assigned a single value epsilon<0 to U(ξk, ξl), independent of the amino acid type, ξk or ξl (homogeneous contact-energy approximation). We set epsilon=−0.10 for all calculations in this study. For f(m, j), because we did not have any established values representing a real protein, we tentatively used the same formulation values that had been used for the lattice protein in the previous paper, i.e., f(m, j)=1.4084×(4.750)jm–2 or ln f(m, j)=(jm–2) B+ 0.3425, where B=ln 4.750=1.5581. In addition, we examined another B value, i.e., B=1.3 for comparison in this study.

A contact matrix Γk,l is defined as follows (as in the previous paper [30]): if a distance of at least one atom pair in two residues is shorter than a given cutoff distance Dc, this residue pair is considered to be in contact. We performed Φ-value calculations with four Dc values, 4.0, 5.0, 5.5, and 6.0 Å, in our previous study; and in that study, we found that the results were sensitive to the Dc value for some proteins but not for others. In the current study, however, two cases, Dc=4.2 and Dc=5.5 Å, were examined to study Dc dependence in the Φ-value calculations.

Free-energy profile

The energy (enthalpy) of the system, Eh, is expressed by the integer h in units of −0.01, i.e., Eh=hepsilon0 and epsilon0=−0.01, for computational convenience. Eventually, the partition function Z is given as a polynomial with two variables, t and u, as a function of the temperature T [29,30,40].

Z(T)=ηhΩ(η,h)tηuh=η=0n-3W(η,T)tη,
(3)

where

W(η,T)=hΩ(η,h)uh,
(4)
u=exp(-βɛ0),
(5)

and t is a dummy parameter introduced to count the number of amino acid residues in the native state η, and is set to unity in the last result. η runs from 0 to n–3. The coefficient Ω(η, h) for given values of η and h can be calculated using the recurrent equation, Eq. (1). W(η, T)tη is the sum of the statistical weights over all states with the given number of amino acid residues in the native state η and at a temperature T.

We define the FE for a given η from Eq. (4):

F(η,T)=-kBTln W(η,T).
(6)

Eq. (6) is used to calculate the FE profiles of the proteins with η as the folding reaction coordinate. The enthalpy and entropy at a given temperature T can also be calculated (see Supplementary Text S1), but they are not discussed in this paper.

Folding and unfolding rates

The kinetics of the folding and unfolding processes of proteins (such as their folding and unfolding rates) were formulated as a motion along a one-dimensional FE profile, with the number of amino acid residues in the native state as a reaction coordinate, as established by Muñoz & Eaton [12] and extensively examined by Henry & Eaton [41]. Because we had the FE profile F(η, T) of Eq. (6), we applied the method of Muñoz & Eaton. According to this method, using a simple approach that involved solving a system of differential equations describing reversible hopping between adjacent discrete values of reaction coordinates (η and η+1 in this study), the characteristic relaxation rate can be given as follows:

1kτ0<η>eq-<η(t)><η>eq-<η(0)>dt={<η>eq-<η>0}-1j=0n-41peq(j)sj,j+1i=j+1n-3{peq(i)-p0(i)}η=0jpeq(η)(<η>eq-η).
(7)

Here, an equilibrium value of η (at temperature T),

<η>eq=η=0n-3ηpeq(η)
(8)

can be calculated using F(η, T), where peq(η) is the probability that a conformation has η amino acid residues in the native state:

peq(η)=Z-1exp {-F(η,T)/kBT}.
(9)

<η>eq and peq(η) are functions of T, but T is omitted here for clarity.

The relaxation rate k is estimated as the mean rate of relaxation of the average number of native residues to its equilibrium value from a starting point of two initial conditions: first with the entire population in the completely unfolded state, i.e., η=0, and second with the entire population in the native state, i.e., η=n–3. In this paper, the former relaxation rate k above 1/Tm and the latter k below 1/Tm are regarded as the folding and unfolding rates, kf and ku, respectively. A transition temperature, Tm, is defined as ln kf(Tm)= ln ku(Tm).

Φ value

We considered a single amino acid substitution for each residue of a given protein. For a single amino acid substitution at the kth residue (for example, if amino acid residue ξk in the wild-type protein is replaced by amino acid wk), we simply assumed that U(ξk, ξl) is transformed to U(wk, ξl) in Eq. (2). Thus, the Φ values of the kth residue were calculated for the two cases: one where U(wk, ξl)=epsilon+0.01 and the other where U(wk, ξl)=epsilon−0.01, hereafter referred to as the “plus perturbation” and the “minus perturbation,” respectively (recall that U(ξk, ξl)=epsilon for each residue pair in the wild-type protein). The responses to the substitutions were examined based on changes in the logarithmic folding and unfolding rates, ln kf and ln ku, respectively, determined in this study. The corresponding rate changes, Δln kf and Δln ku, were used to calculate the Φ value defined by Eq. (10):

Φ=ΔΔF-DΔΔFN-D=Δln kfΔln kf-Δln ku,
(10)

where Δln kf=ln kfmut(Tm)-ln kfwild(Tm) and Δln ku=ln kumut(Tm)-ln kuwild(Tm), because ln kfwild(Tm)=ln kuwild(Tm) and Δln kf-Δln ku=ln kfmut(Tm)-ln kumut(Tm). The superscripts “wild” and “mut” denote the wild-type and mutated proteins, respectively.

From the two Φ values calculated by the plus and minus perturbations, a Φ value for a specific residue is defined as follows. (i) A Φ value greater than 1.3 or smaller than −0.3 is considered “irregular” and is then regarded as having no defined Φ value. (ii) If 1.0<Φ<1.3, then Φ is set to 1.0, and if −0.3<Φ<0.0, then Φ is set to 0.0, because the Φ value is essentially defined as 0≤Φ≤1. Furthermore, (i) and (ii) are applied separately to the Φ values obtained in the plus and minus perturbations. (iii) The two Φ values are averaged. If only one Φ value is available, then it is defined as a Φ value of this residue. If neither of them is available, then the Φ value of this residue is classified as “not defined.”

Contact-order profile

To investigate the calculated Φ values, we generated a CO profile—a concept introduced in the previous paper [28]. CO has been used to characterize the complexity of the folding topology of a protein in relation to its folding kinetics [4246]. In line with this, the CO profile is defined as the cumulative number of native contacts ck=i=1kρi plotted against k, where ρi is the number of native contacts between two residues whose mutual distance along the polypeptide chain is i. The following relationship holds [28]:

k=1n-1ck=ncn-1-k=1n-1kρk.
(11)

The second term on the right-hand side, ∑k, corresponds to the area of the upper left region of the k vs. ck curve and is equal to the CO of a given protein (see Fig. 4 below). The CO profile is more informative than the CO, as described below.

Figure 4
Contact-order profiles of 27 proteins. They were classified into three types, A, B, and C. The dashed curve is y=− (1−x)2+1, where [x] and [y] are [the distance between two residues along the chain] and [the cumulative number of residue ...

Proteins examined

Using the simple statistical-mechanical model, Φ values were calculated for 27 proteins for which Φ-value analysis had already been performed experimentally. The Protein Data Bank (PDB) entry codes, sizes, and fold classes of the proteins that we investigated are listed in Table 1. This protein set includes some proteins that belong to the same superfamily, such as 1SRM, 1SHG, and 1FYN, all of which are in the SH3 domain-containing superfamily. However, since the available data were limited, we did not omit data from homologous proteins; some would consider the data from such proteins to be redundant, but from a practical standpoint, their omission would make performing a statistical analysis far more difficult. Furthermore, comparisons between homologous proteins were expected to actually provide more useful information about the nature of protein folding. Accordingly, we used all the available data, disregarding perceptions of protein redundancy, in the following analyses. It is necessary, however, to remember this point in the statistical discussion below.

For each protein studied, the PDB entry code given in Table 1 is used as the protein name hereinafter, for convenience.

Results and Discussion

In this study, we were interested in characterizing protein folding from various perspectives. As is well known, protein size and fold class are primary characteristics of a protein that determine its folding dynamics. We can also characterize a protein by its Φ-value profile, FE profile, and CO profile. In addition, a Φ-value’s sensitivity to the parameters used in the Φ-value calculation can be examined. The Φ-value calculation was performed for four sets of parameters, hereinafter referred to as D42_13, D42_16, D55_13, and D55_16, each of which denote parameter values (Dc, B) = (4.2, 1.3), (4.2, 1.5581), (5.5, 1.3), and (5.5, 1.5581), respectively (see Materials and Methods). The following data pairs were used as described here: (D42_13, D42_16) and (D55_13, D55_16) were used to investigate the sensitivity to the chain entropy parameter B, and (D42_13, D55_13) and (D42_16, D55_16) were used to investigate the sensitivity to the cutoff distance Dc.

The results are summarized in Figure 1. We will discuss them below.

Figure 1
Classification and characterization of proteins. The proteins were classified according to their correlation coefficient (CC) values between the calculated and experimentally measured Φ values, as well as their sensitivity to plus/minus perturbation, ...

Φ-value profile

Φ-values are one of the few types of experimental data that can both A) give us information on the protein folding process, and B) be directly compared with theoretically calculated properties. In Figure 2, the Φ-value profiles calculated for every residue using the abovementioned statistical-mechanical model are shown with the corresponding experimentally observed Φ values for four example proteins: 1TEN, 1SHG, 3CI2, and 1AYE. Experimentally observed Φ values were available for some, but not all, of the residues. The Φ-value profiles for the other proteins are given in Supplementary Figure S1.

Figure 2
Φ-value profiles of proteins, (A) 1TEN, (B) 1SHG, (C) 3CI2, and (D) 1AYE. The Φ-value profiles of the other proteins are shown in Supplementary Figure S1. [1], [2], [3], and [4] indicate the parameter sets D42_13, D42_16, D55_13, and D55_16, ...

The theoretically calculated Φ-value profiles were assessed by their correlation coefficient (CC) with the experimentally observed Φ values. The results are shown in Table 2. The results from Garbuzynskiy et al. [73] are also shown, when available, for comparison. For Table 2, if the Φ-value experiments were carried out in more than one condition, we show only the case with the best CC. All the results are given in Supplementary Table S1.

Table 2
Correlation between experimentally observed and theoretically calculated Φ values

In our previous study [30], we examined CC values to assess our statistical-mechanical model of protein folding. In this study, however, we shift our perspective to the characterization of protein folding using CC values, because they may imply whether or not a specific protein folds in the scheme assumed in our statistical-mechanical model. We consider here that differences in CC values reflect variation in active protein folding mechanisms. In Figure 1, proteins are classified into three groups according to the best CC value for each protein: CC≥0.6, 0.3≤CC<0.6, and CC<0.3. We assess the groups as follows: the “CC≥0.6” group is a case where our model successfully reproduced the experimental Φ values, the “CC<0.3” group is a case where our model failed, and the “0.3≤CC<0.6” group is an intermediate case—where the experimental Φ values of some regions in a specific protein were successfully reproduced by our model, but others were not. In other words, the “CC≥0.6” group and the “CC<0.3” group are rough indicators of the two major protein-folding schemes: the framework model and the nucleation-condensation model, respectively.

As mentioned in the Introduction, the concepts of the framework model and the nucleation-condensation model somehow remain ambiguous. In this paper, we confine our attention to their difference relating to the order in which short-, medium-, and long-range interactions are formed; that is, whereas longer-range interactions are assumed never to be formed before intervening shorter-range interactions in the framework model, long-range interactions can form before shorter-range interactions in the nucleation-condensation model.

(1) CC≥0.6. Out of 27 proteins examined, ten proteins were classified into this category. Two examples are shown in Figures 2A and B. See Supplementary Figure S1 for the remaining eight proteins.

Figure 2A for protein 1TEN shows that, although the Φ-value profile shows irregular trends, the experimentally observed Φ values were accurately reproduced. As described below, 1TEN is highly sensitive to plus/minus perturbations (it is also sensitive to other parameters used in the Φ-value calculation). The minus perturbation generated irregular Φ values for many residues. Rather than using the average of the two Φ values generated by the plus and minus perturbations, the Φ values generated by the plus perturbation were assigned to the Φ values of such residues. It is interesting that despite this potential confounding issue, 1TEN was found to belong to the high CC group.

Figure 2B for protein 1SHG is another example of CC≥0.6. Among the studied proteins, 1SHG showed high sensitivity to the parameters Dc and B. Whereas the CC was very high for Dc=5.5 Å, it was very low for Dc=4.2 Å. As for B, the CC for D42_13 was less than that for D42_16, whereas no difference was found between D55_13 and D55_16. The Φ-value profiles for the four parameter-sets differed significantly. However, because the number of experimentally observed Φ values was small relative to the number of residues, the statistical significance of the CC values was low. For example, although the calculated Φ values of the residues in the long N-terminal loop differed considerably depending on the parameter sets, such differences could not be assessed because of the lack of corresponding experimental data.

(2) 0.3≤CC<0.6. This category contains ten proteins. Only one example is given in Figure 2C for 3CI2. Whereas the Φ values of the residues in the N- and C-terminal regions were accurately reproduced, those in the middle of the polypeptide chain showed significant difference from the experimental Φ values. The statistical-mechanical model of a linear chain polymer has a tendency, in general, to have lower Φ values at both of the terminals because of a boundary effect, and to have higher Φ values in the middle of the chain because of strong cooperativity (owing to more interactions being present in the middle than at the terminals). Similar situations were observed in other proteins belonging to this category, such as 2ACY, 1BF4, and 1TIU (see Supplementary Fig. S1).

Figure 2C for protein 3CI2 also demonstrates a case where there was a variety of experimentally observed Φ values for a specific residue.

(3) CC<0.3. This category is composed of the remaining seven proteins. These cases, which did not successfully reproduce experimentally determined Φ values, served to highlight the differences between the framework model and the nucleation-condensation model. Whereas our model is based on the framework model, the experiments suggested that the nucleation-condensation model is active for this protein.

Only one example is shown in Figure 2D for protein 1AYE. This protein has the secondary-structure sequence β1-α1-β2-β3-α2-β4. The experimental results suggested that while the folding nucleus is made by the packing of β1 and β3-α2 to form a β-sheet, β1-β3, in the transition state, the remaining parts, α1-β2 and β4, are completely unfolded. This means that the two separated regions, β1 and β3-α2, interact first before the formation of α1-β2 between them. Such a situation is beyond the scope of the assumptions of our model.

In other examples of CC<0.3 (Fig. 1; see also Supplementary Fig. S1), protein 2ABD has the secondary-structure sequence α1-α2-α3-α4. The experimental results suggested that the interaction between the N- and C-terminal α-helices, α1 and α4, initiates the folding process before the formation of the central part, α2-α3. Contradicting the experimental results, the calculated Φ values indicated that α2-α3 was formed but α1 and α4 were unstructured in the transition state. As a result, the CC for 2ABD was largely negative. It should be noted, however, that the experimental Φ values around residues 20–40 varied widely, and the authors of the study that generated those experimental values [58] pointed out that some of the data for these residues were not well suited for Φ-value analysis because the mutations probably significantly changed the folding pathway of the wild-type protein.

The 1BTB protein, which is another example from the “CC<0.3” group, has the secondary-structure sequence β1-α1-α2-β2-α3-α4-β3. The experimental results suggested that the β-sheet composed of β1 and β3 in the N- and C-terminals, respectively, is formed in the early stage of folding. By contrast, the Φ-value calculations indicated that whereas the C-terminal is structured in the early stage of folding, the N-terminal region is not.

In the case of protein 2PTL, which has the secondary-structure sequence β1-β2-α1-β3-β4, there may be another reason for observing CC<0.3. Whereas the N-terminal β-sheet of β1 and β2 was well predicted by the Φ-value calculations, the unstructured α-helix in the middle of the polypeptide chain could not be reproduced very well. As discussed in our previous paper [30] and also as pointed out above, our model has a strong tendency to describe the early formation of an α-helix in the middle part of the polypeptide chain, owing to the dominance of short-range interactions in an α-helix and the avoidance of boundary effects.

We should be careful, however, when using CC values for the above discussions, because CC value may not necessarily be a reliable characteristic for assessing calculated Φ values. Whereas these calculations can generate a Φ value for any residue, the available experimental Φ values were limited. It is not clear how to assess irregular Φ values or residues for which Φ values cannot be obtained experimentally. For example, calculating the CC value after removing two Φ values of the C-terminal region from the 19 experimentally observed Φ values for protein 1TTF (see Supplementary Fig. S1) would result in changing the CC value from −0.09 to 0.28 in the D42_16 case. Moreover, Φ values for a specific protein frequently differ significantly, depending on the experimental conditions (see Supplementary Table S1).

FE profile

Since Φ-value profiles are calculated based on FE profiles, it is expected that FE profile characterization should be associated with protein-folding characterization. In Figure 3, the FE profiles of three proteins, 3CI2, 1RIS, and 1URN are shown for four parameter sets, D42_13, D42_16, D55_13, and D55_16 (FE profiles for the other proteins are given in Supplementary Fig. S2). Since the profiles shifted upward or downward depending on Dc, i.e., the number of interactions, we focused on the shapes of the profiles. It was found that the profile shapes were essentially similar if their chain entropy parameter B values were equal. A few exceptions were observed, such as 1CSP and 1N88. Conversely, a change in B value causes a change in shape of the FE profile. The major differences between the shapes of the FE profiles appeared in the region from the transition state to the folded state, i.e., from the central peak to the valley on the left side of the FE profile. As far as the unfolded region is concerned, the profile shape is conserved, whereas the distance from the local minimum to the peak at the transition state can change.

Figure 3
Free-energy profiles of proteins, (A) 3CI2, (B) 1RIS, and (C) 1URN. The vertical axis on the left is used for D42_13 and D42_16, while that on the right is used for D55_13 and D55_16. The horizontal axis traces the progress of the protein folding reaction ...

It was also found that, while the FE profiles of some proteins have a single peak at the transition state, others have more than one peak (usually double peaks) or a plateau. The latter case means that a transition state cannot be assigned to a single folding reaction coordinate, but should be assigned to a range of coordinates. The secondary peak in a double-peaked profile is sometimes ambiguous, making it difficult to confidently judge its significance. We classified FE profiles into three types: single-peaked, double-peaked, and intermediate [denoted by (S), ((D)), and (D), respectively, in Fig. 1]. This classification could not be defined in a rigorous manner, but was carried out by visual inspection of the profiles. The FE profiles of 3CI2, 1RIS, and 1URN shown in Figure 3 were classified as single-peaked, intermediate, and double-peaked, respectively.

1CSP and 1N88 were exceptional cases, as mentioned above. Interestingly, while the FE profile of 1CSP with the parameter set D42_16 was single-peaked, the FE profiles calculated with the other parameter sets were double-peaked. This difference is reflected in the CC values: whereas the CC value was the lowest for D42_16, that for D55_16 was the highest. Accordingly, the FE profile of 1CSP was classified as double-peaked. For the FE profiles of 1N88, the shapes of the FE profiles for D42_16 and D55_16 differed considerably. While the former profile had a plateau, the latter was double-peaked. The CC values for the former profile were slightly better than those for the latter one.

According to Figure 1, whereas in the “0.3≤CC<0.6” and “CC≥0.6” groups, more proteins had a single-peaked FE profile than a double-peaked FE profile (10 vs. 8), this was reversed in the “CC<0.3” group (2 vs. 4). However, this difference was not statistically significant. Consequently, whether the FE profile was single-peaked or double-peaked was not a crucial factor for the CC values. The Φ values were essentially defined based on the single-peaked FE profiles. However, since Φ values can be calculated in the fashion described above, regardless of the double-peaked FE profile at the transition state, it may be necessary to reconsider Φ-value calculation for double-peaked FE profiles. The theory of multidimensional representation of an FE profile of protein folding that was proposed by Itoh & Sasai [27] would be highly relevant to any further considerations of this issue.

At the outset of this study, we expected to obtain useful information about protein folding by determining the FE profiles of the evaluated proteins. In our previous paper [30], we demonstrated a close relationship between a protein’s FE profile, its folding rate, and its fold class; however, in this study we realized that it is necessary to consider the fine structural changes of the FE profile to be able to predict changes in Φ values. For example, the FE profiles of 1ENH and 1IDY (Supplementary Fig. S2) seemed to be almost identical. Consequently, their calculated Φ-value profiles were grossly similar, but differed significantly in their fine details. FE profile changes that were induced by imposing perturbations were responsible for the observed changes in Φ values. However, it was hard to find such changes in the FE profiles, because such changes were not only small, but were also distributed over all regions of the profile. Similar situations existed for proteins in the SH3 domain-containing superfamily, namely 1SRM, 1SHG, and 1FYN (see Supplementary Figs. S1 and S2).

CO profile

CO is one property of proteins that is well-known to be correlated with folding kinetics. The CO profile, which was discussed above, is an extension of the CO; the cumulative number of native contacts ck, is plotted against residue-residue distances along a chain, k. This profile provides information about the respective contribution ratios of the different interaction types (short-, medium-, and long-range) to the CO. Figure 4 shows the CO profiles of 27 proteins, in which ck and k were normalized between zero and one to adjust their protein sizes. It was found that the CO profiles could be classified into three types, referred to as Types A, B, and C (Fig. 4A, B, and C, respectively). In this classification, the reference curve (a dashed line in Fig. 4), ck=− (k–1)2+1 (with normalized ck and k), was used. This reference curve can be utilized if the number of native contacts of a residue-residue distance along the chain i decreases in inverse proportion to i, i.e., ρini, where n is the number of residues of a given protein.

If a whole CO-profile curve was located above the reference curve, the protein was classified as Type A. If a CO-profile curve for small k was located above the reference curve, but for large k, it dropped below the reference curve, the protein was classified as either Type B or Type C. If the CO-profile curve cut across the reference curve around middle k and was close to the reference curve throughout, the protein was classified as Type B; if it cut across the reference curve around large k, the protein was classified as Type C. Short-range interactions were dominant in Type A proteins. The CO-profile curves of Type B proteins contained, as the reference curve, the short-, medium-, and long-range interactions that decreased in inverse proportion to residue-residue distance. Type C proteins were characterized by very long-range interactions such as interactions between N- and C-terminal regions. Since the upper left area of the curve corresponds to CO as pointed out by Eq. (11), the CO values of Type A proteins were the smallest and those of Type C were the largest.

Figure 1 shows that the CC values of Type A proteins were greater than those of Type C proteins (while the ratio of Type A to Type C was 9:5 in the “CC≥0.6” and “0.3≤CC<0.6” groups, it was 1:4 in the “CC<0.3” group). The CC values of the Type B proteins were of intermediate magnitude. Since the Type A proteins were rich in short-range interactions, it would be natural to expect that such proteins would be more likely to adopt protein-folding mechanisms consistent with the framework model. Another remarkable finding from this study was that Type A proteins were sensitive to the parameters used in the Φ-value calculation. The most plausible reason for this is that smaller proteins, which are more sensitive to these parameters, were more likely to be classified into Type A than into other groups. This finding indicates that CO-profile type, as a method of categorizing proteins, is probably not directly related to parameter sensitivity— although this cannot be completely ruled out.

The Type C proteins in the “CC<0.3” group were interesting because the long-range interactions, such as those between the N- and C-terminal regions, were formed at an early stage of the folding process without being preceded by the formation of secondary structures in the middle of the chain; that is, their protein-folding proceeded in accordance with the nucleation-condensation model. Unfortunately, the model studied in this paper cannot take into account the nucleation-condensation model. This is not because we deny that the nucleation-condensation mechanism can be active in protein folding, but simply because it is very difficult to construct a solvable statistical-mechanical model that can simulate this mechanism. However, we are interested in examining how proteins can fold in accordance with the nucleation-condensation model in future work.

CO has previously been discussed in terms of the folding rate of a protein [4246]. In our previous paper [30], we demonstrated that the folding rates calculated by the present model were also well correlated with the experimentally determined folding rates of 72 proteins (with CC=0.81). Because Φ values were defined based on folding and unfolding rates, and such rates are related to the CO, the Φ-value profile should be related to the CO or the CO profile; and we did indeed find some relationships between them. However, it was also apparent that these are not the only factors that determine Φ values and protein folding.

Sensitivity to parameters

A single set of parameter values to be used in our model has not yet been established. In fact, Table 2 shows that the best CC values for individual proteins (see underlined figures) were obtained by using different parameter sets, i.e., D42_13, D42_16, D55_13, or D55_16. In other words, Φ-value calculations are sensitive to the parameters chosen. The cutoff distance Dc that defines a residue-residue contact is one of the most important parameters, because it defines the 3D structure of a specific protein in the model. Parameter B, chain entropy, is also important, because protein-folding transitions occur in the balance between enthalpy gain that occurs with residue-residue interactions and entropy loss—particularly that associated with chain entropy. In addition, we do not have a sufficient number of experimentally determined Φ values. Consequently, it is hard to determine the best parameter set for our model at present. We consider sensitivity of proteins to such parameters as being one of the intrinsic characteristics of protein folding.

Parameter sensitivity was assessed using the root-mean-square difference (RMSD) between two Φ-value profiles that were calculated with different parameter sets. For analysis of plus/minus perturbations, Φ-value profiles for plus and minus perturbations were compared. For analysis of the distance cutoff Dc, Φ-value profiles of [D42_13 and D55_13] and [D42_16 and D55_16] were compared. For analysis of chain entropy parameter B, Φ-value profiles of [D42_13 and D42_16] and [D55_13 and D55_16] were compared. The results are shown in Table 3. The relatively larger RMSD values are indicated by an underline.

Table 3
Root-mean-square difference between Φ-value profiles calculated with different parameter setsa)

Half of the proteins were sensitive to Dc. In particular, small proteins (smaller than 63 residues) such as 1ENH were sensitive to Dc. A natural expectation would be that changes in the residue-residue contact matrix (Γij) caused by a change in Dc would have a significant effect on smaller proteins (see Eq. (2)), and that relatively larger proteins, two examples being “all-β” proteins (such as 1CSP, 1TIU, and 1TEN) and “α+β” proteins (such as 1URN, 1N88, 1FKB, and 2VIL), would be sensitive to Dc. However, we found that many larger proteins and “α+β” proteins were classified into the “not sensitive” group.

Most proteins that were sensitive to chain entropy parameter B were also sensitive to Dc. Smaller proteins such as 1ENH, 1IDY, 1SRM, 1SHG, and 1FYN were easily influenced by changes in the parameters. It is understandable that “all-β” proteins such as 1SRM, 1TEN, 1SHG, 1FYN, and 1CSP would be sensitive to B, because chain entropy plays an important role in the formation of β-sheets. In the formation of a β-sheet, a larger loss of chain entropy must be overcome than in the formation of an α-helix.

It is possible to mutate any protein. Mutations may change a protein’s 3D structure to some extent, and consequently may change the residue-residue contact matrix. A set of proteins with similar 3D structures can be regarded as an example of the outcome of such mutations. As discussed above, 1ENH and 1IDY have similar 3D structures, and their overall Φ-value profiles resembled each other. However, they differed significantly in some specific regions. In this study, we show that their calculated Φ-value profiles reproduce these experimentally observed differences, at least to some extent. We also show that the same is true for the SH3 domain-containing superfamily proteins, 1SRM, 1SHG, and 1FYN.

Beyond this, it is interesting that Φ-value profiles can provide information that is helpful for identifying residues that are sensitive to the model parameters. For example, Φ-value profiles allowed us to determine that residues in the N-terminal regions of 1SRM, 1SHG, and 1FYN were sensitive to model parameters (Supplementary Fig. S1), as were almost all residues in 1SHG. In the Φ-value experiments, the Φ values obtained for some specific residues sometimes differed significantly, depending on substituted amino acid types or on experimental conditions (see for example Fig. 2C). These results imply that mutations of certain residues in each protein have significant effects on protein folding. The sensitivity of specific residues to model parameters can reveal such characteristics of a protein.

Plus/minus perturbation

In this paper, Φ values were calculated by imposing perturbations of protein structure via single amino acid substitutions. These perturbations changed the predicted interactions of a specific residue with the other residues of the protein. The two perturbation types “plus” and “minus” were defined according to whether the interactions were weakened or strengthened by them, respectively. We hypothesized that when the degree of perturbation is small, the difference in the effect of the plus and minus perturbations should be small; and during the current study, this presumption held true for most proteins. However, it was found that the two types of perturbation produced significantly different results for some proteins: i.e., 1URN, 1TEN, 1SHG, 2VIL, and 1FKB (Fig. 1 and Table 2). Note that these are “all-β” or “α+β” proteins. In these proteins, some residues showed large differences between Φ values calculated in the presence of plus perturbations vs. minus perturbations.

In some cases, either of the two perturbations gave an irregular Φ value, i.e., Φ much greater than one or much smaller than zero. Interestingly, when an irregular Φ value was obtained, it was almost always in the presence of a minus perturbation. In particular, the RMSDs between Φ-value profiles were larger in 1TEN and 1FKB than in other proteins (Fig. 5). Irregular Φ values were obtained for many more residues in these two proteins than in the other proteins. Curiously, the CC values for these two proteins differed considerably. Whereas the CC value for 1TEN was very high, that for 1FKB was very low (see Table 2).

Figure 5
Φ-value profiles of plus/minus perturbations. (A) 1TEN, D42_16 and (B) 1FKB, D55_13. See also the caption of Figure 2 for the legend. “+” and “−” are Φ-value profiles obtained by plus and minus perturbations, ...

In the experimental situation, the substitution of an amino acid residue with a larger side chain is one of the strategies employed to introduce stronger interactions than exist in the wild-type protein. However, if such a residue disrupts the proper folding of the protein owing to its side chain size, the Φ value may be undeterminable or irregular. In the theoretical model, however, the plus/minus perturbations only change the ensemble of conformations at every step along the folding reaction coordinate. Our results suggest that irregular Φ values can be obtained without disrupting proper folding.

Although we examined the FE profiles of 1TEN and 1FKB, we could not find remarkable differences between their plus/minus perturbations. In general, an irregular Φ value is obtained if Δln kf −Δln ku, which is the denominator of Eq. (10), is too small. One possible explanation is that subtle changes occur throughout the FE profiles, and occasionally a change in the folding rate Δln kf is very close to a change in the unfolding rate Δln ku. However, it should be noted that 1TEN and 1FKB are “all-β” proteins and have complex folding topologies of β-strands. A protein’s folding topology is regarded as complex if the folding process cannot easily be represented as a step-by-step formation from short-, to medium-, and then to long-range interactions. Although the CO is introduced to explain such complexity, more details regarding folding topology may be necessary to explain the remarkable differences observed between the plus/minus perturbations for these proteins. Interestingly, in the FE profile of 1FKB, the difference between the local minimum in the unfolding region and the local maximum in the transition region was considerably larger than that for the other proteins (see Supplementary Fig. S2). Obviously, this reflects the complex topology of folding that exists for 1FKB, which affects its irregular behavior in the Φ-value calculation.

Circular permutants

The Φ-value analysis experiment involving circular permutants of the ribosomal protein S6 [74] (PDB ID: 1RIS) provided an interesting example to discuss Φ-value calculation, particularly regarding the important role of the connectivity of the polypeptide chain. A circular permutant is created by dividing a given protein into two fragments, i.e., N- and C-terminal fragments, by an incision; then the N-terminal fragment is connected after the C-terminal fragment. In this analysis, circular permutants were assumed to have the same 3D structure as the wild-type protein, despite the differences in their connectivities.

1RIS has the secondary-structure sequence β1-α1-β2-β3-α2-β4. Five circular permutants, P1314, P3334, P5455, P6869, and P81–82, were constructed, where the superscripts indicate the position of the incision. All possible incisions between the secondary-structure elements of 1RIS were examined [74].

Φ values for the circular permutants were calculated by means of the same inter-residue contact information that was used for the wild-type 1RIS, but with differences in the respective connectivity, e.g., 14–97 and 1–13 for P1314. Figure 6 shows the Φ-value profiles for the wild-type 1RIS and its circular permutant, P1314. Φ-value profiles for the other circular permutants are given in Supplementary Figure S3. CC values for the wild-type proteins and circular permutants are shown in Table 4.

Figure 6
Φ-value profiles of the wild-type 1RIS and its circular permutant, P1314. See also the caption of Figure 2 for the legend. The cross symbol in (B) indicates the location of the scission point of the circular permutant. The Φ-value ...
Table 4
Correlation between experimentally observed and theoretically calculated Φ values for wild-type 1RIS and its circular permutantsa)

The two Φ-value profiles shown in Figure 6 are useful for illustrating the significance of chain connectivity [30]. The wild-type 1RIS has the secondary-structure sequence β1-α1-β2-β3-α2-β4; it forms β-sheets with long-range interactions between β1 and β3, and between β1 and β4. When the secondary-structure sequence in the circular permutant P1314 changes to α1-β2-β3-α2-β4-β1, the interactions between β1 and β4 change to short- or medium-range ones, and the C-terminal region forms a compact domain, β3-α2-β4-β1, that contains β1 and β3. While the circular permutants P3334 and P5455 behave similarly to P1314, P6869 and P81–82 do not. Accordingly, the Φ-value profiles of P1314, P3334, and P5455 were different from those of P6869, P81–82, and the wild-type protein, as shown in Figure 6 and Supplementary Figure S3. Owing to the reduction of the long-range interactions in the first group of 1RIS circular permutants, their CC values became markedly better than those of the second group of 1RIS circular permutants. In other words, whereas the former circular permutants underwent folding according to the framework model, the latter ones underwent folding according to the nucleation-condensation model.

Recently, Inanami et al. proposed a novel method to calculate the folding pathway of a multidomain protein in an extended form of the model discussed here [37]. Their target protein was dihydrofolate reductase (DHFR), which has two domains: one comprising a continuous middle part of the polypeptide chain (ABD domain), and the other comprising discontinuous N- and C-terminal parts (DLD domain). According to the limitations of our model, the DLD domain cannot fold without ABD folding. However, Inanami et al. enhanced the model by introducing the “virtual loop-closure mechanism” to partition function in an exactly calculable form, which allowed them to successfully reproduce DHFR folding behavior. Just as the circular permutant strategy can resolve topological complexities of protein folding, the methodology of Inanami et al. seems to open up a new method of reproducing protein folding by the virtual-closure mechanism, especially when the protein folding occurs according to the nucleation-condensation model.

Conclusion

When planning this project, we expected to derive a unified view of protein folding by studying the folding process from various perspectives. We anticipated that the further characterization of proteins would provide a coherent explanation of the information presented in the table in Figure 1. All of the aspects examined in this paper should play a significant role in future studies of protein folding, and they are associated with one another. However, such connections are complex and exhibit highly nonlinear characteristics. It is known that the relationship between the Φ value and the extent of structural formation is not necessarily linear, and thus the interpretation of intermediate Φ values is still controversial. In addition, the number of available data from Φ-value analyses was too small for reliable statistical analyses at the time of this study. Consequently, it is hard to understand the meanings of the Φ values clearly at present. In this study, however, we have confirmed the importance of characterizing protein folding from various perspectives. Our findings have also highlighted that protein folding is highly variable and individual among different proteins. This should be considered when pursuing a unified theory of protein folding.

Significance

The Φ values were calculated for 27 proteins with a simple statistical-mechanical model of protein folding. We considered that the correlation between the calculated and experimentally observed Φ values should illuminate the protein-folding mechanism active for each protein, such as the framework model or the nucleation-condensation model. In addition, we investigated other properties associated with protein folding such as the free-energy profile, contact-order profile, and sensitivity to the parameters used in the Φ-value calculation of each protein. Through such analyses, we tried to characterize protein folding from various perspectives to derive a unified theory of protein folding.

Supplementary Information

Acknowledgements

This paper is respectfully dedicated to the late Professor Nobuhiko Saitô, who was a Ph.D. supervisor of H. W., and with whom the original statistical-mechanical model of protein folding, “island model”, was developed. This work was supported by JSPS Grant-in-Aid for Scientific Research (C) (Grant No. 16K00407).

Footnotes

Conflicts of Interest

The authors have declared that no competing interests exist.

Author Contributions

H. W. and H. A. contributed equally to this work. Both authors discussed the results and implications, and commented on the manuscript at all stages.

References

1. Anfinsen CB. The formation and stabilization of protein structure. Biochem J. 1972;128:737–749. [PubMed]
2. Tanaka S, Scheraga HA. Hypothesis about the mechanism of protein folding. Macromolecules. 1977;10:291–304. [PubMed]
3. Kim PS, Baldwin RL. Specific intermediates in the folding reactions of small proteins and the mechanism of protein folding. Annu Rev Biochem. 1982;51:459–489. [PubMed]
4. Ptitsyn OB. Protein folding: hypotheses and experiments. J Protein Chem. 1987;6:273–293.
5. Itzhaki LS, Otzen DE, Fersht AR. The structure of the transition state for folding of chymotrypsin inhibitor 2 analysed by protein engineering methods: Evidence for a nucleation-condensation mechanism for protein folding. J Mol Biol. 1995;254:260–288. [PubMed]
6. Fersht AR. Nucleation mechanisms in protein folding. Curr Opin Struct Biol. 1997;7:3–9. [PubMed]
7. Muñoz V. A simple theoretical model goes a long way in explaining complex behavior in protein folding. Proc Natl Acad Sci USA. 2014;111:15863–15864. [PubMed]
8. Wako H, Saitô N. Statistical mechanical theory of the protein conformation. I. General considerations and the application to homopolymers. J Phys Soc Jpn. 1978;44:1931–1938.
9. Wako H, Saitô N. Statistical mechanical theory of the protein conformation. II. Folding pathway for protein. J Phys Soc Jpn. 1978;44:1939–1945.
10. Gō N, Abe H. Noninteracting local-structure model of folding and unfolding transition in globular proteins. I. Formulation. Biopolymers. 1981;20:991–1011. [PubMed]
11. Abe H, Gō N. Noninteracting local-structure model of folding and unfolding transition in globular proteins. II. Application to two-dimensional lattice proteins. Biopolymers. 1981;20:1013–1031. [PubMed]
12. Muñoz V, Eaton WA. A simple model for calculating the kinetics of protein folding from three-dimensional structures. Proc Natl Acad Sci USA. 1999;96:11311–11316. [PubMed]
13. Matouschek A, Kellis JT, Serrano L, Fersht AR. Mapping the transition state and pathway of protein folding by protein engineering. Nature. 1989;340:122–126. [PubMed]
14. Fersht AR, Matouschek A, Serrano L. The folding of an enzyme. I. Theory of protein engineering analysis of stability and pathway of protein folding. J Mol Biol. 1992;224:771–782. [PubMed]
15. Fersht AR, Sato S. Φ-value analysis and the nature of protein-folding transition states. Proc Natl Acad Sci USA. 2004;101:7976–7981. [PubMed]
16. Nölting B, Agard DA. How general is the nucleation-condensation mechanism? Proteins. 2008;73:754–764. [PMC free article] [PubMed]
17. Alonso DO, Daggett V. Staphylococcal protein A: Unfolding pathways, unfolded states, and differences between the B and E domains. Proc Natl Acad Sci USA. 2000;97:133–138. [PubMed]
18. Settanni G, Rao F, Caflisch A. Φ-value analysis by molecular dynamics simulations of reversible folding. Proc Natl Acad Sci USA. 2005;102:628–633. [PubMed]
19. Banachewicz W, Religa TL, Schaeffer RD, Daggett V, Fersht AR. Malleability of folding intermediates in the homeodomain superfamily. Proc Natl Acad Sci USA. 2011;108:5596–5601. [PubMed]
20. Portman JJ, Takada S, Wolynes PG. Variational theory for site resolved protein folding free energy surfaces. Phys Rev Lett. 1998;81:5237–5240.
21. Galzitskaya OV, Finkelstein AV. A theoretical search for folding/unfolding nuclei in three-dimensional protein structures. Proc Natl Acad Sci USA. 1999;96:11299–11304. [PubMed]
22. Alm E, Baker D. Prediction of protein-folding mechanisms from free-energy landscapes derived from native structures. Proc Natl Acad Sci USA. 1999;96:11305–11310. [PubMed]
23. Alm E, Morozov AV, Kortemme T, Baker D. Simple physical models connect theory and experiment in protein folding kinetics. J Mol Biol. 2002;322:463–476. [PubMed]
24. Zong C, Wilson CJ, Shen T, Wolynes PG, Wittung-Stafshede P. Φ-Value analysis of apo-azurin folding: Comparison between experiment and theory. Biochemistry. 2006;45:6458–6466. [PubMed]
25. Itoh K, Sasai M. Flexibly varying folding mechanism of a nearly symmetrical protein: B domain of protein A. Proc Natl Acad Sci USA. 2006;103:7298–7303. [PubMed]
26. Kubelka J, Henry ER, Cellmer T, Hofrichter J, Eaton WA. Chemical, physical, and theoretical kinetics of an ultrafast folding protein. Proc Natl Acad Sci USA. 2008;105:18655–18662. [PubMed]
27. Itoh K, Sasai M. Multidimensional theory of protein folding. J Chem Phys. 2009;130:145104. [PubMed]
28. Abe H, Wako H. Folding/unfolding kinetics of lattice proteins studied using a simple statistical mechanical model for protein folding, I: Dependence on native structures and amino acid sequences. Physica A. 2009;383:3442–3454.
29. Wako H, Abe H. Folding/unfolding kinetics of lattice proteins by applying a simple statistical mechanical model for protein folding. In: Walters EC, editor. Protein Folding. Nova Sci. Pub. Inc.; New York: 2011. pp. 349–376.
30. Wako H, Abe H. Calculation of free-energy profiles, folding rates and Φ values by means of a simple statistical-mechanical model of protein folding. In: Hale M, editor. Advances in Protein Folding Research. Nova Sci. Pub. Inc.; New York: 2015. pp. 19–63.
31. Itoh K, Sasai M. Dynamical transition and proteinquake in photoactive yellow protein. Proc Natl Acad Sci USA. 2004;101:14736–14741. [PubMed]
32. Imparato A, Pelizzola A, Zamparo M. Ising-like model for protein mechanical unfolding. Phys Rev Lett. 2007;98:148102. [PubMed]
33. Itoh K, Sasai M. Entropic mechanism of large fluctuation in allosteric transition. Proc Natl Acad Sci USA. 2010;107:7775–7780. [PubMed]
34. Bruscolini P, Naganathan AN. Quantitative prediction of protein folding behaviors from a simple statistical model. J Am Chem Soc. 2011;133:5372–5379. [PubMed]
35. Caraglio M, Pelizzola A. Effects of confinement on thermal stability and folding kinetics in a simple Ising-like model. Phys Biol. 2012;9:016006. [PubMed]
36. Sivanandan S, Naganathan AN. A disorder-induced domino-like destabilization mechanism governs the folding and functional dynamics of the repeat protein IκBα PLoS Comput Biol. 2013;9:e1003403. [PMC free article] [PubMed]
37. Inanami T, Terada TP, Sasai M. Folding pathway of a multidomain protein depends on its topology of domain connectivity. Proc Natl Acad Sci USA. 2014;111:15969–15974. [PubMed]
38. Abe H, Wako H. Analyses of simulations of three-dimensional lattice proteins in comparison with a simplified statistical mechanical model of protein folding. Phys Rev E. 2006;74:011913. [PubMed]
39. Taketomi H, Ueda Y, Gō N. Studies on protein folding, unfolding and fluctuations by computer simulation. I. The effect of specific amino acid sequence represented by specific inter-unit interactions. Int J Pept Protein Res. 1975;7:445–459. [PubMed]
40. Abe H, Wako H. Application of a statistical-mechanical model for protein folding to a three-dimensional lattice protein. J Phys Soc Jpn. 2004;73:1143–1146.
41. Henry ER, Eaton WA. Combinatorial modeling of protein folding kinetics: free energy profiles and rates. Chem Phys. 2004;307:163–185.
42. Plaxco KW, Simons KT, Baker D. Contact order, transition state placement and the refolding rates of single domain proteins. J Mol Biol. 1978;277:985–994. [PubMed]
43. Plaxco KW, Simons KT, Ruczinski I, Baker D. Topology, stability, sequence, and length: defining the determinants of two-state protein folding kinetics. Biochemistry. 2000;39:11177–11183. [PubMed]
44. Gromiha MM, Selvaraj S. Comparison between long-range interactions and contact order in determining the folding rate of two-state proteins: application of long-range order to folding rate prediction. J Mol Biol. 2001;310:27–32. [PubMed]
45. Kamagata K, Arai M, Kuwajima K. Unification of the folding mechanisms of non-two-state and two-state proteins. J Mol Biol. 2004;339:951–965. [PubMed]
46. Kuznetsov IB, Rackovsky S. Class-specific correlations between protein folding rate, structure-derived, and sequence-derived descriptors. Proteins. 2004;54:333–341. [PubMed]
47. Gianni S, Guydosh NR, Khan F, Caldas TD, Mayor U, White GWN, et al. Unifying features in protein-folding mechanisms. Proc Natl Acad Sci USA. 2003;100:13286–13291. [PubMed]
48. McCallister EL, Alm E, Baker D. Critical role of β-hairpin formation in protein G folding. Nat Struct Biol. 2000;7:669–673. [PubMed]
49. Riddle DS, Grantcharova VP, Santiago JV, Alm E, Ruczinski I, Baker D. Experiment and theory highlight role of native state topology in SH3 folding. Nat Struct Biol. 1999;6:1016–1024. [PubMed]
50. Martinez JC, Serrano L. The folding transition state between SH3 domains is conformationally restricted and evolutionarily conserved. Nat Struct Biol. 1999;6:1010–1016. [PubMed]
51. Northey JGB, Di Nardo AA, Davidson AR. Hydrophobic core packing in the SH3 domain folding transition state. Nat Struct Biol. 2002;9:126–130. [PubMed]
52. Sato S, Religa TL, Daggett V, Fersht AR. Testing protein-folding simulations by experiment: B domain of protein A. Proc Natl Acad Sci USA. 2004;101:6952–6956. [PubMed]
53. Guerois R, Serrano L. The SH3-fold family: experimental evidence and prediction of variations in the folding pathways. J Mol Biol. 2000;304:967–982. [PubMed]
54. Kim DE, Fisher C, Baker D. A breakdown of symmetry in the folding transition state of protein L. J Mol Biol. 2000;298:971–984. [PubMed]
55. Garcia-Mira MM, Boehringer D, Schmid FX. The folding transition state of the cold shock protein is strongly polarized. J Mol Biol. 2004;339:555–569. [PubMed]
56. Went HM, Jackson SE. Ubiquitin folds through a highly polarized transition state. Protein Eng Des Sel. 2005;18:229–237. [PubMed]
57. Villegas V, Martínez JC, Avilés FX, Serrano L. Structure of the transition state in the folding process of human procarboxypeptidase A2 activation domain. J Mol Biol. 1998;283:1027–1036. [PubMed]
58. Kragelund BB, Osmark P, Neergaard TB, Schiødt J, Kristiansen K, Knudsen J, et al. The formation of a native-like structure containing eight conserved hydrophobic residues is rate limiting in two-state protein folding of ACBP. Nat Struct Biol. 1999;6:594–601. [PubMed]
59. Fowler SB, Clarke J. Mapping the folding pathway of an immunoglobulin domain: structural detail from phi value analysis and movement of the transition state. Structure. 2001;9:355–366. [PubMed]
60. Nölting B, Andert K. Mechanism of protein folding. Proteins. 2000;41:288–298. [PubMed]
61. Hamill SJ, Steward A, Clarke J. The folding of an immunoglobulin-like Greek key protein is defined by a common-core nucleus and regions constrained by topology. J Mol Biol. 2000;297:165–178. [PubMed]
62. Cota E, Steward A, Fowler SB, Clarke J. The folding nucleus of a fibronectin type III domain is composed of core residues of the immunoglobulin-like fold. J Mol Biol. 2001;305:1185–1194. [PubMed]
63. Ternström T, Mayor U, Akke M, Oliveberg M. From snapshot to movie: ϕ analysis of protein folding transition states taken one step further. Proc Natl Acad Sci USA. 1999;96:14854–14859. [PubMed]
64. Hedberg L, Oliveberg M. Scattered Hammond plots reveal second level of site-specific information in protein folding: [var phi]′ (β) Proc Natl Acad Sci USA. 2004;101:7606–7611. [PubMed]
65. Otzen DE, Oliveberg M. Conformational plasticity in folding of the split β-α-β protein S6: evidence for burst-phase disruption of the native state. J Mol Biol. 2002;317:613–627. [PubMed]
66. Chiti F, Taddei N, White PM, Bucciantini M, Magherini F, Stefani M, et al. Mutational analysis of acylphosphatase suggests the importance of topology and contact order in protein folding. Nat Struct Biol. 1999;6:1005–1009. [PubMed]
67. Fulton KF, Main ERG, Daggett V, Jackson SE. Mapping the interactions present in the transition state for unfolding/folding of FKBP12. J Mol Biol. 1999;291:445–461. [PubMed]
68. Serrano L, Matouschek A, Fersht AR. The folding of an enzyme. III. Structure of the transition state for unfolding of barnase analysed by a protein engineering procedure. J Mol Biol. 1992;224:805–818. [PubMed]
69. Choe SE, Li L, Matsudaira PT, Wagner G, Shakhnovich EI. Differential stabilization of two hydrophobic cores in the transition state of the villin 14T folding reaction. J Mol Biol. 2000;304:99–115. [PubMed]
70. Wilson CJ, Wittung-Stafshede P. Snapshots of a dynamic folding nucleus in zinc-substituted Pseudomonas aeruginosa azurin. Biochemistry. 2005;44:10054–10062. [PubMed]
71. López-Hernández E, Serrano L. Structure of the transition state for folding of the 129 aa protein CheY resembles that of a smaller protein, CI-2. Fold Des. 1996;1:43–55. [PubMed]
72. Murzin AG, Brenner SE, Hubbard T, Chothia C. SCOP: a structural classification of proteins database for the investigation of sequences and structures. J Mol Biol. 1995;247:536–540. [PubMed]
73. Garbuzynskiy SO, Finkelstein AV, Galzitskaya OV. Outlining folding nuclei in globular proteins. J Mol Biol. 2004;336:509–525. [PubMed]
74. Haglund E, Lindberg MO, Oliveberg M. Changes of protein folding pathways by circular permutation. Overlapping nuclei promote global cooperativity. J Biol Chem. 2008;283:27904–27915. [PubMed]

Articles from Biophysics and Physicobiology are provided here courtesy of The Biophysical Society of Japan