The protein folding field has witnessed significant changes and progress since the original work of Anfinsen showing that proteins can fold spontaneously [
1,
2]. Early
in vitro studies showed that the folding process typically occurs on a milliseconds-to-seconds time scale, much faster than the rate estimated assuming that folding proceeds by a random search of all possible conformations. Based upon this observation, Levinthal then proposed that a random conformation search does not occur in folding and that proteins fold by specific ‘folding pathways’ [
3]. On these pathways, the protein molecule passes through well-defined partially-structured intermediate states. Based on this view, numerous experiments and simulations were conducted to test the existence of transient folding intermediates [
4,
5]. It was expected that the determination of the structures and population of folding intermediates could help elucidate protein folding mechanisms. Earlier experimental studies on protein folding kinetics monitored the structural changes through relaxation of the protein’s spectroscopic properties after exposing the protein to folding or unfolding conditions. The data obtained from such experiments exhibit single- or multiple-exponential time-decay: a single-exponential decay is interpreted as a signature of two-state kinetics between the native state and the denatured state, whereas models involving more than two states are required to explain multiple-exponential decay data. These experiments generally probe only the average behavior of proteins, and they are not able to provide information about the folding/unfolding process in atomic details.
The discovery of a class of simple, single-domain proteins which fold via two-state kinetics without any detectable intermediates in the early 1990s [
6,
7], the development of experimental techniques with improved spatial/temporal resolution[
8-
13], and the application of computer simulations using simplified lattice and off-lattice models[
14,
15] greatly enhanced our understanding of various aspects of the protein folding problem. Based on the nucleation theory [
16-
18], one of the early proposed mechanisms for protein folding, the nucleation-condensation model was formulated[
19-
21]. In this scenario, a small number of residues (folding nucleus) need to form their native contacts in order for the folding reaction to proceed fast into the native state. The cooperativity of the protein folding process is analogous to that exhibited in first-order phase transitions, which proceed via a nucleation and growth mechanism [
22]. Because of these similarities, terminology used in studies of phase transitions, such as energy landscapes and nucleation, was introduced into the discussion of protein folding. The concepts of the nucleation and the free energy landscape have promoted much of the recent progress in understanding the process of protein folding. Proteins are generally thought to have evolved to exhibit globally funneled energy landscapes [
23-
25] which allow proteins to fold to their native states through a stochastic process in which the free energy decreases spontaneously. The unfolded state, transition state, native state and possible intermediates correspond to local minima or saddle points in the free-energy landscape.
Advances in experimental techniques such as protein engineering, nuclear magnetic resonance (NMR), mass spectrometry, hydrogen exchange, fluorescence resonance energy transfer (FRET), atomic force microscopy (AFM), have made it possible to obtain detailed information about the different conformations occurring in the folding process[
26,
27]. At the same time, computational methods have been developed to better interpret experimental data by using simulations to obtain structural information about the states which are populated during the folding process. In , we list several advances in experimental and computational methodologies used for investigating the folding of model proteins.
| Table 1Protein folding in select model systems |
All-atom protein models with explicit or implicit solvents were developed to study the folding thermodynamics and the unfolding dynamics of specific proteins. Technological advances in computation allowed folding simulations of small proteins and peptides at atomic detail [
28-
30]. However, due to the complexity and vast dimensionality of protein conformational space, all-atom MD simulations have severe limitations on the time and length scales that can be studied. Novel simulation protocols have been proposed to improve conformational sampling efficiency, including biased sampling of the free-energy surface and non-equilibrium unfolding simulations [
24]. In addition, world-wide parallel computing (e.g.
Folding/at/Home [
31]) and generalized ensemble sampling techniques that involve parallel simulations of molecular systems coupled with a Monte Carlo (MC) protocol [
32,
33] have been successfully applied to protein folding [
25,
34-
36].
Multi-scale modeling approaches have also been used to combine efficient conformational sampling of coarse-grained models and accuracy of all-atom models to study protein folding pathways. In this approach, iterative simulations and inter-conversion between high and low-resolution protein models are performed. Feig
et al. developed a multi-scale modeling tool set, MMTSB [
37], which integrates a simplified protein model with the MC simulation engine, MONSSTER [
38], and the all-atom MD packages AMBER [
39] or CHARMM[
40]. Using a combination of CHARMM and discrete molecular dynamics (DMD) [
41-
46], Ding
et al. reconstructed the transition state ensemble of the src-SH3 protein domain through multi-scale simulations [
47]. The protein folding studies can also be facilitated by sampling protein conformations near the native state. Several native-state sampling algorithms [
48,
49] have been successfully utilized to study plasticity [
50], cooperative interactions [
51], and allostery [
52] in proteins. Considering native-state ensemble naturally takes into account protein flexibility, which is shown to be crucial in structure based drug designs.
During the last five years, several tools for performing web-based analyses of protein folding dynamics have been developed. The Fold-Rate server (
http://psfs.cbrc.jp/fold-rate/) [
53] predicts rates of protein folding using the amino-acid sequence. The Parasol folding server (
http://parasol.tamu.edu/groups/amatogroup/foldingserver) [
54] predicts protein folding pathways using “probabilistic roadmaps”-based motion planning techniques. The iFold server (
http://ifold.dokhlab.org) [
55] allows discrete molecular dynamics (DMD) simulations of protein dynamics using simplified two-bead per residue protein models. These tools facilitate the second phase of protein folding research, whereby targeted simulations may be performed for probing the dynamics of protein folding and unfolding under controlled conditions.
DMD approaches [
43-
46] with simplified structural models of proteins have been extensively used for investigating general principles of protein folding and unfolding [
56-
60]. Dokholyan et al.[
61] have highlighted the differences between molecular dynamics and DMD approaches. As opposed to the traditional MD approach of iteratively solving Newtonian equations of motion for evolving protein folding trajectory, DMD simulations solve ballistic equations of motion with square-well approximation to inter-particle interaction potentials. DMD algorithm gains efficiency over traditional MD simulations in multiple ways. First, due to ballistic modeling of particle dynamics, a larger time step can be used in DMD simulations on average, which corresponds to the time interval between fastest ballistic interactions; secondly a faster inter-particle collision detection and velocity updating algorithm is used, since only the coordinates of colliding atoms need to be updated at each collision. Additionally, faster simulation speeds are attainable with the DMD approach through simplification of protein models. Overall, an increase in simulation speed of 5–10 orders of magnitude is attainable using DMD [
62]. Jang et al. [
56] used DMD and simplified protein models with Gō interactions [
63] to probe protein folding kinetics. Protein folding kinetics studies using DMD simulations are reviewed in [
42]. Recently, DMD simulations we used in uncovering the structural mechanisms of protein aggregation [
64-
66]. Among the fundamental challenges in studying protein folding using computer simulations are the time-scales and length-scales that can be investigated. DMD simulations have been shown to be useful for investigating long-timescale folding dynamics of complex biological systems such as poly-alanine aggregation [
66,
67] and the nucleosome core particle [
68].
In addition to the extensive
in silico and
in vitro studies of protein folding, significant progress has been made in understanding protein folding
in vivo. There are two major differences between protein folding
in vivo and
in vitro. First, protein folding
in vivo is usually assisted by molecular machinery, such as chaperones (in an ATP-dependent manner), and often involves small molecule cofactors. Molecular chaperones such as the heat shock protein Hsp70 and chaperonin proteins facilitate protein folding, in part, by isolating the proteins from bulk cytosol[
69,
70]. Hartl and Horwich pioneered the research of chaperone-mediated protein folding[
71], highlighting the differences between
in vivo and
in vitro folding mechanisms[
69]. The mechanism of chaperonin GroEL mediated folding, including
in vivo folding intermediates, has been extensively studied by Horwich and Gierasch [
72,
73]. Work by Landry et al. [
74] showed that chaperon binding promotes α-helix formation in partially folded polypeptide chains. Horowitz et al. have investigated the role of chaperonin Cpn60-mediated hydrophobic exposure in protein folding[
75,
76]. Nearly one third of all proteins in living cells are coordinated to small molecule cofactors. The pioneering work of Wittung-Stafshede and coworkers on the role of cofactors in
in vivo protein folding[
77,
78] demonstrated that bound metals stabilize the native fold, suggesting cofactor binding to unfolded polypeptides dramatically accelerates folding timescales[
77].
A second notable difference between
in vivo and
in vitro protein folding is the fact that the concentrations of macromolecular solutes in cells can reach hundreds of grams per liter in cells[
79], but most
in vitro studies are performed in buffered solution with <1% of the cellular macromolecule concentration. The crowding environment
in vivo can have a significant impact on protein stability and the native structure by changing the energy landscape of protein folding[
80,
81]. Dedmon
et al.[
82] showed that FlgM, a 97-residue protein from
Salmonella typhimurium is unstructured in dilute solution, but in E.coli cells its C-terminal half is structured. McPhie
et al.[
83] found that a molten globular state of apomyoglobin at low pH is stabilized by high concentration of the inert polymer, dextran, compared to the unfolded state. Moreover, it was found that aggregate formation from human apolipoprotein C-II is significantly accelerated by the addition of dextran[
84], suggesting a direct effect of molecular crowding on protein aggregation.
Over the past three decades, novel experimental techniques and simulations have yielded many significant insights in protein folding research. Important advances have been made, especially toward the understanding of folding and unfolding mechanisms, the structure of folding transition states, folding kinetics, the nature of folding pathways, and the structure of unfolded proteins and protein folding in vivo. Theoretical approaches to study protein folding have largely complemented experiments by providing experimentally testable hypotheses. In recent years, the rational manipulation of folding pathways and the association between protein folding and disease have marked a more applied phase of protein folding research.