|Home | About | Journals | Submit | Contact Us | Français|
Biomolecular systems possess unique, structure-encoded dynamic properties that underlie their biological functions. Recent studies indicate that these dynamic properties are determined to a large extent by the topology of native contacts. In recent years, elastic network models used in conjunction with normal mode analyses have proven to be useful for elucidating the collective dynamics intrinsically accessible under native state conditions, including in particular the global modes of motions that are robustly defined by the overall architecture. With increasing availability of structural data for well-studied proteins in different forms (liganded, complexed, or free), there is increasing evidence in support of the correspondence between functional changes in structures observed in experiments and the global motions predicted by these coarse-grained analyses. These observed correlations suggest that computational methods may be advantageously employed for assessing functional changes in structure and allosteric mechanisms intrinsically favored by the native fold.
Biomolecular structures enjoy several degrees of freedom under equilibrium conditions, ranging from small fluctuations in atomic positions to collective movements of entire domains, subunits, or molecules. These motions are not random. They are dominated, if not fully determined (owing to environmental perturbations), by intra- and intermolecular interactions, which in turn depend on the three-dimensional (3D) structure, i.e., they are structure-encoded. Structure-encoded dynamics are defined at a minimum by the geometry or topology of native contacts, apart from specific interactions that distinguish between different types of amino acids. Recent years have witnessed an explosion in the number of studies that use elastic network models (ENMs) and normal mode analysis (NMA) for exploring the structure-encoded dynamics that depends exclusively on native contact topology (8, 17). Several servers are now available (1, 21, 48, 63, 64), which permit one to readily compute and visualize a range of collective motions; identify residues that play critical roles in mediating these cooperative movements; or take advantage of predicted modes for structure refinement, for improving docking algorithms, or for steering molecular dynamics (MD) simulations to explore longer timescales and larger length scales.
These studies are now inviting attention to the functional significance of structural dynamics, as also suggested by experimental data (13, 30, 61). There is now an ever-increasing volume of computational studies aimed at gaining insights into the mechanisms of function and allostery, or the links bridging 3D structure and function, with the help of ENM-based methodologies (5, 7, 65). Recent studies raise additional fundamental questions concerning the possible evolution/selection of structures to lend themselves to structure-specific dynamics relevant to their biological function (see, for example, Reference 71).
Given the extensive use of ENMs in conjunction with NMAs and other approaches based on principal component analysis (PCA), as well as graph theoretical methods, in a broad variety of applications, a critical assessment of the underlying assumptions, theory, and methods is in order. The purpose of this review is to give a brief summary of these theoretical foundations, along with illustrative examples of the use and limitations of ENM-based examination of protein dynamics. We begin by presenting an overview of the theoretical foundations and basic features of ENMs, focusing in particular on two ENMs that have been widely used in recent years, the Gaussian network model (GNM) (4, 27) and the anisotropic network model (ANM) (2, 21, 67). We then summarize the extensive studies performed for benchmarking ENMs, or optimizing their parameters, against experimental data on the equilibrium fluctuations of residues observed in structurally resolved proteins. We expose and discuss the relevance of predictions to functional changes in the structures of proteins. We conclude with recent extensions and future prospects.
Three basic features have made ENMs attractive for exploring biomolecular dynamics in recent years. First is their simplicity, both conceptual and methodological. In an ENM, the complex structures of biomolecules are reduced to a network of nodes and springs. The network topology is endowed by the native state contact topology. The positions of the nodes are known from X-ray or NMR structures, and those residues that are close to each other are connected by uniform springs. The distribution of interactions is thus a major determinant of structural dynamics. This simple model lends itself to an easily calculated and unique solution for the collective dynamics of each examined system.
A second feature of ENMs is the robustness of the predictions. The global dynamics predicted by the network are almost indistinguishable from those that would otherwise be obtained by full-fledged atomic force fields. Global dynamics refer here to collective motions that cooperatively involve large substructures, usually at the lowest frequency end of the normal mode spectrum, as opposed to local motions associated with high-frequency vibrations. Starting from the pioneering work of Tirion (69), numerous applications have demonstrated that these modes are insensitive to detailed structure and energetics. They are defined, instead, by the overall shape of the biomolecule (e.g., by the fold of the protein), as recently reviewed (5, 50, 65, 76).
The robustness of ENM predictions entails another important feature: scalability. The nodes may indeed be selected at various levels of resolution, provided that the network accurately preserves the gross features of the molecular structure. The most broadly used approach has been to represent each residue as a single node, usually coincident with the Cα-atom for proteins, but models that use only a single node per 40 residues, for example, also predict global dynamics and allosteric communication properties surprisingly well (14, 19). The ENM also lends itself to use in mixed models, in which specific regions of the molecule are modeled in detail and the remainder at low resolution (43), and to rigid blocking schemes as in the rotating-translating blocks model (66). Clearly, the scalability of ENMs makes them particularly useful for investigating large complexes/assemblies or supramolecular systems (79), or even organelles such as the nuclear pore complex (46) or intact viruses (47), i.e., bimolecular systems that are well beyond the scope of exploration via conventional molecular models and simulations.
Simplicity, scalability, and robustness are attractive features; but ENMs would not have found such broad use in molecular and structural biology if it were not for the physical insights they provided into the intrinsically accessible motions of biomolecules, the same motions that are essential to biological function. This review presents examples of the relevance of ENM predictions to such functional changes. Mainly, structures favor well-defined collective modes, and a few cooperative modes underlie the passage between the alternative functional forms of proteins, as illustrated for a few cases in Figure 1 and further elaborated below.
The ANM is the most broadly used ENM. It was inspired by the original work of Tirion (69) in which a quadratic potential with a uniform force constant for all atomic interactions within a cutoff distance reproduced almost identically the low-frequency modes of motion obtained with a detailed force field. The ANM is essentially a coarse-grained (CG) version of Tirion’s model, in which the interaction sites, or the network nodes, are the individual residues rather than atoms (2, 21, 67). The corresponding potential reads
where Rij and are the instantaneous and equilibrium separations between residues i and j; γ is the uniform force constant, also called stiffness; and Θ(x) designates the Heaviside step function, equal to 1 if x > 0, and zero otherwise. In Equation 1, Θ(x) selects all ij pairs that are within a cutoff distance, Rc, while eliminating farther neighbors. N is the number of nodes/residues in the network. The spatial locations of residues are usually identified by the coordinates of the α-carbons (in proteins), or those of selected nucleotide atoms (in DNA or RNA oligomers). We note that Hinsen et al. (32) introduced a slightly different CG ENM, where the term is replaced by a semiempirical function, the form and parameters of which optimally fit the effective force constants obtained (for crambin) with a realistic force field. This distance-weighted form has the advantage of eliminating the parameter Rc and providing a physical description of inter-residue interactions. Jernigan and coworkers also proposed a force constant that decreases with inter-residue separation, which gives a satisfactory description of equilibrium motions (75).
The expansion of the ANM potential near the equilibrium state reads
where α is x, y, or z; according to Equation 1; and EANM/αi|R0 vanishes at equilibrium. The (remaining) second order term may be written in compact notation as 1
Here ΔR is the 3N-dimensional vector of fluctuations ΔRi = (Δxi Δyi Δzi)T of all residues (1 ≤ i ≤ N) organized as ΔRi = (Δx1 Δy1 Δz1…ΔzN)T, and H is the Hessian, a 3N × 3N matrix composed of the second derivatives of the potential with respect to the components of the position vectors. The ANM yields a concise, closed form expression for the elements of H, e.g.,
for the xy element of the ijth off-diagonal block Hij with a size of 3 × 3. The diagonal blocks are the negative sum over all off-diagonal blocks in the same row/column (H is symmetrical). In NMA using a conventional force field, an expensive initial energy minimization is required prior to calculating H, which becomes a challenge as the size of the system increases. Conversely, ANM requires no such minimization and is readily applicable to supramolecular systems.
The 3N × 3N covariance matrix, C(3N) for residue fluctuations reduces to
using Gaussian integrals and to evaluate the above ensemble. H has six zero eigenvalues, corresponding to the external rotational and translational degrees of freedom of the molecule. As a result, the inversion of H is not straightforward and H−1 is a pseudoinverse expressed in terms of its nonzero eigenvalues (λk, 1 ≤ k ≤ 3N-6) and corresponding eigenvectors uk, as
The implication of Equation 6 is that the positional covariance is contributed by a set of normal modes (uk). The eigenvalues serve as the weights of these contributions; they are the squared frequencies of vibrational modes and also the stiffness (curvature of the energy landscape) as the molecule reconfigures along the mode directions (3, 27). Clearly, lower-frequency modes make a larger contribution to the covariance. The modes with the lowest frequencies are generally of interest, as they entail the most cooperative and largest amplitude motions and are the softest modes favored by the overall architecture.
The GNM is rooted in the statistical mechanical theory of elasticity originally introduced by Flory and coworkers (23, 36) for polymer networks. It uses the same assumption underlying the original theory put forward three decades ago, that of independent, Gaussian fluctuations of network nodes given by the distribution (3, 4)
which, in turn, implies Gaussian fluctuations in internode distances. In the GNM, each node is identified by a residue (of the protein or oligonucleotide), and each connector by the interaction that stabilizes the native contact topology. In the language used above for describing ANM, the isotropic Gaussian distributions of fluctuations map to harmonic potentials of entropic origin (4)
where ΔXT = (Δx1 Δx2… ΔxN), and similarly forΔYT and ΔZT. Γ is the N × N Kirchhoff matrix, fully controlling the dynamics. Its ijth off-diagonal element is Γij = −1 if node i is within a cutoff distance Rc from node j, and Γij = 0; otherwise, the diagonal elements are evaluated from and represent the residue coordination number. No information on the directions of fluctuations can be obtained from the GNM. The theory provides N-dimensional predictions on the mean-square fluctuations (MSFs) of residues and the cross-correlations between their fluctuations given by the respective diagonal and off-diagonal elements of an N × N covariance matrix C(N),
along with the collective modes in an N-dimensional space obtained by the eigenvalue decomposition of Γ. Note that Equation 8 is simply the counterpart of Equation 5 upon substitution of EGNM. It implies, upon substitution from Equation 7, an entropic cost of the form (9)
for the deformation of residue i away from its equilibrium position. Residues subject to large amplitude MSFs thus incur lower entropic cost for a given deformation.
For comparative purposes with the ANM potential, Equation 8 can be rewritten as
The difference with respect to Equation 1 is therefore the replacement of by , where ΔRij is the vectorial difference between the instantaneous and equilibrium distance vectors Rij and . The term in Equation 1 becomes zero if the instantaneous distance vector maintains its magnitude while changing its orientation. Its counterpart in the GNM, on the other hand, penalizes the change in orientation (including both internal and external rotation) in addition to the changes in distance (magnitude). In fact, Γ has one zero eigenvalue (trivial mode), and consequently, ΓI has three, instead of six, trivial modes, which correspond exclusively to the translational invariance of the results (68). Notably, the GNM analysis shares much in common with spectral graph-theoretical methods based on network connectivity that have found wide applications in other disciplines (15, 16).
The most abundant data for the equilibrium fluctuations of residues near their folded state are the X-ray crystallographic temperature factors, also known as B-factors or Debye-Waller factors, reported with the majority of Protein Data Bank (PDB)-deposited X-ray structures. The B-factors are not direct experimental measurements, however, but refined values that reflect more than pure internal motion—including rigid-body motion and static disorder in the crystal. In some structures, rigid-body motions and uniform static disorder make even larger contributions to observed B-factors than do thermal fluctuations (31, 62), as indicated by the separation of internal and external degrees of freedom in NMA-based refinement of structures (35, 57, 62). Figure 2a illustrates the results from a systematic examination of a set of 90 high-resolution structures. No correlation with rigid-body translation is observed as these modes are constants for all residues, while rigid-body rotations exhibit significant correlations.
Despite these limitations, B-factors have been used broadly for exploring whether the equilibrium fluctuations in residue positions predicted by ENM-based NMA exhibit any correlation with experimental data, and for assessing the optimal parameters for these CG studies (37, 38, 42, 62, 78, 80), given that the B-factors theoretically scale with the MSFs of individual atoms around their equilibrium positions, as Bi = (8π 2/3) ΔRi)2. Note that in the ENM analysis, (ΔRi)2 is simply the ith diagonal element of C (Equation 9) or the trace of the ith super-element of C(3N) (Equation 5). The MSFs computed by the GNM usually yield a correlation with B-factors of about 0.60 (20, 37, 42, 78) and increase to 0.66 when the contacts made with neighboring molecules in the crystal environment are taken into consideration (31, 37, 42, 62). For ANM, the mean correlation is about 0.55 and increases to 0.60 upon consideration of crystal contacts (21); the use of distance-weighted spring constants further increases the average correlation with experimental B-factors to 0.67 (60). The better correlation of GNM predictions with B-factors has been attributed to the energetic penalty on internal and external rotations inherent to the GNM. Recent studies show that correlations higher than 0.8 can be achieved in individual cases (49, 62, 81) upon optimizing ANM parameters against experimental data.
In high-resolution crystallographic structures, the diffraction data are sufficiently detailed to refine six anisotropic displacement parameters (ADPs), instead of a single isotropic parameter, per atom. The ADPs for the ith atom are simply the six distinctive elements of the ith diagonal superelement (a 3 × 3 symmetric matrix) of C(3N). Three of these, (Δxi)2, (Δyi)2, and (Δzi)2, provide information on the sizes of motions along different directions. The remaining three, Δxi Δyi, Δxi Δzi, and Δyi Δzi, are cross-correlations between fluctuations along different axes. Systematic calculations (20, 38, 49) showed that the MSFs, (ΔRi)2, are predicted with higher accuracy than the individual components (Δxi)2, (Δyi)2, and (Δzi)2; that these components are, in turn, more accurately predicted than their cross-correlations (20); and that use of detailed force fields like CHARMM slightly improves the predictions (12, 38). ADPs suffer from the same limitations as the isotropic B-factors, and should be considered with caution. Figure 2b shows, for example, that the ADPs predicted by the ANM for hen egg white lysozyme may agree better with experimental data than do two sets of experimental data obtained for the same protein from crystals with different symmetries (20), illustrating the sensitivity of ADPs to experimental conditions.
In addition to X-ray crystallographic data, GNM predictions correlate well with H/D exchange protection factors (9), folding nuclei, (28, 59), NMR order parameters (26), and root-mean-square deviations (RMSDs) between NMR models (78). Notably, the RMSDs from NMR ensembles agree better with ENM results than do the crystallographic B-factors. This difference has been explained by the more accurate sampling of low-frequency/large-amplitude movements in solution, reflected in NMR models, as opposed to their suppression in the crystal environment (78).
The comparative studies summarized above focus on equilibrium fluctuations. More significant, however, is the relevance of ENM predictions to functional changes in structure, which are elucidated by focusing on the global modes.
In principle, there is always a combination of normal modes that accounts for a given conformational change, since the normal modes form an orthonormal basis set that spans the complete space of internal motions. However, the question is whether a single mode or a small subset of low-frequency modes, predictable by ENMs, can largely explain the functional changes. This ability of ENMs could then assist in structural refinement, molecular docking, and development of drugs that target particular functional motions.
The existence of a correlation between GNM predictions and structural changes inferred from the structures of a given protein resolved in different forms (e.g., closed or open forms, or inhibitor-bound, DNA-bound, or unbound forms) was pointed out a decade ago for HIV-1 reverse transcriptase (6) and other proteins (3), along with the implications of observed domain motions on biomolecular function. We note in particular a pioneering survey by Tama & Sanejouand (67) in which 20 different proteins, all resolved in both open and closed states, were examined using the ANM to demonstrate the relevance of predicted slow modes to observed (functional) changes in structure. Since then, this important feature has been exploited for many systems (5). The idea is the following: Given two conformations R(A) and R(B) for a given protein, the question is whether one or more modes, uk, accessible to R(A) closely approximate the 3N-dimensional difference vector ΔRAB = R(B) − R(A) between the two conformers. In mathematical terms, the goal is to determine the displacement of size sk along eigenvector uk that will minimize the RMSD,
between the end points. The sk value that minimizes RMSDk is readily found by differentiating Equation 12 with respect to sk, as the projection of ΔRAB onto uk, (note that uk is normalized), i.e., sk = ΔRAB • uk, which substituted into Equation 12, leads to the smallest RMSDi that can be achieved upon moving along the kth mode axis:
Figure 1 illustrates some typical results obtained for six proteins, each resolved in at least two conformers, R(A) and R(B). We see in each case at least one mode uk, among those in the lowest frequency regime (e.g., k ≤ 20), that makes a significant contribution to the structural change ΔRAB between two conformers. This property is evidenced by (a) the decrease in the RMSD between the two conformers upon reconfiguration of one of them along that particular mode, and (b) the correlation between the square displacement profiles of residues derived from experimental (ΔRAB) and theoretically predicted (uk) changes in coordinates. For example, in Figure 1c, the second slowest mode alone allows for a reduction from 3.80 to 1.95 Å in the RMSD between the open and closed structures of the maltodextrin-binding protein. The square displacements of residues along this particular ANM mode exhibit a correlation of 0.71 with the changes experimentally observed between the two conformers. Yet, this particular mode represents just one of the 3N-6-accessible directions of reconfiguration (at the residue-level description) in the conformational space. Notably, this motion is predicted by the ANM to be one of the most readily accessible (entropically favorable) modes of motion and, when observed experimentally, exhibits a remarkable correlation with the global change in structure.
Similar features are observed in five of the proteins shown in Figure 1. However, not all conformational changes can be described by low-frequency modes, especially if the transitions involve rearrangements on a local scale. In some proteins such as calmodulin, the conformational change can be almost fully accounted for by a few slow modes; in others, such as NtrC, this is not the case (25). Among motor proteins, the functional motions of myosin and F1-ATPase can be accurately described by one or two modes, but this is not the case for the evolutionarily related kinesin (83). For the ligand-binding domain of the leukocyte-immunoglobulin-like receptor (LIR-1) (Figure 1f), mode 3 helps to reduce the RMSD between the open and closed forms, presumably due to the correctly predicted anti-correlated movements of the two domains, but the overall mobility profile exhibits poor correlation with experiments owing to inadequate description of a solvent-exposed loop region. The RMSDs and correlation coefficients thus provide complementary information.
From another perspective, we can see that correlation cosines of ≥0.90 between the theoretical and experimental mobility profiles can be obtained upon adding up the contributions of only a handful of slow modes. Gerstein and coworkers (1) have now constructed a database of more than 3800 molecular motions, which shows that the slow modes often exhibit maximum overlap with the experimentally observed directions of reconfiguration. The slow modes can be readily evaluated and visualized for any structure deposited into the PDB, or any model written in PDB format, using the ANM server (21).
We note that the proteins illustrated in Figure 1 exhibit RMSDs of 1.8–6.0 Å between their different conformers, with the largest changes occurring in the case of reverse transcriptase (Figure 1a). Reverse transcriptase is composed of two subunits (p66 and p51), each of which is composed of multiple subdomains. The large structural change observed between its inhibitor-bound and inhibitor-free forms presumably falls within the same global energy minimum represented at a CG scale. In a sense, coarse-graining of structure and energetics allows for overlooking the ruggedness of the energy landscape—which otherwise arises from specific, highly nonlinear interactions at atomic scale. Motivated by such findings, and presumably inspired by the examination of folding kinetics using Gō-models, transitions between substates have been explored, using ENMs for the endpoints (51, 54, 81). A recent study performed for the bacterial chaperonin GroEL along these lines (79) revealed that a small subset of modes is conducive to the next functional state, but there is also a need to gradually sample steps along higher mode directions as the structure moves away from the original state. Calculations demonstrated that more than half of the 12 Å RMSD undergone by the complex during its allosteric cycle was achieved upon moving along the slowest two to three modes, with minimal perturbation, if any, in the distribution of native contacts (79). The concurrence between the naturally selected (or entropically favored) modes of motion and those required for achieving its chaperonin function provides a nice example of the evolutionary optimization of structure to intrinsically favor functional modes.
Two models are broadly used in the literature for explaining allosteric changes elicited by ligand binding (see Figure 3): (a) induced fit, originally proposed by Koshland (39), which confers an active role to the ligand as the determinant of the conformational change undergone by the protein upon binding it, and (b) pre-existing equilibrium (55), in which the substrate-bound conformer results from the selection of an already existing (albeit at low proportion) conformation of the protein. The transition is an all-or-none change, cooperatively involving the entire molecule, similar to slow modes predicted by NMA. The Koshland-Néméthy-Filmer (KNF) model (40), on the other hand, proposes a sequential transition that is initiated locally and gradually engages other subunits.
An important result from ENM-based studies, with implications for molecular docking, is that the low-frequency modes calculated for the unbound protein can account for the conformational changes observed upon substrate binding (70). This observation reveals the important property of intrinsic accessibility of functional movements, endowed by the equilibrium structure, per se, prior to substrate binding. Recent analyses further suggest that this property holds not only for protein-protein interactions, but even for protein-ligand (small molecule) interactions (10, 73), and that it enables allosteric regulation in general.
A classical example is hemoglobin, the T → R transition of which has been demonstrated in both atomic (56) and ANM-based (72) NMAs to be achieved by one slow mode (second in that case) intrinsically accessible to the T state. The transition is cooperative, i.e., all subunits concertedly undergo a conformational switch, the mechanism of which is defined by the overall quaternary structure. Another prime example is the bacterial chaperonin, GroEL, an ATP-regulated machine. The conformational transitions undergone by GroEL during its allosteric cycle are highly dominated by a few low-frequency modes (79, 82). Zheng et al. (82) demonstrated that not only are ligand-induced conformational changes dominated by few low-frequency modes, but that these modes are robust to sequence variations.
An alternative way of examining the correspondence between experiments and computations is to compare existing structural data for a given protein in the presence of different ligands with snapshots from its MD simulations. This type of comparison may be particularly useful for local changes at the ligand-binding pocket that fall within the range of observation of MD simulations. Such an analysis recently performed for acetylcholine esterase showed that the side chain conformational changes upon ligand binding correspond to pre-existing states observable from MD (73). In summary, increasing experimental and computational evidence supports the dominance of intrinsic, as opposed to induced, dynamics in controlling the conformational switches and allosteric signals of proteins (5, 10, 13, 22, 25, 70, 73); but the bound conformations understandably undergo further stabilizing rearrangements induced upon substrate binding (70) consistent with an alteration in the energy landscape in favor of the bound state (61).
Perhaps the most striking evidence of consistency between the ensembles of structures observed in experiments and the structural changes predicted by the ANM is provided by PCA of structures deposited into the PDB for a given protein. The comparison is straightforward: PCA extracts the dominant modes of structural variation in a given ensemble, permitting us to directly compare these principal modes, based on experimental structural data, with the lowest-frequency normal modes predicted (for one conformation in the ensemble) by the ANM.
With increasing structural data on the same protein in different conformers, we are now in a position to make such comparisons and assess the realism of the global modes predicted by the ANM. We may also take a closer look at the differences between the structures determined by NMR and X ray for a given protein and see to what extent these differences comply with the intrinsic dynamics of the protein. Calculations recently performed (77) along those lines showed that in 20 of 24 proteins examined, the conformational differences between the NMR structures and their X-ray counterparts were consistent with the principal modes identified by PCA of NMR models and supported by ANM. This study unambiguously shows that X-ray and NMR structures simply reflect the reconfigurations of the protein along its intrinsically accessible global modes of motions (77).
Notably, global modes can be deduced from the ANM analysis of NMR models. Figure 4a,b illustrates the results from such an ANM investigation of an ensemble of NMR models determined for calmodulin complexed with myosin light chain kinase (10). The PCA of the ensemble yields two principal modes of structural variations, PC1 and PC2. ANM analysis of one representative structure, on the other hand, yields the top-ranking modes, designated ANM1 and ANM2. The plots display the ensemble of models dispersed along the top-ranking ANM and PCA directions, demonstrating the close correspondence between the experimentally determined structural variations (PCA) and theoretically predicted global modes (ANM). Precisely, ANM modes 1 and 2 exhibit respective correlations of 0.88 and 0.77 with PC1 and PC2.
In a recent study, the ensembles of structures experimentally resolved for well-studied enzymes in the presence of different ligands/substrates, or in the unbound state, were shown to correlate with ANM-predicted conformational changes (10). Figure 4c,d illustrates the results for an ensemble of 74 structures determined for p38 kinase. PCA of this ensemble revealed two principal modes of structural changes, PC1 and PC2, that essentially correspond to a concerted opening/closing of the two lobes (PC1), and a twisting of the N-lobe with respect to the C-lobe (PC2). ANM calculations performed for a representative member from this set revealed that the global modes 1 and 3 closely correlate with PC2 and PC1, respectively, as illustrated in Figure 4c,d. Again, the major, largest-scale conformational changes observed between the experimentally resolved structures are essentially reconfigurations along the respective ANM modes 1 and 3 intrinsically accessible to p38 kinase structure, irrespective of ligand binding.
These results demonstrate that the structural changes undergone by the protein in different functional forms conform to a large extent to those intrinsically encoded by the native contacts. In a recent study of HIV-1 protease, Jernigan and coworkers (74) also demonstrated the close correlation between the top-ranking PCA modes derived for experimentally resolved structures and the low-frequency ANM modes computed for a representative structure. Such comparisons are expected to become a frequent benchmarking technique for assessing and consolidating the conformational changes predicted for functional mechanisms.
These observations draw attention to two important concepts. First, functional information may be extracted from ensembles of conformations resolved for a given protein; i.e., the data in the presence of different ligands or in different crystal space groups are not redundant but actually contain valuable information on functional dynamics that may be extracted by PCA, stipulating the utility of depositing into the PDB multiple structures for a given protein. Second, the principal variations in structure are consistent with the top-ranking ANM modes, implying that information on potential changes in structure, needed for assessing protein-protein and protein-ligand interactions, can be advantageously deduced from knowledge of inter-residue contact topology.
Notably, the residues identified to be highly constrained in the PCA or NMA modes are further verified to be evolutionarily conserved, supporting the utility of the PCA of structural ensembles or ANM analysis of known structures for identifying potential functional sites. We have now constructed a portal, PCA NEST (PCA of Native Ensembles of STructures) (77), which uses as input the ensembles of structures known from experiments (or snapshots from simulations), and releases as output the corresponding principal modes of structural changes and potential functional sites via a user-friendly interface.
Despite recent advances in ab initio methods for protein structure prediction, ongoing structural genomics initiatives strive to resolve representative structures for many different folds such that most proteins can be accurately modeled by homology modeling. In this method, target proteins are modeled on the basis of evolutionarily related templates with known structures. Homology modeling methods, however, are often not accurate enough for practical purposes because of our inability to successfully refine the structure of a template protein based on the target sequence (41). Any advances in this refinement mark significant steps toward the goal of exploiting protein models for practical needs, such as drug design, in the structural genomics era.
We are currently in a position to improve the quality of homology models by refining the template structures based on their equilibrium motions using ENM and NMA. The changes in protein topology due to limited sequence variation between family members tend to take place along a set of low-frequency normal modes (44). Therefore, it should be possible to refine the template backbone using the target sequence by effectively restricting the search space to a subspace spanned by a few lower-frequency normal modes. This will increase the chance of finding correct solutions while reducing the chance of getting trapped in false solutions. Implementing these ideas, Qian et al. (58) performed a grid search for refining template structures by PCA of structures available for a given family and selecting conformations based on sensitive energy functions. Given that the principal components from experimental structures correlate well with normal modes (44, 77), it should be possible to use normal modes instead of PCA. The advantage of normal modes is that they can be calculated based on a single template structure. This endows NMA with the advantage of filling the gaps in the sequence-structure space. Han et al. (29), on the other hand, developed a combined method that utilizes both PCA and NMA. It effectively restricts the search space for homology modeling, making it feasible to adopt sensitive optimization procedures that require lower dimensionality. A recent application is the prediction of open and closed conformations of Rat heme-free oxygenase (52). Starting from the heme-bound X-ray structure, and using templates corresponding to human and incomplete rat heme-free structures, models of the rat unbound species with open and closed conformations were generated.
Finally, there is an even larger body of literature on the use of NMAs for structure refinement, beginning with original studies by Diamond (18) and Kidera & Gō (34) soon after the pioneering NMA articles in the 1980s (11, 24, 45). Owing to space limitations, we simply acknowledge here that significant advances have been made in this field, not only for X-ray structure refinement (57), but also fitting high-resolution structures into low-resolution cryo-electron microscopy density maps (33, 63).
Despite their simplicity, ENM-based approaches have shown considerable success in improving our understanding of allosteric mechanisms and explained a wealth of experimental data. Why do such simplified models work?
The answer presumably lies in two features. First, the major ingredient of the theory, native contact topology represented by the spatial distribution of network nodes, plays a dominant role in defining the collective motions at the low-frequency end of the mode spectrum, i.e., the most cooperative modes. The types (attraction or repulsion) or grades (strong or weak) of interactions are not as important as their existence or absence defined by the network topology. Alternatively, one can think that entropic effects make significant contributions to the selection of structural changes. In the GNM, the network topology defines the conformational space accessible to the network/structure via Gaussian fluctuations, as originally set forth in the statistical mechanical theory of elasticity developed for polymer networks, allowing us to assess the entropic cost of each residue’s movement (Equation 10). The agreement between GNM predictions and experimental data, which is usually better than that achieved by ANM, has raised the possibility of entropic drive playing a major role in defining optimal directions/modes of structural changes on the free energy landscape in the neighborhood of the native state (72).
Second, the network models lend themselves to analytical treatments and unique solutions for the collective modes of particular architectures. In other words, although the models are approximate, the use of rigorous statistical mechanical theories of solid-state physics and rubber elasticity, and the effective implementation of NMA and/or spectral graph theoretic methods, allow for obtaining exact analytical solutions for these approximate models. In the opposite case, we may predict some approximate behavior (from simulations that suffer from sampling inefficiencies) despite the use of highly precise models (full atomic representation and detailed force field).
Our contention is that the former group of models and methods provides superior results when exploring large-scale, long-time (microseconds or slower) processes of biomolecular systems, or highly cooperative or allosteric responses that involve long-range spatial couplings. The latter type of studies/simulations, on the other hand, appears to be suited, or required, for stochastic processes in the nanoseconds regime, where local structural heterogeneities, side chain rotameric transitions, and specific interactions can make big differences. These are in sharp contrast to the robust processes elucidated by ENM-based methods, examples of which were presented above. There is definitely room and a need for both of these complementary approaches in exploring biomolecular structural dynamics at global and local scales. In the next few years, we will likely see more examples of integrative methodologies that exploit the capabilities of these two approaches.
Support from NIH grant 1R01GM086238-01 is gratefully acknowledged by I.B. We also thank Ahmet Bakan for his help in preparing Figure 4.
The authors are not aware of any affiliations, memberships, funding, or financial holdings that might be perceived as affecting the objectivity of this review.