Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2930202

Formats

Article sections

- Abstract
- 1 Introduction
- 2 Normal Mode Analysis Derived from Hamiltonian Mechanics
- 3 Review of cNMA
- 4 Method
- 5 Results and Discussion
- 6 Conclusion
- Supplementary Material
- References

Authors

Related links

Proteins. Author manuscript; available in PMC 2010 August 31.

Published in final edited form as:

PMCID: PMC2930202

NIHMSID: NIHMS66612

Adam D. Schuyler,^{*} Robert L. Jernigan,^{†} Pradman K. Qasba,^{‡} Boopathy Ramakrishnan,^{‡}^{§} and Gregory S. Chirikjian^{¶}

Corresponding author: Dr. Gregory S. Chirikjian, Department of Mechanical Engineering, The Johns Hopkins University, 223 Latrobe Hall, 3400 North Charles Street, Baltimore, MD 21218, Email: ude.uhj@cgerg, telephone: (410) 516-7127, fax: (410) 516-7254

Computational models provide insight into the structure-function relationship in proteins. These approaches, especially those based on normal mode analysis, can identify the accessible motion space around a given equilibrium structure. The large magnitude, collective motions identified by these methods are often well aligned with the general direction of the expected conformational transitions. However, these motions cannot realistically be extrapolated beyond the local neighborhood of the starting conformation. In this paper, the icNMA method is presented for traversing the energy landscape from a starting conformation to a desired goal conformation. This is accomplished by allowing the evolving geometry of the intermediate structures to define the local accessible motion space, and thus produce an appropriate displacement. Following the derivation of the icNMA method, a set of sample simulations are performed to probe the robustness of the model. A detailed analysis of *β*1,4-galactosyltransferase-T1 is also given, to highlight many of the capabilities of icNMA. Remarkably, during the transition, a helix is seen to be extended by an additional turn, emphasizing a new unknown role for secondary structures to absorb slack during transitions. The transition pathway for adenylate kinase, which has been frequently studied in the literature, is also discussed.

Experimental methods have produced a wealth of high resolution protein structures. At the time of writing, the Protein Data Bank (PDB, [1]) contains more than 40,000 structures solved by x-ray crystallography, nearly half of which are at a resolution better than 2Å. This detailed information has provided a solid foundation for understanding the structure-function relationship, which involves the determination of biological function from structure. This paper presents a new method, called iterative cluster normal mode analysis (icNMA), which produces a transition pathway between known conformations by following the local accessible motion space of the evolving conformation. The resulting pathway provides insight into motion-driven biological function.

Numerous experimental methods have been employed in the study of macromolecular structure and dynamics, including fluorescent resonance energy transfer [2], nuclear magnetic resonance [3, 4], hydrogen exchange [5] and crystallography. These purely experimental methods are often limited by a tradeoff between spatial and temporal resolution. In response to this obstacle, computational modeling techniques have been applied with great success.

Before icNMA is presented, it will be useful to review previous modeling techniques, especially the cluster normal mode analysis (cNMA, [6, 7]) method upon which icNMA is based. The first set of models discussed (including cNMA) are primarily concerned with identifying the accessible motion space *local* to a given equilibrium conformation. This first set serves as a foundation for understanding the second set, which are capable of exploring the *extended* energy landscape.

In one of the earliest computational models, Levitt [8] uses a simplified structure representation and a potential energy function to study protein folding. Based on similar concepts, molecular dynamics (MD) methods use complex potential functions and numerical integration to produce conformation trajectories [9, 10]. These atomic resolution models and all-inclusive potential functions come at a significant computational cost [11], often restricting simulations to timescales orders of magnitude shorter than relevant biological processes.

Normal mode analysis (NMA), which is well suited for identifying large-scale, cooperative structure motions, was introduced [12–16]. The success of NMA-based methods can be attributed to the robustness and simplicity of motions around the equilibrium conformation [17, 18]. The first NMA models were still based on complex potential functions and thus presented computational limitations. This was addressed by the elastic network model (ENM, [19]) which proposed the usage of a single parameter harmonic potential function to model all pairwise atomic interactions. The Gaussian network model [20, 21] used a coarse-grained model relying only upon the *α*-carbon trace representation of a protein along with the ENM to produce a measure of atomic mobility. This scalar model was then extended into the anisotropic network model (ANM, [22]) to capture magnitude and direction of atomic fluctuations.

The already simplified ANM representation has been even further reduced through the use of coarse graining, in which multiple atoms are grouped into single representative points. Excellent reviews of coarse grained models are given by Tozzini [23] and Bahar and Rader [17]. A comparison between varying levels of grain resolution is given by Sen et al. [24]. While these “bead models” do succeed in reducing the number of degrees of freedom (DOFs) in the structure representation, they do so at the expense of altering atomic interaction geometries (i.e. when multiple atoms are represented by a single point, all of their distinct contact interactions must be collapsed onto that representative point).

In response to the shortcomings of coarse grained NMA, the authors developed cNMA [6, 7] in which groups of atoms are represented as rigid bodies embedded in the ENM. This approach utilizes a multi-scale structure representation which includes all atoms and all pairwise atomic interactions (i.e. no deformed geometries), while at the same time, reduces the total number of DOFs needed in the parameterization. The cNMA method is the foundation for icNMA and is reviewed in Section 3. The complete derivation and comparison of cNMA to ENM-based C* _{α}*-NMA is given in Schuyler and Chirikjian [6] and its application at atomic resolution to large structures is given in Schuyler and Chirikjian [7]. The RTB method [25, 26] is similar to cNMA, but several substantial differences in coordinate system definitions and computational procedures are discussed in Schuyler and Chirikjian [7]. Clustering schemes have also been incorporated into MD methods [27] and into NMA methods which explicitly define solvent [28].

Proteins are known to sample various conformation states essential for carrying out their functions. These states range from relatively minor changes in structure to large scale rearrangements, such as those seen in hemoglobin and myosin. There have been many attempts at understanding these transitions [29–32]. It has been established that relatively few, low-frequency normal modes can identify the direction of global motions required to achieve conformational transitions [33]. This general categorization has been further refined by a database study which relates the degree of collectivity in a transition with the effectiveness of ENM normal modes to capture the transition direction [34, 35]. Motion correlation analysis across the low frequency modes has provided information on cooperative structure motion and domain stability [36].

The methods discussed thus far are only relevant to the local motion subspace around an equilibrium conformation. In order to achieve a full understanding of biologically relevant structure transitions, we now introduce a second class of methods, which are capable of exploring the extended energy landscape.

Cryo-electron microscopy (cryo-EM) has been used to study conformational changes [37]. The identification of secondary structure elements has been used to fit atomic resolution subunits onto low-resolution, cryo-EM electron density maps of large complexes [38]. There has also been success in coupling cryo-EM with NMA-based methods. The continuous valued density map has been converted into discrete mass locations, which serve as the basis for low-resolution NMA. This approach shows agreement with the lowest modes of atomic resolution NMA [39]. There have also been methods for deforming known crystal structures onto low-resolution cryo-EM electron density maps [40–42]. These methods use cost functions to optimally select a subset of normal modes and assign relative weighting coefficients that best produce the desired conformational changes. These methods start to introduce dynamics into the structure analysis, but it must be noted that the series of conformations converging to the best fit is not intended as (and can not be interpreted as) a pathway, but as a byproduct of the fitting process.

Linear interpolation of Cartesian coordinates between known conformation pairs causes obvious steric violations, but modified interpolation methods are able to avoid this. For example, the elastic network interpolation (ENI) method interpolates between contact maps of two conformations [43, 44]. One particular aspect of this and many of the other models for generating transition pathways is that they lead to a single pathway; this neglects the widely perceived stochastic nature of protein folding. Consequently these single pathway methods likely provide a most probable representative pathway, with variations around these single pathways providing a full ensemble representation. For example, ENI is able to fit a pathway generated for a core central domain of the 16S rRNA onto an MD trajectory using only the lowest 1% of the normal modes [45]. Normal modes are also able to guide conformational changes according to a small set of distance constraints [46] or an x-ray diffraction pattern [47]. Yang et al. [34] investigate the transition pathways of 170 pairs of structures and find that the normal modes succeed in specifying the directionalities of the transitions only when the collectivity of the motion is high.

The undesired, but inherent, linearity of the interpolation methods is addressed by a very successful family of methods that use switching functions to allow for the transition pathway to shift from the energy basin of one conformation to that of another [48–54]. The plastic network model (PNM, [50]) uses an ENM to define energy basins around each known state and then solves for the saddle point located at the global minimum of their intersection. The transition pathway connecting the saddle point to the neighboring equilibrium states is produced by the TReK steepest descent method in CHARMM [55]. Of particular interest are that the PNM assumes the globally optimal transition state is accessible from the equilibrium states and that the connecting pathways from the saddle point down to each equilibrium state are the preferred pathways in the reverse direction. Remarkably, the PNM pathway for adenylate kinase is consistent with intermediate crystal structures. In a similar way, Yang et al. [56] find good agreement between the normal modes and the conformational variations observed across 156 x-ray structures and 28 NMR structures (reported in one set). Notably, these sets include unbound structures as well as ones having different ligands.

The cryo-EM methods and the interpolation methods give insight into the dynamics of large conformational changes, but they are based on either forced motions or artificial sampling of mode space. These issues are addressed by icNMA, which uses the efficient, all atom cNMA method to guide the transition pathway according to the motion space described by each intermediate conformation. The remainder of the paper is structured as follows. A Hamiltonian mechanics foundation for normal mode analysis is presented in a preliminary section. This background simplifies the icNMA derivation in the method section and relates the icNMA method to several fundamentals of statistical mechanics allowing for a more powerful interpretation of the results. An analysis of *β*1,4-galactosyltransferase-T1 is given so that further details of icNMA may be discussed in context. A Q1 vs Q2 plot is given for the frequently studied adenylate kinase; it shows non-linearity of the transition path, which is in agreement with other methods [50, 51, 57, 58].

The icNMA method is based on a few fundamental theories from statistical mechanics. These concepts are presented and will allow for a more direct formulation of the icNMA method.

For small motions about an equilibrium, the potential energy of a biomolecular structure is parameterized by its generalized coordinates, ** q**, and is written as

$$V(\mathit{q})=C+\frac{1}{2}{\mathit{q}}^{T}K\mathit{q}$$

(1)

where *C* is a constant (which can be ignored by appropriate choice of the datum in the definition of potential energy) and *K* is the stiffness (or Hessian) matrix. This quadratic potential is nothing more than the first few terms in the Taylor-series expansion of the molecular potential, where the linear term drops out from the definition of equilibrium (i.e. *V*/*q _{i}* = 0 for all values of

The Hamiltonian of the system is defined as

$$H(\mathit{q},\mathit{p})=\frac{1}{2}{\mathit{p}}^{T}[{M}^{-1}(\mathit{q})]\mathit{p}+V(\mathit{q})$$

(2)

where ** p** is the vector of all conjugate momenta corresponding to the generalized coordinates and

The foundation for normal mode analysis is more easily derived from the Hamiltonian by performing the following coordinate transforms

$$\stackrel{\sim}{\mathit{q}}={M}_{0}^{1/2}\mathit{q}$$

(3)

$$\stackrel{\sim}{\mathit{p}}={M}_{0}^{-1/2}\mathit{p}$$

(4)

where the exponent of “1/2” indicates a matrix square root. The Hamiltonian is expressed in these coordinates as

$$\stackrel{\sim}{H}(\stackrel{\sim}{\mathit{q}},\stackrel{\sim}{\mathit{p}})=\frac{1}{2}{\stackrel{\sim}{\mathit{p}}}^{T}\stackrel{\sim}{\mathit{p}}+\frac{1}{2}{\stackrel{\sim}{\mathit{q}}}^{T}\stackrel{\sim}{K}\stackrel{\sim}{\mathit{q}}$$

(5)

where we have defined the mass-weighted Hessian as

$$\stackrel{\sim}{K}={M}_{0}^{-1/2}K{M}_{0}^{-1/2}$$

(6)

The Boltzmann distribution describes the accessibility of all states in an equilibrium ensemble and is given as the probability density function on phase space

$$f(\stackrel{\sim}{\mathit{q}},\stackrel{\sim}{\mathit{p}})=\frac{1}{Z(\beta )}exp\{-\beta \stackrel{\sim}{H}(\stackrel{\sim}{\mathit{q}},\stackrel{\sim}{\mathit{p}})\}$$

(7)

where

$$Z(\beta )={\int}_{\stackrel{\sim}{\mathit{q}}}{\int}_{\stackrel{\sim}{\mathit{p}}}exp\{-\beta \stackrel{\sim}{H}(\stackrel{\sim}{\mathit{q}},\stackrel{\sim}{\mathit{p}})\}d\stackrel{\sim}{\mathit{p}}d\stackrel{\sim}{\mathit{q}}$$

(8)

is the partition function, and *β* = 1/*k _{B}T* (

In the context of generating conformational ensembles, it is more desirable to have a distribution over only the generalized coordinate (**) and not over the conjugate momenta (****). The constant mass matrix has effectively decoupled the **** and **** terms in the Hamiltonian and allows the Boltzmann distribution to be integrated in closed form over **** yielding a Gaussian distribution of conformations**

$$\rho (\stackrel{\sim}{\mathit{q}})={\int}_{\stackrel{\sim}{\mathit{p}}}f(\stackrel{\sim}{\mathit{q}},\stackrel{\sim}{\mathit{p}})d\stackrel{\sim}{\mathit{p}}$$

(9)

$$=\widehat{Z}(\beta )exp\left\{-\beta \stackrel{\sim}{V}(\stackrel{\sim}{\mathit{q}})\right\}$$

(10)

where the integration over the kinetic energy portion of the Hamiltonian is incorporated into the scaling factor

$$\widehat{Z}(\beta )=\frac{1}{Z(\beta )}{\int}_{\stackrel{\sim}{\mathit{p}}}exp\left\{-\frac{\beta}{2}{\stackrel{\sim}{\mathit{p}}}^{T}\stackrel{\sim}{\mathit{p}}\right\}d\stackrel{\sim}{\mathit{p}}$$

(11)

and the remaining portion of the Hamiltonian is the potential energy in mass-weighted coordinates

$$\stackrel{\sim}{V}(\stackrel{\sim}{\mathit{q}})=\frac{1}{2}{\stackrel{\sim}{\mathit{q}}}^{T}\stackrel{\sim}{K}\stackrel{\sim}{\mathit{q}}$$

(12)

The distribution in Equation 10, indicates that the most populated conformation states are those whose displacements from the equilibrium (** = ****0**) correspond to the lowest potential energies as defined by Equation 12. These conformation displacements are more easily identified by projecting the mass-weighted generalized coordinate onto a new basis defined by the solutions to the eigenproblem

$$\stackrel{\sim}{K}{\stackrel{\sim}{\mathit{v}}}_{i}={\stackrel{\sim}{\lambda}}_{i}{\stackrel{\sim}{\mathit{v}}}_{i}$$

(13)

The significance of the eigenvectors ({* _{i}*}, unit length by convention) and eigenvalues ({

The restoring force opposing displacements away from the equilibrium state is

$${\mathit{f}}_{R}(\stackrel{\sim}{\mathit{q}})=-\frac{\partial \stackrel{\sim}{V}(\stackrel{\sim}{\mathit{q}})}{\partial \stackrel{\sim}{\mathit{q}}}=-\stackrel{\sim}{K}\stackrel{\sim}{\mathit{q}}$$

(14)

Evaluating this force for a unit magnitude displacement along one of the eigenvectors produces

$${\mathit{f}}_{R}({\stackrel{\sim}{\mathit{v}}}_{i})=-\stackrel{\sim}{K}{\stackrel{\sim}{\mathit{v}}}_{i}=-{\stackrel{\sim}{\lambda}}_{i}{\stackrel{\sim}{\mathit{v}}}_{i}$$

(15)

which indicates that the restoring force acts along the same axis as the displacement, but in the opposite direction. The eigenvectors define the axes of harmonic oscillations around the equilibrium state and the eigenvalues are the corresponding squared frequencies.

The potential energy associated with a unit magnitude displacement along one of the eigenvectors is

$$\stackrel{\sim}{V}({\stackrel{\sim}{\mathit{v}}}_{i})=\frac{{\stackrel{\sim}{\lambda}}_{i}}{2}$$

(16)

This simple relationship complements Equation 10, and indicates that the most populated conformation states are reached by displacements along the low index (i.e. low energy) eigen-vectors. is symmetric so the eigenvectors, also referred to as normal modes or mode shapes, are pairwise orthogonal and define a basis for all conformation motions around the equilibrium state. The eigenvector basis and its relationship to the system’s potential energy are the foundation for all NMA methods and will be referenced in the icNMA method section.

The reader is referred to the original publications [6, 7] for the full derivation and application of cNMA. The following section restates the interaction model and the generalized coordinates, which are both used during the formulation of icNMA.

A structure of *n* atoms is represented as *N* rigid bodies (clusters of atoms). The harmonic potential defined in Equation 1, is produced by defining an ENM, in which the clusters are interconnected by a network of springs with an atomic cutoff distance of *r _{c}* = 5Å. No springs are defined between atoms within a cluster, so this ENM is a subset of the traditional atomistic ENM.

The clustering can be defined on a per residue basis, which is the highest resolution application of cNMA, but most costly at (*n*^{3}); it can be defined according to domain, chain or subunit, which break the dependence of *N* on *n*, allowing the cNMA method to achieve (*n*) computational complexity; or it can be defined by some combination of these options as a hybrid multiscale model. As an alternative to these hierarchically-based clustering schemes, studies of protein flexibility and rigidity have been conducted [59, 60] and incorporated into coarse grained models [61]. The Vishveshwara group developed a clustering algorithm based on graph spectral analysis [62] and have used other graph theoretic methods to study structure connectivity [63].

Regardless of the clustering algorithm used with cNMA, a structure’s conformation is defined by the position and orientation of the *N* embedded rigid bodies. The translational motion of cluster *c* is measured by the displacement of its center of mass, *x** _{c}*, with respect to its location in the initial equilibrium state,
${\mathit{x}}_{c}^{\mathcal{I}}$, as defined by

$${\mathit{\chi}}_{c}={\mathit{x}}_{c}-{\mathit{x}}_{c}^{\mathcal{I}}$$

(17)

The rotational displacement of cluster *c* is parameterized by the axis-angle vector *γ*_{c}^{3}. This vector can be expressed as *γ** _{c}* =

$$R({\mathit{\gamma}}_{c})\triangleq exp\{J({\mathit{\gamma}}_{c})\}$$

(18)

$$={\mathbb{I}}_{3}+sin({\theta}_{c})\text{J}({\widehat{\mathit{a}}}_{c})+(1-cos({\theta}_{c}))\text{J}{({\widehat{\mathit{a}}}_{c})}^{2}$$

(19)

where is the 3×3 identity matrix and the skew symmetric matrix function, *J*: ^{3} → ^{3×3}, is defined by

$$J\left(\begin{array}{c}a\\ b\\ c\end{array}\right)=\left[\begin{array}{ccc}0& -c& b\\ c& 0& -a\\ -b& a& 0\end{array}\right]$$

(20)

Each cluster’s generalized coordinates are given by

$${\mathit{\delta}}_{c}={[{\mathit{\chi}}_{c}^{T},{\mathit{\gamma}}_{c}^{T}]}^{T}\in {\mathbb{R}}^{6}$$

(21)

and the whole structure’s generalized coordinates are the stacked vector

$$\mathit{\delta}={[{\mathit{\delta}}_{1}^{T},\dots ,{\mathit{\delta}}_{N}^{T}]}^{T}\in {\mathbb{R}}^{6N}$$

(22)

Given the initial, , and final, , conformations, the transition pathway is a sequence of connecting intermediate conformations, {}. The first conformation in this sequence is defined to be the initial conformation: = . The remaining pathway progresses towards and is generated by the following iterative procedure:

- Perform cNMA on .
- Compute the reference direction,
**, from to .** - Construct the global motion,
, from cNMA modes with guidance from reference direction.*g* - Generate the next conformation in the pathway by displacing the current conformation according to the global motion. This can be conveniently expressed as = +
.*g*

The cNMA computations of the first step are based on the derivations in Section 2, and are computed in the cluster coordinates defined in Section 3. The reference direction in the second step and the global motion construction in the third step, are defined in the following sections.

The reference direction points from the current conformation, , to the desired goal conformation, . The reference direction is only used to identify whether candidate motions move towards or away from the final conformation.

The translational component of cluster *c* of the reference direction is calculated as

$${\widehat{\mathit{\chi}}}_{c}={\mathit{x}}_{c}^{\mathcal{F}}-{\mathit{x}}_{c}$$

(23)

where
${\mathit{x}}_{c}^{\mathcal{F}}$ and *x** _{c}* are the center of mass positions of cluster

The rotational motion of cluster *c* of the reference direction is calculated by solving for the rotation that optimally aligns cluster *c* of the current conformation with cluster *c* of the final conformation. Each atom’s Cartesian coordinates are placed in sequence as a column vector in the matrix *A* ^{3×}* ^{n}* for the current conformation and in matrix

$${\widehat{R}}_{c}={[{BA}^{T}A{B}^{T}]}^{{\scriptstyle \frac{1}{2}}}{[A{B}^{T}]}^{-1}$$

(24)

is applied to the positions in *B* and the new atomic positions in the columns of * _{c}B* are optimally alignment in RMSD with the corresponding column positions in

The cluster’s reference direction is defined as

$${\widehat{\mathit{\delta}}}_{c}=[{\widehat{\mathit{\chi}}}_{c}^{T},{\widehat{\mathit{\gamma}}}_{c}^{T}]\in {\mathbb{R}}^{6}$$

(25)

and the conformation’s reference direction is given by the stacked vector

$$\widehat{\mathit{\delta}}=\left[{\widehat{\mathit{\delta}}}_{1}^{T},\dots ,{\widehat{\mathit{\delta}}}_{N}^{T}\right]\in {\mathbb{R}}^{6N}$$

(26)

The Hamiltonian mechanics analysis in Section 2, identifies a set of basis motions for the space of conformation displacements. The lowest energy motions from the basis lead to the most populated conformation states in the ensemble representation. This section develops a method for constructing a single conformation displacement from the set of “building block” basis motions. In general, a set of mode shapes are independent oscillations with distinct frequencies, thus precluding their superposition. However, in the case of icNMA, there are two major factors which support the representation of the mode set by the subspace that it spans rather than as a set of independent motions.

Consider the extreme case where multiple modes have the same eigenvalue. The modes degenerate into an *arbitrary*, pairwise orthogonal set. In this situation, the specific mode shapes are no longer important, they only serve as basis vectors of a subspace. This degeneration is not an all or none property. As eigenvalues become increasingly close, individual mode directions become increasingly more arbitrary. In the context of coarse grained NMA, the first few low modes, at best, may be sufficiently separated in the frequency domain allowing for analysis of their specific motions [64]. However, the ensemble of lowest modes is usually quite dense in the frequency domain and is accordingly subject to mode degeneration.

The second major factor contributing to the use of a global motion results from the ENM atomic interaction model. The ENM is a harmonic, “smooth” approximation of the energy surface. Distinguishing between mode shapes in this approximate subspace would be artificial and an over interpretation of the model. Van Wynsberghe and Cui [65] present an example illustrating the importance of motion ensembles over individual modes. They use a motion correlation analysis and demonstrate that a pair of structural components in the voltage gated ion channel, KvAP, are correlated under the lowest mode, anti-correlated under the second lowest mode and show no correlation under the ensemble of the lowest 76 modes.

The conformation space between equilibrium states is unstable and supportive of an ensemble of trajectories [48]. In agreement with this, the subspace of low frequency modes is multi-dimensional and identifies a distribution of conformations. The question that now remains is how to appropriately construct a global motion within the low mode subspace that advances a trajectory across the energy landscape to the goal conformation.

The derivations in Section 2, deal with the properties of individual modes. The global motion aims to combine multiple modes into a single representative motion and requires additional consideration. The probability density function in Equation 10, identifies the most populated conformation states by relating them to their associated potential energies in Equation 12. This result supports the construction of a global motion from the normal modes derived in Equation 13. The first 6 eigenvalues are zero valued and correspond to rigid translation and rotation of the structure; these motions are not included in the global motion. The remaining mode shapes, indexed {7, …, *d*}, are non-rigid deformations and are included in the global motion. The normal modes are derived in a mass-weighted coordinate system, but it is more appropriate to express the global motion in the non-mass-weighted cluster coordinates. This is accomplished by reversing the coordinate change in Equation 3, with the transform

$${\mathit{v}}_{i}={M}_{0}^{-1/2}{\stackrel{\sim}{\mathit{v}}}_{i}$$

(27)

The equipartition theorem states that the potential energy of a system is distributed equally, on average, across each of the system’s degrees of freedom. The normal mode solutions to Equation 13, define a basis for which each mode shape represents one degree of freedom. Conformation displacements in the direction of each mode shape must produce a constant valued energy, when averaged over the pathway. This is achieved by defining the global motion as

$$\mathit{g}=\sum _{i=7}^{d}\frac{{w}_{i}}{\sqrt{{\stackrel{\sim}{\lambda}}_{i}}}{\mathit{v}}_{i}$$

(28)

where each mode is scaled by two terms.

The
$1/\sqrt{{\stackrel{\sim}{\lambda}}_{i}}$ scaling is a foundation that produces *exactly* equal energies for all modes (Equation 16). The inverse frequency scaling corresponds to the physical interpretation of “soft” lower frequency modes moving along shallower slopes of the energy basin than the “stiff” higher frequency modes. The lower frequency modes achieve a greater magnitude of displacement than the higher frequency modes under the same fixed energy.

The *w _{i}* terms are undetermined weighting factors which allow for variability of each mode’s relative contribution, subject to the following: (i) The equipartition theorem requires the average energy contribution from each mode to be uniform (i.e. < |

$$V(\mathit{g})=\frac{1}{2}{\left(\sum _{i=7}^{d}\frac{{w}_{i}}{\sqrt{{\stackrel{\sim}{\lambda}}_{i}}}{\mathit{v}}_{i}\right)}^{T}K\left(\sum _{j=7}^{d}\frac{{w}_{j}}{\sqrt{{\stackrel{\sim}{\lambda}}_{j}}}{\mathit{v}}_{j}\right)$$

(29)

$$=\frac{1}{2}\left(\sum _{i=7}^{d}\frac{{w}_{i}}{\sqrt{{\stackrel{\sim}{\lambda}}_{i}}}{\stackrel{\sim}{\mathit{v}}}_{i}^{T}\right)\stackrel{\sim}{K}\left(\sum _{j=7}^{d}\frac{{w}_{j}}{\sqrt{{\stackrel{\sim}{\lambda}}_{j}}}{\stackrel{\sim}{\mathit{v}}}_{j}\right)$$

(30)

$$=\frac{1}{2}\left(\sum _{i=7}^{d}\frac{{w}_{i}}{\sqrt{{\stackrel{\sim}{\lambda}}_{i}}}{\stackrel{\sim}{\mathit{v}}}_{i}\right)\xb7\left(\sum _{j=7}^{d}\frac{{w}_{j}}{\sqrt{{\stackrel{\sim}{\lambda}}_{j}}}\stackrel{\sim}{K}{\stackrel{\sim}{\mathit{v}}}_{j}\right)$$

(31)

$$=\frac{1}{2}\left(\sum _{i=7}^{d}\frac{{w}_{i}}{\sqrt{{\stackrel{\sim}{\lambda}}_{i}}}{\stackrel{\sim}{\mathit{v}}}_{i}\right)\xb7\left(\sum _{j=7}^{d}{w}_{j}\sqrt{{\stackrel{\sim}{\lambda}}_{j}}{\stackrel{\sim}{\mathit{v}}}_{j}\right)$$

(32)

$$=\frac{1}{2}\sum _{i=7}^{d}\sum _{j=7}^{d}{w}_{i}{w}_{j}\frac{\sqrt{{\stackrel{\sim}{\lambda}}_{j}}}{\sqrt{{\stackrel{\sim}{\lambda}}_{i}}}({\stackrel{\sim}{\mathit{v}}}_{i}\xb7{\stackrel{\sim}{\mathit{v}}}_{j})$$

(33)

$$=\frac{1}{2}\sum _{i=7}^{d}{w}_{i}^{2}$$

(34)

The orthogonality of the eigenvectors reduces the dot product in the last parentheses to a delta function, which eliminates the summation over *j*.

Global motions of equal energy (and a weighting factor normalization constraint) are produced by setting the result of Equation 34, to a constant

$$\frac{1}{2}\sum _{i=7}^{d}{w}_{i}^{2}={s}^{2}$$

(35)

This is an equation for a hyper-sphere whose radius,
$s\sqrt{2}$, sets a constant value for the overall energy of the mode ensemble. An optimization over the *w _{i}* values on the hyper-sphere can be used to construct the global motion that most directly moves to the goal conformation. However, this optimization is too costly for iterative application and does not necessarily satisfy the constraint: < |

It is difficult to allow the modes to vary in energy along the pathway and still guarantee the final distribution is equal. This is resolved by setting all *w _{i}* values to the same constant magnitude. This computational convenience, which also reduces the dimension of the optimization space, is expressed as

$${w}_{i}={s}_{i}\xb7c\{\begin{array}{ll}{s}_{i}\hfill & =\pm 1\hfill \\ c\hfill & =s\sqrt{{\scriptstyle \frac{2}{d-6}}}\hfill \end{array}$$

(36)

where *s _{i}* is the sign of the weighting factor and

The constant magnitude, *c*, defined in Equation 36, is factored out of the global motion summation producing the expression

$$\mathit{g}=c\sum _{i=7}^{d}\frac{{s}_{i}}{\sqrt{{\stackrel{\sim}{\lambda}}_{i}}}{\mathit{v}}_{i}$$

(37)

The remainder of this section is a discussion of the weighting factor magnitude (*c*), the summation upper bound (*d*), the mode shape sign choice (*s _{i}*) and the implications of mode shape “addition”.

The weighting factor magnitude, *c*, is defined in terms of a constant energy level, *s*, which is distributed equally over the *d* − 6 non-rigid modes (Equation 36). Selecting an appropriate energy level is not trivial or even necessarily uniform across structures. A qualitatively equivalent result is achieved by scaling the global motion to produce a predetermined root-mean-square displacement (RMSD), *μ*. The reason for this approach is that the interaction model for cNMA is an ENM defined by an atomic separation *distance* (*r _{c}* = 5Å). The ENM remains valid as long as the conformation displacement is small, ensuring that the local geometries do not change significantly. Restricting the global motion to a “local neighborhood” is more naturally stated with an RMSD constraint than an energy constraint.

The summation upper bound, *d*, can be reduced from its maximum value (i.e. the total number of modes: *d* = 6*N*) to isolate the contribution from the lower modes. Before the global motion is scaled with the coefficient *c*, it has a total magnitude given by:
$m(d)={\sum}_{i=7}^{d}{\stackrel{\sim}{\lambda}}_{i}^{-1/2}$. Therefore, to achieve a particular percentage, *p*, of this total magnitude, a new upper bound, , on the summation is defined such that

$$m(\widehat{d})=p\xb7m(d)$$

(38)

The distribution of the frequency spectrum results in < *pd* and often times *pd*.

The mode shapes of cNMA (and all NMA-based techniques) are indications of principal axes of motion, not directions. The general equation for an eigenproblem *A*** v** =

A direct implementation of the RMSD minimization involves displacing the current intermediate conformation in the “plus” and “minus” directions along each mode shape, projecting the conformations from cluster coordinates into full Cartesian coordinates, computing RMSD values to and choosing the minimum. A significantly more efficient (and equivalent) method is to compute the reference direction according to Equation 26, and then a simple dot product: *s _{i}* = sign(

All that remains of the global mode construction in Equation 37, is the addition operation. In Cartesian coordinate based NMA, mode shapes can be unambiguously combined to produce the global motion (i.e. addition of translational displacements is commutative). The cNMA method is based on cluster coordinates which include translational and rotational components. The translational displacements are Cartesian and can be summed. The rotational components are given by axis-angle vectors which represent rotation matrices. After each mode’s rotational displacements have been put into rotation matrix form, the cumulative rotational displacement on each cluster is computed by matrix multiplication, which is not a commutative operation. Due to the computational expense of converting each axis-angle vector and the arbitrary order of multiplication, this process is not desirable.

The small motion assumption which the cNMA method is predicated on, allows the rotational displacement of each mode to be represented by the first order approximation of Rodrigues’ expression for the rotation matrix (Equation 18): *R*(** γ**) ≈ + J(

$$R({\mathit{\gamma}}_{1})R({\mathit{\gamma}}_{2})\approx {\mathbb{I}}_{3}+\text{J}({\mathit{\gamma}}_{1}+{\mathit{\gamma}}_{2})$$

(39)

by making use of the facts that (1) the addition of skew symmetric matrices is equivalent to summing the corresponding axis-angle vectors; and (2) the second order term J(*γ*_{1})J(*γ*_{2}) can be discarded in this first order calculation. The matrix multiplication is reduced to the addition of axis-angle vectors, which is a commutative process and computationally cheap. It is therefore a straightforward process to create a global motion from a set of cNMA mode shapes and frequencies, as defined by Equation 37.

The ENM is redefined for each intermediate structure enabling the cNMA mode set to reflect the current accessible motion space. The ENM implicitly assumes that each intermediate conformation is at equilibrium. This condition cannot be strictly observed over the course of the transition. However, the core transition motion that we aim to capture involves large coordinated structure motions, which are primarily dependent on the overall shape and density of the ENM and are substantially less sensitive to local rearrangements [18, 66]. It is for this reason that we can continue to apply the ENM as a traveling energy basin across the transition pathway. As an alternative, there are switching methods for including multiple energy basins in various network based models [48, 49], as well as MD methods [67].

The cumulative effect of cNMA recalculation provides icNMA with the flexibility to produce a non-linear pathway and to allow temporary localized unfolding. These two characteristics are critical in generating a functionally relevant transition pathway [48], and are also both inherently excluded by linear interpolation methods.

As discussed in the Introduction, there are interpolation-based methods that use interatomic distances to drive one conformation to another. These approaches guarantee a transition pathway that reaches the goal conformation and maintains atomic contact distances within the range defined by the starting and goal conformations. However, these forced motions do not take into account the local energy landscape. In contrast, the icNMA method is not artificially constrained and is able to produce conformations with undesirable local geometries (i.e. bonded atomic pairs that stretch too far apart or non-bonded atoms that approach too closely).

The icNMA method is designed to capture the core motion of the transition pathway, but if tighter geometric control over the intermediate conformations is preferred, any of the following modifications can be applied if a global motion causes an undesirable conformation: (i) Perform an energy minimization to relax the intermediate conformation; or (ii) Revert the pathway back to the previous conformation and stiffen the harmonic potential between each pair of atoms that cause an undesirable contact distance; or (iii) Revert the pathway back to the previous conformation and merge the clusters containing each pair of atoms that cause an undesirable contact distance. In Section 5, a comparison is made between an unconstrained icNMA pathway and one using the merged cluster modification.

Consider a pair of conformations that are based on the same crystal structure. By the definition of clustering, the relative atomic positions within each cluster remain fixed, so the RMSD between conformations is entirely due to differences in cluster locations. In this situation it is possible to achieve an RMSD value of zero. Now consider a pair of conformations for the same protein, but derived from different crystal structures (e.g. the open and closed conformation states). The RMSD between conformations has a contribution from differences in cluster positions *and* a contribution from differences in relative atomic positions within each cluster. The former varies as the clusters move and can be used to evaluate the global progress of a conformational transition, while the latter remains constant and is an indication of local structure geometries that have been locked into place as a result of clustering. In order to make use of these global and local quantities, it is necessary to decouple the pair of contributions. This is accomplished by introducing a new, cluster-based, metric for conformation comparison.

Given a pair of conformations and a clustering scheme, the RMSD is calculated independently for each cluster: isolate cluster *c* from each conformation, center and optimally align the pair, compute the RMSD between the clusters and define the quantity as *RMSD _{c}*. Once this procedure is performed for each cluster, the

$$\mathit{bRMSD}=\sqrt{\frac{1}{n}\sum _{c=1}^{N}[A(c)\xb7{\mathit{RMSD}}_{c}^{2}]}$$

(40)

where *A*(*c*) is the number of atoms in cluster *c*. This quantity represents the lowest possible RMSD that can be achieved by optimal cluster positioning.

Crystal structure resolution is not equivalent to resolution of atomic position. The PDB states as a guideline that resolution of atomic position is one-tenth to one-fifth of the crystal resolution for structures with an R-value (a quantification of the model’s agreement with the crystallographic data) less than 0.2. Accordingly, “good” crystal structures with resolutions of ~ 2.0Å, give atomic positions with accuracies of ~ 0.2–0.4Å. The observed bRMSD quantity for clustering by residue is ~ 0.3Å, which is well within this range. The DOFs locked into place when clustering by residue impose a restriction on the atoms in the structure, which corresponds to an RMSD value *less* than the atomic resolution of the model. Therefore, any (more aggressive) candidate clustering may be validated by performing a bRMSD calculation prior to the icNMA computations.

The progress of an icNMA transition pathway is monitored by a combination of the RMSD and bRMSD metrics. The quantities discussed below are depicted in Figure 1. Consider an icNMA intermediate conformation, , at an RMSD *d* = *RMSD*(, ) from the goal conformation. The space of all possible global motion steps around defines a hypersphere, *S _{i}*, of radius

An arbitrary icNMA pathway is depicted as a series of conformations (filled circles) connected by global motions (solid lines). The current intermediate conformation () is surrounded by the hyper-sphere *S*_{i}. The sign choice for the direction of the global **...**

$$\theta =\text{acos}(\frac{\mu}{2d})$$

(41)

This solution is obtained with the law of cosines and is based on the geometry of the configuration shown in Figure 1.

As the icNMA pathway gets closer to the goal conformation, *θ* becomes smaller, indicating a restriction on the available space of RMSD reducing global motions. In the limit of *θ*(*d* → ∞) = *π*/2 the entire hemisphere of *S _{i}* is RMSD minimizing. The

$$\pi (i)=\frac{\mathit{RMSD}({\mathcal{C}}_{I},{\mathcal{C}}_{F})-\mathit{RMSD}({\mathcal{C}}_{i},{\mathcal{C}}_{F})}{\mathit{RMSD}({\mathcal{C}}_{I},{\mathcal{C}}_{F})-(b\mathit{RMSD}({\mathcal{C}}_{I},{\mathcal{C}}_{F})+\mu )}$$

(42)

where the input parameters for the functions *RMSD* and *bRMSD* indicate the conformation pair to which they are applied. The numerator quantifies how much RMSD has been traversed. The denominator quantifies the total RMSD the pathway is expected to traverse; excluding the RMSD which cannot be reduced due to clustering (i.e. bRMSD) and excluding the RMSD associated with the step size (i.e. *μ*). It is possible for *π*(*i*) to reach a value slightly greater than 1, if the trajectory gets within *μ* RMSD of the goal conformation, but in practice, the function will approach a value of 1.

The *β*1,4-galactosyltransferase-T1 structure (open: PDB=1FR8 [68] and closed: PDB=1NKH [**?**]), commonly written with the abbreviation *β*4Gal-T1, is composed of 2209 atoms in 271 residues and is the catalytic component of the lactose synthase enzyme (Figure 2). During biological function, *β*4Gal-T1 binds uridine diphosphogalactose (UDP-Gal), thus causing a large conformational change in the loop region comprising residues 345-365. The mean RMSD value between conformations for the alpha-carbons on the loop is 9.8Å, and Lys^{352} achieves a displacement of 20Å. In comparison, the rest of the structure experiences a mere 0.6Å displacement [**?**]. Trp^{314} is opposite the large loop and swings down over the pocket assisting in the ligand binding [**?**].

Cartoon representation of the open conformation of *β*4Gal-T1. The loop residues are shown in yellow, the UDP-Gal ligand is space filled shown in red, and Trp^{314} is shown in green sticks. (All structure representations are created with PyMOL [**?** **...**

*β*4Gal-T1 belongs to a superfamily of enzymes called glycosyltransferases that are involved in the synthesis of sugar moieties of glycoproteins and glycolipids. Crystal structures of many of these enzymes are available and, similar to *β*4Gal-T1, they exhibit conformational changes involving at least one flexible loop [**?**]. Analyses of the other glycosyltransferases is possible and is expected to provide insight into the conformational dynamics of these enzymes.

The icNMA simulations are performed on the open conformation without the ligand; clustering is by residue and the global motion step size is *μ* = 0.1Å, producing an expected RMSD of: (*bRMSD* + *μ*) = 0.60Å. The control simulation is run with *p* = 100% (Equation 38) and no contact distance constraints. A second simulation is run with *p* = 50% and no contact constraints, to probe the relevance of the higher frequency modes on the pathway construction. A third simulation is run with *p* = 100% and a minimum allowable contact distance of 1.1Å and a maximum allowable peptide bond length of 2.5Å, to test the effects of constrained geometries. The three simulations, now referred to as , and , respectively, are listed in Table 1, along with their DOF usage and *π* values. The following sections discuss pathway RMSD, mode decompositions, local geometry, pathway features and energy.

The RMSD evolutions (Figure 3) show that all simulations experience an initial steep descent in RMSD, followed by asymptotic approaches to their final values. The early success of is explained by the fact that the lowest modes are contributing the majority of the desired transition and the extra modes utilized by , which come from the higher frequency range, are extraneous. The early success of indicates that, initially, the pathway naturally stays within the range of the constraints, thus performing like the control simulation.

The mode space dimension of is nearly 3 times smaller than that of , but yet it achieves a significantly better transition result (*π* = 86.5% vs *π* = 59.1%). This comparison indicates that the dimensionality of the motion subspace is less important than which region of the full motion space it represents. This concept is quantified by computing mode space decompositions.

A pair of icNMA simulations for the same initial structure, but based on different parameters, will produce pathways that explore different regions of conformation space. At any given intermediate step, the accessible motion space of one simulation is compared to that of the other by using a decomposition. Let the normalized modes of one simulation be given by = {*a*_{1}, …, *a** _{p}*} and the normalized modes of a second simulation be given by = {

$${D}_{i,j}=\mid {\mathit{a}}_{i}\xb7{\mathit{b}}_{j}\mid $$

(43)

The decomposition matrix is populated with values on the interval [0, 1] where 0 indicates linear independence and 1 indicates exact alignment of the corresponding mode pair.

The decomposition data is evaluated in two ways. First, a concentration of high values along the diagonal indicates that modes from one set are highly aligned with modes with the same index of the other set (a solid line is superimposed on the decomposition plots for reference). This feature gives a general indication of how the motion subspace of compares to that of . Second, the vector norm of column *i*, is a measure of how well mode *b** _{i}* is represented by all of . Computing this norm for all columns in

The discrepancy in *π* values is now addressed by mode decompositions with respect to the control simulation. The decompositions are based on the intermediate conformations at step 40, which is halfway through each simulation and also approximately where the RMSD plots start to diverge the fastest.

The decomposition of the modes (Figure 4) covers the reference line (i.e. the decomposition spans the same number of modes on each axis). This indicates that each of the 450 modes of is highly aligned with the mode of the same index. Therefore, the low-frequency motion space has been completely captured by . This is evident by the fact that achieves almost the same *π* value as the control. In contrast, the decomposition (Figure 5) deviates from the reference diagonal. As a result, the 1266 modes from do not correspond to an equal number of modes in , but rather are evenly distributed across all 1626 modes. In order for the lower dimension subspace of to be evenly distributed across the higher dimension subspace of , there must necessarily be motions of which are not well represented by .

The decomposition of the mode shapes from compared with the mode shapes of , at intermediate conformation 40. The data occupies the diagonal, thus indicating that has, as expected, accurately and completely reproduced the low frequency motion space **...**

The decomposition of the mode shapes from compared with the mode shapes of , at intermediate conformation 40. The data is shifted to the right of the reference line, thus indicating that modes of are uniformly shifted into the higher frequency spectrum. **...**

The plot of column vector norms (Figure 6) confirms this by clearly showing that even though the modes capture the entire frequency spectrum fairly well, the modes do a noticeably better job of capturing the low frequency modes, which are necessary for constructing the transition pathway. Further, since these modes are more highly weighted in their relative contribution to the global motion than the higher frequency modes, the differential seen in Figure 6, is that much more significant.

The preceding decomposition analysis explains why the lower DOF actually produces a better *π* value than the higher DOF . Intermediate conformation geometries and pathway energies are analyzed in the following sections to ascertain the necessity and effects of constrained simulations.

Figure 7, shows the evolution of the distances between the 5 pairs of atoms that reach the shortest contact distances during . Figure 8, shows the evolution of the lengths of the 5 peptide bonds that stretch the most during . A pair of atoms reach a contact distance of 0.25Å and a peptide bond is stretched to a length of 6.5Å – both of these situations are not possible. However, these are the extremes of the simulation and are only temporary. In fact, the bulk of the structure is acceptable. In the open conformation of *β*4Gal-T1, there are 26,340 atom pairs within a cutoff distance of 5Å, and of these, 18,840 are between atoms of different residues and are free to move relative to each other. During , 52 of these atomic pairs (0.3%) come within 1.1Å of each other and 17 of the peptide bonds (6.3%) stretch beyond 2.5Å. The results are similar (0.1% and 5.2%, respectively) and are not plotted.

Evolution of the separation distance between atomic pairs that reach one of the 5 shortest separation distances during the simulation. The atom pairs in each plot, from the top down, are: (Ala^{221}:O – Gly^{222}:N), (Asp^{350}:N – Lys^{351}:CE), **...**

Evolution of the peptide bond lengths for those bonds that become one of the 5 longest bonds during the simulation. The atom pairs in each plot, from the top down, are: (GLY^{313}:C – TRP^{314}:N), (PHE^{360}:C – ASP^{361}:N), (ARG^{359}:C – **...**

Comparing the three icNMA simulations has shown the effects of reducing the dimension of the global motion subspace and of imposing geometric constraints. In particular, the unconstrained simulation, , produces a near complete (*π* = 96.4%) transition pathway while allowing minimal local geometry violations. The constrained simulation, , removes the undesirable contacts, but at the expense of causing a mode set frequency shift, which limits the progression of the transition pathway (*π* = 59.1%).

There are several features of the *β*4Gal-T1 transition pathway that make it an excellent example structure. There are many possibilities for icNMA modification, but as demonstrated in the previous sections, successfully produces the core motions of a transition pathway from the open to the closed conformation. The discussion which follows is based on the motions of this simulation and highlight the capabilities of the icNMA method. Several AVI movie clips showing features of the pathway discussed below are available as Supplemental Material on the journal’s website.

The main loop and Trp^{314} are involved in a cooperative motion in which they move towards each other and overlap as they come down onto the ligand, holding it in the binding pocket (Figure 9). A direct interpolation of this motion would cause severe steric clash, but icNMA produces a pathway in which Trp^{314} undergoes a rotation about the backbone so that it can pass underneath the main loop and reach the closed conformation. The very fact that the cNMA parameterization is designed to represent rotational motions allows icNMA to describe this complex transitional motion. Purely Cartesian based models are not able to characterize this motion without a full atomic model, but the computational expense is prohibitive.

The superimposed intermediate conformations show the icNMA transition (blue → red) of *β*4Gal-T1. The viewpoint in (A) is from the binding pocket looking outward and shows Trp^{314} pass inside of the main loop as they approach each other. **...**

The biological function of *β*4Gal-T1 is accomplished by the large motions of relatively few residues (27 main loop residues and 6 residues around Trp^{314}), while the bulk of the structure remains stationary. This type of motion is common to globular proteins and the analysis greatly benefits from modeling techniques which allow for multiscale representations. The scope of this paper does not allow for the detailed inclusion of such a simulation, but the authors performed an icNMA simulation in which the mobile residues mentioned above are individually clustered and the remaining structure is defined as a single cluster. This model uses 198 non-rigid DOFs, which is 3% of the all-atom, non-rigid DOFs, and has *bRMSD* = 0.75Å. The computation time is cut by an order of magnitude with respect to , but yet the conformational transition pathway still achieves *π* = 93.5% and never gets more than 0.9Å RMSD away from (which is closer than either or ).

The final interesting characteristic of the *β*4Gal-T1 transition pathway included in this discussion is the extension of an alpha-helix. As the main loop undergoes its large motion, it creates slack (perhaps this is what allows the clearance for Trp^{314} to pass underneath it). This slack is converted into an additional turn (residues 358-363) on a neighboring alpha-helix as the path approaches the goal conformation (Figure 9).

An effective way to monitor pathway energy is by utilizing the ENM framework upon which icNMA is already based. The distances between all pairs of atoms in each end state are computed and serve as reference values (i.e. equilibrium states). During a transition, displacements from these reference values result in deformation of the corresponding springs in the ENM. The energy function for atom *a* in intermediate conformation is defined as

$${V}_{a}(i)=\frac{1}{2}\sum _{b=1}^{n}min{(\delta {d}_{a,b}(i,\mathcal{I}),\delta {d}_{a,b}(i,\mathcal{F}))}^{2}$$

(44)

where *δd _{a}*

The potential energy contained in the springs radiating from each C_{α} as a function of pathway intermediate index (colormap in center). Summing across the rows (i.e. over the pathway) produces the vertical plot at the left, which shows the total **...**

The potential function varies with atom index (i.e. spatial location) and conformation index (i.e. “temporal” location). Fixing one of these variables and summing over the other produces an energy distribution over either space or “time”. These plots are shown alongside the potential function in Figure 10.

Summing over conformation indices produces an energy distribution over atomic indices (vertical plot in Figure 10). This plot clearly identifies the structural components involved in the transition. The main loop and Trp^{314} are the largest contributors to system energy and the next two locations are where the main loop anchors to the structure prior to the transition (labeled “release”) and after the transition (labeled “landing”).

Summing over atomic indices produces an energy distribution over conformation indices (horizontal plot in Figure 10). This distribution reveals the energy barrier in the transition. Rather than summing over all atomic indices, summations can be taken independently on each of the four locations identified above. The peaks in each of these energy distributions (not shown, but readable from the potential function) indicates when in the transition the corresponding structure element is active.

Combining all of these findings results in the following pathway sequence: (1) Trp^{314} completes the first phase of its motion, (2) the main loop reaches an intermediate position and the landing site rearranges, both coinciding with the transition barrier, (3) immediately after the barrier is crossed, the release site rearranges, and (4) Trp^{314} completes its second phase of motion. The two stage motion of Trp^{314} is evident in Figure 9, by the sliding motion of the side chain, followed by its rotation around the backbone.

*β*4Gal-T1 was chosen as the primary example structure for this paper because its crystal structures suggest a wide range of motions which must be spatially and temporally coordinated in such a way that interpolation (i.e. linear methods) can not capture. Adenylate kinase (open: PDB=4AKE [**?** ] and closed: PDB=1AKE [**?** ]) is a common structure used to test computational methods due to the wealth of experimental data available. The icNMA pathway for the adenylate kinase transition is briefly studied and comparisons are made with existing methods. The following material is not intended as a complete analysis or as validation of icNMA, but rather as an indication that the most fundamental properties of such a transition are captured by icNMA.

Franklin et al. [51] create a transition pathway by stitching together one linear system for the initial conformation with a second linear system for the final conformation. The non-linear pathway is well represented by a Q1 vs Q2 plot, in which the authors track the percentage of initial (Q1) and final (Q2) conformation contacts that exist in each intermediate conformation. This plot shows that Q1 contacts are broken and *then* Q2 contacts are formed in the very last stages of the transition. This non-linear pathway enters the lower left region of the plot where both Q1 and Q2 are minimized. The barrier conformations which populate this region are meta-stable as they are equally distant from the stable open and closed conformations. In contrast, methods which produce linear pathways, like the interpolation-based UMMS [**? ?** ] method, force the simultaneous breaking of Q1 contacts and formation of Q2 contacts.

In order to place the icNMA method in context with other techniques, transition pathways of adenylate kinase are mapped onto a Q1 vs Q2 plot (Figure 11). The transition pathway from the open to the closed conformation is complete in RMSD (*π* = 95.3%), but it only forms 30.8% of the missing atomic contacts required in the final conformation (the Q2 value increases from 0.81 to 0.87). The application of icNMA in the reverse direction (i.e. from the closed conformation to the open conformation) produces a transition pathway that once again achieves RMSD convergence (*π* = 94.9%) while only forming 26.4% of the missing atomic contacts required in the open conformation (the Q1 value increases from 0.82 to 0.86).

Q1 vs Q2 plots for icNMA pathways generated for adenylate kinase. The transition from open to closed (●) and closed to open (+) are both non-linear and reach the barrier region in the lower left portion of the plot. Both simulations use the parameters. **...**

Both of the transition pathways enter the barrier region and then proceed towards their respective goal conformations. This demonstrates that the icNMA method is able to produce a non-linear pathway and access the population of meta-stable states in the barrier region. The *s _{i}* parameters of the icNMA global motion are computed with respect to minimizing RMSD from the goal conformation, whereas the Q1–Q2 coordinate space is directly tied to contact geometries. If desired, an alternative contact-based function for setting the

The icNMA method is founded on a traveling harmonic potential which defines the evolving global motion and guides the transition pathway. Unlike interpolation and extrapolation based methods, the iterative updating of the local accessible motion space reflects the changing geometry of the structure. The global motion allows the structure to evolve towards its destination conformation along a non-linear path while traveling only through the reduced dimensional subspace defined by the low-frequency modes of cNMA.

The analysis of *β*4Gal-T1 illustrates the features of icNMA and the resulting pathway captures a variety of different structure motions which all give insight into how the biological function is achieved. The energy analysis of the pathway reveals spatially and temporally coordinated motions. The main structural elements identified in this analysis are ideal candidates for mutagenesis studies.

The Introduction discusses NMA-based structure deformation methods which utilize distance constraints and x-ray diffraction pattern matching. These methods are powerful ways to convert partial structural data from experiments into high resolution conformations, especially in the meta-stable transition region where complete structural information is difficult to produce. The icNMA method can perform this task by replacing its mode shape sign choice with any scoring function that promotes a given set of structural features. In addition to producing transition states, the icNMA method also produces corresponding transition pathways.

The power and flexibility of icNMA is due to its effective combination of independent local (i.e. cNMA mode space) and global (i.e. RMSD minimization) constraints.

Click here to view.^{(11M, avi)}

Click here to view.^{(11M, avi)}

Click here to view.^{(11M, avi)}

Click here to view.^{(11M, avi)}

This work was supported by NIH Grant R01GM075310.

1. Berman HM, Westbrook J, Feng Z, Gilliland G, Bhat T, Weissig H, Shindyalov IN, Bourne PE. The Protein Data Bank. Nucleic Acids Research. 2000. pp. 235–242. http://www.pdb.org/ [PMC free article] [PubMed]

2. Ha T. Single-Molecule Fluorescence Resonance Energy Transfer. METHODS. 2001;25:78–86. [PubMed]

3. Palmer AG., III Probing molecular motion by NMR. Current Opinion in Structural Biology. 1997;7:732–737. [PubMed]

4. Doniach S, Eastman P. Protein dynamics simulations from nanoseconds to microseconds. Current Opinion in Structural Biology. 1999;9:157–163. [PubMed]

5. Englander SW, Krishna MM. Hydrogen exchange. Nature Structural Biology. 2001;8:741–742. [PubMed]

6. Schuyler AD, Chirikjian GS. Normal mode analysis of proteins: a comparison of rigid cluster modes with *C*_{a} coarse graining. Journal of Molecular Graphics and Modelling. 2004;22:183–193. [PubMed]

7. Schuyler AD, Chirikjian GS. Efficient determination of low-frequency normal modes of large protein structures by cluster-NMA. Journal of Molecular Graphics and Modelling. 2005;24:46–58. [PubMed]

8. Levitt M. A simplified representation of protein conformations for rapid simulation of protein folding. Journal of Molecular Biology. 1976;104:59–107. [PubMed]

9. Brooks BR, Bruccoleri RE, Olafson BD, States DJ, Swaminathan S, Karplus M. CHARMM: A Program for Macromolecular Energy, Minimization, and Dynamics Calculations. Journal of Computational Chemistry. 1983;4:187–217.

10. Pearlman DA, Case DA, Caldwell JW, Ross WS, Cheatham TE, III, DeBolt S, Ferguson D, Seibel G, Kollman P. AMBER, a package of computer programs for applying molecular mechanics, normal mode analysis, molecular dynamics and free energy calculations to simulate the structural and energetic properties of molecules. Computer Physics Communications. 1995;91:1–41.

11. Duan Y, Kollman P. Computational protein folding: From lattice to all-atom. IBM Systems Journal. 2001;40:297–309.

12. Noguti T, Go N. Collective variable description of small-amplitude conformational fluctuations in a globular protein. Nature. 1982;296:776–778. [PubMed]

13. Go N, Noguti T, Nishikawa T. Dynamics of a Small Globular Protein in Terms of Low-Frequency Vibrational Modes. Proceedings of the National Academy of Sciences of the United States of America. 1983;80:3696–3700. [PubMed]

14. Brooks B, Karplus M. Harmonic Dynamics of Proteins: Normal Modes and Fluctuations in Bovine Pancreatic Trypsin Inhibitor. Proceedings of the National Academy of Sciences of the United States of America. 1983;80:6571–6575. [PubMed]

15. Levitt M, Sander C, Stern PS. Protein normal-mode dynamics: Trypsin inhibitor, crambin, ribonuclease and lysozyme. Journal of Molecular Biology. 1985;181:423–447. [PubMed]

16. Tirion MM, ben Avraham D. Normal Mode Analysis of G-actin. Journal of Molecular Biology. 1993;230:186–195. [PubMed]

17. Bahar I, Rader A. Coarse-grained normal mode analysis in structural biology. Current Opinion in Structural Biology. 2005;15:586–592. [PMC free article] [PubMed]

18. Zheng W, Brooks BR, Thirumalai D. Low-frequency normal modes that describe allosteric transitions in biological nanomachines are robust to sequence variations. Proceedings of the National Academy of Sciences of the United States of America. 2006;103:7664–7669. [PubMed]

19. Tirion MM. Large Amplitude Elastic Motions in Proteins from a Single-Parameter, Atomic Analysis. Physical Review Letters. 1996;77:1905–1908. [PubMed]

20. Bahar I, Atilgan AR, Erman B. Direct Evaluation of Thermal Fluctuations in Proteins Using a Single-Parameter Harmonic Potential. Folding & Design. 1997;2:173–181. [PubMed]

21. Bahar I, Atilgan AR, Demirel MC, Erman B. Vibrational Dynamics of Folded Proteins: Significance of Slow and Fast Motions in Relation to Function and Stability. Physical Review Letters. 1998;80:2733–2736.

22. Atilgan A, Durell S, Jernigan R, Demirel M, Keskin O, Bahar I. Anisotropy of Fluctuation Dynamics of Proteins with an Elastic Network Model. Biophysical Journal. 2001;80:505–515. [PubMed]

23. Tozzini V. Coarse-grained models for proteins. Current Opinion in Structural Biology. 2005;15:144–150. [PubMed]

24. Sen TZ, Feng Y, Garcia JV, Kloczkowski A, Jernigan RL. The extent of cooperativity of protein motions observed with elastic network models is similar for atomic and coarser-grained models. Journal of Chemical Theory and Computation. 2006;2:696–704. [PMC free article] [PubMed]

25. Tama F, Gadea FX, Marques O, Sanejouand YH. Building-Block Approach for Determining Low-Frequency Normal Modes of Macromolecules. Proteins: Structure, Function, and Genetics. 2000;41:1–7. [PubMed]

26. Li G, Cui Q. A Coarse-Grained Normal Mode Approach For Macromolecules: An Efficient Implementation and Application to Ca^{2+}-ATPase. Biophysical Journal. 2002;83:2457–2474. [PubMed]

27. Chun HM, Padilla CE, Chin DN, Watanabe M, Karlov VI, Alper HE, Soosaar K, Blair KB, Becker OM, Caves LS, Nagle R, Haney DN, Farmer BL. MBO(N)D: A Multibody Method for Long-Time Molecular Dynamics Simulations. Journal of Computational Chemistry. 2000;21:159–184.

28. Zhou L, Siegelbaum SA. Effects of surface water on protein dynamics studied by a novel coarse-grained normal mode approach. Biophysical Journal. 2008;94:3461–3474. [PMC free article] [PubMed]

29. Schlitter J, Engels M, Krüger P. Targeted molecular dynamics: a new approach for searching pathways of conformational transitions. Journal of Molecular Graphics. 1994;12:84–89. [PubMed]

30. Zheng W, Brooks BR. Modeling protein conformational changes by iterative fitting of distance constraints using reoriented normal modes. Biophysical Journal. 2006;90:4327–4336. [PubMed]

31. Petrone P, V, Pande S. Can conformational change be described by only a few normal modes? Biophysical Journal. 2006;90:1583–1593. [PubMed]

32. Kirillova S, Cortés J, Stefaniu A, Siméon T. An NMA-guided path planning approach for computing large-amplitude conformational changes in proteins. Proteins. 2008;70:131–143. [PubMed]

33. Krebs W, Alexandrov V, Wilson CA, Echols N, Yu H, Gerstein M. Normal Mode Analysis of Macromolecular Motions in a Database Framework: Developing Mode Concentration as a Useful Classifying Statistic. Proteins: Structure, Function, and Genetics. 2002;48:682–695. [PubMed]

34. Yang L, Song G, Jernigan RL. How well can we understand large-scale protein motions using normal modes of elastic network models? Biophysical Journal. 2007;93:920–929. [PubMed]

35. Song G, Jernigan RL. An enhanced elastic network model to represent the motions of domain-swapped proteins. Proteins: Structure, Function, and Genetics. 2006;63:197–209. [PubMed]

36. Su JG, Jiao X, Sun TG, Li CH, Chen WZ, Wang CX. Analysis of Domain Movements in Glutamine-Binding Protein with Simple Models. Biophysical Journal. 2007;92:1326–1335. [PubMed]

37. Saibil HR. Conformational changes studied by cryo-electron microscopy. Nature Structural Biology. 2000;7:711–714. [PubMed]

38. Dror O, Lasker K, Nussinov R, Wolfson H. EMatch: an efficient method for aligning atomic resolution subunits into intermediate-resolution cryo-EM maps of large macromolecular assemblies. Acta Crystallographica Section D. 2007;D63:42–49. [PMC free article] [PubMed]

39. Tama F, Wriggers W, Brooks CL., III Exploring global distortions of biological macromolecules and assemblies from low-resolution structural information and elastic network theory. Journal of Molecular Biology. 2002;321:297–305. [PubMed]

40. Tama F, Miyashita O, Brooks CL., III Flexible multi-scale fitting of atomic structures into low-resolution electron density maps with elastic network normal mode analysis. Journal of Molecular Biology. 2004;337:985–999. [PubMed]

41. Tama F, Miyashita O, Brooks CL., III Normal mode based flexible fitting of high-resolution structure into low-resolution experimental data from cryo-EM. Journal of Structural Biology. 2004;147:315–326. [PubMed]

42. Hinsen K, Reuter N, Navaza J, Stokes DL, Lacapère JJ. Normal mode-based fitting of atomic structure into electron density maps: application to sarcoplasmic reticulum Ca-ATPase. Biophysical Journal. 2005;88:818–827. [PubMed]

43. Kim MK, Chirikjian GS, Jernigan RL. Elastic Models of Conformational Transitions in Macromolecules. Journal of Molecular Graphics and Modelling. 2002;21:151–160. [PubMed]

44. Kim MK, Jernigan RL, Chirikjian GS. Rigid-cluster models of conformational transitions in macromolecular machines and assemblies. Biophysical Journal. 2005;89:43–55. [PubMed]

45. Kim MK, Li W, Shapiro BA, Chirikjian GS. A Comparison Between Elastic Network Interpolation and MD Simulation of 16S Ribosomal RNA. Journal of Biomolecular Structure and Dynamics. 2003;21:395–405. [PubMed]

46. Zheng W, Brooks BR. Normal-modes-based prediction of protein conformational changes guided by distance constraints. Biophysical Journal. 2005;88:3109–3117. [PubMed]

47. Jeong JI, Lattman EE, Chirikjian GS. A method for finding candidate conformations for molecular replacement using relative rotation between domains of a known structure. Acta Crystallographica Section D. 2006;62:398–409. [PMC free article] [PubMed]

48. Miyashita O, Onuchic J, Wolynes P. Nonlinear elasticity, proteinquakes, and the energy landscapes of functional transitions in proteins. Proceedings of the National Academy of Sciences of the United States of America. 2003;100:12570–12575. [PubMed]

49. Best RB, Chen YG, Hummer G. Slow protein conformational dynamics from multiple experimental structures: the helix/sheet transition of arc repressor. Structure. 2005;13:1755–1763. [PubMed]

50. Maragakis P, Karplus M. Large amplitude conformational change in proteins explored with a plastic network model: adenylate kinase. Journal of Molecular Biology. 2005;352:807–822. [PubMed]

51. Franklin J, Koehl P, Doniach S, Delarue M. MinActionPath: maximum likelihood trajectory for large-scale structural transitions in a coarse-grained locally harmonic energy landscape. Nucleic Acids Research. 2007;35:W477–W482. [PMC free article] [PubMed]

52. Chu JW, Voth GA. Coarse-grained free energy functions for studying protein conformational changes: a double-well network model. Biophysical Journal. 2007;93:3860–3871. [PubMed]

53. Zheng W, Brooks BR, Hummer G. Protein conformational transitions explored by mixed elastic network models. Proteins. 2007;69:43–57. [PubMed]

54. Zheng W, Brooks BR, Thirumalai D. Allosteric transitions in the chaperonin GroEL are captured by a dominant normal mode that is most robust to sequence variations. Biophysical Journal. 2007;93:2289–2299. [PubMed]

55. Fischer S, Karplus M. Conjugate peak refinement: an algorithm for finding reaction paths and accurate transition states in systems with many degrees of freedom. Chemical Physics Letters. 1992;194:252–261.

56. Yang L, Song G, Carriquiry A, Jernigan RL. Close correspondence between the motions from principal component analysis of multiple HIV-1 protease structures and elastic network modes. Structure. 2008;16:321–330. [PMC free article] [PubMed]

57. Whitford PC, Miyashita O, Levy Y, Onuchic JN. Conformational transitions of adenylate kinase: switching by cracking. Journal of Molecular Biology. 2007;366:1661–1671. [PMC free article] [PubMed]

58. Arora K, Brooks CL. Large-scale allosteric conformational transitions of adenylate kinase appear to involve a population-shift mechanism. Proceedings of the National Academy of Sciences of the United States of America. 2007;104:18496–18501. [PubMed]

59. Thorpe M, Lei M, Rader A, Jacobs DJ, Kuhn LA. Protein flexibility and dynamics using constraint theory. Journal of Molecular Graphics and Modelling. 2001;19:60–69. [PubMed]

60. Rader A, Hespenheide BM, Kuhn LA, Thorpe M. Protein unfolding: Rigidity lost. Proceedings of the National Academy of Sciences of the United States of America. 2002;99:3540–3545. [PubMed]

61. Gohlke H, Thorpe M. A natural coarse graining for simulating large biomolecular motion. Biophysical Journal. 2006;91:2115–2120. [PubMed]

62. Kannan N, Vishveshwara S. Identification of side-chain clusters in protein structures by a graph spectral method. Journal of Molecular Biology. 1999;292:441–464. [PubMed]

63. Brinda K, Vishveshwara S. A network representation of protein structures: implications for protein stability. Biophysical Journal. 2005;89:4159–4170. [PubMed]

64. Hinsen K. Mathematical and Computational Biology Series. chapter 1. Chapman & Hall/CRC; 2006. Normal Mode Analysis: Theory and Applications to Biological and Chemical Systems; pp. 1–16.

65. Van Wynsberghe AW, Cui Q. Interpreting correlated motions using normal mode analysis. Structure. 2006;14:1647–1653. [PubMed]

66. Tama F, Sanejouand Y. Conformational Change of Proteins Arising from Normal Mode Calculations. Protein Engineering. 2001;14:1–6. [PubMed]

67. Okazaki K-i, Koga N, Takada S, Onuchic JN, Wolynes PG. Multiple-basin energy landscapes for large-amplitude conformational motions of proteins: Structure-based molecular dynamics simulations. Proceedings of the National Academy of Sciences of the United States of America. 2006;103:11844–11849. [PubMed]

68. Gastinel LN, Cambillau C, Bourne Y. Crystal structures of the bovine *β*4galactosyltransferase catalytic domain and its complex with uridine diphosphogalactose. EMBO Journal. 1999;18:3546–3557. [PubMed]

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |