PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
 
Bioinformatics. Author manuscript; available in PMC 2006 December 28.
Published in final edited form as:
PMCID: PMC1752228
NIHMSID: NIHMS9263

iGNM: a database of protein functional motions based on Gaussian Network Model

Abstract

Motivation

The knowledge of protein structure is not sufficient for understanding and controlling its function. Function is a dynamic property. Although protein structural information has been rapidly accumulating in databases, little effort has been invested to date toward systematically characterizing protein dynamics. The recent success of analytical methods based on elastic network models, and in particular the Gaussian Network Model (GNM), permits us to perform a high-throughput analysis of the collective dynamics of proteins.

Results

We computed the GNM dynamics for 20 058 structures from the Protein Data Bank, and generated information on the equilibrium dynamics at the level of individual residues. The results are stored on a web-based system called i GNM and configured so as to permit the users to visualize or download the results through a standard web browser using a simple search engine. Static and animated images for describing the conformational mobility of proteins over a broad range of normal modes are accessible, along with an online calculation engine available for newly deposited structures. A case study of the dynamics of 20 non-homologous hydrolases is presented to illustrate the utility of the iGNM database for identifying key residues that control the cooperative motions and revealing the connection between collective dynamics and catalytic activity.

INTRODUCTION

With the rapid accumulation of protein structures in the Protein Data Bank (PDB) (Berman et al., 2000) it has become evident that structural information per se is not sufficient for gaining insights into the mechanisms of function. Protein function is a dynamic property. It is closely related to conformational mechanics which, in turn, is largely dictated by the equilibrium (native) structure. It is now widely recognized that efficient computational methods and tools are needed for understanding the dynamics, and thereby controlling the function of proteins and their complexes.

Time cost of molecular dynamics simulations has been a major drawback for a systematic computational characterization of protein dynamics. This motivated efforts for developing efficient, but physically realistic, methods for deriving dynamic properties based on structure. The recent success of analytical methods based on normal mode analysis (NMA) combined with elastic network (EN) models after the original studies of Tirion (1996), Bahar and coworkers (Bahar et al., 1997; Doruker et al., 2000; Atilgan et al., 2001), Hinsen (Hinsen, 1998; Hinsen and Kneller, 1999) and Tama (Tama and Sanejouand, 2001) is paving the way for overcoming the computational limitations and making a rapid assessment of proteins collective motions (Tama, 2003; Ma, 2004).

Among the EN models of different complexities, the simplest is the Gaussian Network Model (GNM) (Bahar et al., 1997; Haliloglu et al., 1997). The GNM is entirely based on inter-residue contact topology in the folded state; it requires no a priori knowledge of empirical energy parameters, in accord with the original proposition of Tirion (1996). Most importantly, it lends itself to a unique, closed mathematical solution for each structure.

An important feature of the GNM is the possibility of dissecting the observed motion into a collection of normal modes. The GNM mode analysis is similar, but simpler and more efficient than conventional NMA (see Systems and Methods section). The slowest modes usually provide information on the collective motions relevant to biological function (Hinsen and Kneller, 1999; Kitao and Go, 1999; Tama and Sanejouand, 2001), as demonstrated in many applications. Despite its simplicity, the GNM has proven to yield results in good quantitative and qualitative agreement with experimental data and MD simulations (Bahar et al., 1998a,b, 1999; Demirel et al., 1998; Bahar and Jernigan, 1998, 1999; Bahar, 1999; Haliloglu and Bahar, 1999; Jaravine et al., 2000; Kundu et al., 2004; Rader and Bahar, 2004; Kurt et al., 2003; Wu et al., 2003; Erkip and Erman, 2004; Burioni et al., 2004; Kundu et al., 2004; Lattanzi, 2004; Liao and Beratan, 2004; Micheletti et al., 2004; Temiz et al., 2004). Experimental data that have been compared and successfully reproduced with the GNM include X-ray crystallographic B-factors, H/D exchange protection factors or free energies of exchange, order parameters from 15N-NMR relaxation, hinge regions and correlations between domain motions inferred from the comparison of the different forms of a given protein, and key residues whose mutations have been observed to impede function or folding. The accumulating evidence that supports the utility of the GNM as an efficient tool for a first estimation of the machinery of proteins and their complexes led us to the construction of iGNM, a Database (DB) of GNM results compiled for >20 000 PDB structures.

The earliest attempt to establish a collection of biomolecular motions was the DB of macromolecular movements (MolMovDB; http://molmovdb.mbb.yale.edu/molmovdb/), originally known as the DB of protein motions, constructed by Gerstein and collaborators (Echols et al., 2003). Two main features of MolMovDB are the visualization and classification of molecular motions according to their size and mechanism. The displayed animations require the knowledge of starting and ending conformational states between which the molecule moves. About 17 000 movies are available in the DB, generated by morphs interpolating between pairs of known structures of proteins and RNA molecules and refined by X-PLOR (Brünger, 1993) and CNS (Brünger et al., 1998). Another resource offered by Gerstein's laboratory is the use of a simplified NMA to display the biomolecular motions in the low frequency modes (Krebs et al., 2002).

A similar online calculation tool based on a simplified NMA combined with the RTB (Rotations-Translations of Blocks) algorithm (Tama et al., 2000) has been developed by Sanejouand and coworkers (elNémo; http://igs-server.cnrs-mrs.fr/elnemo/) presenting up to 100 slowest modes of studied structures (Suhre and Sanejouand, 2004a). This website provides information on the degree of collectivity of each predicted mode, as well as the overlap with experimentally observed change in conformation. In addition, the implementation of normal mode perturbed models as templates for diffraction data phasing through molecular replacement is discussed (Suhre and Sanejouand, 2004b).

A more extensive study has been conducted by Wako and coworkers where the normal modes have been generated using the ECEPP/2 force field (Nemethy et al., 1983), and collected in the ProMode DB (http://promode.socs.waseda.ac.jp/) (Wako and Endo, 2002; Wako et al., 2003, 2004) for nearly 1400 single chain proteins from the PDB. The structures are subjected therein to a detailed energy minimization prior to NMA computation. The NMA is performed in the coordinate system of dihedral angles based on the work of Go and collaborators (Wako et al., 1995), such that each residue is subject to approximately six degrees of freedom (rotatable bonds on the backbone and sidechain), assuming the bond rotations to be independent. ProMode DB has been restricted to relatively small proteins having <200 residues in view of the time cost of energy minimization. Finally, a recent effort in this direction is the Molecular Vibrations Evaluation Server (MoViES; http://ang.cz3.nus.edu.sg/cgi-bin/prog/norm.pl), constructed by Chen and coworkers (Cao et al., 2004), for NMA of proteins and DNA/RNA containing up to 4000 heavy atoms, in a full atomic framework. The results can be obtained in 7 days through email.

Despite all these attempts, a DB of predicted mobilities for all PDB structures, ranging from small enzymes to large complexes and assemblies in a unified framework is lacking. In this paper we discuss a new internet-based system, iGNM, recently developed to address this need and to release the results from GNM computations applied to PDB structures.

The current version of iGNM consists of three modules: DB engine, GNM computations engine and visualization engine. The DB engine is presented here, which contains visual and quantitative information on the collective modes predicted by the GNM for 20 058 structures deposited in the PDB prior to September 15, 2003. The goal of constructing the DB engine has been to provide information on the dynamics of all proteins beyond those provided experimentally by B-factors (for X-ray structures), root mean square fluctuations (NMR structures), or by interpolation between existing PDB structures. We have developed an internet-based query system to allow users to retrieve information through a simple search engine by entering the PDB identifier of the protein structure of interest. The retrieved data are viewed by a Chime plug-in (for 3D visualization) or a Java applet (for graphics). The output includes the equilibrium fluctuations of residues and comparison with X-ray crystallographic B-factors, the sizes for residue motions in different collective modes, the cross-correlations between residue fluctuations or domain motions in the collective modes, the identity of residues that assume a key mechanical role (e.g. hinge) in the global dynamics, and thereby the function of the molecule, as well as those potentially participating in folding nuclei/cores (Bahar et al., 1998; Rader and Bahar, 2004). In addition to retrieving the data stored in the DB, the user has the ability to compute the GNM dynamics for newly deposited structures through an automated online calculation server.

A case study, the dynamics of hydrolases, is presented here, to illustrate the use of iGNM data for inferring functional information. The catalytic sites in a set of hydrolases are located as residues participating in low mobility regions in the global modes, which could serve as a new prediction criterion to locate catalytic pockets from a given enzyme structure. iGNM is accessible at http://ignm.ccbb.pitt.edu/.

SYSTEMS AND METHODS

Model: GNM

The GNM is built on the statistical mechanical theory developed by Flory and coworkers for describing the fluctuation dynamics of polymer networks (Flory, 1976; Mattice and Suter, 1994). Accordingly, the structure is modeled as an elastic network with the nodes being the amino acids, usually represented by their α-carbons, and uniform springs of force constant γ connect the pairs of α-carbons located within an interaction cutoff distance rc. The dynamics of this network is fully defined by the N × N connectivity (or Kirchhoff) matrix of interresidue contacts, Γ. The off diagonal elements of Γ are defined as Γij = −1 if the distance between residues i and j, Rij , is shorter than rc, and zero otherwise; and the i-th diagonal terms is the degree of node i, or the coordination number of residue i. Γ contains the same information as contact maps. The statistical thermodynamics of the network are controlled by the Hamiltonian (Bahar et al., 1998)

equation M1
(1)

where ΔX, ΔY and ΔZ are the N-dimensional vectors of the X-, Y- and Z- components of the fluctuation vectors {ΔR1, ΔR2, … , ΔRN} of the N residues in the examined protein. The mean-square fluctuations of residue i scale with the i-th diagonal element of the inverse of Γ (Bahar et al., 1997; Haliloglu et al., 1997), as

equation M2
(2)

and the cross-correlations left angle bracketΔRi · ΔRjright angle bracket scale with the ij-th off-diagonal elements of Γ−1.

The fluctuation dynamics of the structure results from N − 1 superposed GNM modes. The modes can be extracted by the eigenvalue decomposition Γ = UΛUT where U is an orthogonal matrix whose columns uk(1 ≤ kN) are the eigenvectors of Γ, and Λ is the diagonal matrix of the eigenvalues λk. The k-th eigenvector reflects the shape of the k-th mode as a function of residue index i; the k-th eigenvalue represents its frequency (Haliloglu et al., 1997; Bahar et al., 1999).

Structures

All the structures deposited in PDB as of September 15, 2003 have been downloaded (22 549 of them) and subjected to GNM analysis. A file parser was implemented to eliminate structures composed of (1) predominantly DNA or RNA molecules, (2) carbohydrates, small organic compounds or short peptides containing <15 residues, which eliminated 6.2% of the structures and (3)4.8% of the originally downloaded structures that yielded unrealistic mode shapes owing to their incomplete and/or inaccurate coordinates deposited in the PDB. Figure 1 gives a schematic description of such an occurrence where a portion of the network is ‘disconnected’. For a given fully connected structure Γ has rank N − 1 and its eigenvalue decomposition yields N − 1 non-trivial eigenvalues and one 0 eigenvalue. However, more than one 0 eigenvalue was obtained for the disconnected networks.

Fig. 1
A schematic diagram illustrating how a discontinuity in the PDB sequence/coordinates may lead to more than one 0 eigenvalue. (a) The coordinates of residue C belonging to the A-B-C-D-E are ...

We generated the GNM results for 20 058 structures, after filtering out the above listed cases. The examined structures cover a broad range of size, including, e.g. large proteins, such as contractile protein of insect flight muscle (PDB: 1o1c), with 11 730 amino acids. The size distribution of the examined structures is shown in Figure 2.

Fig. 2
Distribution of the sizes of PDB structures compiled in the iGNM DB. The number N of residues includes the number of amino acids contained in the examined PDB structures. 8.4% (1701 ...

Computations

The eigenvalue decomposition of Γ is the most time-consuming part of the computations. We have recently implemented the BLZPACK package (Marques, 1995) based on the Lanczos algorithm, which permits us to efficiently extract subsets of interesting modes at either end of the vibrational spectrum. This package reduces the computing time by at least three orders of magnitude in the case of large proteins.

RESULTS

Output files

Eleven output files can be accessed for each query structure (Fig.3a). Users can retrieve the generated output files for structures of interest by simply entering the 4-digit PDB ID in the search engine, http://ignm.ccbb.pitt.edu/FileDownload.htm. A brief description of the output files and/or the type of information that can be extracted is presented in the following subsections.

Fig. 3
(a) The query engine to retrieve GNM data for 20 058 structures. The PDB identifier (ID) of the protein of interest is entered to retrieve the output files from the iGNM. Alternatively, a search ...

Contact topology (‘.ca’, ‘.cont’ and ‘.eigen’)

The residue types, sequence numbers, α-carbon coordinates and temperature factors reported in the PDB and used in the GNM are listed in the files with suffix ‘.ca’. The size of the protein, defined by the number of α-carbons (N) included in the computations, is listed in the last line of the file. The ‘.cont’ file lists the contact number (the number of adjacent neighbors within a cutoff rc = 7.3 Å) for each residue. A large contact number refers to a constrained environment that limits or inhibits the residue mobility. The ‘.eigen’ file lists the N − 1 non-zero eigenvalues λk in descending order, starting from the fastest mode (k = N − 1), and the zero eigenvalue λ0 is listed as the last element. Any value of the order of 10−6 or lower is deemed as zero. The structures with the above described ‘discontinuity’ yielded more than one 0 eigenvalue, which were captured in the corresponding ‘.eigen’ files, and were removed from the DB.

Time-average fluctuations and their correlations (‘.bfactor’ and ‘.cc’)

The theoretical temperature factor (Bi) predicted by the GNM is proportional to the inverse Kirchhoff matrix and also to the summation of all modes as

equation M3
(3)

Equation (3) follows from Equation (2) and the definition Bi =(8π2/3) left angle bracketRi)2right angle bracket. The term [uk]i designates the i-th element (corresponding to i-th residue) of the k-th eigenvector. The ‘.bfactor’ file contains the experimental Bi values of α-carbon atoms (if available in the PDB) and the corresponding theoretical Bi values for each residue. Figure 4c illustrates the comparison of the two sets of Bi values, as a function of residue index, for a query protein, phospholipase 2 (1BK9; Zhao et al., 1998) shown in Figure 4a. A correlation coefficient of 0.72 between the experimental (yellow curve) and theoretical results (red curve) is obtained.

Fig. 4
Visualization of GNM dynamics for phospholipase A2 (PDB ID: 1BK9). (a) Color-coded ribbon diagram (Chime) that illustrates the mobilities in the slowest GNM mode (slow1). The structure is colored from ...

The predicted cross-correlations left angle bracketΔRi · ΔRjright angle bracket between the fluctuations of residues i and j are listed in the ‘.cc’ files. These are reported for small-to-moderate size proteins (N < 290) owing to memory constraints. The data in these files are used to construct the color-coded correlation maps (called CCplot) (Fig. 4d). left angle bracketΔRi · ΔRjright angle bracket values are normalized between −1 and 1, by dividing them by equation M4. A value of −1 refers to perfectly anticorrelated (i.e. concerted but in opposite direction) fluctuations undergone by residues i and j (colored blue in the map), and +1 refers to fully correlated motions (colored red).

Mobilities in normal modes (‘.sloweigenvector’, ‘.slowmodes’ and ‘.slowav’)

The shapes of the slowest 20 modes (equation M5, 1 ≤ k ≤ 20, as a function of residue index i) are given in the ‘.slowmodes’ file, and the corresponding eigenvectors, uk, in the ‘.sloweigenvector’ file. Each row in these files corresponds to a given residue, and each column to a different mode, starting from the slowest (global) mode. We note that the eigenvectors are orthonormal, and consequently the k-th mode shape represents the normalized distribution of residue mobilities (square displacements) induced in mode k. The joint effect of modes 1 and 2 on mobilities can be found in the ‘.slowav’ file. The entries therein refer to the weighted average

equation M6
(4)

Global hinge residues at crossovers between positive and negative elements of u1

The positive and negative elements of uk refer to residues moving in opposite directions along mode k. Of interest are the residues at the passage between positive and negative elements of slowest modes, which presumably act as hinges between the clusters of residues moving in opposite directions. The ‘.sloweigenvector’ files thus provide information on the identity of the residues that play a mechanically critical role in the global modes.

>Peaks in high frequency modes (‘.fasteigenvector’, ‘.fastmodes’, ‘.fast10av’)

The shapes of the fastest 20 modes (equation M7, N − 20 ≤ kN − 1, as a function of residue index i) are given in the ‘.fastmodes’ file, and the corresponding eigenvectors, uk, in the ‘.fasteigenvector’ file, similar to their slow mode counterparts. We note that, contrary to the slow mode shapes, the fast modes are highly localized and exhibit sharp peaks at certain residues. The cumulative mode shape for the fastest 10 modes is presented in the file ‘.fast10av’. The peaks in the latter file are indicative of potential folding nuclei or conserved residues important for stability (Demirel et al., 1998; Rader and Bahar, 2004).

Query and visualization

iGNM allows users to conveniently query and visualize GNM output files. By typing the PDB ID in the 3D visualization module (http://ignm.ccbb.pitt.edu/3D_GNM.htm) users can view and compare the ribbon diagrams of the query structures color-coded according to the mobilities of residues in the slowest or fastest 20 modes. Similarly, the B-factors visualization module (http://ignm.ccbb.pitt.edu/BFactors.htm) provides access to ribbon diagrams colored by the mean-square fluctuations predicted and observed for all modes (B-factors).

In addition to queries using PDB IDs, iGNM is integrated with PDB SearchLite query interface for keyword-based queries (http://ignm.ccbb.pitt.edu/PDB_Integration.htm). By typing keywords related to the biological macromolecules of interest, users can browse PDB records and iGNM output files for a given protein family in an integrated environment (Fig. 3b).

The data visualization uses a Java applet in the Java 2 Runtime Environment (Sun Microsystems, Inc. http://java.sun.com/) and Chime (MDL Information Systems, Inc. http://www.mdli.com/) to produce interactive mobility plots and structure animations. These cross-platform software tools can be freely downloaded and easily installed. Chime, as a browser plug-in, allows users to manipulate color-coded structure in atomic details. The Java applet displays the residue mobility in a pop-up window (graph). The user can point the cursor to the positions of interest (minima or maxima) on the graphs and see the corresponding residue number and relative fluctuations. Links to the raw iGNM data, PDB, PDBsum, SCOP and CATH are also included for user references.

Online calculations

Currently (January 25, 2005), the PDB contains 29 326 structures. The iGNM DB has processed 22 549 of them, and generated results for 20 058 structures. When the user performs a search for a PDB ID the DB engine is checked first for the GNM files of that structure. If the structure's results are found, the results are displayed to the user through the visualization engine. For the PDB structures that are not included, an interface to perform online calculations is provided at http://ignm.ccbb.pitt.edu/gnmwebserver/index2.html.

The online calculation module is a three-tier architecture, where the user's browser communicates with iGNM, and the server communicates with the PDB server (Fig. 5). This server takes as input the 4-digit PDB ID, searches the PDB, and if the structure is found it then retrieves the file and runs the GNM calculations on it. Once the calculation is complete the results are passed to the visualization engine for graphical presentation to the user.

Fig. 5
iGNM currently consists of two stand-alone servers, one that houses the DB engine with the visualization engine, and the other houses the online calculation module and visualization for structures deposited ...

Future additions to iGNM will include an automatic update module for checking the PDB for new structures, downloading the structure files, running the GNM calculations on the structure files and updating the DB with the newly computed GNM results. When this automatic update module is completed, the online calculation server will be reconfigured to utilize structure files submitted by users over the web. This will allow users to submit their own structure files for online GNM calculations, and allow then to view the results through the visualization engine. These additions are currently in the design and testing phases.

A case study: interplay between dynamics and chemistry

An application to a family of enzymes illustrating the utility of iGNM DB is presented here. To achieve this, 20 single-chain hydrolases that exhibit a range of functions (EC number) and structural subclasses (CATH) were selected from the DB (Table 1). Of these, 13 are taken from the catalytic residues dataset compiled by Thornton and collaborators (Bartlett et al., 2002), and seven (indicated by the asterisks) are additional hydrolases retrieved from the PDB. The catalytic sites are required to meet one of the following criteria set forth by Thornton and coworkers: (1) they are directly involved in catalytic function; (2) they affect residues or water molecules that are directly involved in catalysis; (3) they can stabilize a transient intermediate; or (4) they interact with a substrate or cofactor that facilitates the local chemical reaction. The amino acids which simply bind substrates or ligands are not necessarily catalytic residues.

Table 1
Properties of 20 hydrolases subjected to iGNM analysis

Participation of catalytic residues in the collective modes

Experimental B-factors (in ‘.bfactor’ files) and the weighted average mobilities [(ΔRi)2]1–2 (in ‘.slowav’) extracted from iGNM for the catalytic residues of the examined hydrolases are listed in the last two columns of Table 1. To make a quantitative assessment across the complete set, both the B-factors and mobilities of residues in a given mode were normalized in the range [0, 1]. The distributions of [(ΔRi)2]1–2 are displayed in Figure 6 for two proteins from the examined set, phospholipase A2 (1BK9) and protein tyrosine phosphatase (1YTW; Fauman et al., 1996). The catalytic residues are indicated by the arrows in the Figure 6.

Fig. 6
Mobilities in the slowest two modes [(ΔRi)2]1–2 versus residue index i) and corresponding color-coded structure for phospholipase A2 (1BK9) and Yersinia protein tyrosine ...

The mode shapes in Figure 6 and the values listed in the last two columns of Table 1 show that the catalytic sites occupy regions that are spatially constrained in general (evidenced by low Bi values), and this tendency becomes more pronounced in the examination of the slowest modes (indicated by even lower [(ΔRi)2]1–2 values). The average left angle bracket[(ΔRi)2]1–2right angle bracketcat over all the catalytic residues of the examined hyrolases is 0.045, and the average left angle bracketBiright angle bracketcat is 0.126, as opposed to the respective averages over all residues of 0.180 and 0.244. Thus, the square fluctuations of catalytic residues are reduced by a factor of 2 on average compared with other residues, and their mobilities in the slowest modes are further constrained by a factor 4. Such severely constrained regions are usually involved in, or closely communicate with, the mechanical key sites (hinges, anchoring regions, symmetry centers, etc.) that control the collective dynamics of the enzymes. Thus, chemically active residues are found here to also participate in critical sites from conformational mechanics point of view, which invites attention to the functional coupling between catalysis and global dynamics.

DISCUSSION

We generated information on the equilibrium dynamics of 20 058 structures in the reach of covering the entire PDB. The case study of 20 hydrolases sets a simple example of the use of the collective dynamics data to gain insights on the mobilities of catalytic residues and their participation in large scale motions of the overall enzyme. The catalytic residues are shown to preferentially occupy cooperatively constrained regions (minima in slow modes) which might be expected to efficiently transmit the effect of chemical reaction to other regions of the enzymes. This feature, which has also been observed in other families of enzymes (Chen and Bahar, 2004; Yang and Bahar, 2005), may be advantageously used in identifying or designing active sites.

The eigenvalue decomposition of the connectivity matrix Γ is the most expensive task in GNM calculations from computational time point of view. We used a singular value decomposition (SVD) subroutine to achieve this (Press et al., 1992), the computing time of which scales with N3 for a network of N residues. For N < 1500, the computations are performed within minutes, while the CPU times increased up to 15 days in the case of the largest structures, the output of which are compiled and accessible in the DB. Although all N − 1 modes, and the mean-square fluctuations resulting from the superposition of all modes have been compiled to date in the iGNM, we have also implemented an alternative algorithm that utilizes the BLZPACK software (Marques, 1995) based on Block Lanczos method for large structures. The latter evaluates a subset (1 ≤ k ≤ 100) of dominant (slowest) modes, within a time scale of N2, i.e. the computing times is more than 3 orders of magnitude shorter than the routine SVD, when structures of >103 residues are analyzed. The same algorithm will be particularly useful for generating the (anisotropic network model ANM) (Atilgan et al., 2001) data that we plan to include in the near future in the iGNM DB.

In a previous study, we have shown that GNM can satisfactorily reproduce the experimentally observed fluctuations and functional motions of proteins complexed with RNA or DNA (Bahar and Jernigan, 1998; Bahar et al., 1999; Temiz and Bahar, 2002), including supramolecular structures like ribosomal complexes (Wang et al., 2004) or viral capsids (Rader et al., 2005). P and O4′ atoms of nucleotides are usually adopted as nodes to model the RNA/DNA structures. The choice of these two atoms per nucleotide provides a spatial resolution comparable to that of α-carbons in proteins, and the cutoff distances are reasonably adjusted to account for the longer range interactions of nucleotides. Currently, the iGNM DB does not contain the results for such complexes or assemblies containing RNA/DNA components, although a server is currently available at http://ignm.ccbb.pitt.edu/GNM_Online_Calculation.htm, which can generate GNM results for such cases using the nucleotide coordinates reported in the PDB.

Finally, users have to be cautious about two facts: (1) the iGNM results reflect the equilibrium dynamics for proteins in their crystal form reported in PDB and (2) the method is applicable to fluctuations near the native structure. Conformational changes involving the passage over an energy barrier, or other non-linear effects on the conformational dynamics cannot be described by the GNM and necessitate more detailed MD simulations. In some cases, the crystallized form may not be the active state of the protein under physiological condition. For instance, PDB entry 1hho contains one half of a hemoglobin (Hb) molecule (two chains) in the crystal asymmetric unit, while the bioactive Hb is a tetramer that can actually be generated by combining 1hho with its crystallographic 2-fold axis partner. We are, currently, designing a new module that will facilitate the retrieval and generation of such user-customized structures that combine the biological units or any structural parts of interest. Finally, we note that the GNM is particularly useful in the case of large structures and complexes/assemblies, although its application to small structures (a network <30 nodes) may not be always justifiable. First, small structures are amenable to analysis using more details with full atomic models that take account of their specific interactions. Second, the Gaussian approximation for residue fluctuations becomes more accurate with increasing size of the network, as follows from the central limit theorem.

With the number of ‘new’ folds deposited in the PDB decreasing on a yearly basis, we are close to collecting data for a large fraction of all possible folds. Although the biomolecular function overwhelmingly exceeds the number of known folds, the types of large scale conformational motions undergone by biomolecules seem to be relatively limited, similar to the finite number of folds. The particular fold and its intrinsic global dynamics can presumably offer a versatile scaffold and mechanism for achieving a diversity of biochemical functions by amino acid substitutions that can accommodate the same fold and global dynamics. iGNM resulted from an attempt to collect those dynamic data in a DB framework to enable further exploration and establishment of biomolecular structure–dynamic function relations.

ACKNOWLEDGEMENTS

We would like to thank Dr Rob Bell for his efforts in facilitating high-speed computing hardware for the calculations in this work, Mr Shannching Chen for resolving the memory allocation problem in sequential computation of 22 549 structures. Partial support by the NSF-ITR grant #EIA-0225636 and the NIH grant #1 R01 LM007994–01A1 is gratefully acknowledged.

REFERENCES

  • Atilgan AR, et al. Anisotropy of fluctuation dynamics of proteins with an elastic network model. Biophys. J. 2001;80:505–515. [PubMed]
  • Bahar I. Dynamics of proteins and biomolecular complexes: inferring functional motions from structure. Rev. Chem. Eng. 1999;15:319–349.
  • Bahar I, Jernigan RL. Vibrational dynamics of transfer RNAs. Comparison of the free and enzyme-bound forms. J. Mol. Biol. 1998;281:871–884. [PubMed]
  • Bahar I, Jernigan RL. Cooperative fluctuations and subunit communication in tryptophan synthase. Biochemistry. 1999;38:3478–3490. [PubMed]
  • Bahar I, et al. Direct evaluation of thermal fluctuations in protein using a single parameter harmonic potential. Fold Des. 1997;2:173–181. [PubMed]
  • Bahar I, et al. Vibrational dynamics of proteins: Significance of slow and fast modes in relation to function and stability. Phys. Rev. Lett. 1998a;80:2733–2736.
  • Bahar I, et al. Correlation between native state hydrogen exchange and cooperative residue fluctuations from a simple model. Biochemistry. 1998b;37:1067–1075. [PubMed]
  • Bahar I, et al. Collective motions of HIV-1 reverse transcriptase. Examination of flexibility and enzyme function. J. Mol. Biol. 1999;285:1023–1037. [PubMed]
  • Bartlett G, et al. Analysis of catalytic residues in enzyme active sites. J. Mol. Biol. 2002;324:105–121. [PubMed]
  • Berman HM, et al. The Protein Data Bank. Nucleic Acids Res. 2000;28:235–242. [PMC free article] [PubMed]
  • Brooks B, Karplus M. Harmonic dynamics of proteins: normal modes and fluctuations in bovine pancreatic trypsin inhibitor. Proc. Natl. Acad. Sci. U.S.A. 1983;80:6571–6575. [PubMed]
  • Brünger AT. X-PLOR 3.1: A system for X-Ray crystallography and NMR. Yale University Press; New Haven, USA: 1993.
  • Burioni R, et al. Topological thermal instability and length of proteins. Proteins. 2004;55:529–535. [PubMed]
  • Cao ZW, et al. MoViES: molecular vibrations evaluation server for analysis of fluctuational dynamics of proteins and nucleic acids. Nucleic Acids Res. 2004;32:W679–W685. [PMC free article] [PubMed]
  • Chen SC, Bahar I. Mining frequent patterns in proteins: A study of serine proteinases. Bioinformatics. 2004;20:i77–i85. [PMC free article] [PubMed]
  • Cregut D, et al. Hinge-bending motions in annexins: molecular dynamics and essential dynamics of apo-annexin V and of calcium bound annexin V and I. Protein Eng. 1998;11:891–900. [PubMed]
  • Demirel MC, et al. Identification of kinetically hot residues in proteins. Protein Sci. 1998;7:2522–2532. [PubMed]
  • Doruker, et al. Dynamics of proteins predicted by molecular dynamics simulations and analytical approaches: Application to α-amylase inhibitor. Proteins. 2000;40:512–524. [PubMed]
  • Echols N, et al. MolMovDB: analysis and visualization of conformational change and structural flexibility. Nucleic Acids Res. 2003;31:478–482. [PMC free article] [PubMed]
  • Erkip A, Erman B. Dynamics of large-scale fluctuations in native proteins. Analysis based on harmonic inter-residue potentials and random external noise. Polymer. 2004;45:641–648.
  • Fauman EB, et al. The X-ray crystal structures of Yersinia tyrosine phosphatase with bound tungstate and nitrate. Mechanistic Implications. J. Biol. Chem. 1996;271:18780–18788. [PubMed]
  • Flory PJ. Statistical thermodynamics of random networks. Proc. Roy. Soc. Lond. A. 1976;351:351–380.
  • Haliloglu T, Bahar I. Structure-based analysis of protein dynamics. Comparison of theoretical results for hen lysozyme with X-ray diffraction and NMR relaxation data. Proteins. 1999;37:654–667. [PubMed]
  • Haliloglu T, et al. Gaussian dynamics of folded proteins. Phys. Rev. Lett. 1997;79:3090–3093.
  • Hinsen K. Analysis of domain motions by approximate normal mode calculations. Proteins. 1998;33:417–429. [PubMed]
  • Hinsen K, Kneller GR. A simplified force field for describing vibrational protein dynamics over the whole frequency range. J. Chem. Phys. 1999;111:10766–10769.
  • Jaravine VA, et al. Microscopic stability of cold shock protein A examined by NMR native state hydrogen exchange as a function of urea and trimethylamine N-oxide. Protein Sci. 2000;9,:290–301. [PubMed]
  • Kitao A, Go N. Investigating protein dynamics in collective coordinate space. Curr. Opin. Struct. Biol. 1999;9:164–169. [PubMed]
  • Krebs WG, et al. Normal mode analysis of macromolecular motions in a database framework: developing mode concentration as a useful classifying statistic. Proteins. 2002;48:682–695. [PubMed]
  • Kundu S, et al. Dynamics of proteins in crystals: comparison of experiment with simple models. Biophys. J. 2002;83:723–732. [PubMed]
  • Kundu S, et al. Automatic domain decomposition of proteins by a Gaussian Network Model. Proteins. 2004;57:725–733. [PubMed]
  • Kurt N, et al. Cooperative fluctuations of unliganded and substrate-bound HIV-1 protease: a structure-based analysis on a variety of conformations from crystallography and molecular dynamics simulations. Proteins. 2003;51:409–422. [PubMed]
  • Lattanzi G. Application of coarse grained models to the analysis of macro-molecular structures. Comput. Mat. Sci. 2004;30:163–171.
  • Liao JL, Beratan DN. How does protein architecture facilitate the transduction of ATP chemical-bond energy into mechanical work? The cases of nitrogenase and ATP binding-cassette proteins. Biophys J. 2004;87:1369–1377. [PubMed]
  • Ma JP. New advances in normal mode analysis of supermolecular complexes and applications to structural refinement. Curr. Protein. Pept. Sci. 2004;5:119–123. [PMC free article] [PubMed]
  • Marques O. BLZPACK: Description and User's Guide. CERFACS; Toulouse, France: 1995. TR/PA/95/30.
  • Mattice WL, Suter UW. Conformational theory of large molecules. John Wiley & Sons, Inc.; New York: 1994.
  • McCallum SA, et al. Ligand-induced changes in the structure and dynamics of a human class Mu glutathione S-transferase. Biochemistry. 2000;39:7343–7356. [PubMed]
  • Micheletti C, et al. Accurate and efficient description of protein vibrational dynamics: Comparing molecular dynamics and Gaussian models. Proteins. 2004;55:635–645. [PubMed]
  • Nemethy G, et al. Energy parameters in polypeptides. Updating of geometrical parameters, nonbonded interactions and hydrogen bond interactions for the naturally occurring amino acids. J. Phys. Chem. 1983;87:1883–1887.
  • Press WH, et al. Numerical Recipes in Fortran. 2nd Chp 2.6. Cambridge University Press; 1992. pp. 51–62.
  • Rader AJ, Bahar I. Folding core predictions from network models of proteins. Polymer. 2004;45:659–668.
  • Rader AJ, et al. Maturation dynamics of bacteriophage HK97 capsid. Structure. 2005;13:413–421. [PubMed]
  • Suhre k., Sanejouand Y-H. ElNémo: a normal mode web server for protein movement analysis and the generation of templates for molecular replacement. Nucleic Acids Res. 2004a;32:610–614. [PMC free article] [PubMed]
  • Suhre k., Sanejouand Y-H. On the potential of normal mode analysis for solving difficult molecular replacement problems. Acta Crystallogr., Sect. D. 2004b;60:796–799. [PubMed]
  • Tama F. Normal mode analysis with simplified models to investigate the global dynamics of biological systems. Protein Peptide Lett. 2003;10:119–132. [PubMed]
  • Tama F, Sanejouand Y-H. Conformational change of proteins arising from normal mode calculations. Protein Eng. 2001;14:1–6. [PubMed]
  • Tama F, et al. Building-block approach for determining low-frequency normal modes of macromolecules. Proteins. 2000;41:1–7. [PubMed]
  • Temiz NA, Bahar I. Inhibitor binding alters the directions of domain motions in HIV-1 reverse transcriptase. Proteins. 2002;49:61–70. [PubMed]
  • Temiz NA, et al. Escherichia coli adenylate kinase dynamics: comparison of elastic network model modes with mode-coupling 15N-NMR relaxation data. Proteins. 2004;57:468–480. [PMC free article] [PubMed]
  • Tirion MM. Large amplitude elastic motions in proteins from a single-parameter, atomic analysis. Phys. Rev. Lett. 1996;77:1905–1908. [PubMed]
  • Wako H, Endo S. ProMode: a database of normal mode analysis of proteins. Genome Informatics. 2002;13:519–520.
  • Wako H, et al. FEDER/2: program for static and dynamic conformational energy analysis of macro-molecules in dihedral angle space. Comp. Phys. Comm. 1995;91:233–251.
  • Wako H, et al. Improvements in ProMode (a Database of Normal Mode Analyses of Proteins) Genome Informatics. 2003;14:663–664.
  • Wako H, et al. ProMode: a database of normal mode analyses on protein molecules with a full-atom model. Bioinformatics. 2004;20:2035–2043. [PubMed]
  • Wang Y, et al. Global ribosome motions revealed with elastic network model. J. Struct Biol. 2004;147:302–314. [PubMed]
  • Wu Y, et al. Universal behavior of localization of residue fluctuations in globular proteins. Phys. Rev. E. Stat. Nonlin. Soft. Matter Phys. 2003;67:041909. [PubMed]
  • Yang L-W, Bahar I. Coupling between catalytic site and collective dynamics: a requirement for mechanochemical activity of enzymes. Structure. 2005 in press. [PMC free article] [PubMed]
  • Zhang Z, et al. Molecular dynamics simulations of peptides and proteins with amplified collective motions. Biophys. J. 2003;84:3583–3593. [PubMed]
  • Zhao H, et al. Structure of a snake venom phospholipase A2 modified by p-bromo-phenacyl-bromide. Toxicon. 1998;36:875–876. [PubMed]