PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of ploscompComputational BiologyView this ArticleSubmit to PLoSGet E-mail AlertsContact UsPublic Library of Science (PLoS)
 
PLoS Comput Biol. 2007 September; 3(9): e172.
Published online 2007 September 21. doi:  10.1371/journal.pcbi.0030172
PMCID: PMC1988854

Signal Propagation in Proteins and Relation to Equilibrium Fluctuations

Michael Levitt, Editor

Abstract

Elastic network (EN) models have been widely used in recent years for describing protein dynamics, based on the premise that the motions naturally accessible to native structures are relevant to biological function. We posit that equilibrium motions also determine communication mechanisms inherent to the network architecture. To this end, we explore the stochastics of a discrete-time, discrete-state Markov process of information transfer across the network of residues. We measure the communication abilities of residue pairs in terms of hit and commute times, i.e., the number of steps it takes on an average to send and receive signals. Functionally active residues are found to possess enhanced communication propensities, evidenced by their short hit times. Furthermore, secondary structural elements emerge as efficient mediators of communication. The present findings provide us with insights on the topological basis of communication in proteins and design principles for efficient signal transduction. While hit/commute times are information-theoretic concepts, a central contribution of this work is to rigorously show that they have physical origins directly relevant to the equilibrium fluctuations of residues predicted by EN models.

Author Summary

In recent years, there has been a surge in the number of studies using network models for understanding biomolecular systems dynamics. Essentially, two different groups of studies have been performed, driven by two different communities. The first is based on molecular biophysics and statistical mechanical concepts. Normal mode analyses using elastic network models lie in this group. The second is based on information theory and spectral graph methods. The present study demonstrates for the first time that signal transduction events directly depend on the fluctuation dynamics of the biomolecular systems, thus establishing the bridge between the (newly proposed) information-theoretic and the (well-established) physically inspired approaches. We have applied the new approach to five different enzymes. Functionally active residues are shown to possess enhanced communication propensities. Furthermore, secondary structural elements emerge as efficient mediators of communication. These results provide us with important insights for protein design and mechanisms of allostery.

Introduction

Proteins function neither as static entities nor in isolation, under physiological conditions. They are instead subject to constant motions and interactions, both within and between molecules. These motions can be either random fluctuations or concerted functional changes in conformations; and their sizes can vary from localized motions (e.g., single amino acid side chain reorientations) to large-scale global motions (e.g., domain–domain or intersubunit movements). While motions in the nanoseconds regime can be explored by full atomic simulations, understanding those involving large-scale structural rearrangements remains a challenge. In recent years, elastic network (EN) models in conjunction with modal analysis, and in particular the Gaussian Network Model (GNM) [13], have been widely used for elucidating the collective dynamics of proteins and exploring their relevance to biological function [49].

We posit that these collective motions also determine communication patterns that are inherent to the native architecture. To explore the validity and implications of this concept, we assume a discrete-time, discrete-state Markov process [1011] of “information” transfer across the network of residues and measure two basic quantities: hitting time and commute time [11]. Hitting time H(j,i) is the expected number of steps it takes to send information from residue vi to residue vj, and this may not be the same as H(i,j). Commute time C(i,j) is by definition the sum: H(i,j) + H(j,i). Hitting time has directionality, while commute time does not.

A major goal in this study is to relate the hitting (and commute) times derived from the Markovian stochastics model to the equilibrium fluctuations (mean-square fluctuations and cross-correlations) of residues predicted by EN models, thus bridging the gap between two disciplines, information theory and statistical mechanics. To this end, using the theory of generalized matrix inverses [1214], we show that hitting/commute times can be expressed in terms of the Kirchhoff matrix of inter-residue contacts that underlie the GNM methodology. Additionally, we present new insights into the signal transduction properties of enzymes, the catalytic residues of which are shown to be distinguished by their fast and precise communication abilities.

The paper is organized as follows. The Results are divided into three parts: first we present the Markovian stochastic model of information diffusion developed for exploring the inter-residue communication in proteins. The process is controlled by transition probabilities for the passage/flow of information across the nodes, which in turn is based on the internode affinities derived from atom–atom contacts in the folded structures. Second, we describe the evaluation of hit and commute times, and illustrate these concepts by presenting the application of the methodology to five different enzymes. Strikingly, active residues are distinguished by their effective communication stochastics. Third, we present a rigorous derivation of the mathematical relation (Equation 15) between inter-residue hit/commute times, and their fluctuation dynamics derived from purely statistical mechanical theory. This important relation establishes the bridge between information-theoretic quantities evaluated here for proteins and the intrinsic structural dynamics of proteins as described by physics-based models, and provides a new avenue for further examination of protein allostery using the new information-theoretic perspective.

Results

Information-Theoretic Description of Network Communication

Affinity matrices.

The protein structure is modeled as a network of n nodes, each representative of a given residue vi, for 1 < i < n. The interaction strength, or the affinity aij, between residues vi and vj is defined as

equation image

where Nij is the total number of atom–atom contacts made between residues vi and vj based on a cutoff distance of rc = 4 Å and (Ni, Nj) are the total numbers of heavy atoms in the individual residues (vi, vj). The affinity matrix is very similar to a contact map or the adjacency matrix of a graph. Non-zero entries in the adjacency matrix denote if two residues are in contact, while the entries in the affinity matrix scale with the number of atom–atom contacts between the residue pair. This representation captures to a first approximation the strong (weak) interactions expected to arise between residue pairs with large (small) number of atom–atom contacts. The denominator corrects for the biases induced by size effects, e.g., for the larger number of atom–atom contacts inherently made by larger size amino acids, thus permitting us to assign affinities purely based on differential interactions. Note that a similar expression has been adopted by Brinda and Vishveshwara [15].

The affinities provide a measure of the local interaction density dj at each residue vj as dj =An external file that holds a picture, illustration, etc.
Object name is pcbi.0030172.ex001.jpg aij. Note that we can define an equivalent mass-spring system having stiffness matrix γΓ where Γ is defined in terms of the affinity and degree matrices, A = { aij } andD = diag{dj } respectively, as

equation image

and γ is a force constant uniform over all springs.Γ is also called the Kirchhoff matrix or the combinatorial Laplacian in graph theory [16].

Markov model of network communication.

A discrete-time, discrete-state Markov process [11,16] is defined by setting the communication probability between residue pairs to be a function of their affinity. In particular, we define

equation image

as the conditional probability of transmitting information to residue vi in one time step given that the signal is initially positioned at residue vj. Note, dj serves as a normalizing factor to ensure

equation image

The conditional probability matrix M = {mij}, also called the Markov transition matrix, defines the stochastics of information diffusion over the network of residues, via

equation image

Suppose the probability of initiating the Markov propagation process at node j is pj(0). Then, the probability of reaching residue vi in one time step is mijpj(0). In matrix notation, the probability of ending up on any of the residues v = [v1, v2,...,vn] after one time step is given by the distribution p(1) = M p(0). Or, after k steps,

equation image

where p(k) = [p1(k),...,pn(k)] represents the n-dimensional vector of the probabilities of residing at node i (1 ≤ in) at step k.

Assume there is a path connecting every pair of residues in the network. Then, as the number of steps β approaches infinity, p(β) approaches a unique stationary distribution π = [π1,π2,...,πn], the elements of which are given by

equation image

Whereas the evolution of the diffusion process is a function of the starting distribution, the stationary distribution is invariant to the details of initiation.

In the continuous time limit [17], the change in probabilities follows the master equation An external file that holds a picture, illustration, etc.
Object name is pcbi.0030172.ex002.jpg = (MI) p(t) with a solution p(t) = exp[(MI)t] p(0), where I is an identity matrix of dimension n × n. Here,MI replaces the transition rate matrix, assuming the time elapsed between successive jumps obeys an exponential probability distribution [17]. Note that (i) in this limit, t replaces kδt where δt is the mean step size implicitly used in the discrete approximation, and (ii) the stationary distribution corresponding to the continuous process is identical to that obtained for the discrete process such that the detailed balance mijπj= mjiπi holds.

Hit/commute times as a function of Markov transition probabilities.

The hitting time H(j,i) is the average number of steps it takes for the information residing at residue vi to be transmitted to residue vj for the first time. We will term residue vi a broadcaster and residue vj a receiver.

The calculation of H(j,i) requires the consideration of all possible pathways on the network, each being weighted by the product of transition probabilities along the path, starting from vi and ending at vj. An efficient recursive formula can be derived for the calculation of H(j,i) as follows. Suppose the passage from vi to vj is performed in two stages, from vi to a neighbor vk that is one step away, succeeded by probabilistic passages from vk to the final destination vj. Furthermore, assume we know the hit time H(j,k) from the intermediate node vk to the destination node vj. Summing over all the intermediate nodes, the hitting time from vi to vj is simply

equation image

where Equation 4 is used on the first term on the right hand side. By definition, H(i,i) = 0. Equation 8 provides a self-consistent method for evaluating the hitting time between any two nodes.

The commute time is defined by the sum of the hitting times in both directions, i.e.,

equation image

Note that the commute time is symmetric by definition while H(i,j) is not, as will be illustrated below for example proteins. See the section “Pedagogical example to compute hit/com-mute time” in Methods for hit time analysis of a simple network.

In the calculations below, it proves convenient to define the average hitting times in both directions, as well as the average commute time, for each individual residue as

equation image

<Hr(j)> and <Hb(i)> provide a measure of the respective receiver and broadcasting properties of residues j and i, and <C(i)> provides a measure of signal transduction properties, in general. Commute time is also known by the name of “resistance distance” in computational chemistry [14,18].

Application to Enzymes

Mapping of hit/commute times between all residue pairs.

Figure 1A displays the hitting times H(j,i) computed for all residue pairs (vi, vj) for an example enzyme, snake phospholipase A2 (Protein Data Bank (PDB) [19], 1bk9 [20]). The blue regions correspond to short hit times, and red regions to long hit times, as indicated by the scales on the right. The map consists of the elements of the hitting time matrix H. Accordingly, the jth row indicates the number of steps required for a signal to hit residue vj, starting from any residue vi. The values are relatively uniform within each row, which reveals that no single residue stands out as an efficient broadcaster, i.e., <Hb(i)> = 340.3 ± 1.5 for all i. On the other hand, comparing different rows we note that some residues are much better receivers than others. <Hr(j)> values indeed vary over a broader range of 340.3 ± 124.8.

Figure 1
Distribution of Hitting Times and Commute Times for Phospholipase A2 (1bk9 [20])

The higher ability of particular residues to transduce signals is also reflected in the commute times displayed in Figure 1B. The commute time is symmetric by definition, but the hitting time is not. The blue regions along the diagonal show that there is efficient communication along sequential residues, although we also observe several sequentially distant residue pairs that efficiently communicate. While the majority of these residue pairs are spatially close, as will be shown below, there is not necessarily a one-to-one correspondence between commute times and spatial distances, and some residue pairs emerge as more efficient communicators than others despite their longer physical separation.

Communication properties of individual residues.

Figure 1C displays the mean hitting time <Hr(j)> for each residue. Minima in this curve point to residues that are effective in receiving signals. It is worth mentioning that the mean commute time <C(i)>, which involves both receive and broadcast times, will have the same profile shape as the mean hitting time, because the broadcasting ability is roughly the same for all residues (as observed in Figure 1A).

Catalytic residues distinguished by fast and precise communication.

It is of interest to examine the signal transduction properties of catalytic residues. Phospholipase A2 has three catalytic residues: His48, Tyr52, and Asp99. Notably, all three residues (indicated by blue dots) are found to be located in minima (Figure 1C), i.e., the effective time required for these residues to establish communication with others is minimal.

To additionally highlight the enhanced communication properties of the catalytic residues, we plot in Figure 2 the mean <Hr(j)> and standard deviations σ(Hr(j)) for all residues. The catalytic sites are marked as red crosses. We observe that their hitting times (as well as their commute times) are short (in terms of their mean values) and precise (in terms of their standard deviations). From the plot, we also observe that for the same mean hitting time, the precision can vary by several folds. Figure 2 also displays the ligand-binding residues by black + symbols. While ligand-binding residues also exhibit relatively short hit times and small variance, they do not appear to be as distinctive as the catalytic residues.

Figure 2
Average Hit Times < Hr(j) > versus Their Standard Deviations σ(Hr(j))

Figure 3 illustrates similar results for four other enzymes: HIV-1 protease [21], ricin [22], human rhinovirus 3C protease [23], and endo-1,4-xylanase [24] (see caption for more details). The catalytic residues (highlighted as red dots) exhibit relatively short and narrowly distributed hitting times in each case. Ligand-binding residues (blue dots), on the other hand, display a wider range of hitting times and deviations, consistent with the results for phospholipase A2. At least one of the catalytic residues, indicated by the label, is distinguished in each case by its high communication speed and precision.

Figure 3
Results from Hitting Time Analysis for Four Enzymes

Bridging Information-Theoretic Concepts and Physically Inspired Models

Fluctuations determine communication.

Consider the hitting time to the nth residue vn starting from residue vi. Using Equation 8 and substituting n for j, we can use a truncated version of the Markov transition matrix An external file that holds a picture, illustration, etc.
Object name is pcbi.0030172.ex003.jpg where the nth row and column are deleted to obtain

equation image

Here, An external file that holds a picture, illustration, etc.
Object name is pcbi.0030172.ex004.jpg denotes nth row of the hitting time matrix H truncated to the first n − 1 elements and An external file that holds a picture, illustration, etc.
Object name is pcbi.0030172.ex005.jpg is a row vector of length (n − 1) of all 1′s. An external file that holds a picture, illustration, etc.
Object name is pcbi.0030172.ex006.jpg can be expressed in terms of a similarly truncated Kirchhoff matrix An external file that holds a picture, illustration, etc.
Object name is pcbi.0030172.ex007.jpg using Equations 2 and 5, leading to

equation image

or, in component form,

equation image

As derived in Methods, An external file that holds a picture, illustration, etc.
Object name is pcbi.0030172.ex008.jpg can be expressed in terms of the pseudo-inverse An external file that holds a picture, illustration, etc.
Object name is pcbi.0030172.ex009.jpg using the theory of generalized matrix inverses [1214], to obtain

equation image

for the hitting time from residue vi to any arbitrary residue vj. Substituting from Equation 19 in Methods, where the elements of the inverse of the Kirchhoff matrix are related to residue fluctuations [25,26], we obtain

equation image

The above equation constitutes the most important result from the present study: it provides the physical basis for the hitting times obtained with the information-theoretic methodology by relating them to correlations between residue fluctuations derived from statistical mechanical theory [25,26]. The meaning of Equation 15 will be further elaborated below upon assessment of the contribution of each term in brackets.

Substitution of Equation 14 in Equation 9 yields an expression for the commute time in terms of Γ−1,

equation image

which, using Equation 19, reduces to

equation image

This is our final expression bridging commute times with fluctuations An external file that holds a picture, illustration, etc.
Object name is pcbi.0030172.ex010.jpg in inter-residue distances. Note that the term in parentheses is a constant for all pairs of residues. Thus, the commute time between residues vi and vj is directly proportional to the fluctuations in the distance between these two residues, larger fluctuations entailing longer commute times, and vice versa.

More on the physical meaning of hitting times.

The hitting time expression Equation 14 involves three different types of contributions: a one-body term that depends on the destination node, An external file that holds a picture, illustration, etc.
Object name is pcbi.0030172.ex011.jpg ; a two-body term that depends on the initial and final nodes, An external file that holds a picture, illustration, etc.
Object name is pcbi.0030172.ex012.jpg ; and a series of three-body terms that depend on intermediate nodes, in addition to the two end points, An external file that holds a picture, illustration, etc.
Object name is pcbi.0030172.ex013.jpg dk. Of interest is to understand the relative contributions of these three terms. Note that the first is always positive, and increases with the size of destination residue fluctuations; the second may be positive or negative, and the negative sign in front of this term implies that positively correlated residue pairs shorten the hitting time. Likewise, the third term may be positive or negative.

Figure 4 shows the results for phospholipase A2. Figure 4A–4C corresponds to the respective one-body, two-body, and three-body contributions. Note that Figure 4A–4C has different scales, for clearer visualization. As we demonstrate in Figure 4A, the one-body term plays by far a dominant role in determining the resulting hitting times (shown in Figure 1A), i.e., the mean-square fluctuations of the destination node largely determine the hitting time. Residues subject to large amplitude fluctuations require a longer time to be hit, while those subject to small amplitude fluctuations, usually confined to the core or high-density regions, display short hitting times.

Figure 4
Physical Meaning of Hitting Times

The two-body term may be positive or negative, depending on the type of cross-correlations between residues vi and vj. The negative sign on the two-body term implies that the hitting time is reduced if a residue pair undergoes positively correlated fluctuations (Figure 4B). Anticorrelated residues ([Γ−1]ji < 0), on the other hand, make a positive contribution to Equation 14, thus increasing the communication time. Finally, from Figure 4C we observe that the contribution from the three-body term is negligibly small.

The qualitative features observed here were verified to be valid for all examined proteins: mainly, the mean-square fluctuations of the destination node play a dominant role in determining the hitting (or commute) time, and the cross-correlations between the two end points may increase or decrease the hit/commute time, depending on the type of correlation. Anticorrelations have a retarding effect, while positive correlations reduce the hitting time. In the extreme case of the two nodes moving in phase, by the same amplitude, the effective hit/commute time approaches zero.

Discussion

Communication Distances versus Physical Distances

The commute times provide us with a means of estimating effective communication distances An external file that holds a picture, illustration, etc.
Object name is pcbi.0030172.ex014.jpg between residues vi and vj, using the simple relation

equation image

for the mean-square distance traveled by a random walk of n steps, with l being the average step size. In our case, l can be readily estimated from the average distance between connected nodes in the network. For phospholipase A2, l is evaluated to be 3.41 Å. Note that this is shorter than the distance (~3.81 Å) between consecutive α-carbons, because side chain atoms between neighboring residues may get closer to each other. The number of steps, on the other hand, is directly given by the hitting times themselves (as hitting times are expressed in terms of number of steps). For simplicity, we will use n = ½C(i,j) for the effective number of steps for communication between residues vi and vj.. The average commute time is similar to the recently proposed “diffusion distance” metric for graphs introduced in previous studies [27,28]. The diffusion distance is based on the eigenvalue decomposition of the Markov transition matrix, whereas the hit/commute times are derived from the graph Laplacian.

How Do Effective Communication Distances Correlate with Physical Distances?

Figure 5A displays the results for phospholipase A2. The effective distance An external file that holds a picture, illustration, etc.
Object name is pcbi.0030172.ex015.jpg (ordinate) is plotted therein against the physical distance An external file that holds a picture, illustration, etc.
Object name is pcbi.0030172.ex016.jpg directly evaluated from the PDB coordinates, averaged out over all atoms of residues vi and vj. As expected, the effective communication distances increase with physical distance; however, we can also see a broad variability. The points colored red refer to pairs involving the catalytic residue His48.

Figure 5
Correlation of Effective Communication Distances with Physical Distances

Figure 5B and and5C5C displays from two different perspectives, two residues (Trp31 and Ile104) located at the same physical distance (11.35 ± 0.15 Å ) from His48, but differing in their communication distances (69Å and 46.5Å) by a factor of approximately 1.5, pointing to the importance of the particular topology, or secondary structural elements, in increasing the effectiveness of communication. The communication with Trp31, located on a loop, turns out to be much slower, in this case. Figure 5D and and5E5E illustrates the opposite case of two residues (Lys69 and Asn125) that display comparable communication distances (60.5 ± 1.52 Å), while their respective physical distances (12.9 and 21.6 Å) differ by a factor of 1.7, approximately.

Effect of Secondary Structure

The comparison of the effective and actual (physical) communication distances in Figure 5 suggests that secondary structural elements possess higher abilities in processing signals. To test the validity of this conjecture, we analyzed the distributions of hitting times H(i,j) to residue i, for the three cases where residue i is α-helical, β-strand, or coiled/disordered. Figure 6 displays the distributions obtained by combining the results for the examined enzymes. Because the average hitting time increases linearly with the size of a given enzyme, the results for each enzyme are normalized with respect to the number of residues N in each protein, before combining the data. α-helical residues are observed to be the most efficient communicators, succeeded by β-strand residues (intermediate behavior), while the coiled residues are slower and exhibit a broader distribution of hitting times. Examination of the individual proteins, on the other hand, reveals that β-strands may exhibit strong dependence on their spatial location in the 3-D structure of the proteins (Figure 7).

Figure 6
Importance of Secondary Structure in Defining Effective Means of Communication
Figure 7
Effect of Secondary Structure on the Hitting Time Distributions for Individual Proteins, Each Shown in a Separate Panel

As noted above, the mean-square fluctuations of the destination node play a dominant role in determining the hitting (or commute) time. The higher communication propensity of α-helical, and to some extent β-strand residues may thus be rationalized by the smaller fluctuations of secondary structural elements compared with coiled regions commonly observed in proteins. It is worth noting, however, that these observations hold for single domain proteins. As illustrated in Figure 8 for a multidomain protein, adenylate kinase, the communication between residue pairs belonging to different domains are usually slower than that between pairs in the same domain.

Figure 8
Comparison of Effective (Commute) Distances and Physical Distances between Residue Pairs in E. coli Adenylate Kinase (PDB: 4ake)

Connecting Physically Inspired and Information Theoretic Approaches

Methods based on network models significantly helped in recent years in providing a comprehensible description of the dynamics of biomolecular systems. On the one hand, methods based on fundamental statistical mechanical principles have been proposed for delineating the collective motions of biomolecules [19]; on the other, those based on spectral graph theory and machine learning algorithms have been developed for exploring allosteric effects and response to perturbations/mutations in complex structures [29,30]. While these two methodologies concur in their objectives—understanding the complex machinery of biomolecular systems, the connection between these two approaches has been elusive due to their different originating disciplines, as well as the basic quantities they shed light onto: frequency spectrum and normal mode shapes in the former, shortest paths of communication, and hitting/commute times in the latter.

The present study offers a rigorous way of connecting the two approaches, by demonstrating that the commute times between residues vi and vj, derived from Markov propagation formalism, directly scale with the mean-square fluctuations An external file that holds a picture, illustration, etc.
Object name is pcbi.0030172.ex017.jpg in inter-residue distances (Equation 17). Alternatively, the hitting times are expressed in terms of the elements of the covariance matrix (Equation 15). The proportionality between commute times/distances and fluctuations in inter-residue distances explains the “intriguing balance” (or high correlation) recently reported between shortest path lengths and residue fluctuations [31,32]. What the hit-time/commute-time metrics show is the predisposition of the network to exhibit particular communication patterns. These metrics point to how information diffuses in the system. For a protein that behaves as an elastic body, these communication patterns are expected to be intimately connected to equilibrium dynamics, and the present approach connects the equilibrium fluctuations to the kinetic perspective of diffusion on the graph. This analysis may be viewed as a first step toward building analytical models for elucidating the pathways of energy flow in complex biomolecular systems, complementing ongoing MD efforts along the same lines [33].

Biological Implications and Future Prospects

Notably, the application to example enzymes point to the more efficient communication propensity and precision of catalytic sites (Figures 13), to the role of residue fluctuations and their correlations in transmitting information (e.g., delaying effect of anticorrelated pairs), to the structure-encoded differences in the communication abilities of residue pairs, irrespective of their physical distances; and to the importance of both tertiary contact topology and local (secondary) structure in defining effective means of communication (Figures 5 and and6).6). Also, irrespective of physical distance, interdomain communication tends to be slower than intradomain, as illustrated in Figure 8 for adenylate kinase.

The major advantage of the present stochastic model over the GNM is the fact that the new methodology lends itself to a comprehensive assessment of the communication paths and their efficiency in biomolecular structures. As such, it holds promise for identifying allosteric communication pathways as well as the sites distinguished by high allosteric potentials, thus providing insights into the design principles of biomolecular machines. The presently observed enhancement in the information transfer properties of catalytic residues and secondary structural elements suggests possible design requirements for efficient enzymatic activity. In this context, it is worth noting the relevant studies by Choe and Sun [34] and Maritan and coworkers [35], which point to the dependence of equilibrium dynamics on secondary structural content/type. It remains to be understood whether such special communication abilities of catalytic residues result from their local packing topology or more global features conferred by evolutionary pressure.

We note that finding suitable experimental setup for probing hit-times is a challenge. In general, the residues/interactions involved in information flow, or the changes in inter-residue distances (which directly define the commute times) may be assessed by site-directed mutagenesis and cross-linking experiments as well as spectroscopic methods such as site-directed fluorescence labeling [36] or FRET [37].

Finally, establishing the bridge between these two disciplines will permit us to translate the wealth of concepts and methods developed in information-theoretic approaches, to exploring the signal transduction mechanisms in complex biomolecular systems, thus complementing physically inspired models and methods.

Methods

Positional fluctuations based on the GNM.

Let Δri and Δrj be the fluctuation vectors from the mean locations of residues vi and vj. According to the GNM, the cross-correlations between the fluctuations scale as [1]

equation image

where kB is the Boltzmann constant and An external file that holds a picture, illustration, etc.
Object name is pcbi.0030172.ex018.jpg denotes the ijth element of the inverse Kirchhoff matrix Γ−1. The individual residue mean-square (ms) fluctuations are obtained by substituting i = j in Equation 3. Equation 19 is based on purely statistical mechanical considerations, originally put forward by Flory and coworkers for polymer networks [25,26].

Fluctuations in inter-residue distances.

The mean-square fluctuations An external file that holds a picture, illustration, etc.
Object name is pcbi.0030172.ex019.jpg in inter-residue distances Δrij = Δrj − Δri can be expressed in terms of the fluctuations in position vectors as

equation image

Equation 20 can be rewritten, using Equation 19, in terms of the elements of Γ−1 as

equation image

By definition, Γ is positive semi-definite, i.e., Γ has rank n − 1, so it cannot be inverted. Instead, its pseudo-inverse is computed after eliminating the contribution from the zero eigenvalue.

Pedagogical example to compute hit/commute time.

Consider a small undirected network of three nodes connected as in

equation image

Assume that aij = ajk = 1 and hence the degrees di = dk = 1; dj = 2. For a random walk initiated at i (or k), it is obvious that it takes just one time step to reach j as mji = mjk = 1. So the hitting time H(j,i) = 1. Similarly, one might be tempted to conclude that the expected number of time steps to reach k from i is 2. However, this is not the case because once the random walk reaches j, there is an equal chance of returning to i or going to k, because mij = mkj = ½. This recursive argument can be unrolled in two ways. First, using Equation 8, the hit time H(k,i) can be expressed as

equation image

Similarly,

equation image

Simultaneous solution of Equations 22 and 23 yields H(k,i) = 4 and H(k, j) = 3. The second way to unroll the recursion is to enumerate the paths between pairs of nodes, as shown in Table 1. The enumeration leads to the calculation of expected time steps

equation image

Table 1
Enumerating Some of the Paths Connecting Node i with Node k

Given the symmetry in the network here, the hitting time H(k,i) = H(i,k), but this may not be true in general. To conclude this example, consider the hitting time from j to i. Again, the random walk unrolled partially is shown in Table 2. This enumeration leads to

equation image

Clearly, the hitting time H(j,i) ≠ H(i,j). While iterative methods, using Equation 8, are one way to solve for hit/commute times, there is also a “fundamental matrix” technique [10] for computing these quantities.

Table 2
Enumerating Some of the Paths Connecting Node j with Node i

Derivation of Equation 14.

The discussion below borrows from results in [12,13]. Deriving An external file that holds a picture, illustration, etc.
Object name is pcbi.0030172.ex020.jpg from Γ−1 is a three-step process: (i) put together a matrix An external file that holds a picture, illustration, etc.
Object name is pcbi.0030172.ex021.jpg (size: n − 1 × n) from An external file that holds a picture, illustration, etc.
Object name is pcbi.0030172.ex022.jpg (size: n − 1 × n − 1) by appending a column vector p (size: n − 1 × 1):

equation image

(ii) derive Γ (size: n × n) from An external file that holds a picture, illustration, etc.
Object name is pcbi.0030172.ex023.jpg (size: n − 1 × n) by appending a row vector tT (size: 1 × n ):

equation image

(iii) following the theory of generalized inverses [12] use An external file that holds a picture, illustration, etc.
Object name is pcbi.0030172.ex024.jpg to express the inverse An external file that holds a picture, illustration, etc.
Object name is pcbi.0030172.ex025.jpg and then derive An external file that holds a picture, illustration, etc.
Object name is pcbi.0030172.ex026.jpg from Γ−1.

The vectors p and tT are easy to derive because

equation image

and

equation image

which implies that

equation image

The generalized matrix inverse of An external file that holds a picture, illustration, etc.
Object name is pcbi.0030172.ex027.jpg is given by

equation image

equation image

where

equation image

equation image

and

equation image

Substituting for p we obtain An external file that holds a picture, illustration, etc.
Object name is pcbi.0030172.ex028.jpg , and hence

equation image

An external file that holds a picture, illustration, etc.
Object name is pcbi.0030172.ex029.jpg is a rank-1 update to An external file that holds a picture, illustration, etc.
Object name is pcbi.0030172.ex030.jpg and rT is a row vector. So, we can tease out An external file that holds a picture, illustration, etc.
Object name is pcbi.0030172.ex031.jpg here by

equation image

such that in component form

equation image

equation image

Now when we write

equation image

the effect of adding a row vector tT to An external file that holds a picture, illustration, etc.
Object name is pcbi.0030172.ex032.jpg is similar to adding the column vector p to An external file that holds a picture, illustration, etc.
Object name is pcbi.0030172.ex033.jpg . So, as in Equation 38, the inverse of An external file that holds a picture, illustration, etc.
Object name is pcbi.0030172.ex034.jpg can be expressed as

equation image

Putting the inverses in Equations 38 and 39 together, we obtain

equation image

By substituting Equation 40 into Equation 13, we get

equation image

Using symmetry, the summation can be extended to n as in

equation image

This derivation of hitting time to nth residue vn from vi was for convenience. For any arbitrary residue vj, the hitting time from residue vi will be

equation image

Acknowledgments

We gratefully acknowledge the contribution of Sinem Ozel to the results on adenylate kinase.

Abbreviations

EN
elastic network
GNM
Gaussian Network Model

Footnotes

Author contributions. CC and IB conceived and designed the experiments, performed the experiments, analyzed the data, and wrote the paper.

Funding. This study was supported by US National Institutes of Health grant R33 GM068400–01.

Competing interests. The authors have declared that no competing interests exist.

References

  • Bahar I, Atilgan A, Erman B. Direct evaluation of thermal fluctuations in protein using a single parameter harmonic potential. Folding Design. 1997;2:173–181. [PubMed]
  • Haliloglu T, Bahar I, Erman B. Gaussian dynamics of folded proteins. Phys Rev Lett. 1997;79:3090–3093.
  • Bahar I, Atilgan AR, Demirel MC, Erman B. Vibrational dynamics of proteins: Significance of slow and fast modes in relation to function and stability. Phys Rev Lett. 1998;80:2733–2736.
  • Cui Q, Bahar I, editors. Normal Mode Analysis: Theory and applications to biological and chemical systems. Boca Raton (Florida): CRC Press; 2006.
  • Bahar I, Rader AJ. Coarse-grained normal modes in structural biology. Curr Opin Struct Bio. 2005;15:1–7.
  • Micheletti C, Carloni P, Maritan A. Accurate and efficient description of protein vibrational dynamics: Comparing molecular dynamics and Gaussian models. Proteins. 2004;55:635. [PubMed]
  • Nicolay S, Sanejouand YH. Functional modes of proteins are among the most robust. Phys Rev Lett. 2006;96:078104. [PubMed]
  • Yang LW, Bahar I. Coupling between catalytic sites and collective dynamics: A requirement for mechanochemical activity of enzymes. Structure. 2005;279:893–904. [PMC free article] [PubMed]
  • Ma J. Usefulness and limitations of normal mode analysis modeling dynamics of biomolecular complexes. Structure. 2005;13:373–380. [PubMed]
  • Norris JR. Markov chains. Cambridge (United Kingdom): Cambridge University Press; 1997.
  • Doyle PG, Snell JL. Random walks and electric networks. The Mathematical Association of America; 1984.
  • Barnett S. Matrices: Methods and applications. Oxford: Oxford University Press; 1992.
  • Fouss F, Pirotte A, Renders JM, Saerens M. A novel way of computing similarities between nodes of a graph, with application to collaborative recommendation; Proceedings of the IEEE International Conference on Web Intelligence; 19–22 September 2005;; Compiegne, France.. Institute of Electrical and Electronics Engineers; 2005. pp. 550–556.
  • Xiao W, Gutman I. Resistance distance and Laplacian spectrum. Theo Chem Acc. 2003;110:284–289.
  • Brinda K, Vishveshwara S. A network representation of protein structures: Implications for protein stability. Biophys J. 2005;89:4159–4170. [PubMed]
  • Chung FRK. Spectral Graph Theory; CBMS Lectures Regional Conference Series in Mathematics; February 1997;; Providence, Rhode Island.. American Mathematical Society; 1997.
  • Aldous D, Fill J. Reversible Markov chains and random walks on graphs. Chapter 2. 2001. Available: http://www.stat.berkeley.edu/~aldous/RWG/book.html.
  • Klein DJ, Randic M. Resistance distance. J Math Chem. 1993;12:81–85.
  • Berman H, Westbrook J, Feng Z, Gilliland G, Bhat T, et al. The Protein Data Bank. Nucleic Acids Res. 2000;28:235–242. [PMC free article] [PubMed]
  • Zhao H, Tang L, Wang X, Zhou Y, Lin Z. Structure of a snake venom phospholipase A2 modified by p-bromo-phenacyl-bromide. Toxicon. 1998;36:875. [PubMed]
  • Louis J, Dyda F, Nashed NT, Kimmel AR, Davies DR. Hydrophilic peptides derived from the transframe region of Gag-Pol inhibit the HIV-1 protease. Biochemistry. 1998;37:2105–2110. [PubMed]
  • Yan X, Hollis T, Svinth M, Monzingo AF, Milne GW, et al. Structure-based identification of a ricin inhibitor. J Mol Biol. 1997;266:1043–1049. [PubMed]
  • Matthews DA, Dragovich PS, Webber SA, Fuhrman SA, Patick AK, et al. Structure-assisted design of mechanism-based irreversible inhibitors of human rhinovirus 3C protease with potent antiviral activity against multiple rhinovirus serotypes. Proc Natl Acad Sci U S A. 1999;96:11000–11007. [PubMed]
  • Sidhu G, Withers SG, Nguyen NT, McIntosh NT, Ziser LP. Sugar ring distortion in the glycosyl-enzyme intermediate of a family G/11 xylanase. Biochemistry. 1999;38:5346–5354. [PubMed]
  • Flory PJ. Statistical thermodynamics of random networks. Proc Roy Soc A. 1976;351:351–380.
  • Kloczkowski A, Mark JE, Erman B. Chain dimensions and fluctuations in random elastomeric networks I: Phantom Gaussian networks in the undeformed state. Macromolecules. 1989;22:1423–1432.
  • Coifman R, Lafon S. Diffusion maps. App Comp Harmonic Analysis. 2006;21:5–30.
  • Estrada F, Jepson A, Chennubhotla C. Spectral embedding and min-cut for image segmentation; Proceedings of the British Machine Vision Conference (BMVC); 7–9 September 2004;; Kingston, United Kingdom.. 2004. In.
  • Chennubhotla C, Bahar I. Markov propagation of allosteric effects in biomolecular systems. Mol Sys Biol. 2006;2:36. [PMC free article] [PubMed]
  • Ming D, Wall ME. Quantifying allosteric effects in proteins. Proteins. 2005;59:697–707. [PubMed]
  • Atilgan AR, Akan P, Baysal C. Small world communication of residues and significance for protein dynamics. Biophys J. 2004;86:85–91. [PubMed]
  • Atilgan AR, Turgut D, Atilgan C. Screened non-bonded interactions in native proteins manipulate optimal paths for robust residue communication. Biophys J BioFAST. 2007. doi: 10.1529/biophysj.106.099440. [PubMed]
  • Ishikura T, Yamato T. Energy transfer pathways relevant for long-range intramolecular signaling of photosensory protein revealed by microscopic energy conductivity analysis. Chem Phy Lett. 2006;432:533–537.
  • Choe S, Sun S. The elasticity of α-helices. J Chem Phy. 2005;122:244912. [PubMed]
  • Micheletti C, Lattanzi G, Maritan A. Elastic properties of proteins: Insight on the folding process and evolutionary selection of native structures. J Mol Biol. 2002;321:909–921. [PubMed]
  • Mansoor S, Farrens D. High-throughput protein structural analysis using site-directed fluorescence labeling and the bimane derivative (2-pyridyl)dithiobimane. Biochemistry. 2004;43:9426–9438. [PubMed]
  • Haas E. The study of protein folding and dynamics by determination of intramolecular distance distributions and their fluctuations using ensemble and single-molecule FRET measurements. Chem Phys Chem. 2005;6:858–870. [PubMed]

Articles from PLoS Computational Biology are provided here courtesy of Public Library of Science