Home | About | Journals | Submit | Contact Us | Français |
Using the problem of ion channel thermodynamics as an example, we illustrate the idea of building up complex thermodynamic models by successively adding physical information. We present a new formulation of information algebra that generalizes methods of both information theory and statistical mechanics. From this foundation we derive a theory for ion channel kinetics, identifying a nonequilibrium ‘process’ free energy functional in addition to the well-known integrated work functionals. The Gibbs-Maxwell relation for the free energy functional is a Green-Kubo relation, applicable arbitrarily far from equilibrium, that captures the effect of non-local and time-dependent behavior from transient thermal and mechanical driving forces. Comparing the physical significance of the Lagrange multipliers to the canonical ensemble suggests definitions of nonequilibrium ensembles at constant capacitance or inductance in addition to constant resistance. Our result is that statistical mechanical descriptions derived from a few primitive algebraic operations on information can be used to create experimentally-relevant and computable models. By construction, these models may use information from more detailed atomistic simulations. Two surprising consequences to be explored in further work are that (in)distinguishability factors are automatically predicted from the problem formulation and that a direct analogue of the second law for thermodynamic entropy production is found by considering information loss in stochastic processes. The information loss identifies a novel contribution from the instantaneous information entropy that ensures non-negative loss.
Ion channels are transmembrane proteins that allow movement of solutes between two aqueous/membrane interfaces [1]. Selective channels and transporters are critical for maintaining living cells in their nonequilibrium state. Similar functionality is a required ingredient of synthetic semi-permeable partitions, used in fuel cells, solute separation, and electrochemical sensing. The operational characteristics of these devices are determined from their response to applied pressure, electric fields, and solute concentration differences. The most easily measured response is ion conduction, available through current measurements that can be carried out on micrometer-sized patches at milli-second resolution [2, 3]. Conduction of other species, such as water, as well as structural changes in the channel and surrounding interface regions are also important, but less accessible. The most easily accessible theoretical descriptions of channel behavior center around the structural properties of the equilibrium state and its propensity for ion occupancy under no external bias (in non-conducting conditions). In this article we present a top-down view by successively adding mechanistic information to predict these propensities. This allows a construction of simplified physical interpretations of channel behavior, but uses a statistical mechanics capable of deriving all the complexities of atomistic and quantum-mechanical systems. Because no net currents are present at equilibrium [4], the fluxes in these systems must be analyzed using a nonequilibrium theory.
When the conceptualization of an ensemble was extended by Gibbs [5] from physically realizable systems with many weakly interacting particles to non-interacting replicas of systems that may contain strong internal interactions, there seemed to be two entirely different ways in which the laws of thermostatics could be produced. This conception could hardly be considered as satisfactory, and it leaves unanswered the physical reason for the weak coupling between ensembles that is supposed to bring about equilibrium [6]. Although there continues to be debate over these conceptualizations [7], attempts to prove ergodicity and convergence to maximum entropy distributions using mechanical arguments show that the most robust route is to introduce some form of uncertainty [8, 9]. It has been noted that the maximum entropy formalism follows if one assumes the existence of infinite heat-baths [10]. Such results have given way to a gradual increase in acceptance of the information-theoretic derivation popularized by Jaynes [11–13] and others [14]. These works have helped clarify the situation by making a distinction between the “delusion that an ensemble describes an ‘objectively real’ physical situation” [12] and the subjective question of determining the “agreement between the premises and the conclusions” [5].
It cannot be denied that these views call forth some objections. Perhaps the strongest criticism of this approach is associated with the use of the term, ‘subjective.’ This term seems to imply that the results of the theory cannot be considered as objectively existing in reality. Again, Jaynes presented detailed examples applicable only to the canonical ensemble already given by Gibbs. This has left the question of how the molecular degeneracy factor may be derived, as on this point Jaynes reverts to a functional argument requiring the entropy to be extensive with respect to the volume [15]. A similar criticism of his approach to nonequilibrium is that it is operationally similar to the projector-operator formalism [4, 14, 16, 17], and has departed from the original program of formulating universally applicable laws based on a minimal description of a thermodynamic system. Indeed, a description of nonequilibrium ensembles obtained simply by applying maximum entropy to path space, the maximum caliber approach, does not produce a causal description of mechanics [18].
It appears after these remarks that the usefulness of the information theoretic approach outside of the realm of the canonical ensemble may be called into question and that this investigation concerns important logical principles. We have therefore built up a purely statistical theory by which we have been able to show that both of the above objections may be answered.
This derivation is made possible by the assumption that a probability exists for every piece of information given a starting set of assumed information. Representing coordinates for specifying a microstate of a thermodynamic system as a logical hypothesis, we derive the machinery of statistical mechanics by building up a probability distribution for a set of possible coordinates as well as information on their relative probabilities. Every change in the thermodynamic state of the system corresponds to a change in an objective state of knowledge. A judicious use of Bayes’ theorem then allows us to build up an algebra for describing these changes. The partition functions of these states become fundamental objects for computation of averages given known information, equilibrium or otherwise.
In the next section, we define an information algebra for working with belief functions generated by successive addition of information. Section 3 applies this work to the grand-canonical distribution for ion occupancy within a channel. The informational origins of the (in)distinguishability factor in the problem symmetry group and of the thermostatic forces in experimental convention are emphasized. The next section considers the consequences of adding coordinates to the system. Rather than allowing the distribution over initial coordinates to change, Sect. 5 then asks what distribution we obtain by calibrating the newly added coordinates to the existing distribution. Applying the resulting theory to the ion channel problem then identifies a minimalistic description of a small-scale nonequilibrium system. Quantifying the information loss in this process leads to a novel form for the second law of thermodynamics, which combines both the instantaneous information entropy and the system energy flux. The nonequilibrium partition function gives a set of Green-Kubo relations valid for transient processes. Finally, we illustrate these developments with a numerical calculation for deviations from steady-state conductance.
A belief function may be represented as an unnormalized assignment of probability,
to a set of statements, C [19]. We say that such a belief function represents a ‘state of knowledge’ when P(Q|C) is known up to a constant of proportionality for any logical statement, Q. In order to carry out computations, we define the logical conjunction as a single basic operation. This operation defines an algebra by forming a new state of knowledge, AC, from a given state of knowledge, C, and a new hypothesis, A. As for A and C separately, the combination AC can be also interpreted as a logical statement about a set of events, so that we may compute P(Q|AC) for any Q up to a constant of proportionality. Because the combination rule should involve P(Q|C) and P(A|C), it defines the rules of probabilistic inference and should be considered carefully.
Jaynes [13] presents a cogent interpretation of probability theory as a method for conducting logical inference in the presence of uncertainty. This interpretation is based on Pólya’s qualitative conditions for plausible reasoning in mathematics [20] combined with the consistency theorems of Cox and Aczél [21, 22] deduced by consideration of the associativity equation. Requiring our system for assigning plausibilities to be associative, such that adding information in any order leads to the same probability assignment, it is possible to deduce the product rule,
for which the right equality is Bayes’ theorem. In this paper, we denote propositions using Greek or capital letters, and the symbols on the right of the | represent given information, or assumptions. This distinction is necessary to allow for propositions that represent coordinates,
X: Some property of the system is described by the number x.
Propositions always appear inside the probability or Z[ ] symbols and follow the Boolean algebra, where multiplication denotes a logical ‘and,’ while addition represents a logical ‘or.’
In order to ensure this condition is always satisfied, a generic rule must be given for carrying out the conjunction,
We can prune the summation set, {Q}, by only including statements directly relevant to deciding the plausibility of C or A. To see this, assume that these relevant statements are collected in the space . Then write {Q} = × , where Y are irrelevant to A and C (when X is known) so that P(Y|XAC) = P(Y) and P(Q|AC) = P(XY |AC) = P(X|AC)P(Y). The sum in (3) factors into
An immediate difficulty arises using (3) in adding the first piece of information. This is because P(A|C) may only be known up to a constant. We therefore make the convention of always assuming the principle of indifference (termed I) on the right-hand side of the probability symbol. Although it may be omitted in some formulas for clarity, it is always implicitly assumed to be present. This principle assigns a default distribution, P(A|I) = constant, but does not affect the conditional assignments, P(A|CI), when C says something about A.
In order to work with likelihood ratios instead of the explicitly normalized form of (3), we introduce a null hypothesis, Φ, that is undecidable from any other information. It has the formal properties,
Now divide the set of statements, C′ (appearing above), into two sets, C′ = DC. From the two equivalent ways of composing P(DΦ|CI) using Bayes’ theorem, it is easy to see that the above is true if and only if Φ is irrelevant to conclusions about D,
We thus recognize Φ as the identity element of information algebra.
By weighing alternatives against P(Φ|I), it is possible to re-phrase (3) into
Now, if P(QC|I) is known up to normalization as Z[QC], then Z[C] = Σ_{{}_{Q}_{}} Z[QC] and
This re-casts (5) as an explicit formula for carrying out the logical conjunction,
Because this derivation is symmetric in C and A, the conjunction is commutative and associative.
To find an inverse in this algebra, we compare the addition C → AC with C → BC. Instead of computing each of these separately, we directly find likelihood between AC and BC using
This shows that the distribution over Q|AC may be had from Q|BC via re-weighting. However, if there is a Q for which P(Q|BC) is zero, but for which $\text{P}(A\mid QC)={\scriptstyle \frac{\text{P}(Q\mid AC)\text{P}(A\mid C)}{\text{P}(Q\mid C)}}$ is non-zero, then (7) cannot be evaluated. Therefore if B contains a restriction on the set of allowable Q, then this restricts mutual comparison among A, B. In other words, the inverse of B relative to A only exists when P(Q|BC) is nonzero on a smaller space Ω {Q} than {Q} on which P(Q|AC) is nonzero. An absolute inverse exists if this holds for all A, or equivalently, for A = Φ. This caveat is related to the computational problems involved in computing Z[AC] using (7) as a Monte Carlo method based on data, Q, sampled from P(Q|BC) [23, 24].
Because the conjunction formula (5) is simply (7) for the special case B = Φ, it is convenient to define
so that likelihood ratios can be expressed more simply as
As their name implies, these are weights,
It must be understood that the re-weighting is only valid when w_{B}_{→}_{A}(Q) < ∞ for all Q with Z[QAC] > 0.
Propositions defined inside some set of allowable questions, Ω, can still be compared against one another, and their likelihoods computed from either the null hypothesis, Φ, or a new null hypothesis, Φ Ω, defined relating only to Q allowed by Ω. Addition of the information, B = BΩ, to a state can be represented using a commutation diagram (Fig. 1), where paths represent step-wise addition of constraints/hypotheses. Completely commuting classes share an underlying definition of coordinate space. Whenever information of the type Ω is added, it directly bears on subsequent propositions. Paths adding BΩ will therefore restrict the set of subsequent questions that may be asked without knowledge of P(Ω|C). These paths are therefore represented by a directed edge, branching from the above completely connected graph. The commutation diagram terminology is justified by noting that the multiplicative functions, (8), transforming one probability distribution into another arrive at the same distribution function for any ‘allowed’ path.
The considerations up to this point show that it is possible to define a probability assignment for any states of knowledge about a set of possible underlying causes, {Q}, by specifying likelihood ratios for successive addition of this information to each system state, Q {Q}. Update schemes taking a consistent valuation, Z[QC], to another, Z[QAC], have been derived that exploit factorability of Z[Q_{1}Q_{2} … AC], where only some parts, Q_{i}, of each complete state specification, Q, are relevant to each other and to hypotheses in A [19, 25, 26]. In reference [19] it is particularly clear that addition of evidence to a state of knowledge is carried out by successively moving new information along a causal path using an unnormalized form of (10).
Our definitions for the information algebra may be connected to the usual use of Bayesian probability theory in the following way. Given a set of possible parameters, θ , we may use their symmetries [13] to arrive at a state of knowledge, , listing unique parameter values. The principle of indifference then assigns a uniform relative weight over ,
This means that the relative likelihood of the set is Σ_{θ} Z[θ ] = | |, and the prior distribution, Z[θ ]/Z[ ], is uniform.
Next, some data, D, is collected and the state of knowledge updated to D by conjunction,
Notice that θ implies , so that the likelihood ratio Z[Dθ]/Z[D ] along the path D → Dθ now gives the posterior distribution. Bayes’ theorem appears as the cycle identity between successive likelihood ratios for D → → θ → Dθ,
The unnormalized probability Z[C] in (1) is formally a function of the state of knowledge C that can be arrived at independently from the order of information addition. We will show that up to normalization, this is identical to the thermostatic partition function, a function of the state of a thermostatic system. We assume in this section that the hypotheses are conditionally independent if the coordinates, X, are known so that w_{A}(XC) = w_{A}(X).
First we have to address the physical problem of defining w_{A}(X) for two types of information:
Once specified, these will determine how transformations between states of knowledge are carried out using the information algebra.
A general pattern forms for assigning w_{A}(XC) by first finding a minimal set of relevant information XY, implied by XC so that A is conditionally independent of C when XY is known, simplifying the weight to w_{A}(XC) = w_{A}(XY). Comparing w_{A}(XY) for different A then suggests an appropriate relative weight. This relative weight problem is similar to the problem of factoring Z[QAC] in belief networks [25].
To find w_{Ω} (Q), we start from an implicit definition of Ω as re-normalizing:
where the indicator function, I (·) is one when the condition is satisfied, and zero otherwise. Two constraints that both allow Q should be equally likely given the same starting information, C, leading to the assignment
with partition function
This is consistent with Z[ ] of the last section as well as the free energy cost for inserting a hard core solute into solution [27] or imposing a constraint on the geometry of an ion binding site [28].
Now consider the multi-ion binding site in a K^{+} ion channel selectivity filter (Fig. 2) [29]. Four cationic binding sites are distinguished, and it is assumed that the channel presents a high enough energetic penalty to exclude the possibility of anion occupancy. We do not expect multiple ion occupancy of the same site to be possible (or highly probable) because of mutual electrostatic repulsion and geometric features of the channel. This leads us to the fermion-like default statistics,
where n particles may occupy k states in $\left(\begin{array}{c}k\\ n\end{array}\right)$ ways for a total of 2^{k} elementary states of the system. The notation [·] is used to mean that each of the referenced states are mutually exclusive.
In the absence of any other information, each state is equally likely, and
This probability distribution factors into a product of independent distributions for each site, with equal probability for occupied and unoccupied states. The distribution is shown in Fig. 3a.
The partition function is the number of states, Z[ ] = 2^{4} (14). Using the same equation, the partition function of a constrained system, for example at fixed N, is $Z[N{\mathbb{S}}_{x}]=Z[{\mathbb{S}}_{x}]{\sum}_{X\mid N}\text{P}(NX\mid {\mathbb{S}}_{x}\phantom{\rule{0.16667em}{0ex}}I)=\left(\begin{array}{c}k\\ n\end{array}\right)$. The much debated ‘(in)distinguishability factor’ for particle counting [30, 31], as well as a volume factor, have already crept in as a consequence of the definition in (15) since in the limit K N, $\left(\begin{array}{c}k\\ n\end{array}\right)\to {k}^{n}/n!$. It is easy to see from the arguments leading to (15) why Z[N ] should always be the size of the fundamental domain of symmetry or unique space over which any function can be defined (e.g., a crystallographic unit cell). This always gives division by the correct symmetry factor.
Formulas (9) and (10) are well-known relations in statistical mechanics when Boltzmann factors are inserted for the weights
In that case, they identify P(X|F ) as the canonical distribution and generate free energy perturbation and umbrella sampling formulas [32]. To develop physical intuition, we show in Appendix A how w_{F} (X) can be related to its definition at the beginning of this section using an intuitive derivation of the relative information entropy.
To go beyond the uniform distribution for ion occupancies, we may add a constraint on average energy, labeled by β. A simplified energy function is constructed for the ion channel system by including a mutual Coulomb repulsion between the ions, constrained to the vertical axis and spaced at 3.5 Å, close to the spacing observed in the 1K4C crystal structure [33]. We also assume a simple stabilization energy for each ion from the protein, E^{0} ≈ −111 kcal/mol, just strong enough to give multiple ion occupancy in Fig. 3c. Abbreviating N X to X, the energy function is
Placing this constraint on the average system energy at constant N leads to the well-known canonical distribution with partition function
Here, it can be seen that the probability for N , proportional to $\left(\begin{array}{c}4\\ n\end{array}\right)$, cancels in the expression so that the increment Z[Nβ ]/Z[N ] is an average according to (5). Removing the constraint on N also leads to the multicanonical ensemble in the same way, viz. Z[β ] = Σ_{N} Z[Nβ ] (14), P(N|β ) = Z[Nβ ]/Z[β ].
In either case, we can assign the parameter β the meaning of, “there exists a physical mechanism that decreases the likelihood of the system being in a high-energy state.” To separate these energy states, we introduce a constraint on the energy, denoted by E. Thus, if a system were allowed to choose its own energy state,^{1} the force would bias this choice according to P(Eβ|A)/P(EΦ|A) = e^{−}^{βE}. We can set this bias, β, to give a reference system with known properties by exactly balancing its internal tendency toward higher energy, P(E + dE|A)/P(E|A)e^{−}^{βdE} = 1. This implies that β should solve $\beta ={\scriptstyle \frac{\partial}{\partial E}}lnZ[EA]$ for a reference system with known energy; for example, a thermometer in which energy is easily measured by size expansion. Because our reference thermometer is constantly exchanging energy with the environment, we usually observe its average energy, and β should be chosen such that $\langle E\mid \beta A\rangle =-{\scriptstyle \frac{\partial}{\partial \beta}}lnZ[\beta A]$. The difference between these values (maximum vs. average energy) is important for small systems, but becomes negligible in the limit of large system sizes [34]. Using either of these forces in the present system mimics the effect of allowing energy exchange between the thermometer at this state and the system. This explains the convention of identifying temperature with the dilation of a thermometer and its connection to the statical force, β.
By the device of a reference system, the physical nature of the Lagrange multiplier, β, has changed from an absolute constraint on the average energy of the system of interest into a force biasing its energy. The information F thus has a different quality than the information λ in (17) because the first implies that λ re-adjusts when further information is added.
Another constraint we may add is the inclusion of an external force on the total number of ions, μ. Because the n ions are more likely to choose an environment with lower energy, −μn, this changes the probability of ion occupancy by ${\scriptstyle \frac{\text{P}(\mu \mid N)}{\text{P}(\mathrm{\Phi}\mid N)}}={e}^{\beta \mu n}$. The multiplier β appears because we want to express μ in energy units. Just as above, we can choose the chemical potential, μ, to give a reference system with known properties by balancing its internal energy change on ion addition using the choice $(\beta \mu )=-ln{\scriptstyle \frac{Z[(N+1)A]}{Z[NA]}}$ [35]. We can mimic the effect of allowing K^{+} transfer from a bulk 100 mM KCl solution to a reference volume of V_{0} = 4 Å^{3} in the present system (with the corresponding Cl^{−} moved to a similar environment and its contribution neglected) by choosing μ_{K+} = −81 + β^{−1} ln(0.1V_{0}) kcal/mol [36]. Without this constraint on N, the system is effectively allowed to exchange particles with vacuum. The combination of both constraints, which we refer to as F = βμ, is shown in panel (c) of Fig. 3. The preference for the separated state (X_{1}X_{4}) in this model shows the effect of mutual ion repulsion.
The theoretical background in Sect. 2 allows us to go further than the most common relations of thermostatics. In particular, the choice of coordinate space, , is no different than any other constraint except that it is almost never moved to the left-hand side to form quantities such as P(F |I) and comparisons between states are carried out almost exclusively with a fixed . The addition of coordinates is associated with the transition from canonical to multicanonical ensembles. It has served as the starting point for some very difficult reading in thermodynamics textbooks involving over/under counting and (in)distinguishability arguments.
Since the rules have already been given above, we proceed to an example: addition of protein-ion interactions by assuming a set of protein conformational states. A simplistic example is provided by assuming (in addition to an open state, O) two ‘C-type’ inactivated states in which a pinching motion of the pore prevents occupancy at site 2 (state I_{1}) or sites 2 and 3 (state I_{2}, see Fig. 2) [37]. These states are assumed to be mutually exclusive and exhaustive, so that all conformational states, Y, are a member of the space Ω = (O, I_{1}, I_{2}). Before any coupling is assumed, the total number of occupancy states, | |, is multiplied |Ω| times to create the product space, Ω × .
To add coupling, introduce a hypothesis, G, stating the unallowed joint conformations. Using (13) and (10),
But since GΩ is just another piece of information on XY,
Summing over X gives P(Y|FG Ω) = Z[Y F G ]/Z[F G Ω]. The partition function again has the interpretation of an unnormalized probability. This idea forms the basis for understanding the ratio between Z[AF G ] and Z[B F G ] as a log-likelihood ratio between two Hamiltonians, and for extending a canonical ensemble into a multicanonical one. If the number of states changes for this process, then as we have shown for the grand-canonical ensemble, our definition of P(Ω|I) (14) counts each ‘state of knowledge’ once, and thus directly accounts for (in)distinguishability factors.
Incorporating the conformational state information, GΩ, into the ion channel system leads to the results shown in panels (b), no energetic constraint, and (d), constrained chemical potential and energy, of Fig. 3. Because fewer states are available to the system in conformations I_{1} and I_{2}, they appear less often. Colloquially, they are said to be entropically un-favorable. In our derivation, this entropy decrease came about from adding information G to F Ω. Using the definition of the entropy given in (34) implies that the relative entropy addition F → F Ω is zero. This should be expected for a measure of information since the ability to observe a new variable, Y, that is nevertheless completely random adds no real information. The statement, ‘I_{2} is entropically unfavorable’ is therefore expressing the fact that the accessible volume for X has decreased from some previously available volume upon changing Ω to I_{2} or upon adding information I_{2}G.
The conventional thermodynamic entropy implicitly defines a previously available volume, regardless of whether such a state physically exists. Instead of this behavior, it seems preferable to define the entropy relative to the completely uniform distribution, as we have done here. In this case, the probability for occupying degenerate (but distinguishable) states increases because of the counting conventions of the partition function. This dependence is made explicit in the present definitions of likelihood ratios and relative entropies.
Comparing the default model to an assignment of free energies calculated in Ref. [29] shows a stronger preference for occupancy at S_{2}, S_{4} than S_{1}, S_{4} due to a large stabilization for occupancy at S_{2}. The crystal structures of Ref. [37] show decreases in occupancy at this site due to a pore-domain conformational change, and it is interesting to speculate that this conformational change is involved in destabilizing S_{2} during ion permeation. In our analysis up to this point, an assumption for the channel conformation has had the same effect as assuming an energy function for the system. Labeling the conformations and allowing them to change gives the conformations the interpretation of an additional system coordinate. For the system to destabilize S_{2}, the I_{1}/I_{2} conformations would require an additional biasing energy from the environment. Another way to approach the problem is to use ion occupancies averaged over conformations along with information on their coupling to infer the conformational distribution. This method will be shown in the next section.
If we had assumed some experimentally known probability distribution over X instead of the energy function assumed for F in the last example, then adding information G becomes qualitatively different. To avoid interfering with the distribution over X, the information F must take priority over any other constraints we may add to the problem. However, this does not prevent us from coupling Y to X using the conventional maximum-relative entropy hypothesis,
G: The probability of XY, given that G is accepted, is the most likely observational distribution that obeys g(y; x)|AXG = G(X) for any AX.
The entropy functional (34) decomposes as
The sums in this section are all taken to be over X and Y Ω without loss of generality since we choose × Ω to be the set of all XY relevant to deciding A or G. The last term in the expansion above is a conditional entropy, which is a functional of P(Y |AGXΩ) and depends on X. Because each conditional distribution can be chosen independently from the others and from P(X|AG Ω), the entropy of each one is independently maximized when [AG Ω|A Ω] is maximum. However, the presence of Y allows [AG Ω|A ] to differ from [A |A ] = 0, since P(X|AG Ω) = Σ_{Y} P(XY |AG Ω). For these two to be equal in general requires that P(X|AG Ω) = P(X|A ), that is, that the distribution of X not be dependent on the information GΩ when A is present.
Because we want to specify the marginal distribution of X directly, it is convenient to denote this information as the compound hypothesis,
F_{X}: The probability distribution of X is determined by information F_{X} and unchanged by information GΩ.
When this hypothesis is in place, we will have P(X|F_{X} G Ω) = P(X|F_{X} ). Bayes’ theorem says that we must also have P(GΩ|XF_{X} ) = P(GΩ|F_{X} ), implying w_{GΩ} (F_{X} X) = 1. Effectively, the Y have become ‘imaginary states’ to the system in the sense that there is no free energy change for F_{X} → F_{X} G Ω.
Although there is no change to or the distribution of X, maximizing (21) results in
an expression reminiscent of the transition probability for a Markov process. The conditional entropy is
and we define as usual
These considerations are sufficient to fill out the thermodynamic cycle when F_{X} is assumed, as has been done in the left half of Fig. 4.
Imposing the distribution among ion occupancy states given in Ref. [29] (shown for reference in Fig. 3f) as F_{X}, application of this procedure to determine the conformational equilibrium shows that the channel is almost always in the open state due to the high probability for occupancy of S_{2}. The probabilities for I_{1} and I_{2} are 2.3 · 10^{−4} and 8 · 10^{−6}. Although X|F_{X} is independent from GΩ, knowledge of Y is still informative for X, as
Using this method of inference, the occupancy distribution in the open state is shown in Fig. 3e. There is a very slight increase in occupancy at S_{2} and a decrease at S_{3}, but the effect is small because the open structure is dominant. Note that our assumption that the free energies of Ref. [29] are averages over the conformational states is at odds with the crystal structures of Ref. [37], indicating that motions around the S_{2} site not seen in short simulations may play a role in destabilizing this site, enabling ion translocation through S_{1} and S_{3}.
We argue that addition of conditional maximum entropy information is central to nonequilibrium statistical mechanics. To derive an ensemble of trajectories, we add all possible transitions, Y, originating from each state, X. The initial state and its transitions are linked by some information, G, which determines the distribution of Y given X. This constraint determines a maximum entropy transition probability density, as considered in differential form in Refs. [38, 39] and suggested in Ref. [40]. The hypothesis F_{X} states that what we know about the starting distribution is completely determined by F_{X} and not by any possible, but unknown, future events. It is required for the process to be non-anticipating in the sense that no information about processes we may carry out in the future, GΩ, is available from X.
By focusing on the information loss during a stochastic transition, we derive fluctuation formulas for irreversible entropy production that include a contribution from the instantaneous information entropy. Figure 4 displays the duality between fixing F_{X} at the initial time and fixing its propagated distribution F_{Z}. In setting up an inference problem for Y starting from F_{X} GX, the distribution of Y is given by (22). If this distribution is used to determine F_{Z} using P(Z|F_{X} G ) = Σ_{XY} P(Y |F_{X} GXΩ)P(X|F_{X} _{)}I (Z = Z(Y)), some information loss occurs when F_{X} is discarded and only F_{Z} and information constraining the transitions between states, G, retained. Assuming the transitions, Y, specify both end-points X, Z, the distribution of Y carries the complete information for this process. Using the information loss metric [13, 41],
The averaging is taken in the forward direction, and so L ≥ 0 evidently represents the amount by which the real distribution F_{X} G → XY F_{X} G contains information not present in a distribution guessed from F_{Z} G^{*}. Note that if G allows only one-to-one XZ, the transitions are deterministic, and zero information is lost. More generally, if forward and backward inference directions yield the same joint distribution so that F_{X} G = F_{Z} G^{*}, then there is no way to discern the direction of time’s arrow and no information is dissipated.
The above relations are purely statistical, and have been stated in terms of maximum entropy constraints for forward, G, and reverse, G^{*}, inference problems. They are generally valid for any choice of G^{*}. In derivations of the fluctuation theorem [42], a particular choice of G^{*} is made corresponding to time-reversed equations of motion. The statistical perspective expressed here shows that this operation is confined to the choice for G^{*}, and provides a suggestion as to the informational role of time-reversal. For example, the forward constraints are consistent with the Langevin equation,
so that the momentum change (Δp = p_{Z} − p_{X}_{)} is normally distributed about F - γv to yield a Boltzmann distribution. The correct choice of G^{*} is given by changing β to −β in the above equation. The equation for Brownian motion can be similarly derived by constraining Δx^{2} with σ^{−2}/2 and −ΔxF /2 with β. In both of these equations, the same set of forward transitions are used for G^{*}, but the sign of the Lagrange multipliers constraining the fluxes are reversed. We can thus intuitively see that reversing the sign of externally applied forces gives the correct fluctuation theorems using the information loss metric (25). This relation is valid in transient stochastic dynamics, and allows for entropy to increase both by increasing the entropy of the distribution (first part of (25)) and by the presence of irreversible fluxes (last term of (25)). Such an informational perspective is required for understanding entropy increase for processes that do not have time-reversal symmetry, but nevertheless have well-defined and reproducible behavior.
Retaining only information about the end-points of a path Γ = X_{1}X_{2} … X_{N}, from F_{1} to F_{N}, we denote Γ_{i} = X_{1} … X_{i} and Γ^{i} = X_{i} … X_{N}. We also assume constant and conditional independence, P(X_{i}_{+1}|GΓ_{i} F_{1}) = P(X_{i}_{+1}|GΓ_{i}). If the transitions are known from Γ, the total dissipation is
where k_{B} is the Boltzmann constant. The path sum on the right is in agreement with the thermodynamic entropy production given by the ratios of forward and reverse path probabilities [42–44] as well as an expression for entropy production deduced from mechanical considerations [38] when ln P(X_{i}_{+1}|GΓ_{i} )/P(X_{i}|GΓ^{i}^{+1} ) = −λg(x_{i}_{+1}, x_{i}_{)}, with g a generalized flux. The left side identifies a contribution from the instantaneous information entropy of the system. We have derived this result from the direction of information propagation [45], and no special treatment has been given to the multiplier, β, defining the externally applied temperature. This derivation also avoids the complications associated with defining a steady-state. A curious feature is that it does not make specific reference to heat. This may be explained by noting that the transitions associated to fluxes, g, are probabilistic and represent interaction with an external system. These transitions may add or remove energy from our system, while the external system remains at a fixed thermostatic temperature state, ${\beta}_{\text{ext}}^{-1}$. We then define the heat injected from the environment as the net energy gain, β_{ext}dQ = λg(x_{i}_{+1}; x_{i}). This identifies (26) with the Clausius form for the second law [15, 46, 47],
The above claims relating transition probabilities to fluxes can be established for the Langevin and Brownian equations, and have been more thoroughly explored in a manuscript devoted to nonequilibrium problems [18].
The next result will be a derivation of generalized Green-Kubo relations as non-equilibrium Gibbs-Maxwell relations. Because our free energy for the process A = F_{X}_{1} G_{12} G_{123} … is simply the free energy for F_{X}_{1} , we must find an alternate free energy functional. Notice that the partition function for the transition Γ_{i} Ω → G_{i} Γ_{i} Ω is Σ_{Y|Γi}e^{−λig(y;Γi)}/| Ω| (by summing the top and bottom of (23)), so that we can define
The first derivatives generate a ‘first law’ relating time-dependent fluxes to forces for nonequilibrium processes,
The second derivatives give a time-asymmetric Green-Kubo-like formula,
The derivation is subtle, and full details are given in Appendix B.
The thermal, λ_{i}, and mechanical, g_{i}, driving protocols should be understood as specifying the properties of the external system. Constant constraints correspond to connection to a constant external driving force, while the stochastic nature of the transitions implicitly defines an external heat bath. The process defined by (22) can also be history-dependent. By analogy to the equilibrium process, either the average relation, G, or the force, λ can be set. If these are, in turn, history-dependent, then a new set of possibilities for time-dependent driving based on the behavior of the system are possible. For instance, setting λ as a function of the current integrated over previous times, Σ_{j}_{<}_{i} g(Y_{j}), connects one port of our system to a capacitor, while constraining G as a function of the integrated force, Σ_{j <i} λj, connects the system to a type of inductor [48].
For the ion channel example we have been developing, a completely new set of constraints must be developed for transitions between states. For the forward problem, we are given X_{i} as well as some set of feasible transitions, Y |X_{i}, from state i. Because the probability of inactivated states are negligible, we consider only the open channel state, and single-jump transitions as shown in Fig. 2 of Ref. [29]. Five transitions from each state are possible, corresponding to doing nothing, or all sites moving up or down by the addition of a K^{+} or a water at the appropriate end.
In order to produce a system that conserves energy, we place a constraint on the energy change at each step,
This amounts to a stochastic addition of energy to the system with average value $\langle dE\mid {X}_{i}\rangle =-{\scriptstyle \frac{\partial Z[{X}_{i}{\beta}^{\prime}\mathrm{\Omega}]}{\partial {\beta}^{\prime}}}$. The steady-state distribution will differ from the canonical distribution in general because the normalization constant, Z[X_{i} β′Ω], depends on X_{i}. This difference has come about because of the addition of information limiting which transitions are possible. If all states were available during each transition, the normalization constant would again be independent of X_{i} and we would recover the canonical distribution. For the Langevin and Brownian equations with uniform applied temperature, the canonical distribution is also obtained because the normalization constant is independent of X_{i}.
Because transitions are not generally spontaneous, but may have an energy barrier, we add another constraint, β′E^{†}, directly on the number of transitions per time-step, τ,
These barriers could, of course, be made to depend arbitrarily on the transition, Y. For simplicity we assume that they are present only when a transition occurs and are uniformly equal to the sum of 2 ps kcal/mol. The stochastic process specified by these two formulas has the identity matrix as the small time-step limit, and an equilibrium-like distribution as the large step limit. The energy barrier assumption differs from the usual rate equation formulation, since the Chapman-Kolmogorov equation no longer holds. Instead, the behavior of the above system is dependent on the time-scale studied, reminiscent of fractal kinetic models [49]. Because this is a novel kinetic model, it remains to be seen how well these two constraints reproduce actual dynamics; however, the form of this equation matches well the nonlinearity near t = 0 in exact transition probabilities computed for the Müller-Brown potential surface (Fig. 4 of Ref. [50]), while variations in the surface chosen to divide states can be mimicked by changes in E^{†}. We can recover a Markov model by noting that E^{†} may be a function of the time-step, τ, to give a specified average number of transitions.
To finish our specification of nonequilibrium jump processes, we add forces on spontaneous ion creation and annihilation. Removing the possibility of a change in ion number unless it either enters or exits through an end of the channel, we can then specify the external force, μ, acting on these special events using the same type of energy constraint (and assuming for simplicity the same energy barrier) as above. This leads to
with dN_{int} and dN_{ext} representing the number of ions added to the system (±1) from the internal and external solutions, respectively. The form of this transition probability is similar to that of a recent paper on currents in boundary-driven Kawasaki dynamics [51], which were also analyzed using a cumulant-generating function similar to (28).
An outward-driving voltage can be added to the system by imposing an external field, increasing the likelihood for transitions moving ions outward by an amount e^{βΔVg}^{(}^{Y}^{)}. The function $g(Y)={\scriptstyle \frac{1}{5}}{\sum}_{j}I({X}_{j}\leftarrow {X}_{j+1})$ counts the average number of ions taking a step out-ward during transition Y, consistent with the sign convention of Fig. 2. For ion movements internal to the channel, this has an equivalent effect on the path distribution as applying an energy constraint e^{βΣjVjI (Xj}) (I (·) is the indicator function). These constraints provide a physically motivated kinetic model for our ion channel in arbitrary solution conditions and driving voltages.
The steady-state ion occupancies at zero applied voltage and μ identical to that for (e) and (f) of Fig. 3 are plotted in panel (g). The steady-state distribution is slightly altered from the local equilibrium prediction of (e). This happens despite the fact that the transition probability obeys detailed balance with respect to the steady-state, and exactly five transitions lead into each ion occupancy state. The reason is that the transition probability is normalized by a different value for the forward and reverse transitions.
As a final note, the current can be calculated as a perturbation from a steady-state using (29)
This gives the time-dependent linear response for small changes in the holding potential. The conductance near the resting potential is the time-integral of the steady-state current autocorrelation function (at zero average current), in accordance with Onsager’s phenomenological equation [52]. The negative sign comes about because of the positive sign of the constraint (βΔV). At other voltages, this integral is the slope of the current/voltage curve. The presence of an additive constant time-asymmetry of (29) explains why Onsager reciprocity only holds near equilibrium, where the fluxes are zero. Other Legendre transforms of (28) lead to relationships at fixed currents or forces, as in the usual theory [34].
The current-voltage characteristics calculated for a single channel using a 1 ps time-step are shown in Fig. 5. The fluctuation-dissipation theorem (33) gives the slope of the current-voltage curve, plotted as a tangent line at each data point. Low transition probabilities between conduction states with high free energy barriers leads to very long relaxation times (O(10^{5}) steps ~ 0.1 μs) in this system. The (usually large) contribution to (33) from the tail region was obtained by fitting steps 500–1000 to an exponential and integrating to infinity. Although this calculation demonstrates the numerical accuracy of (33), the choice of ion occupancy free energies and transition barriers does not produce goodagreement with experiment. The set of energy barriers used leads to larger current magnitudes at hyperpolarized voltages (inward-rectifying behavior), inconsistent with the net-outward rectification observed in Kv and large-conductance BK homologues. It is of interest to model the transition energy barriers more accurately and determine whether the time-dependence of dwell times for individual states is adequately represented by equations of the present, maximum entropy, form.
This work has provided a view of statistical mechanics as expressing relationships between states of knowledge. This viewpoint has interesting connections to modern information theory and its algebra. Thermostatic partition functions, Z[A ], have been identified as expressing relative likelihoods. Changes in these functions correspond to changes in information, and can be understood as a subjective probability assignment determining relative likelihoods between allowed alternative states of the system. This interpretation of the partition function leads naturally to multicanonical ensemble and umbrella sampling methods [32].
To answer objections to such a subjective theory, we note that experiments are able to compare work and heat values to find agreement with thermostatics, provided a given system behaves according to the assumptions. In exactly the same way, Euclid’s geometry is able to deduce physically measurable distances, provided these objects behave as ideal solids. Subjectivity is present in both of these cases because assumptions are always required in order to calculate one quantity from another. The term ‘subjective’ simply acknowledges that this reasoning process proceeds from assumptions derived from experience. Physical predictions of objectively real phenomena can be made from a subjective theory based on assumptions that are objectively correct. This distinction explains why the structure of statistical mechanics has persisted throughout the developments of the last century and shows the practical utility of founding statistical mechanics on a mathematical theory of information. Because its basic axioms are conventions chosen to be logically consistent and in agreement with our intuition, the maximum entropy approach operates as a device for carrying out extended logic.
Comparisons between states of knowledge can be done using the methods in this report. The picture presented here does not require the specification of a complete set of all possible states of knowledge. Instead, the relations of Sect. 2 give a basic, consistent set of equations for defining the changes between these states. The algebra already justifies the appearance of (in)distinguishability factors in the partition function, as shown in Sect. 3. We have provided a justification for the common indicator function, w_{Ω} (C) (13), for comparing purely entropic changes in phase space, as well as the Boltzmann factor, for comparing changes in maximum entropy information P(F | )/P(G| ) = Z[F ]/Z[G ] (see Appendix A).
Two new types of information were introduced, corresponding to addition of states to a system and conditional maximization of the entropy. These operations provide alternative ways of looking at multiscale and nonequilibrium problems in terms of the Bayesian probability theory of Jaynes [13]. The concept of building up thermodynamic equations of state by adding system information is important for developing multi-scale understanding of large physical systems. The predictions of the coarse-grained theory may be compared with a fully atomistic (or ab-initio electronic) molecular dynamics simulation or coarse-grained Monte Carlo sampling. Here the number of states will be greatly increased to include coordinates and momenta of all particles, with a change in the energy function to a more accurate approximation. The information entropy for adding coordinates, however, will remain zero whenever the distribution is unchanged by maximizing entropy because the entropy was de-fined only relative to a reference distribution. As this level of description becomes computationally intractable, the approximate potential of mean force derived from high-level considerations may be useful for locating important states for detailed study, deriving stochastic boundary conditions, and applying force or energy biasing sampling techniques. We have shown this line of reasoning for the KcsA ion channel by calculating a current/voltage curve with interesting properties at depolarized voltage due to the energy barrier in moving out-ward from S_{2}. Further work on conformational transitions associated with this movement should be particularly relevant to the physical mechanism limiting outward current and may have implications for Ba^{2+} ‘lock-in’ experiments [53, 54].
The addition of constrained maximum entropy information in Sect. 5.1 allows a treatment of nonequilibrium problems. Starting with a ‘trajectory space’ and adding information on allowed transitions as well as expectation values of fluxes between states leads to a state of knowledge about the process. In our formalism, the ability to directly write down the equilibrium distribution (a long-sought goal [16, 17, 55]) disappears in the same way a marginal distribution over coarse-grained variables cannot be directly produced from an equilibrium distribution over all atomistic coordinates and momenta. Instead, the transition distribution can be directly written, and the transient fluxes and eventual steady-state (if it exists) become path averages.
The Lagrange multipliers in the equilibrium theory are proxies for static forces on the constrained variables that are imposed by an external system. In the same way, the Lagrange multipliers biasing average energy exchange, number of transitions per time-step, ion currents, and particle insertion/deletion operations can be understood as dynamic properties of the external system. This implies that these dynamic forces may be determined by examining their action on a known reference system in the spirit of circuit theory, where resistors, capacitors, inductors, and memristors [48] form the prototypes for general time-dependent constraint relationships between forces, fluxes, and their integrated counterparts.
A consideration of the information loss for stochastic processes leads to a formula similar to the second law of thermodynamics (26), applicable arbitrarily far from equilibrium. An average of the one-step partition function in (28) gives a simple way to generate Green-Kubo type fluctuation-dissipation theorems. We emphasize that these formulas are not required to be extensive or local [56–58], avoid the necessity of defining a steady-state [59, 60], and are independent of how we define fluxes so that we do not have to immediately write down hydrodynamic equations [61]. This work has given a necessary statistical foundation for extending statistical thermostatics by carrying over modern equilibrium techniques such as the evaluation of free energy differences [62], and coordinate/path re-weighting techniques [63, 64]. These formulas achieve Jaynes’ goal of providing a “foundation for the predictive aspect of statistical mechanics, in which a single basic principle and method applies to all cases, equilibrium or otherwise” [45]. They imbue nonequilibrium and transient dynamic problems with the same structure as the equilibrium thermodynamics given by Gibbs [5], and open the door for a new understanding of processes far from equilibrium.
This work was supported, in part, by Sandia’s LDRD program, and, in part, by the National Institutes of Health through the NIH Road Map for Medical Research. TLB gratefully acknowledges the support of NSF grants CHE-0709560 and CHE-1011746. Sandia National Laboratories is a multi-program laboratory operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.
Our definition of F utilized the most likely observational distribution with respect to the probability distribution from an initial state, C, before F has been accepted. If we use information, F, as an assumption it should come from known experimental data on the system. In order to establish F, we may therefore tabulate frequencies for X . If F C, turned out to be true, scientists basing their conclusions only on C would be increasingly surprised (or skeptical if the report is second-hand) at the evidence collected after N trials. This is because the probability of these results given C would be (from the multinomial distribution),
According to C, the likelihood of such a set of observations decreases exponentially with the number of trials. This is a condensed version of the Wallace derivation for the entropy, presented in more detail in Ref. [13]. The limit taken in the second equation is as N → ∞, which is appropriate for assessing such a set of hypothetical observations or second-hand reports. Evidently, the Kullback-Liebler divergence, − ≥ 0, represents the value of the information F (or difference of opinion) to an observer who has already accepted C . The relative information entropy, , reaches its maximum, zero, when the new information does not alter the distribution. For any reasonable comparison to be made, the distributions must be compared over the same set, , which should include any observational information that A or B may predict. As in the case for likelihood ratios (5), the relative entropy is independent of the distribution over irrelevant variables, Y . This happens here because the probability assignments are identical over the subspace Q|Y for each Y.
According to this maximum entropy argument, the least surprising distribution given information F is the maximum entropy distribution. This distribution should satisfy the mathematical condition,
The unique solution to this condition is [13]
for some λ(C), proving that the hypothesis F (35) is logically equivalent to assuming the probability assignment of (36). The Jacobian ${\scriptstyle \frac{{dx}_{C}}{{dx}_{FC}}}$ has been explicitly shown in this equation because of the importance of continuous functions in thermodynamics. In a discrete setting, it has the effect of dividing P(X|C) to maintain its normalization.
According to Bayes’ theorem,
and since two constraints that have the same weight at a given point should be equally likely when X is given, w_{F} (X) = e^{−}^{λf} ^{(}^{x}^{)}. This should again equal w_{F} (CX) as long as F is conditionally independent of C, given X.
Already a few important differences from the standard development can be noticed in the above. First, the commutativity of thermodynamic cycles is perhaps not as widely appreciated as it should be. Although it is well known that Z[C] is a state function, because of its definition in (1), this shows that a sum of relative free energy differences around any closed loop of a thermodynamic cycle totals to zero with the caveat that (9) may only be applied from a larger phase space to a smaller. The same is not true of relative entropies (34), which give a sum dependent on the path taken. Instead, it is necessary to define [ C | ] as the state function. Also, the entropy definition of (34) is independent of changes in phase-space volume because P(X|AI) transforms the same way as P(X|BI) for an injective change of variables X → Y.
We assume that the path probability can be written in the form (22),
The derivatives of − ln Z[Γ_{i}, λ_{i}] are conditional averages,
For brevity, the explicit dependence on λ_{i} has been omitted in the above. All averages in this section are taken to be over the full path distribution of (38) except for the conditions explicitly stated.
The dependence on λ_{ik} of a general path average, f (Γ, λ), is given by
Since
and
we can re-state the above as
To compute (29) in the case where g_{i} are a function of only the (i → i + 1) transitions, X_{i}_{+1} (i.e. g_{i} (X_{i}_{+1}; Γ_{i}) = g_{i} (X_{i}_{+1}; X_{i})), we take advantage of starting in a steady-state,
by defining k j − i ≥ 0. We can write the correlation function as
The last step used the fact that g(X_{k} |X_{0}) is only a function of X_{0}. This simplifies the computation of the autocorrelation function for a Markov process, since only the vector g(X_{1}) and the matrix P(X_{k} X_{1}X_{0}|A) need to be stored. The latter can be updated by taking independent steps for each initial transition X_{1}, P(X_{k}_{+1}X_{1}X_{0}|A) = Σ_{k} P(X_{k}_{+1}|X_{k})P(X_{k} X_{1}X_{0}|A). When the probability loses the information on what transition occurred at X_{1}, P(X_{k} |X_{1}X_{0}A) → P(X_{k} |X_{0}A), the flux-autocorrelation becomes zero.
^{1}Alternatively, to avoid anthropomorphic terminology, if the system energy is not constrained and we compare the maximum entropy P(E|A).
David M. Rogers, Center for Biological and Materials Sciences, MS 0895, Sandia National Laboratories, Albuquerque, NM 87185, USA.
Thomas L. Beck, Department of Chemistry, University of Cincinnati, Cincinnati, OH 45221-0172, USA.
Susan B. Rempe, Center for Biological and Materials Sciences, MS 0895, Sandia National Laboratories, Albuquerque, NM 87185, USA.
PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |