PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
 
J Stat Phys. Author manuscript; available in PMC 2012 October 1.
Published in final edited form as:
J Stat Phys. 2011 October; 145(2): 385–409.
doi:  10.1007/s10955-011-0358-9
PMCID: PMC3436205
NIHMSID: NIHMS335425

An Information Theory Approach to Nonlinear, Nonequilibrium Thermodynamics

Abstract

Using the problem of ion channel thermodynamics as an example, we illustrate the idea of building up complex thermodynamic models by successively adding physical information. We present a new formulation of information algebra that generalizes methods of both information theory and statistical mechanics. From this foundation we derive a theory for ion channel kinetics, identifying a nonequilibrium ‘process’ free energy functional in addition to the well-known integrated work functionals. The Gibbs-Maxwell relation for the free energy functional is a Green-Kubo relation, applicable arbitrarily far from equilibrium, that captures the effect of non-local and time-dependent behavior from transient thermal and mechanical driving forces. Comparing the physical significance of the Lagrange multipliers to the canonical ensemble suggests definitions of nonequilibrium ensembles at constant capacitance or inductance in addition to constant resistance. Our result is that statistical mechanical descriptions derived from a few primitive algebraic operations on information can be used to create experimentally-relevant and computable models. By construction, these models may use information from more detailed atomistic simulations. Two surprising consequences to be explored in further work are that (in)distinguishability factors are automatically predicted from the problem formulation and that a direct analogue of the second law for thermodynamic entropy production is found by considering information loss in stochastic processes. The information loss identifies a novel contribution from the instantaneous information entropy that ensures non-negative loss.

Keywords: Predictive statistical mechanics, Maximum entropy, Likelihood, Probability, Information entropy

1 Introduction

Ion channels are transmembrane proteins that allow movement of solutes between two aqueous/membrane interfaces [1]. Selective channels and transporters are critical for maintaining living cells in their nonequilibrium state. Similar functionality is a required ingredient of synthetic semi-permeable partitions, used in fuel cells, solute separation, and electrochemical sensing. The operational characteristics of these devices are determined from their response to applied pressure, electric fields, and solute concentration differences. The most easily measured response is ion conduction, available through current measurements that can be carried out on micrometer-sized patches at milli-second resolution [2, 3]. Conduction of other species, such as water, as well as structural changes in the channel and surrounding interface regions are also important, but less accessible. The most easily accessible theoretical descriptions of channel behavior center around the structural properties of the equilibrium state and its propensity for ion occupancy under no external bias (in non-conducting conditions). In this article we present a top-down view by successively adding mechanistic information to predict these propensities. This allows a construction of simplified physical interpretations of channel behavior, but uses a statistical mechanics capable of deriving all the complexities of atomistic and quantum-mechanical systems. Because no net currents are present at equilibrium [4], the fluxes in these systems must be analyzed using a nonequilibrium theory.

When the conceptualization of an ensemble was extended by Gibbs [5] from physically realizable systems with many weakly interacting particles to non-interacting replicas of systems that may contain strong internal interactions, there seemed to be two entirely different ways in which the laws of thermostatics could be produced. This conception could hardly be considered as satisfactory, and it leaves unanswered the physical reason for the weak coupling between ensembles that is supposed to bring about equilibrium [6]. Although there continues to be debate over these conceptualizations [7], attempts to prove ergodicity and convergence to maximum entropy distributions using mechanical arguments show that the most robust route is to introduce some form of uncertainty [8, 9]. It has been noted that the maximum entropy formalism follows if one assumes the existence of infinite heat-baths [10]. Such results have given way to a gradual increase in acceptance of the information-theoretic derivation popularized by Jaynes [1113] and others [14]. These works have helped clarify the situation by making a distinction between the “delusion that an ensemble describes an ‘objectively real’ physical situation” [12] and the subjective question of determining the “agreement between the premises and the conclusions” [5].

It cannot be denied that these views call forth some objections. Perhaps the strongest criticism of this approach is associated with the use of the term, ‘subjective.’ This term seems to imply that the results of the theory cannot be considered as objectively existing in reality. Again, Jaynes presented detailed examples applicable only to the canonical ensemble already given by Gibbs. This has left the question of how the molecular degeneracy factor may be derived, as on this point Jaynes reverts to a functional argument requiring the entropy to be extensive with respect to the volume [15]. A similar criticism of his approach to nonequilibrium is that it is operationally similar to the projector-operator formalism [4, 14, 16, 17], and has departed from the original program of formulating universally applicable laws based on a minimal description of a thermodynamic system. Indeed, a description of nonequilibrium ensembles obtained simply by applying maximum entropy to path space, the maximum caliber approach, does not produce a causal description of mechanics [18].

It appears after these remarks that the usefulness of the information theoretic approach outside of the realm of the canonical ensemble may be called into question and that this investigation concerns important logical principles. We have therefore built up a purely statistical theory by which we have been able to show that both of the above objections may be answered.

This derivation is made possible by the assumption that a probability exists for every piece of information given a starting set of assumed information. Representing coordinates for specifying a microstate of a thermodynamic system as a logical hypothesis, we derive the machinery of statistical mechanics by building up a probability distribution for a set of possible coordinates as well as information on their relative probabilities. Every change in the thermodynamic state of the system corresponds to a change in an objective state of knowledge. A judicious use of Bayes’ theorem then allows us to build up an algebra for describing these changes. The partition functions of these states become fundamental objects for computation of averages given known information, equilibrium or otherwise.

In the next section, we define an information algebra for working with belief functions generated by successive addition of information. Section 3 applies this work to the grand-canonical distribution for ion occupancy within a channel. The informational origins of the (in)distinguishability factor in the problem symmetry group and of the thermostatic forces in experimental convention are emphasized. The next section considers the consequences of adding coordinates to the system. Rather than allowing the distribution over initial coordinates to change, Sect. 5 then asks what distribution we obtain by calibrating the newly added coordinates to the existing distribution. Applying the resulting theory to the ion channel problem then identifies a minimalistic description of a small-scale nonequilibrium system. Quantifying the information loss in this process leads to a novel form for the second law of thermodynamics, which combines both the instantaneous information entropy and the system energy flux. The nonequilibrium partition function gives a set of Green-Kubo relations valid for transient processes. Finally, we illustrate these developments with a numerical calculation for deviations from steady-state conductance.

2 Information Algebra

A belief function may be represented as an unnormalized assignment of probability,

equation M1
(1)

to a set of statements, C [19]. We say that such a belief function represents a ‘state of knowledge’ when P(Q|C) is known up to a constant of proportionality for any logical statement, Q. In order to carry out computations, we define the logical conjunction as a single basic operation. This operation defines an algebra by forming a new state of knowledge, AC, from a given state of knowledge, C, and a new hypothesis, A. As for A and C separately, the combination AC can be also interpreted as a logical statement about a set of events, so that we may compute P(Q|AC) for any Q up to a constant of proportionality. Because the combination rule should involve P(Q|C) and P(A|C), it defines the rules of probabilistic inference and should be considered carefully.

Jaynes [13] presents a cogent interpretation of probability theory as a method for conducting logical inference in the presence of uncertainty. This interpretation is based on Pólya’s qualitative conditions for plausible reasoning in mathematics [20] combined with the consistency theorems of Cox and Aczél [21, 22] deduced by consideration of the associativity equation. Requiring our system for assigning plausibilities to be associative, such that adding information in any order leads to the same probability assignment, it is possible to deduce the product rule,

equation M2
(2)

for which the right equality is Bayes’ theorem. In this paper, we denote propositions using Greek or capital letters, and the symbols on the right of the | represent given information, or assumptions. This distinction is necessary to allow for propositions that represent coordinates,

X: Some property of the system is described by the number x.

Propositions always appear inside the probability or Z[ ] symbols and follow the Boolean algebra, where multiplication denotes a logical ‘and,’ while addition represents a logical ‘or.’

In order to ensure this condition is always satisfied, a generic rule must be given for carrying out the conjunction,

equation M3
(3)

We can prune the summation set, {Q}, by only including statements directly relevant to deciding the plausibility of C or A. To see this, assume that these relevant statements are collected in the space An external file that holds a picture, illustration, etc.
Object name is nihms335425ig1.jpg. Then write {Q} = An external file that holds a picture, illustration, etc.
Object name is nihms335425ig1.jpg × An external file that holds a picture, illustration, etc.
Object name is nihms335425ig2.jpg, where Y [set membership] An external file that holds a picture, illustration, etc.
Object name is nihms335425ig2.jpg are irrelevant to A and C (when X is known) so that P(Y|XAC) = P(Y) and P(Q|AC) = P(XY |AC) = P(X|AC)P(Y). The sum in (3) factors into

equation M4

An immediate difficulty arises using (3) in adding the first piece of information. This is because P(A|C) may only be known up to a constant. We therefore make the convention of always assuming the principle of indifference (termed I) on the right-hand side of the probability symbol. Although it may be omitted in some formulas for clarity, it is always implicitly assumed to be present. This principle assigns a default distribution, P(A|I) = constant, but does not affect the conditional assignments, P(A|CI), when C says something about A.

In order to work with likelihood ratios instead of the explicitly normalized form of (3), we introduce a null hypothesis, Φ, that is undecidable from any other information. It has the formal properties,

equation M5
(4)

Now divide the set of statements, C′ (appearing above), into two sets, C′ = DC. From the two equivalent ways of composing P(|CI) using Bayes’ theorem, it is easy to see that the above is true if and only if Φ is irrelevant to conclusions about D,

equation M6

We thus recognize Φ as the identity element of information algebra.

By weighing alternatives against P(Φ|I), it is possible to re-phrase (3) into

equation M7
(5)

Now, if P(QC|I) is known up to normalization as Z[QC], then Z[C] = Σ{Q} Z[QC] and

equation M8
(6)

This re-casts (5) as an explicit formula for carrying out the logical conjunction,

equation M9

Because this derivation is symmetric in C and A, the conjunction is commutative and associative.

2.1 Inverse Elements

To find an inverse in this algebra, we compare the addition CAC with CBC. Instead of computing each of these separately, we directly find likelihood between AC and BC using

equation M10
(7)

This shows that the distribution over Q|AC may be had from Q|BC via re-weighting. However, if there is a Q for which P(Q|BC) is zero, but for which equation M11 is non-zero, then (7) cannot be evaluated. Therefore if B contains a restriction on the set of allowable Q, then this restricts mutual comparison among A, B. In other words, the inverse of B relative to A only exists when P(Q|BC) is nonzero on a smaller space Ω [subset, dbl equals] {Q} than {Q} on which P(Q|AC) is nonzero. An absolute inverse exists if this holds for all A, or equivalently, for A = Φ. This caveat is related to the computational problems involved in computing Z[AC] using (7) as a Monte Carlo method based on data, Q, sampled from P(Q|BC) [23, 24].

Because the conjunction formula (5) is simply (7) for the special case B = Φ, it is convenient to define

equation M12
(8)

so that likelihood ratios can be expressed more simply as

equation M13
(9)

As their name implies, these are weights,

equation M14
(10)

It must be understood that the re-weighting is only valid when wBA(Q) < ∞ for all Q with Z[QAC] > 0.

Propositions defined inside some set of allowable questions, Ω, can still be compared against one another, and their likelihoods computed from either the null hypothesis, Φ, or a new null hypothesis, Φ Ω, defined relating only to Q allowed by Ω. Addition of the information, B = , to a state can be represented using a commutation diagram (Fig. 1), where paths represent step-wise addition of constraints/hypotheses. Completely commuting classes share an underlying definition of coordinate space. Whenever information of the type Ω is added, it directly bears on subsequent propositions. Paths adding will therefore restrict the set of subsequent questions that may be asked without knowledge of P(Ω|C). These paths are therefore represented by a directed edge, branching from the above completely connected graph. The commutation diagram terminology is justified by noting that the multiplicative functions, (8), transforming one probability distribution into another arrive at the same distribution function for any ‘allowed’ path.

Fig. 1
Reaction diagram showing system states as nodes. Two constraints, An external file that holds a picture, illustration, etc.
Object name is nihms335425ig1.jpg, defining a coordinate space, and Ω, defining some further restriction are illustrated here. F and G are average value constraints, and their relative likelihoods can be calculated ...

The considerations up to this point show that it is possible to define a probability assignment for any states of knowledge about a set of possible underlying causes, {Q}, by specifying likelihood ratios for successive addition of this information to each system state, Q [set membership] {Q}. Update schemes taking a consistent valuation, Z[QC], to another, Z[QAC], have been derived that exploit factorability of Z[Q1Q2AC], where only some parts, Qi, of each complete state specification, Q, are relevant to each other and to hypotheses in A [19, 25, 26]. In reference [19] it is particularly clear that addition of evidence to a state of knowledge is carried out by successively moving new information along a causal path using an unnormalized form of (10).

2.2 Inference

Our definitions for the information algebra may be connected to the usual use of Bayesian probability theory in the following way. Given a set of possible parameters, θ [set membership] An external file that holds a picture, illustration, etc.
Object name is nihms335425ig3.jpg, we may use their symmetries [13] to arrive at a state of knowledge, An external file that holds a picture, illustration, etc.
Object name is nihms335425ig3.jpg, listing unique parameter values. The principle of indifference then assigns a uniform relative weight over An external file that holds a picture, illustration, etc.
Object name is nihms335425ig3.jpg,

equation M15

This means that the relative likelihood of the set An external file that holds a picture, illustration, etc.
Object name is nihms335425ig3.jpg is Σθ Z[θ An external file that holds a picture, illustration, etc.
Object name is nihms335425ig3.jpg] = | An external file that holds a picture, illustration, etc.
Object name is nihms335425ig3.jpg|, and the prior distribution, Z[θ An external file that holds a picture, illustration, etc.
Object name is nihms335425ig3.jpg]/Z[ An external file that holds a picture, illustration, etc.
Object name is nihms335425ig3.jpg], is uniform.

Next, some data, D, is collected and the state of knowledge updated to D An external file that holds a picture, illustration, etc.
Object name is nihms335425ig3.jpg by conjunction,

equation M16
(11)

Notice that θ implies An external file that holds a picture, illustration, etc.
Object name is nihms335425ig3.jpg, so that the likelihood ratio Z[]/Z[D An external file that holds a picture, illustration, etc.
Object name is nihms335425ig3.jpg] along the path D An external file that holds a picture, illustration, etc.
Object name is nihms335425ig3.jpg now gives the posterior distribution. Bayes’ theorem appears as the cycle identity between successive likelihood ratios for D An external file that holds a picture, illustration, etc.
Object name is nihms335425ig3.jpgAn external file that holds a picture, illustration, etc.
Object name is nihms335425ig3.jpgθ,

equation M17

3 Canonical Ensemble Model for a Channel Binding Site

The unnormalized probability Z[C] in (1) is formally a function of the state of knowledge C that can be arrived at independently from the order of information addition. We will show that up to normalization, this is identical to the thermostatic partition function, a function of the state of a thermostatic system. We assume in this section that the hypotheses are conditionally independent if the coordinates, X, are known so that wA(XC) = wA(X).

First we have to address the physical problem of defining wA(X) for two types of information:

  • Constraints limiting hypothesis space (or focusing operations) [26], Ω: The set of allowed states is limited to those in which Q is a member of the set, Ω; and
  • Maximum entropy constraints, F: The probability distribution of the system, given that F is accepted, is the most likely observational distribution that obeys left angle bracketf (x)|F Cright angle bracket = F for any C.

Once specified, these will determine how transformations between states of knowledge are carried out using the information algebra.

A general pattern forms for assigning wA(XC) by first finding a minimal set of relevant information XY, implied by XC so that A is conditionally independent of C when XY is known, simplifying the weight to wA(XC) = wA(XY). Comparing wA(XY) for different A then suggests an appropriate relative weight. This relative weight problem is similar to the problem of factoring Z[QAC] in belief networks [25].

3.1 Degeneracy Factors

To find wΩ (Q), we start from an implicit definition of Ω as re-normalizing:

equation M18
(12)

where the indicator function, I (·) is one when the condition is satisfied, and zero otherwise. Two constraints that both allow Q should be equally likely given the same starting information, C, leading to the assignment

equation M19
(13)

with partition function

equation M20
(14)

This is consistent with Z[ An external file that holds a picture, illustration, etc.
Object name is nihms335425ig3.jpg] of the last section as well as the free energy cost for inserting a hard core solute into solution [27] or imposing a constraint on the geometry of an ion binding site [28].

Now consider the multi-ion binding site in a K+ ion channel selectivity filter (Fig. 2) [29]. Four cationic binding sites are distinguished, and it is assumed that the channel presents a high enough energetic penalty to exclude the possibility of anion occupancy. We do not expect multiple ion occupancy of the same site to be possible (or highly probable) because of mutual electrostatic repulsion and geometric features of the channel. This leads us to the fermion-like default statistics,

Fig. 2
KcsA ion channel selectivity filter in its biological orientation (intracellular solution below) showing ion binding sites S1–S4. For visual clarity, two of the four identical monomer units are not shown. Physiological conventions for the potential ...

equation M21
(15)

where n particles may occupy k states in equation M22 ways for a total of 2k elementary states of the system. The notation [plus sign in circle][·] is used to mean that each of the referenced states are mutually exclusive.

In the absence of any other information, each state is equally likely, and

equation M23
(16)

This probability distribution factors into a product of independent distributions for each site, with equal probability for occupied and unoccupied states. The distribution is shown in Fig. 3a.

Fig. 3
Ion occupancy distribution in successively complex models. Number distributions are plotted normally (right scale), while probabilities for occupancy of individual sites are plotted vertically (top scale). The set of figures on the left do not include ...

The partition function is the number of states, Z[ An external file that holds a picture, illustration, etc.
Object name is nihms335425ig1.jpg] = 24 (14). Using the same equation, the partition function of a constrained system, for example at fixed N, is equation M24. The much debated ‘(in)distinguishability factor’ for particle counting [30, 31], as well as a volume factor, have already crept in as a consequence of the definition in (15) since in the limit K [dbl greater-than sign] N, equation M25. It is easy to see from the arguments leading to (15) why Z[N An external file that holds a picture, illustration, etc.
Object name is nihms335425ig1.jpg] should always be the size of the fundamental domain of symmetry or unique space over which any function can be defined (e.g., a crystallographic unit cell). This always gives division by the correct symmetry factor.

3.2 Energy Functions

Formulas (9) and (10) are well-known relations in statistical mechanics when Boltzmann factors are inserted for the weights

equation M26
(17)

In that case, they identify P(X|F An external file that holds a picture, illustration, etc.
Object name is nihms335425ig1.jpg) as the canonical distribution and generate free energy perturbation and umbrella sampling formulas [32]. To develop physical intuition, we show in Appendix A how wF (X) can be related to its definition at the beginning of this section using an intuitive derivation of the relative information entropy.

To go beyond the uniform distribution for ion occupancies, we may add a constraint on average energy, labeled by β. A simplified energy function is constructed for the ion channel system by including a mutual Coulomb repulsion between the ions, constrained to the vertical axis and spaced at 3.5 Å, close to the spacing observed in the 1K4C crystal structure [33]. We also assume a simple stabilization energy for each ion from the protein, E0 ≈ −111 kcal/mol, just strong enough to give multiple ion occupancy in Fig. 3c. Abbreviating N X to X, the energy function is

equation M27
(18)

Placing this constraint on the average system energy at constant N leads to the well-known canonical distribution with partition function

equation M28

Here, it can be seen that the probability for N An external file that holds a picture, illustration, etc.
Object name is nihms335425ig1.jpg, proportional to equation M29, cancels in the expression so that the increment Z[ An external file that holds a picture, illustration, etc.
Object name is nihms335425ig1.jpg]/Z[N An external file that holds a picture, illustration, etc.
Object name is nihms335425ig1.jpg] is an average according to (5). Removing the constraint on N also leads to the multicanonical ensemble in the same way, viz. Z[β An external file that holds a picture, illustration, etc.
Object name is nihms335425ig1.jpg] = ΣN Z[ An external file that holds a picture, illustration, etc.
Object name is nihms335425ig1.jpg] (14), P(N|β An external file that holds a picture, illustration, etc.
Object name is nihms335425ig1.jpg) = Z[ An external file that holds a picture, illustration, etc.
Object name is nihms335425ig1.jpg]/Z[β An external file that holds a picture, illustration, etc.
Object name is nihms335425ig1.jpg].

In either case, we can assign the parameter β the meaning of, “there exists a physical mechanism that decreases the likelihood of the system being in a high-energy state.” To separate these energy states, we introduce a constraint on the energy, denoted by E. Thus, if a system were allowed to choose its own energy state,1 the force would bias this choice according to P(|A)/P(|A) = eβE. We can set this bias, β, to give a reference system with known properties by exactly balancing its internal tendency toward higher energy, P(E + dE|A)/P(E|A)eβdE = 1. This implies that β should solve equation M30 for a reference system with known energy; for example, a thermometer in which energy is easily measured by size expansion. Because our reference thermometer is constantly exchanging energy with the environment, we usually observe its average energy, and β should be chosen such that equation M31. The difference between these values (maximum vs. average energy) is important for small systems, but becomes negligible in the limit of large system sizes [34]. Using either of these forces in the present system mimics the effect of allowing energy exchange between the thermometer at this state and the system. This explains the convention of identifying temperature with the dilation of a thermometer and its connection to the statical force, β.

By the device of a reference system, the physical nature of the Lagrange multiplier, β, has changed from an absolute constraint on the average energy of the system of interest into a force biasing its energy. The information F thus has a different quality than the information λ in (17) because the first implies that λ re-adjusts when further information is added.

Another constraint we may add is the inclusion of an external force on the total number of ions, μ. Because the n ions are more likely to choose an environment with lower energy, −μn, this changes the probability of ion occupancy by equation M32. The multiplier β appears because we want to express μ in energy units. Just as above, we can choose the chemical potential, μ, to give a reference system with known properties by balancing its internal energy change on ion addition using the choice equation M33 [35]. We can mimic the effect of allowing K+ transfer from a bulk 100 mM KCl solution to a reference volume of V0 = 4 Å3 in the present system (with the corresponding Cl moved to a similar environment and its contribution neglected) by choosing μK+ = −81 + β−1 ln(0.1V0) kcal/mol [36]. Without this constraint on N, the system is effectively allowed to exchange particles with vacuum. The combination of both constraints, which we refer to as F = βμ, is shown in panel (c) of Fig. 3. The preference for the separated state (X1X4) in this model shows the effect of mutual ion repulsion.

4 The Generalized Ensemble Model for Including Conformations

The theoretical background in Sect. 2 allows us to go further than the most common relations of thermostatics. In particular, the choice of coordinate space, An external file that holds a picture, illustration, etc.
Object name is nihms335425ig1.jpg, is no different than any other constraint except that it is almost never moved to the left-hand side to form quantities such as P(F An external file that holds a picture, illustration, etc.
Object name is nihms335425ig1.jpg|I) and comparisons between states are carried out almost exclusively with a fixed An external file that holds a picture, illustration, etc.
Object name is nihms335425ig1.jpg. The addition of coordinates is associated with the transition from canonical to multicanonical ensembles. It has served as the starting point for some very difficult reading in thermodynamics textbooks involving over/under counting and (in)distinguishability arguments.

Since the rules have already been given above, we proceed to an example: addition of protein-ion interactions by assuming a set of protein conformational states. A simplistic example is provided by assuming (in addition to an open state, O) two ‘C-type’ inactivated states in which a pinching motion of the pore prevents occupancy at site 2 (state I1) or sites 2 and 3 (state I2, see Fig. 2) [37]. These states are assumed to be mutually exclusive and exhaustive, so that all conformational states, Y, are a member of the space Ω = [plus sign in circle](O, I1, I2). Before any coupling is assumed, the total number of occupancy states, | An external file that holds a picture, illustration, etc.
Object name is nihms335425ig1.jpg|, is multiplied |Ω| times to create the product space, Ω × An external file that holds a picture, illustration, etc.
Object name is nihms335425ig1.jpg.

To add coupling, introduce a hypothesis, G, stating the unallowed joint conformations. Using (13) and (10),

equation M34
(19)

But since is just another piece of information on XY,

equation M35
(20)

Summing over X gives P(Y|FG An external file that holds a picture, illustration, etc.
Object name is nihms335425ig1.jpgΩ) = Z[Y F G An external file that holds a picture, illustration, etc.
Object name is nihms335425ig1.jpg]/Z[F G An external file that holds a picture, illustration, etc.
Object name is nihms335425ig1.jpgΩ]. The partition function again has the interpretation of an unnormalized probability. This idea forms the basis for understanding the ratio between Z[AF G An external file that holds a picture, illustration, etc.
Object name is nihms335425ig1.jpg] and Z[B F G An external file that holds a picture, illustration, etc.
Object name is nihms335425ig1.jpg] as a log-likelihood ratio between two Hamiltonians, and for extending a canonical ensemble into a multicanonical one. If the number of states changes for this process, then as we have shown for the grand-canonical ensemble, our definition of P(Ω|I) (14) counts each ‘state of knowledge’ once, and thus directly accounts for (in)distinguishability factors.

Incorporating the conformational state information, , into the ion channel system leads to the results shown in panels (b), no energetic constraint, and (d), constrained chemical potential and energy, of Fig. 3. Because fewer states are available to the system in conformations I1 and I2, they appear less often. Colloquially, they are said to be entropically un-favorable. In our derivation, this entropy decrease came about from adding information G to F An external file that holds a picture, illustration, etc.
Object name is nihms335425ig1.jpgΩ. Using the definition of the entropy given in (34) implies that the relative entropy addition F An external file that holds a picture, illustration, etc.
Object name is nihms335425ig1.jpgF An external file that holds a picture, illustration, etc.
Object name is nihms335425ig1.jpgΩ is zero. This should be expected for a measure of information since the ability to observe a new variable, Y, that is nevertheless completely random adds no real information. The statement, ‘I2 is entropically unfavorable’ is therefore expressing the fact that the accessible volume for X has decreased from some previously available volume upon changing Ω to I2 or upon adding information I2G.

The conventional thermodynamic entropy implicitly defines a previously available volume, regardless of whether such a state physically exists. Instead of this behavior, it seems preferable to define the entropy relative to the completely uniform distribution, as we have done here. In this case, the probability for occupying degenerate (but distinguishable) states increases because of the counting conventions of the partition function. This dependence is made explicit in the present definitions of likelihood ratios and relative entropies.

Comparing the default model to an assignment of free energies calculated in Ref. [29] shows a stronger preference for occupancy at S2, S4 than S1, S4 due to a large stabilization for occupancy at S2. The crystal structures of Ref. [37] show decreases in occupancy at this site due to a pore-domain conformational change, and it is interesting to speculate that this conformational change is involved in destabilizing S2 during ion permeation. In our analysis up to this point, an assumption for the channel conformation has had the same effect as assuming an energy function for the system. Labeling the conformations and allowing them to change gives the conformations the interpretation of an additional system coordinate. For the system to destabilize S2, the I1/I2 conformations would require an additional biasing energy from the environment. Another way to approach the problem is to use ion occupancies averaged over conformations along with information on their coupling to infer the conformational distribution. This method will be shown in the next section.

5 Including Time-Dependence Using Conditional Information

If we had assumed some experimentally known probability distribution over X instead of the energy function assumed for F in the last example, then adding information G becomes qualitatively different. To avoid interfering with the distribution over X, the information F must take priority over any other constraints we may add to the problem. However, this does not prevent us from coupling Y to X using the conventional maximum-relative entropy hypothesis,

G: The probability of XY, given that G is accepted, is the most likely observational distribution that obeys left angle bracketg(y; x)|AXGright angle bracket = G(X) for any AX.

The entropy functional (34) decomposes as

equation M36
(21)

The sums in this section are all taken to be over X [set membership] An external file that holds a picture, illustration, etc.
Object name is nihms335425ig1.jpg and Y [set membership] Ω without loss of generality since we choose An external file that holds a picture, illustration, etc.
Object name is nihms335425ig1.jpg × Ω to be the set of all XY relevant to deciding A or G. The last term in the expansion above is a conditional entropy, which is a functional of P(Y |AGXΩ) and depends on X. Because each conditional distribution can be chosen independently from the others and from P(X|AG An external file that holds a picture, illustration, etc.
Object name is nihms335425ig1.jpgΩ), the entropy of each one is independently maximized when An external file that holds a picture, illustration, etc.
Object name is nihms335425ig4.jpg[AG An external file that holds a picture, illustration, etc.
Object name is nihms335425ig1.jpgΩ|A An external file that holds a picture, illustration, etc.
Object name is nihms335425ig1.jpgΩ] is maximum. However, the presence of Y allows An external file that holds a picture, illustration, etc.
Object name is nihms335425ig5.jpg[AG An external file that holds a picture, illustration, etc.
Object name is nihms335425ig1.jpgΩ|A An external file that holds a picture, illustration, etc.
Object name is nihms335425ig1.jpg] to differ from An external file that holds a picture, illustration, etc.
Object name is nihms335425ig5.jpg[A An external file that holds a picture, illustration, etc.
Object name is nihms335425ig1.jpg|A An external file that holds a picture, illustration, etc.
Object name is nihms335425ig1.jpg] = 0, since P(X|AG An external file that holds a picture, illustration, etc.
Object name is nihms335425ig1.jpgΩ) = ΣY P(XY |AG An external file that holds a picture, illustration, etc.
Object name is nihms335425ig1.jpgΩ). For these two to be equal in general requires that P(X|AG An external file that holds a picture, illustration, etc.
Object name is nihms335425ig1.jpgΩ) = P(X|A An external file that holds a picture, illustration, etc.
Object name is nihms335425ig1.jpg), that is, that the distribution of X not be dependent on the information when A is present.

Because we want to specify the marginal distribution of X directly, it is convenient to denote this information as the compound hypothesis,

FX: The probability distribution of X is determined by information FX and unchanged by information .

When this hypothesis is in place, we will have P(X|FX G An external file that holds a picture, illustration, etc.
Object name is nihms335425ig1.jpgΩ) = P(X|FX An external file that holds a picture, illustration, etc.
Object name is nihms335425ig1.jpg). Bayes’ theorem says that we must also have P(|XFX An external file that holds a picture, illustration, etc.
Object name is nihms335425ig1.jpg) = P(|FX An external file that holds a picture, illustration, etc.
Object name is nihms335425ig1.jpg), implying w (FX X) = 1. Effectively, the Y have become ‘imaginary states’ to the system in the sense that there is no free energy change for FX An external file that holds a picture, illustration, etc.
Object name is nihms335425ig1.jpgFX G An external file that holds a picture, illustration, etc.
Object name is nihms335425ig1.jpgΩ.

Although there is no change to An external file that holds a picture, illustration, etc.
Object name is nihms335425ig5.jpg or the distribution of X, maximizing (21) results in

equation M37
(22)

an expression reminiscent of the transition probability for a Markov process. The conditional entropy is

equation M38

and we define as usual

equation M39
(23)

These considerations are sufficient to fill out the thermodynamic cycle when FX is assumed, as has been done in the left half of Fig. 4.

Fig. 4
Reaction diagram for adding conditional maximum entropy information. Partition functions, determined by likelihood ratios for each transition, are written out for each state. For the ‘forward’ process FX An external file that holds a picture, illustration, etc.
Object name is nihms335425ig1.jpgFX G An external file that holds a picture, illustration, etc.
Object name is nihms335425ig1.jpgΩ, there ...

Imposing the distribution among ion occupancy states given in Ref. [29] (shown for reference in Fig. 3f) as FX, application of this procedure to determine the conformational equilibrium shows that the channel is almost always in the open state due to the high probability for occupancy of S2. The probabilities for I1 and I2 are 2.3 · 10−4 and 8 · 10−6. Although X|FX is independent from , knowledge of Y is still informative for X, as

equation M40
(24)

Using this method of inference, the occupancy distribution in the open state is shown in Fig. 3e. There is a very slight increase in occupancy at S2 and a decrease at S3, but the effect is small because the open structure is dominant. Note that our assumption that the free energies of Ref. [29] are averages over the conformational states is at odds with the crystal structures of Ref. [37], indicating that motions around the S2 site not seen in short simulations may play a role in destabilizing this site, enabling ion translocation through S1 and S3.

5.1 Transient, Nonequilibrium Transition Processes

We argue that addition of conditional maximum entropy information is central to nonequilibrium statistical mechanics. To derive an ensemble of trajectories, we add all possible transitions, Y, originating from each state, X. The initial state and its transitions are linked by some information, G, which determines the distribution of Y given X. This constraint determines a maximum entropy transition probability density, as considered in differential form in Refs. [38, 39] and suggested in Ref. [40]. The hypothesis FX states that what we know about the starting distribution is completely determined by FX and not by any possible, but unknown, future events. It is required for the process to be non-anticipating in the sense that no information about processes we may carry out in the future, , is available from X.

By focusing on the information loss during a stochastic transition, we derive fluctuation formulas for irreversible entropy production that include a contribution from the instantaneous information entropy. Figure 4 displays the duality between fixing FX at the initial time and fixing its propagated distribution FZ. In setting up an inference problem for Y starting from FX GX, the distribution of Y is given by (22). If this distribution is used to determine FZ using P(Z|FX G An external file that holds a picture, illustration, etc.
Object name is nihms335425ig1.jpg) = ΣXY P(Y |FX GXΩ)P(X|FX An external file that holds a picture, illustration, etc.
Object name is nihms335425ig1.jpg)I (Z = Z(Y)), some information loss occurs when FX is discarded and only FZ and information constraining the transitions between states, G, retained. Assuming the transitions, Y, specify both end-points X, Z, the distribution of Y carries the complete information for this process. Using the information loss metric [13, 41],

equation M41
(25)

The averaging is taken in the forward direction, and so L ≥ 0 evidently represents the amount by which the real distribution FX GXY FX G contains information not present in a distribution guessed from FZ G*. Note that if G allows only one-to-one XZ, the transitions are deterministic, and zero information is lost. More generally, if forward and backward inference directions yield the same joint distribution so that FX G = FZ G*, then there is no way to discern the direction of time’s arrow and no information is dissipated.

The above relations are purely statistical, and have been stated in terms of maximum entropy constraints for forward, G, and reverse, G*, inference problems. They are generally valid for any choice of G*. In derivations of the fluctuation theorem [42], a particular choice of G* is made corresponding to time-reversed equations of motion. The statistical perspective expressed here shows that this operation is confined to the choice for G*, and provides a suggestion as to the informational role of time-reversal. For example, the forward constraints are consistent with the Langevin equation,

equation M42

so that the momentum change (Δp = pZpX) is normally distributed about F - γv to yield a Boltzmann distribution. The correct choice of G* is given by changing β to −β in the above equation. The equation for Brownian motion can be similarly derived by constraining Δx2 with σ−2/2 and −ΔxF /2 with β. In both of these equations, the same set of forward transitions are used for G*, but the sign of the Lagrange multipliers constraining the fluxes are reversed. We can thus intuitively see that reversing the sign of externally applied forces gives the correct fluctuation theorems using the information loss metric (25). This relation is valid in transient stochastic dynamics, and allows for entropy to increase both by increasing the entropy of the distribution (first part of (25)) and by the presence of irreversible fluxes (last term of (25)). Such an informational perspective is required for understanding entropy increase for processes that do not have time-reversal symmetry, but nevertheless have well-defined and reproducible behavior.

Retaining only information about the end-points of a path Γ = X1X2XN, from F1 to FN, we denote Γi = X1Xi and Γi = XiXN. We also assume constant An external file that holds a picture, illustration, etc.
Object name is nihms335425ig1.jpg and conditional independence, P(Xi+1|i F1) = P(Xi+1|i). If the transitions are known from Γ, the total dissipation is

equation M43
(26)

where kB is the Boltzmann constant. The path sum on the right is in agreement with the thermodynamic entropy production given by the ratios of forward and reverse path probabilities [4244] as well as an expression for entropy production deduced from mechanical considerations [38] when ln P(Xi+1|i An external file that holds a picture, illustration, etc.
Object name is nihms335425ig1.jpg)/P(Xi|i+1 An external file that holds a picture, illustration, etc.
Object name is nihms335425ig1.jpg) = −λg(xi+1, xi), with g a generalized flux. The left side identifies a contribution from the instantaneous information entropy of the system. We have derived this result from the direction of information propagation [45], and no special treatment has been given to the multiplier, β, defining the externally applied temperature. This derivation also avoids the complications associated with defining a steady-state. A curious feature is that it does not make specific reference to heat. This may be explained by noting that the transitions associated to fluxes, g, are probabilistic and represent interaction with an external system. These transitions may add or remove energy from our system, while the external system remains at a fixed thermostatic temperature state, equation M44. We then define the heat injected from the environment as the net energy gain, βextdQ = left angle bracketλg(xi+1; xi)right angle bracket. This identifies (26) with the Clausius form for the second law [15, 46, 47],

equation M45
(27)

The above claims relating transition probabilities to fluxes can be established for the Langevin and Brownian equations, and have been more thoroughly explored in a manuscript devoted to nonequilibrium problems [18].

The next result will be a derivation of generalized Green-Kubo relations as non-equilibrium Gibbs-Maxwell relations. Because our free energy for the process A = FX1 An external file that holds a picture, illustration, etc.
Object name is nihms335425ig6.jpgG12 An external file that holds a picture, illustration, etc.
Object name is nihms335425ig7.jpgG123 … is simply the free energy for FX1 An external file that holds a picture, illustration, etc.
Object name is nihms335425ig6.jpg, we must find an alternate free energy functional. Notice that the partition function for the transition Γi ΩGi Γi Ω is ΣY|Γieλig(y;Γi)/| Ω| (by summing the top and bottom of (23)), so that we can define

equation M46
(28)

The first derivatives generate a ‘first law’ relating time-dependent fluxes to forces for nonequilibrium processes,

equation M47

The second derivatives give a time-asymmetric Green-Kubo-like formula,

equation M48
(29)

The derivation is subtle, and full details are given in Appendix B.

The thermal, λi, and mechanical, gi, driving protocols should be understood as specifying the properties of the external system. Constant constraints correspond to connection to a constant external driving force, while the stochastic nature of the transitions implicitly defines an external heat bath. The process defined by (22) can also be history-dependent. By analogy to the equilibrium process, either the average relation, G, or the force, λ can be set. If these are, in turn, history-dependent, then a new set of possibilities for time-dependent driving based on the behavior of the system are possible. For instance, setting λ as a function of the current integrated over previous times, Σj<i g(Yj), connects one port of our system to a capacitor, while constraining G as a function of the integrated force, Σj <i λj, connects the system to a type of inductor [48].

For the ion channel example we have been developing, a completely new set of constraints must be developed for transitions between states. For the forward problem, we are given Xi as well as some set of feasible transitions, Y |Xi, from state i. Because the probability of inactivated states are negligible, we consider only the open channel state, and single-jump transitions as shown in Fig. 2 of Ref. [29]. Five transitions from each state are possible, corresponding to doing nothing, or all sites moving up or down by the addition of a K+ or a water at the appropriate end.

In order to produce a system that conserves energy, we place a constraint on the energy change at each step,

equation M49
(30)

This amounts to a stochastic addition of energy to the system with average value equation M50. The steady-state distribution will differ from the canonical distribution in general because the normalization constant, Z[Xi β′Ω], depends on Xi. This difference has come about because of the addition of information limiting which transitions are possible. If all states were available during each transition, the normalization constant would again be independent of Xi and we would recover the canonical distribution. For the Langevin and Brownian equations with uniform applied temperature, the canonical distribution is also obtained because the normalization constant is independent of Xi.

Because transitions are not generally spontaneous, but may have an energy barrier, we add another constraint, β′E, directly on the number of transitions per time-step, τ,

equation M51
(31)

These barriers could, of course, be made to depend arbitrarily on the transition, Y. For simplicity we assume that they are present only when a transition occurs and are uniformly equal to the sum of 2 ps kcal/mol. The stochastic process specified by these two formulas has the identity matrix as the small time-step limit, and an equilibrium-like distribution as the large step limit. The energy barrier assumption differs from the usual rate equation formulation, since the Chapman-Kolmogorov equation no longer holds. Instead, the behavior of the above system is dependent on the time-scale studied, reminiscent of fractal kinetic models [49]. Because this is a novel kinetic model, it remains to be seen how well these two constraints reproduce actual dynamics; however, the form of this equation matches well the nonlinearity near t = 0 in exact transition probabilities computed for the Müller-Brown potential surface (Fig. 4 of Ref. [50]), while variations in the surface chosen to divide states can be mimicked by changes in E. We can recover a Markov model by noting that E may be a function of the time-step, τ, to give a specified average number of transitions.

To finish our specification of nonequilibrium jump processes, we add forces on spontaneous ion creation and annihilation. Removing the possibility of a change in ion number unless it either enters or exits through an end of the channel, we can then specify the external force, μ, acting on these special events using the same type of energy constraint (and assuming for simplicity the same energy barrier) as above. This leads to

equation M52
(32)

with dNint and dNext representing the number of ions added to the system (±1) from the internal and external solutions, respectively. The form of this transition probability is similar to that of a recent paper on currents in boundary-driven Kawasaki dynamics [51], which were also analyzed using a cumulant-generating function similar to (28).

An outward-driving voltage can be added to the system by imposing an external field, increasing the likelihood for transitions moving ions outward by an amount eβΔVg(Y). The function equation M53 counts the average number of ions taking a step out-ward during transition Y, consistent with the sign convention of Fig. 2. For ion movements internal to the channel, this has an equivalent effect on the path distribution as applying an energy constraint eβΣjVjI (Xj) (I (·) is the indicator function). These constraints provide a physically motivated kinetic model for our ion channel in arbitrary solution conditions and driving voltages.

The steady-state ion occupancies at zero applied voltage and μ identical to that for (e) and (f) of Fig. 3 are plotted in panel (g). The steady-state distribution is slightly altered from the local equilibrium prediction of (e). This happens despite the fact that the transition probability obeys detailed balance with respect to the steady-state, and exactly five transitions lead into each ion occupancy state. The reason is that the transition probability is normalized by a different value for the forward and reverse transitions.

As a final note, the current can be calculated as a perturbation from a steady-state using (29)

equation M54
(33)

This gives the time-dependent linear response for small changes in the holding potential. The conductance near the resting potential is the time-integral of the steady-state current autocorrelation function (at zero average current), in accordance with Onsager’s phenomenological equation [52]. The negative sign comes about because of the positive sign of the constraint (βΔV). At other voltages, this integral is the slope of the current/voltage curve. The presence of an additive constant time-asymmetry of (29) explains why Onsager reciprocity only holds near equilibrium, where the fluxes are zero. Other Legendre transforms of (28) lead to relationships at fixed currents or forces, as in the usual theory [34].

The current-voltage characteristics calculated for a single channel using a 1 ps time-step are shown in Fig. 5. The fluctuation-dissipation theorem (33) gives the slope of the current-voltage curve, plotted as a tangent line at each data point. Low transition probabilities between conduction states with high free energy barriers leads to very long relaxation times (O(105) steps ~ 0.1 μs) in this system. The (usually large) contribution to (33) from the tail region was obtained by fitting steps 500–1000 to an exponential and integrating to infinity. Although this calculation demonstrates the numerical accuracy of (33), the choice of ion occupancy free energies and transition barriers does not produce goodagreement with experiment. The set of energy barriers used leads to larger current magnitudes at hyperpolarized voltages (inward-rectifying behavior), inconsistent with the net-outward rectification observed in Kv and large-conductance BK homologues. It is of interest to model the transition energy barriers more accurately and determine whether the time-dependence of dwell times for individual states is adequately represented by equations of the present, maximum entropy, form.

Fig. 5
Current-voltage plot calculated using the free energies from Fig. 2 of Ref. [29] along with the assumptions listed in the text. The voltage plotted in this figure is the sum of the five voltage steps between S0–S5. Traces are labeled using internal/external ...

6 Conclusions

This work has provided a view of statistical mechanics as expressing relationships between states of knowledge. This viewpoint has interesting connections to modern information theory and its algebra. Thermostatic partition functions, Z[A An external file that holds a picture, illustration, etc.
Object name is nihms335425ig1.jpg], have been identified as expressing relative likelihoods. Changes in these functions correspond to changes in information, and can be understood as a subjective probability assignment determining relative likelihoods between allowed alternative states of the system. This interpretation of the partition function leads naturally to multicanonical ensemble and umbrella sampling methods [32].

To answer objections to such a subjective theory, we note that experiments are able to compare work and heat values to find agreement with thermostatics, provided a given system behaves according to the assumptions. In exactly the same way, Euclid’s geometry is able to deduce physically measurable distances, provided these objects behave as ideal solids. Subjectivity is present in both of these cases because assumptions are always required in order to calculate one quantity from another. The term ‘subjective’ simply acknowledges that this reasoning process proceeds from assumptions derived from experience. Physical predictions of objectively real phenomena can be made from a subjective theory based on assumptions that are objectively correct. This distinction explains why the structure of statistical mechanics has persisted throughout the developments of the last century and shows the practical utility of founding statistical mechanics on a mathematical theory of information. Because its basic axioms are conventions chosen to be logically consistent and in agreement with our intuition, the maximum entropy approach operates as a device for carrying out extended logic.

Comparisons between states of knowledge can be done using the methods in this report. The picture presented here does not require the specification of a complete set of all possible states of knowledge. Instead, the relations of Sect. 2 give a basic, consistent set of equations for defining the changes between these states. The algebra already justifies the appearance of (in)distinguishability factors in the partition function, as shown in Sect. 3. We have provided a justification for the common indicator function, wΩ (C) (13), for comparing purely entropic changes in phase space, as well as the Boltzmann factor, for comparing changes in maximum entropy information P(F | An external file that holds a picture, illustration, etc.
Object name is nihms335425ig1.jpg)/P(G| An external file that holds a picture, illustration, etc.
Object name is nihms335425ig1.jpg) = Z[F An external file that holds a picture, illustration, etc.
Object name is nihms335425ig1.jpg]/Z[G An external file that holds a picture, illustration, etc.
Object name is nihms335425ig1.jpg] (see Appendix A).

Two new types of information were introduced, corresponding to addition of states to a system and conditional maximization of the entropy. These operations provide alternative ways of looking at multiscale and nonequilibrium problems in terms of the Bayesian probability theory of Jaynes [13]. The concept of building up thermodynamic equations of state by adding system information is important for developing multi-scale understanding of large physical systems. The predictions of the coarse-grained theory may be compared with a fully atomistic (or ab-initio electronic) molecular dynamics simulation or coarse-grained Monte Carlo sampling. Here the number of states will be greatly increased to include coordinates and momenta of all particles, with a change in the energy function to a more accurate approximation. The information entropy for adding coordinates, however, will remain zero whenever the distribution is unchanged by maximizing entropy because the entropy was de-fined only relative to a reference distribution. As this level of description becomes computationally intractable, the approximate potential of mean force derived from high-level considerations may be useful for locating important states for detailed study, deriving stochastic boundary conditions, and applying force or energy biasing sampling techniques. We have shown this line of reasoning for the KcsA ion channel by calculating a current/voltage curve with interesting properties at depolarized voltage due to the energy barrier in moving out-ward from S2. Further work on conformational transitions associated with this movement should be particularly relevant to the physical mechanism limiting outward current and may have implications for Ba2+ ‘lock-in’ experiments [53, 54].

The addition of constrained maximum entropy information in Sect. 5.1 allows a treatment of nonequilibrium problems. Starting with a ‘trajectory space’ and adding information on allowed transitions as well as expectation values of fluxes between states leads to a state of knowledge about the process. In our formalism, the ability to directly write down the equilibrium distribution (a long-sought goal [16, 17, 55]) disappears in the same way a marginal distribution over coarse-grained variables cannot be directly produced from an equilibrium distribution over all atomistic coordinates and momenta. Instead, the transition distribution can be directly written, and the transient fluxes and eventual steady-state (if it exists) become path averages.

The Lagrange multipliers in the equilibrium theory are proxies for static forces on the constrained variables that are imposed by an external system. In the same way, the Lagrange multipliers biasing average energy exchange, number of transitions per time-step, ion currents, and particle insertion/deletion operations can be understood as dynamic properties of the external system. This implies that these dynamic forces may be determined by examining their action on a known reference system in the spirit of circuit theory, where resistors, capacitors, inductors, and memristors [48] form the prototypes for general time-dependent constraint relationships between forces, fluxes, and their integrated counterparts.

A consideration of the information loss for stochastic processes leads to a formula similar to the second law of thermodynamics (26), applicable arbitrarily far from equilibrium. An average of the one-step partition function in (28) gives a simple way to generate Green-Kubo type fluctuation-dissipation theorems. We emphasize that these formulas are not required to be extensive or local [5658], avoid the necessity of defining a steady-state [59, 60], and are independent of how we define fluxes so that we do not have to immediately write down hydrodynamic equations [61]. This work has given a necessary statistical foundation for extending statistical thermostatics by carrying over modern equilibrium techniques such as the evaluation of free energy differences [62], and coordinate/path re-weighting techniques [63, 64]. These formulas achieve Jaynes’ goal of providing a “foundation for the predictive aspect of statistical mechanics, in which a single basic principle and method applies to all cases, equilibrium or otherwise” [45]. They imbue nonequilibrium and transient dynamic problems with the same structure as the equilibrium thermodynamics given by Gibbs [5], and open the door for a new understanding of processes far from equilibrium.

Acknowledgments

This work was supported, in part, by Sandia’s LDRD program, and, in part, by the National Institutes of Health through the NIH Road Map for Medical Research. TLB gratefully acknowledges the support of NSF grants CHE-0709560 and CHE-1011746. Sandia National Laboratories is a multi-program laboratory operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC04-94AL85000.

Appendix A: Derivation of Maximum Entropy Weights

Our definition of F utilized the most likely observational distribution with respect to the probability distribution from an initial state, C, before F has been accepted. If we use information, F, as an assumption it should come from known experimental data on the system. In order to establish F, we may therefore tabulate frequencies for X [set membership] An external file that holds a picture, illustration, etc.
Object name is nihms335425ig1.jpg. If F C, turned out to be true, scientists basing their conclusions only on C would be increasingly surprised (or skeptical if the report is second-hand) at the evidence collected after N trials. This is because the probability of these results given C An external file that holds a picture, illustration, etc.
Object name is nihms335425ig1.jpg would be (from the multinomial distribution),

equation M55
(34)

According to C, the likelihood of such a set of observations decreases exponentially with the number of trials. This is a condensed version of the Wallace derivation for the entropy, presented in more detail in Ref. [13]. The limit taken in the second equation is as N → ∞, which is appropriate for assessing such a set of hypothetical observations or second-hand reports. Evidently, the Kullback-Liebler divergence, − An external file that holds a picture, illustration, etc.
Object name is nihms335425ig4.jpg ≥ 0, represents the value of the information F (or difference of opinion) to an observer who has already accepted C An external file that holds a picture, illustration, etc.
Object name is nihms335425ig1.jpg. The relative information entropy, An external file that holds a picture, illustration, etc.
Object name is nihms335425ig4.jpg, reaches its maximum, zero, when the new information does not alter the distribution. For any reasonable comparison to be made, the distributions must be compared over the same set, An external file that holds a picture, illustration, etc.
Object name is nihms335425ig1.jpg, which should include any observational information that A or B may predict. As in the case for likelihood ratios (5), the relative entropy is independent of the distribution over irrelevant variables, Y [set membership] An external file that holds a picture, illustration, etc.
Object name is nihms335425ig2.jpg. This happens here because the probability assignments are identical over the subspace Q|Y for each Y.

According to this maximum entropy argument, the least surprising distribution given information F is the maximum entropy distribution. This distribution should satisfy the mathematical condition,

equation M56
(35)

The unique solution to this condition is [13]

equation M57
(36)

for some λ(C), proving that the hypothesis F (35) is logically equivalent to assuming the probability assignment of (36). The Jacobian equation M58 has been explicitly shown in this equation because of the importance of continuous functions in thermodynamics. In a discrete setting, it has the effect of dividing P(X|C) to maintain its normalization.

According to Bayes’ theorem,

equation M59
(37)

and since two constraints that have the same weight at a given point should be equally likely when X is given, wF (X) = eλf (x). This should again equal wF (CX) as long as F is conditionally independent of C, given X.

Already a few important differences from the standard development can be noticed in the above. First, the commutativity of thermodynamic cycles is perhaps not as widely appreciated as it should be. Although it is well known that Z[C] is a state function, because of its definition in (1), this shows that a sum of relative free energy differences around any closed loop of a thermodynamic cycle totals to zero with the caveat that (9) may only be applied from a larger phase space to a smaller. The same is not true of relative entropies (34), which give a sum dependent on the path taken. Instead, it is necessary to define An external file that holds a picture, illustration, etc.
Object name is nihms335425ig4.jpg[ An external file that holds a picture, illustration, etc.
Object name is nihms335425ig8.jpgC An external file that holds a picture, illustration, etc.
Object name is nihms335425ig1.jpg | An external file that holds a picture, illustration, etc.
Object name is nihms335425ig1.jpg] as the state function. Also, the entropy definition of (34) is independent of changes in phase-space volume because P(X|AI) transforms the same way as P(X|BI) for an injective change of variables XY.

Appendix B: Differentiation of Path Averages

We assume that the path probability can be written in the form (22),

equation M60
(38)

The derivatives of − ln Z[Γi, λi] are conditional averages,

equation M61

For brevity, the explicit dependence on λi has been omitted in the above. All averages in this section are taken to be over the full path distribution of (38) except for the conditions explicitly stated.

The dependence on λik of a general path average, left angle bracketf (Γ, λ)right angle bracket, is given by

equation M62

Since

equation M63

and

equation M64

we can re-state the above as

equation M65
(39)

To compute (29) in the case where gi are a function of only the (ii + 1) transitions, Xi+1 (i.e. gi (Xi+1; Γi) = gi (Xi+1; Xi)), we take advantage of starting in a steady-state,

equation M66

by defining k [equivalent] ji ≥ 0. We can write the correlation function as

equation M67

The last step used the fact that left angle bracketg(Xk |X0)right angle bracket is only a function of X0. This simplifies the computation of the autocorrelation function for a Markov process, since only the vector g(X1) and the matrix P(Xk X1X0|A) need to be stored. The latter can be updated by taking independent steps for each initial transition X1, P(Xk+1X1X0|A) = Σk P(Xk+1|Xk)P(Xk X1X0|A). When the probability loses the information on what transition occurred at X1, P(Xk |X1X0A) → P(Xk |X0A), the flux-autocorrelation becomes zero.

Footnotes

1Alternatively, to avoid anthropomorphic terminology, if the system energy is not constrained and we compare the maximum entropy P(E|A).

Contributor Information

David M. Rogers, Center for Biological and Materials Sciences, MS 0895, Sandia National Laboratories, Albuquerque, NM 87185, USA.

Thomas L. Beck, Department of Chemistry, University of Cincinnati, Cincinnati, OH 45221-0172, USA.

Susan B. Rempe, Center for Biological and Materials Sciences, MS 0895, Sandia National Laboratories, Albuquerque, NM 87185, USA.

References

1. Hille B. Ion Channels of Excitable Membranes. 3. Sinauer; Sunderland: 2001.
2. Hamill OP, Marty A, Neher E, Sakmann B, Sigworth FJ. Improved patch-clamp techniques for high-resolution current recording from cells and cell-free membrane patches. Pflügers Arch Eur J Physiol. 1981;391(2):85–100. [PubMed]
3. Wonderlin WF, Finkel A, French RJ. Optimizing planar lipid bilayer single-channel recordings for high resolution with rapid voltage steps. Biophys J. 1990;58(2):289–297. [PubMed]
4. Jaynes ET. Predictive statistical mechanics. In: Moore GT, Scully MO, editors. Frontiers of Nonequilibrium Statistical Physics. Plenum Press; New York: 1986. p. 33.
5. Gibbs JW. Elementary Principles in Statistical Mechanics. Scribner’s; New York: 1902.
6. Ehrenfest P, Ehrenfest T. The Conceptual Foundations of the Statistical Approach in Mechanics. Cornell University Press; Ithaca: 1959. English translation of Encykl. Math. Wiss. 1912, by M.J. Moravcsik.
7. van Kampen NG. Stochastic Processes in Physics and Chemistry. 3. Elsevier; Amsterdam: 2007.
8. Zwanzig R. Ensemble method in the theory of irreversibility. J Chem Phys. 1960;33(5):1338–1341.
9. Mackey MC. The dynamic origin of increasing entropy. Rev Mod Phys. 1989;61(4):981.
10. Schrödinger E. Statistical Thermodynamics. Cambridge University Press; Cambridge: 1967.
11. Jaynes ET. Information theory and statistical mechanics. Phys Rev. 1957;106(4):620–630.
12. Jaynes ET, Rosenkrantz RD. Papers on Probability, Statistics and Statistical Physics. Kluwer; Boston: 1989.
13. Jaynes ET. Probability Theory: The Logic of Science. Cambridge University Press; Cambridge: 2003.
14. Grandy WT. Foundations of Statistical Mechanics. Kluwer; Boston: 1987.
15. Jaynes ET. The Gibbs paradox. In: Smith CR, Erickson GJ, Neudorfer PO, editors. Maximum Entropy and Bayesian Methods. Kluwer; Dordrecht: 1992. pp. 1–22.
16. Zubarev DN. Modern methods of the statistical theory of nonequilibrium processes. J Math Sci. 1981;16:1509–1571.
17. Robin WA. Non-equilibrium thermodynamics. J Phys A, Math Gen. 1990;23:2065–2085.
18. Rogers DM, Rempe SB. A first and second law for nonequilibrium thermodynamics: maximum entropy derivation of the fluctuation-dissipation theorem and entropy production functionals. Phys Rev E. 2011 submitted.
19. Jensen FV, Olesen KG, Andersen SK. An algebra of Bayesian belief universes for knowledge-based systems. Networks. 1990;20(5):637–659.
20. Pólya G. Mathematics and Plausible Reasoning. Princeton University Press; Princeton: 1954. 2 vols.
21. Cox RT. The Algebra of Probable Inference. Johns Hopkins University Press; Baltimore: 1961.
22. Aczél J. A Short Course on Functional Equations and Their Applications. Reidel; Dordrecht: 1987.
23. Torrie GM, Valleau JP. Nonphysical sampling distributions in Monte Carlo free-energy estimation: umbrella sampling. J Comput Phys. 1977;23(2):187–199.
24. Lu N, Kofke DA. Accuracy of free-energy perturbation calculations in molecular simulation. I Modeling. J Chem Phys. 2001;114(17):7303–7311.
25. Pearl J. Fusion propagation, and structuring in belief networks. Artif Intell. 1986;29(3):241–288.
26. Shenoy PP, Shafer G. Axioms for probability and belief-function propagation. Machine Intelligence and Pattern Recognition. In: Shachter RD, Levitt TS, Lemmer JF, Kanal LN, editors. Uncertainty in Artificial Intelligence. Vol. 4. North-Holland; Amsterdam: 1990. pp. 169–198.
27. Rogers DM, Beck TL. Modeling molecular and ionic absolute solvation free energies with quasi-chemical theory bounds. J Chem Phys. 2008;129(13):134505. [PubMed]
28. Rogers DM, Rempe SB. Probing the thermodynamics of competitive ion binding using minimum energy structures. J Phys Chem B. 2011;115(29):9116–9129. [PMC free article] [PubMed]
29. Åqvist J, Luzhkov V. Ion permeation mechanism of the potassium channel. Nature. 2000;404(6780):881–884. [PubMed]
30. Hestenes D. Entropy indistinguishability. Am J Phys. 1970;38(7):840–845.
31. Saunders S. On the explanation for quantum statistics. Stud Hist Philos Sci Part B, Stud Hist Philos Mod Phys. 2006;37(1):192–211.
32. Chipot C, Pohorille A, editors. Free Energy Calculations. Springer; Berlin: 2007.
33. Zhou Y, Morais-Cabral JH, Kaufman A, MacKinnon R. Chemistry of ion coordination and hydration revealed by a K+ channel Fab complex at 2.0 Å resolution. Nature. 2001;414:43–48. [PubMed]
34. Callen HB. Thermodynamics and an Introduction to Thermostatistics. 2. Wiley; New York: 1985.
35. Beck TL, Paulaitis ME, Pratt LR. The Potential Distribution Theorem and Models of Molecular Solutions. Cambridge University Press; New York: 2006.
36. Friedman HL, Krishnan CV. Thermodynamics of ion hydration. In: Franks F, editor. Water: A Comprehensive Treatise. Plenum Press; New York: 1973.
37. Cuello LG, Vishwanath J, Marien Cortes D, Perozo E. Structural mechanism of C-type inactivation in K+ channels. Nature. 2010;466(7303):203–208. [PMC free article] [PubMed]
38. Bergmann PG, Lebowitz JL. New approach to nonequilibrium processes. Phys Rev. 1955;99(2):578–587.
39. Lebowitz JL. Stationary nonequilibrium Gibbsian ensembles. Phys Rev. 1959;114(5):1192–1202.
40. Filyukov AA, Karpov VYa. Method of the most probable path of evolution in the theory of stationary irreversible processes. J Eng Phys Thermophys. 1967;13:416–419.
41. Akaike H. International Symposium on Information Theory. 2. 1973. Information theory and an extension of the maximum likelihood principle; pp. 267–281.
42. Kawai R, Parrondo JMR, Van den Broeck C. Dissipation: the phase-space perspective. Phys Rev Lett. 2007;98(8):080602. [PubMed]
43. Crooks GE. Entropy production fluctuation theorem and the nonequilibrium work relation for free energy differences. Phys Rev E. 1999;60(3):2721–2726. [PubMed]
44. Jarzynski C. Rare events and the convergence of exponentially averaged work values. Phys Rev E. 2006;73:046105. [PubMed]
45. Jaynes ET. Information theory and statistical mechanics. II Phys Rev. 1957;108(2):171–190.
46. von Neumann J. In: Mathematical Foundations of Quantum Mechanics. Bayer Robert T., translator. Princeton University Press; Princeton: 1996.
47. Jaynes ET. The evolution of Carnot’s principle. EMBO Workshop on Maximum-Entropy Methods; 1984. pp. 267–282. Reprinted by Ericksen & Smith in 1988.
48. Chua LO. Memristor—the missing circuit element. IEEE Trans Circuit Theory. 1971;CT-18(5):507–519.
49. Liebovitch LS, Fischbarg J, Koniarek JP. Ion channel kinetics: a model based on fractal scaling rather than multistate Markov processes. Math Biosci. 1987;84(1):37–68.
50. Zuckerman DM, Woolf TB. Dynamic reaction paths and rates through importance-sampled stochastic dynamics. J Chem Phys. 1999;111(21):9475–9484.
51. Baiesi M, Maes C, Neton K. Computation of current cumulants for small nonequilibrium systems. J Stat Phys. 2009;135(1):57–75.
52. Onsager L. Reciprocal relations in irreversible processes. I Phys Rev. 1931;37(4):405–426.
53. Neyton J, Miller C. Potassium blocks barium permeation through a calcium-activated potassium channel. J Gen Physiol. 1988;92(5):549–567. [PMC free article] [PubMed]
54. Neyton J, Miller C. Discrete Ba2+ block as a probe of ion occupancy and pore structure in the high-conductance Ca2+-activated K+ channel. J Gen Physiol. 1988;92(5):569–586. [PMC free article] [PubMed]
55. Niven RK. Steady state of a dissipative flow-controlled system and the maximum entropy production principle. Phys Rev E. 2009;80(2):021113. [PubMed]
56. Jou D, Casas-Vázquez J, Lebon G. Extended irreversible thermodynamics. Rep Prog Phys. 1988;51:1105–1179.
57. Jou D, Casas-Vázquez J, Lebon G. Extended irreversible thermodynamics revisited. Rep Prog Phys. 1999;62:1035–1142.
58. Kjelstrup S, Bedeaux D. Series on Advances in Statistical Mechanics. World Scientific; Singapore: 2008. Non-equilibrium Thermodynamics of Heterogeneous Systems.
59. Crooks GE. Path-ensemble averages in systems driven far from equilibrium. Phys Rev E. 2000;61(3):2361–2366.
60. Trepagnier EH, Jarzynski C, Ritort F, Crooks GE, Bustamante CJ, Liphardt J. Experimental test of Hatano and Sasa’s nonequilibrium steady-state equality. Proc Natl Acad Sci USA. 2004;101(42):15038–15041. [PubMed]
61. Luzzi R, Vasconcellos ÁR, Galvão Ramos J. Predictive Statistical Mechanics: A Nonequilibrium Ensemble Formalism. Kluwer; Dordrecht: 2002.
62. Shirts MR, Chodera JD. Statistically optimal analysis of samples from multiple equilibrium states. J Chem Phys. 2008;129(12):124105. [PubMed]
63. Ytreberg FM, Zuckerman DM. Single-ensemble nonequilibrium path-sampling estimates of free energy differences. J Chem Phys. 2004;120:10876. Note: J. Chem. Phys. 121, 5022 (2004) corrects the Metropolis criterion in the text above Eq. (9) [PubMed]
64. Minh DDL, Chodera JD. Optimal estimators and asymptotic variances for nonequilibrium path-ensemble averages. J Chem Phys. 2009;131(13):134110. [PubMed]