Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
J Comput Chem. Author manuscript; available in PMC 2013 February 5.
Published in final edited form as:
PMCID: PMC3271861

Convergence and error estimation in free energy calculations using the weighted histogram analysis method


The weighted histogram analysis method (WHAM) has become the standard technique for the analysis of umbrella sampling simulations. In this paper, we address the challenges (1) of obtaining fast and accurate solutions of the coupled nonlinear WHAM equations, (2) of quantifying the statistical errors of the resulting free energies, (3) of diagnosing possible systematic errors, and (4) of optimal allocation of the computational resources. Traditionally, the WHAM equations are solved by a fixed-point direct iteration method, despite poor convergence and possible numerical inaccuracies in the solutions. Here we instead solve the mathematically equivalent problem of maximizing a target likelihood function, by using superlinear numerical optimization algorithms with a significantly faster convergence rate. To estimate the statistical errors in one-dimensional free energy profiles obtained from WHAM, we note that for densely spaced umbrella windows with harmonic biasing potentials, the WHAM free energy profile can be approximated by a coarse-grained free energy obtained by integrating the mean restraining forces. The statistical errors of the coarse-grained free energies can be estimated straightforwardly and then used for the WHAM results. A generalization to multidimensional WHAM is described. We also propose two simple statistical criteria to test the consistency between the histograms of adjacent umbrella windows, which help identify inadequate sampling and hysteresis in the degrees of freedom orthogonal to the reaction coordinate. Together, the estimates of the statistical errors and the diagnostics of inconsistencies in the potentials of mean force provide a basis for the efficient allocation of computational resources in free energy simulations.


The calculation of free energies is one of the main quantitative applications of molecular dynamics or Monte Carlo simulations of molecular systems. In umbrella sampling simulations,1 a free energy profile (or potential of mean force, PMF) G(x) along a chosen physical or virtual coordinate x is obtained by performing a series of simulations with biasing potentials applied that act as local restraints on x. The weighted histogram analysis method (WHAM)2,3 has become the standard method to combine the results from the different simulations,4 and has accordingly been implemented in major simulation software packages.5 Variants of WHAM can be used for the analysis of replica exchange simulations,6,7 or in conjunction with the string method.8 Here we present methods (1) to obtain faster and more accurate solutions of the coupled nonlinear WHAM equations, (2) to quantify the statistical errors of the estimated free energies, (3) to diagnose possible systematic errors in the free energies that result from inadequate sampling of motions orthogonal to x, and (4) to optimize the allocation of computational resources in free energy calculations.

Traditionally, the free energy profile in WHAM is computed by direct iteration of a set of coupled equations until self-consistency is achieved. It was long acknowledged that in practical applications the convergence of the WHAM equations may require more than 100,000 iterations.9 More seriously, the convergence is typically judged by the variation of the free energy between iterations, which turns out to be only a poor estimator of the actual deviation from the true solution. When each WHAM iteration results in only a small change in the free energy, it may incorrectly appear as if convergence has been achieved. The traditional fixed-point iteration scheme normally has a linear convergence rate, which means that a constant number of iterations is required to gain one more significant digit in the precision of the solution.

To overcome this slow convergence, we note that the WHAM solution corresponds to a maximum likelihood estimate of the free energy parameters,6,10 and solving the WHAM equations is thus equivalent to finding the maximum of the likelihood function. A variety of numerical techniques with superlinear convergence rate have been developed for such optimization problems,11 including the Newton-Raphson, trust region, and nonlinear conjugate gradient methods. In these methods, the rate of convergence typically improves as the exact solution is approached. For quadratic objective functions, they can find the exact solution within a finite number of iterations.11 In this study, we demonstrate that by adopting these superior numerical methods, both the speed and the accuracy in solving the WHAM equations can be significantly improved.

Estimating the statistical errors in the WHAM results is more challenging than solving the WHAM equations. In the original method,2,3 the error is determined from the expected statistical uncertainty in each bin of the histogram, after properly accounting for correlation effects in the sampling. However, this method ignores the uncertainty in the normalization factor (or equivalently, the free energy) for each umbrella window,6 and therefore fails to account for the accumulation of errors over multiple intermediate windows, which, unfortunately, is typically the major source of error in practical applications. A more reliable approach is to divide the simulation data into a certain number of blocks, to use WHAM to calculate a PMF from the data in each block, and then to determine the statistical uncertainty from the variance of the PMFs. One may further use the bootstrap strategy to generate new random data according to the estimated distribution, and then determine the uncertainty by comparing the PMFs calculated from these hypothetical trajectories or histograms.5 In a novel method6 based on Bayesian statistics, the underlying free energy profile is taken as the unknown quantity, and the histograms as the given observed data. The uncertainty in the free energy is then determined from the posterior likelihood of the parameters. Although the method is conceptually rigorous, the uncertainty cannot be obtained analytically, and in practice has to be obtained from statistical sampling in the parameter space under proper approximations.6

In this study we propose a simple method for the error estimation in umbrella sampling simulations, based on the statistical error of the free energy gradient, or the mean force, in each individual umbrella window. For a PMF G(x) along a reaction coordinate x sampled with a series of harmonic biasing potentials K(xri)2 / 2 spaced evenly at ri =r0+iΔr, and a reference value of G(r0) = 0 in the leftmost umbrella window by definition, we show that the variance in the free energy estimator, given by the square of the cumulative statistical error, is approximately


Here var (xi) is the squared error in the estimate of the mean position of x in window i which can be obtained straightforwardly from block averages12 (see Eq. 36 below). Eq. (1) allows us to use simple statistics to estimate the error of a PMF. Furthermore, it clearly reveals the error propagation through multiple windows, and identifies the contribution of each umbrella window to the overall statistical error, thus providing a basis for systematic improvement of the accuracy with minimal computational effort.

We also introduce consistency tests between histograms in adjacent umbrella windows, or between observed and consensus histograms. In particular, we provide a diagnostic that uses information entropy as a measure of deviation between the actual observed histogram piobs(x) in window i from the consensus histogram piWHAM(x) expected from the WHAM free energy:


Large values of this Kullback-Leibler divergence or relative entropy indicate that the free energy surfaces sampled in different umbrella windows are inconsistent. Eq. (2) and an additional pair-wise test for adjacent histograms may help identify potential problems of insufficient equilibration of the degrees of freedom orthogonal to the chosen reaction coordinate.

We thus suggest the following procedure to analyze umbrella sampling simulations: (1) efficiently compute an accurate solution of the WHAM equations and obtain the PMF using a superlinear optimization algorithm; (2) compute the variance of the average reaction coordinate in each umbrella window by block averaging, and then estimate the statistical errors in the PMF using Eq. (1) or its more precise forms discussed in this article; (3) calculate the inconsistency coefficient in Eq. (2) or other similar measures discussed in this study, and identify windows with potential problems of insufficient equilibration. We note that all the analyses above are computationally inexpensive, and can be straightforwardly implemented. Furthermore, on the basis of Eqs. (1) and (2), if one intends to extend the sampling, the computational resources can be invested efficiently by concentrating on regions that contribute most to statistical uncertainties and inconsistencies.

In this study we focus on one-dimensional PMFs in umbrella sampling simulations, which have been widely adopted in a large body of studies on biological and synthetic channels as well as many other important systems. We note that WHAM can also be used on multidimensional histograms involving different state variables (such as temperature) or collective coordinates. The numerical methods for solving the WHAM equations and the consistency tests introduced in this study can be directly applied in such multidimensional cases. A generalization of our estimate of the statistical error to higher dimensions is outlined in Appendix A.


In this section, we first introduce the WHAM equations and propose new numerical algorithms to solve the equations. Then we propose a new scheme to estimate the statistical errors in the free energy obtained from WHAM, and consistency tests of the histograms to help identify the potential problem of poor equilibration of the degrees of freedom orthogonal to the chosen reaction coordinate. We conclude this section with a discussion of strategies for the optimal allocation of computational resources in umbrella sampling simulations.

WHAM Equations and Numerical Algorithms

Consider a set of S independent simulations at temperature T, each corresponding to an “umbrella window” with a harmonic biasing potential


For each simulation i, the time series of x is binned into histograms, with {nil} representing the counts in bin l centered at {xl} (l =1, …, M) and Ni=l=1Mnil the number of samples in simulation i. If the data from a simulation are correlated, one can scale the counts by an inefficiency factor (1+ 2τi)−1 determined from the correlation time τi of x in window i, as measured in units of steps in its time series.6,7 We assume in the following that proper scaling has been done so that the {nil} represent the equivalent number of effectively independent samples.

Our objective is to construct the underlying unbiased free energy G(x) that is most consistent with the observed simulation data. For this purpose we aim to determine the unbiased equilibrium probability distribution {pl} (l =1, …, M), representing the probability of finding the coordinate x in each bin when no biasing potential is applied. Then G(x) is given by


where kB is the Boltzmann constant and Δl is the width of bin l. Given the {pl}, the expected probability distribution {pil} in simulation i is also known:6


where cil is determined by the biasing potential at the center of each bin:


and fi is a normalization factor to ensure that {pil} sum to one:


fi is thus the reciprocal of the partition function of simulation i,


Given the probabilities {pil} for simulation i, the likelihood of having {nil} counts in the respective bins obeys the multinomial distribution:6


The overall likelihood for jointly observing the counts in the S independent simulations is then given by


Substituting Eqs. (5) and (9) into Eq. (10) and then taking the logarithm, we obtain6


in which the negative log-likelihood A includes all the terms containing the {pl}:


where Ml is the total count in the l -th bin, summed over all simulations:


Finding the maximum likelihood estimates of {pl} is then equivalent to the minimization of A. We note that general discussions of the maximum likelihood approach applied to biased sampling can be found in the statistical literature.1315

To obtain the minimum of A, we take the derivative of A with respect to each individual pl, noting that in Eq. (12) fi is a function of the {pl} through Eq. (7):


By demanding these derivatives to be zero, we obtain


for each bin. Eqs. (7) and (15) jointly form the set of coupled nonlinear WHAM equations, which are traditionally solved by iteratively substituting {pl} and {fi} to obtain a new set of values until self-consistency is achieved. This direct iteration method is numerically inefficient. In the following we introduce techniques with superior convergence rate that directly aim to minimize A, as given in Eq. (12). In addition, for a given problem, A can be used to compare the accuracy of different results, with those closer to the exact solution having smaller A values.

To reduce the dimensionality of the optimization problem, we rewrite the minimization of A with respect to the {pl} into an equivalent minimization with respect to the {fi}. Since there are typically far fewer simulations than bins, S < M, this reformulation reduces the computational cost. Substitution of Eq. (15) into Eq. (12) results in a new function of the {fi} alone:


By taking the gradient with respect to the {fi}, it is then straightforward to show that the minimum of this new function coincides with the solution of the WHAM equations:


where we used Eq. (15) to rewrite the gradient in terms of the {pl}. At the minimum of A~ these derivatives are zero, and we thus recover Eq. (7). Solving the WHAM equations is equivalent to minimizing Eq. (16) with respect to the {fi}, which is numerically more efficient than minimizing Eq. (12) with respect to the {pl} in typical cases.

One may also work on the logarithm of {fi} by a change of variables:



According to Eq. (8), the {gi} actually correspond to the total free energies of the system in the presence of the biasing potential i. Using the {gi} as the free variables eliminates the possibility of having (unphysical) negative fi values in the search of the minimum. The optimization function becomes:


Furthermore, it can be shown13 that A^(g1,,gS) is a convex function with nonnegative second derivatives everywhere, and thus has a single minimum.13 The derivatives of A^ with respect to the {gi} are given by


Because multiplying all {fi} by a constant factor, or adding a constant to all {gi}, does not alter the free energy profile G(x), we can set g1 to zero without loss of generality, and minimize A^ with respect to g2, …, gS.

In typical umbrella sampling setups, the histogram of a particular umbrella window only substantially overlaps with the histograms of neighboring windows. Consequently, the derivative A^gi for window i is predominantly determined by those values of {gj} with |ji| being small, and is hardly affected by distant windows. The derivative therefore mainly reflects the local consistency of the free energy across a small range of umbrella windows, but poorly indicates the real distance of gi to its true value in the exact solution. To improve the performance of the optimization, one may use the incremental changes of the {gi} over adjacent windows as free variables in the minimization:



The function A^ to be minimized and its derivatives with respect to the {Δgi} (i=1, …, S −1) are then given by



Our experimentation shows that the change of variables to {Δgi} makes the convergence considerably faster, especially when the number of umbrella windows is large.

The target function A^ can be minimized using a variety of numerical algorithms.11 In the Newton-Raphson algorithm,11 a quadratic expansion of the target function at the current search position is obtained from the local gradient and Hessian matrix, and the minimum of this approximate quadratic function is taken as the new search position. This algorithm was used on a similar problem,16 and was found to be efficient yet slightly less reliable than the direct iteration method.16 Some variants of the Newton-Raphson algorithm are both efficient and reliable, and are thus more widely adopted in practice. Here we test two such variants, the subspace trust region method and the Broyden-Fletcher-Goldfarb-Shanno (BFGS) method11 with cubic line search, as implemented in the “fminunc” function in the Matlab17 Optimization Toolbox. In the trust region method, a two-dimensional subspace is identified from the local curvature, and a trial move in the subspace is computed to minimize the approximate quadratic function, subject to the constraint that the move must lie within a given trust radius. If the trial move results in a higher value of the actual target function, the trust radius will be decreased to ensure that the quadratic expansion is sufficiently accurate. The quadratic approximation will become increasingly accurate upon approaching the true minimum, and therefore the iterations will converge rapidly. In the BFGS method, the Hessian matrix is not explicitly computed and is instead approximated and updated at each iteration step, and a line search is performed to locate the minimum along a given direction.

Error Estimation

Eq. (15) indicates that the statistical error in pl obtained from WHAM can be formally attributed to two uncertainties: (1) in the observed counts in bin l, and (2) in the normalization factors {fi}. Although the former uncertainty can be readily estimated, the latter is much more complicated to quantify. In typical umbrella sampling setups, only histograms of neighboring windows have substantial overlap, and the free energy difference between two distant windows can only be determined through a relay of intermediate windows. Consequently the uncertainties in the histograms of these intermediate windows contribute to the statistical errors in the {fi}, and in turn to the errors in G(x). In practice, the sampling within narrow windows is usually quite efficient, and the cumulative errors in the {fi} thus tend to dominate the error in the estimated free energy profile.

To estimate the statistical errors in the {fi}, we first define the total free energy of the system in the presence of a harmonic biasing potential K(xr)2 / 2 centered at r:


Using Eq. (8), we have


with gi defined in Eq. (18). Therefore the {fi} or {gi} are determined by the values of Gr(r) at the discrete points {ri}.

Furthermore, taking the derivative of Gr(r), we have:


This derivatives at the {ri} are thus given by


where Fi = K(rix) is the restraining force arising from the harmonic potential wi(x) in Eq. (3), and left angle bracketright angle bracketi denotes the ensemble average in the presence of the biasing potential wi(x). The i -th umbrella simulation actually represents such an ensemble, and thus can be used to obtain the ensemble averages:


where xi and Fi are the mean position of x and the mean restraining force in the simulation, respectively. If the biasing potentials are not harmonic, the estimators of the free energy gradient have to be modified appropriately (see Appendix B). We can integrate the gradients at the {ri} using the trapezoidal rule:


Eqs. (24) and (28) show that the {fi} or {gi} can be obtained approximately by integrating the gradients of Gr(r), or the mean restraining forces {Fi} from the simulations,18 analogous to thermodynamic integration. This method can also be conveniently applied in multidimensional cases, and therefore has been adopted in conjunction with the string method19 to obtain a free energy profile in a high-dimensional coordinate space. For the one-dimensional PMFs considered in this study, the method is expected to give similar results as WHAM does, if Gr(r) varies smoothly as a function of r between windows. Whereas the systematic errors arising from discretization of the chosen coordinate are expected to be smaller in WHAM than in the gradient-based method, the statistical errors in the latter can nonetheless be taken as reasonable estimates of the statistical errors in the WHAM result. We expect the quality of this approximation to decrease when the windows are spaced too far compared to features in Gr(r) .

Because the {fi} or {gi} can be expressed as functions of the {Fi}, their statistical errors can also be calculated from the variances of the {Fi}, which according to Eq. (27) are determined by the variances of the mean coordinates {xi}:


If the {ri} are equally spaced, i.e.,


Eq. (28) can be simplified into


The resulting variance of the estimator for the free energy difference over repeated measurements is then given simply by


The corresponding standard deviation, given by the square root of the variance, is typically taken as the statistical error of the free energy difference above. Here we assumed that the errors in the {xi} are uncorrelated. For some simulation protocols, the {xi} of different windows are correlated and one should then include their covariances in addition to the variances. In Hamiltonian replica exchange simulations20 the {xi} of different windows are statistically independent of each other in the joint probability distribution. For sufficiently long Hamiltonian replica exchange simulations that fully explore the combined phase space, the covariances of the {xi} are thus expected to vanish, and Eq. (32) approximately holds in such cases.

If the harmonic springs are relatively stiff (with large K), as is often the case in umbrella sampling simulations, p(x) would vary little within a narrow umbrella window. In such cases we have


Gr(r) in Eq. (23) can then be simplified into


Therefore under the stiff spring approximation,21 the {Gr(ri)} directly represent the PMF, and thus also share the same statistical errors as the PMF.

The variance var(xi) can be estimated from the trajectory of x in simulation i. For a correlated trajectory, this variance depends on the autocorrelation function of the data. By block averaging, however, var(xi) can be estimated without explicitly computing the autocorrelation function.12 In this method the time series xiα of the x - coordinate in window i is divided into n blocks of sufficiently large size m, so that the averages of the n blocks are independent of each other, and then var(xi) is estimated from the variance of these averages:12


For small n, this estimate tends to be noisy; for large n, the blocks are correlated. While n between 5 and 10 are typically used in practice, for accurate estimates it is advisable to plot Eq. (35) as a function of log(n) to identify a plateau12 at small n and use that for the estimate of var(xi).

From the variances of x in each individual simulation, the uncertainty in the {gi} (assuming g1=0) can be obtained via Eqs. (24) and (32):


This estimate is the central result of the second part of this paper, and clearly shows the accumulation of the statistical error over the intermediate umbrella windows. Under the stiff spring approximation discussed earlier, Gr(ri) or {gi} directly represent the PMF (Eq. 34), and therefore Eq. (36) also gives the error estimate for the PMF, thus leading to Eq. (1) except for minor differences arising from the first and last windows.

Additionally, the effective number of independent samples in the trajectory, Ni, can be estimated from the variance of xi:


where var(xi) is the variance of the coordinate x in trajectory i, and ni the length of the trajectory. We note that with this relation and var(xi) ≈ kBT / K (ignoring additional curvature effects) one can rewrite Eq. (1) in terms of the correlation time τi :


WHAM Consistency Tests

The free energy profile G(x) obtained from WHAM is a function of the reaction coordinate x, with all coordinates orthogonal to x integrated out. The determination of an accurate G(x) requires that these orthogonal coordinates be sufficiently sampled in the simulations. In each individual simulation, although the reaction coordinate x is restrained to a reference position, the subspace formed by the orthogonal coordinates may feature multiple metastable states or local minima, and infrequent transitions between these states will result in long autocorrelation in the trajectory. If the simulation is not long in comparison to the corresponding correlation time τi, the true error in xi will be underestimated by block averaging or other methods based on the autocorrelation function. In extreme cases, the system may stay in a single state without ever visiting other states in the subspace orthogonal to x during the entire simulation i, resulting in a severe underestimation of the errors in xi and the free energy. In such cases the trajectory of the single simulation alone will show no indication of the insufficient sampling; however, if other states are visited in the simulations of neighboring windows, the problem can be inferred by a comparison of the simulation results. If different states of the orthogonal coordinates are sampled in two adjacent umbrella windows, the two simulations will normally produce inconsistent probability distributions of x. The problem of insufficient sampling of the orthogonal degrees of freedom, therefore, can be detected by checking the consistency of the histograms in neighboring windows, as described below.

Consider two simulations in adjacent umbrella windows centered at r1 and r2. From the simulation trajectories, we obtain the probability distributions p1(x) and p2(x), and aim to test whether they are consistent with a common underlying unbiased probability distribution p(x). In principle, we may reconstruct p(x) from either p1(x) or p2(x), and then compare the two resulting distributions for consistency. In practice, however, the p(x) obtained from pi (x) is only accurate in the vicinity of ri, and will bear increasingly large statistical uncertainties at increasing distances from ri. Indeed, in the WHAM equations p1(x) and p2(x) primarily contribute to the shape of p(x) around the region between r1 and r2, and the comparison should therefore also be focused on this range. For this purpose, we imagine a virtual umbrella window at the midpoint r* =(r1 + r2) / 2, with the biasing potential given by w(x)=K2(xr)2 as in Eq. (3). The probability distribution of x in this virtual umbrella window can be estimated from either p1(x) or p2 (x):


If p1(x) and p2 (x) indeed arise from a common unbiased probability distribution, p1(x) and p2(x) should be identical within statistical uncertainties. Furthermore, p1(x) and p2(x) are peaked in the region where p1(x) and p2 (x) have maximum overlap, and could be compared on the same footing. A significant disagreement of p1(x) and p2(x) would then indicate that the two simulations are probably sampling different free energy surfaces.

Several statistical methods can be applied to test p1(x) and p2(x). In this study we adopt a simple protocol based on the Kolmogorov-Smirnov test. The method uses the maximum difference between the two respective cumulative distribution functions to quantify the deviation between p1(x) and p2(x). For two histograms arising from a common probability distribution, the expected value of this maximum difference asymptotically approaches zero following the inverse square root of the sample size, as the latter approaches infinity. Therefore we define an inconsistency coefficient, θ1,2, as an empirical measure for the discrepancy between p1(x) and p2(x):


Note that N1 and N2 above should be the effective number of independent samples, which can be estimated from Eq. (37) for correlated data. An abnormally high θ1,2 indicates inconsistency between the results of the two simulations. For any pair of adjacent umbrella windows i and i +1, we can similarly calculate an inconsistency coefficient θi,i+1. We note that other methods, such as Pearson's χ2 test, can also be used to examine p1(x) and p2(x).

In addition to the pair-wise consistency test above, one can also check the agreement between the actual observed histograms and the consensus histograms predicted from the WHAM results. Given the {pl} and {fi} from the WHAM calculations, the consensus histogram in each umbrella window, denoted by p^i(xl)=p^il, can be obtained according to Eq. (5) and then compared to the observed probability distribution pi (xl) in the same window. Among various possible methods, in this study we adopt the relative entropy, denoted by ηi, as a metric for the consistency between p^i(x) and pi(x):


ηi is guaranteed to be non-negative, with smaller ηi values indicating better agreement between the two probability distributions.

Optimal Allocation of Computational Resources in Umbrella Sampling

Assume that one sets up umbrella sampling simulations aiming to calculate the free energy difference between the first and last umbrella windows. For simulation i, the variance of the mean coordinate var(xi) asymptotically depends on the simulation length ni as var(xi)=vini, where vi is a constant for window i. To obtain an estimate of the {vi}, one could first simulate each umbrella window for a short length n'. We caution that these initial simulations should be sufficiently long to avoid significant systematic errors, which can be indicated by the inconsistency coefficients as mentioned earlier. Under the constraint of a fixed total length of all simulations Σini = N imposed by available computational resources, we seek to optimally allocate the {ni} for the individual simulations to minimize the statistical error, which according to Eq. (1) is determined by Σivi/ni. Utilizing the Cauchy-Schwarz inequality (Σixi2)(Σiyi2)(Σixiyi)2, we have


The two sides above are equal if and only if ni=λvi, with λ a normalization constant. This relation can also be derived by applying Lagrange multipliers. Therefore in the optimal allocation scheme one should sample each umbrella window with the simulation length proportional to vi[var(xi)]12, based on the variances of the mean position estimated from trial runs of equal length n'. We note that whereas the prescription above may serve as a general guideline, in practice other technical factors also need to be considered. In the replica exchange scheme, e.g., all simulations must have the same length. In such cases, one could either first simulate all windows with replica exchange and then perform extension runs for those windows that require further sampling, or introduce additional windows in regions of poor statistics.


In this section, we present two tests for the numerical algorithms and the error estimation scheme discussed in the Methods section. In the first test, we analyze data from a practical application, a set of trial molecular dynamics (MD) simulations in our study of ion conduction through a protein channel.22 Our focus in this test is on the performance of different numerical algorithms to solve the WHAM equations. In the second test, we carry out Monte-Carlo (MC) simulations on a simple potential designed to reproduce some common problems (such as hysteresis) in practical applications. Moreover, with the true free energy profile known, the absolute errors can be calculated, allowing us to assess the quality of the estimated statistical errors.

Umbrella Sampling of Na+ in an Ion Channel

As described in ref. 22, a total of 153 umbrella windows with a uniform spacing of 0.5 Å were employed to determine the free energy for passage of a Na+ ion through the transmembrane pore of the GLIC channel.22 In each window an umbrella potential with a spring constant of 10 kcal/mol/Å2 was applied on the z -coordinate of the Na+ ion along the membrane normal direction, and a lateral restraint on the xy plane was applied to confine the ion in the bulk region.22 For the present study, we performed new calculations in which we implemented Hamiltonian replica exchange20 between neighboring windows. Here we analyze data from a set of trial MD simulations of 1 ns in each window. We construct histograms with a uniform bin width of 0.02 Å for each of the 1-ns trajectories as the input for the WHAM calculation. We note that the resulting free energy has a considerable entropic component, due to a significant variation in the lateral area accessible to the ion at different z positions. The mean squared deviation of the ion in the xy plane from the axis varies from ~0.6 Å2 at the narrowest portion of the pore to ~28 Å2 in the bulk region.

We first compare the performance of the superlinear (trust region and BFGS) algorithms and the traditional direct iteration method in solving the WHAM equations. In all methods the iterations start with given initial values of the probabilities {pl} or the normalization factors {fi}. Although it is common practice to assign a constant initial value to all {pl} or {fi}, it was shown that using more accurate estimates of the free energy as initial values can significantly speed up the convergence in the direct iteration method.23 In fact, approximate {fi} or {gi} can be calculated by integrating the gradients, or the mean restraining forces (Eqs. 24, 31). Here we test each method, first for constant initial values, and then by using the coarse-grained profile determined from the mean forces. In our implementation of the direct iteration method, the convergence is deemed achieved when the relative change in {pl} for any l by a WHAM iteration is smaller than a given threshold δ. We test each case with a larger (10−3) and a smaller (10−6) δ value.

As shown in Table 1, the trust region and BFGS numerical algorithms yield more accurate results than the traditional direct iteration method, indicated by smaller values of the target function A. In fact, the direct iteration method with a convergence threshold of δ = 10−3 leads to a rather inaccurate free energy profile (Fig. 1A, blue dashed curve), with an error of more than 2 kBT, when starting with uniform initial values. When the gradient-based estimates of the {fi} are used as initial values, the direct iteration method indeed converges faster, with the threshold of δ = 10−3 almost satisfied already at the start of the iterations. To achieve better accuracy with a more stringent threshold of δ = 10−6, however, a large number of iterations is still required, taking a computational time at least an order of magnitude longer than the trust region and BFGS methods. The example also demonstrates that apparently small variations between WHAM iterations, on the order of δ, do not necessarily mean that convergence at that level of accuracy has been achieved. At the end of the WHAM iterations in our test, the free energies change by less than 10−6 kBT in a single iteration, although their absolute errors are still on the order of 0.01 kBT, implying that to gain one more significant digit in the result, one needs to perform thousands of iterations. In contrast, the convergence rate of superlinear algorithms increases when approaching the final solution; in our case only ~500 more equivalent iterations are required in the trust region method when the termination tolerance is decreased from 10−6 to ~2×10−16 (the minimum relative change that can be represented by a double-precision floating-point number).

Figure 1
Umbrella sampling of a Na+ ion in an ion channel. The chosen reaction coordinate is the z -coordinate of the ion, but is denoted by x here, as used consistently throughout this article. (A) Free energy profiles calculated from the WHAM equations (blue ...
Table 1
Performance benchmarks for the direct iteration, trust region, and BFGS numerical algorithms. Each algorithm was tested with two sets of initial values, from a uniform assignment or from a gradient-based estimation of the free energy (Eqs. 24 and 31), ...

The results of the free energy calculation are shown in Fig. 1. The coarse-grained free energy profile (red curve, Fig. 1A) obtained by integrating the mean forces (Eq. 31) indeed closely overlaps with that (blue solid curve, Fig. 1A) obtained by solving the WHAM equations. The statistical uncertainties in the average position xi (Fig. 1B) are largest in the region between x =−20 Å and x =0 Å, which is actually the narrow part of the channel where the ion is coupled to the motions of the protein side chains. The flat baselines at the two ends of the free energy curves (Fig. 1A) represent the bulk water regions at the two sides of the channel, respectively. Although the MD simulations were performed under 3D periodic boundary conditions, the umbrella windows do not span the entire length of the unit cell and the windows at the two ends are still too far from each other to have an overlap across the periodic boundary. Nonetheless, in the absence of a membrane potential, the two levels at both ends of the ideal PMF should match, although in a calculated PMF they may not match exactly due to the statistical errors in the finite sampling. Overall, the calculated statistical errors of the free energy shown in Fig. 1A appear to be reasonable, and offer a faithful estimate of the uncertainty in the baseline difference. However, as the true free energy profile is not known in this case, we could not thoroughly calibrate the errors. To systematically examine the validity of our error estimation method, in the following we design a model potential that allows us to obtain the absolute errors of the calculation.

Umbrella Sampling of a Model Potential with Hysteresis

As illustrated in Fig. 2A and discussed earlier, when the reaction coordinate x is fixed at a given value, a multidimensional system may still populate different metastable states separated by high barriers in directions orthogonal to x. Insufficient sampling of the relevant orthogonal coordinates (or “solvent degrees of freedom”) may give rise to systematic errors in the obtained free energy and result in hysteresis. Here, we reduce the problem further and assume fast relaxation in the orthogonal degrees of freedom (y) in each well of Fig. 2A, but slow transitions between the wells. Analogous to the Marcus theory of electron transfer, we can then collapse motion in y and treat the problem as a transition between two one-dimensional surfaces, as shown in Fig. 2B. In addition to the continuous dimensionless reaction coordinate x, our model includes an orthogonal y coordinate, which takes discrete values of 1 or 2, representing two metastable states with different potentials Ei(x):


For simplicity, we use harmonic potentials


in which a1 =−4 and a2 =4. The free energy as a function of x is then


as shown in Fig. 2C.

Figure 2
Model free energy surface. (A) Schematic illustration of a typical situation in free energy calculations, with degrees of freedom orthogonal to the reaction coordinate exhibiting multiple local minima. (B) Realization of the scheme in (A) with a simple ...

To sample the free energy, we carried out MC simulations in a total of 21 umbrella windows, covering the range from r0 =−5 to r20 =5 with a uniform spacing of Δr=0.5. In each simulation, an umbrella potential with a spring constant of K =17 kBT was applied. At every MC step we made a random choice for either a y -move with a probability of 10−4, or otherwise an x -move. In an attempted y -move, the y -coordinate is switched to the alternative value (i.e., from 1 to 2 or vice versa). In an attempted x -move, the x -coordinate is moved by a random displacement from a uniform distribution in [−0.24, 0.24]. In either case the energy at the new coordinates is calculated, and the move is accepted or rejected according to the Metropolis criterion. The center, or reference position, of each window was used as the initial x -coordinate for the corresponding simulation. The initial y -coordinate was set to 1 for the first 12 windows (at r0 =−5 to r11 =0.5), and to 2 for the remaining 9 windows (at r12 =1 to r20 =5).

In Fig. 3, we examine the WHAM results and the error estimation from simulation data of various lengths measured by the number of MC steps. We construct histograms with a uniform bin width of 0.05 for the individual simulations, and then solve the WHAM equations by the minimization technique described in Methods. The resulting coarse-grained free energy profiles Gr(x) (red curves, Fig. 3A), calculated by integrating the local gradients (Eq. 31), closely match the WHAM results G(x) (blue curves, Fig. 3A). Therefore, we expect that the errors in G(x) can be determined from the errors in the mean x -coordinate xi in each simulation window, as described in Methods.

Figure 3
Umbrella sampling MC simulations for different sampling sizes, plotted in a manner similar to Fig. 1. The three columns show results for simulations with N =10,000, N =100,000, and N =1,000,000 MC steps, respectively. (A) The free energy profiles calculated ...

For short simulations (N =10,000), we find large systematic deviations in the free energy, and significantly underestimated statistical errors. The reason for the deviations is that in these short simulations the y -coordinate typically remains unchanged and undergoes no transition to the alternative state during the entire simulation. This lack of equilibration gives rise to particularly large errors in the windows at r10 =0, with equal expected probabilities of the two states with y =1 and 2, and at r11=0.5, which sampled a different state (y =1) than the expected most probable state (y =2) because of the biased initial condition. Consequently both the barrier height and the relative difference between the two local minima deviate considerably from those in the true free energy profile (green curve, Fig. 3A). As a result of the inadequate statistics in y, the errors σ(xi)=var(xi) in xi are significantly underestimated for the two windows, which in turn leads to an underestimation of the errors in the free energy (Fig. 3A, N =10,000).

In principle, the discontinuity in y can be identified by a direct examination of the y -coordinate in neighboring windows. In practice, however, this is not always possible in simulations of complex biomolecular systems with an enormous number of degrees of freedom, in which the states with slow transitions in the orthogonal subspace can probably only be described by collective order parameters. In such cases, the inconsistency coefficient, obtained from the reaction coordinate x, can still be readily used to detect the problem. For example, in Fig. 3C (N =10,000), the inconsistency coefficient θ for the two problematic windows at r11 =0.5 and r12 =1 is found to be abnormally high, indicating that the two corresponding simulations were not sampling a common unbiased distribution of x. Indeed, the y -coordinate in these two simulations stays at 1 and 2, respectively, thus resulting in the sampling of different potential surfaces.

As the simulations were extended to N =100,000 and N =1,000,000, multiple transitions of the y -coordinate occur in the windows near r10 =0. As a result, the autocorrelation functions of x decay slowly, and the block averaging method properly reports the large statistical uncertainty of xi in these windows, as shown in Fig. 3B. Consequently, the estimated statistical error in the calculated free energy (Fig. 3A) is now consistent with the actual absolute errors. The large variances in xi for the windows near r10 =0 result in small and more reasonable values of Ni, the effective numbers of independent samples (Eq. 37) in these simulations. When these proper Ni values are used, the inconsistency coefficient θ (Eq. 40) for the two windows r11 =0.5 and r12 =1 is no longer high (Fig. 3C), indicating that the discrepancy between the two histograms is not abnormally large in comparison to expectations. Indeed, the infrequent transitions in the y -coordinate are already reflected in the estimated variances of xi, and are thus accounted for in the error estimation of the free energy.

As shown in Fig. 3D, the errors in the observed histograms are also reflected in the consistency with respect to the consensus histograms. Insufficient sampling of the orthogonal coordinate in an umbrella window usually results in higher values of the relative entropy (η) in this window and in its neighboring windows. For example, for N =10,000, as mentioned earlier, major inconsistencies occur near r11 =0.5 and r12 =1, and the η values are indeed higher in the corresponding windows. For N =100,000 and N =1,000,000, relatively large η values occur near the central window at r10 =0 which bears the largest uncertainty. Overall, the η values decrease with longer simulation length N, indicating improved consistency with more extensive sampling. We note that for a system characterized by two local minima on the energy surface that are partially overlapping in projection, the resulting free energy will typically feature a barrier at an intermediate location. In this barrier region, the sampling is more challenging because this intermediate region tends to be of high energy relative to the minima even within one well and, more seriously, both minima need to be visited with the correct proportionality. Consequently, for such systems the barrier region generally bears larger uncertainty in the free energy calculation, consistent with the results for our model system here.

To further test the quality of the error estimation, we repeated the simulations 100 times with identical initial coordinates as described above and different random seeds. When the sampling size N is relatively small, the results are biased by the particular initial condition and the systematic error in the free energy is significant due to hysteresis. As shown in Fig. 4, in this case the estimated errors are smaller than the absolute errors with respect to the exact free energy. When N becomes larger, however, the systematic error arising from a particular initial condition becomes insignificant in comparison to the statistical error, and the latter becomes the main contributor to the actual error. When N is large enough so that all umbrella windows get well equilibrated, the statistical errors obtained from Eq. (32) indeed offer a faithful estimate for the absolute errors, as shown in Fig. 4. The results here further confirm that an accurate estimate of the errors is only possible after sufficient equilibrations (i.e., in the asymptotic limit, where statistical errors dominate).

Figure 4
Comparison of the estimated errors and the absolute errors, with respect to the free energy difference between the two minima at x =±4 in the model potential. The umbrella simulations with 1,000,000 MC steps are repeated 100 times, and the first ...


In this study, we demonstrated that superior numerical optimization algorithms, such as the trust region and BFGS methods, can solve the WHAM equations with higher accuracy and significantly faster speed than the traditional direct iteration method. In the direct iteration method, the incremental change in each WHAM iteration can be smaller by orders of magnitude than the actual distance to the solution, resulting in slow convergence rates and potentially misperceptions of false convergence. In contrast, common numerical minimization algorithms take into account the curvature of the target function and make a more informed move in each iteration, thus requiring significantly fewer iterations to find the solution. Furthermore, as the search approaches the optimal solution the convergence rate of these algorithms will speed up, and their performance gain over the direct iteration method will therefore be more significant when results of higher precision are desired. In our tests we found that the BFGS method is more efficient than the trust region method, mainly because BFGS constructs an approximate Hessian matrix and thus avoids the costly computation of the second derivatives. If the number of the umbrella windows is very large, the nonlinear conjugate gradient method is another viable choice, as it avoids the construction of the Hessian matrix altogether.

One underlying reason for the poor convergence of the iterative solution to the WHAM equations is that the free energies of each umbrella run need to be adjusted globally, but the iteration operates locally. In our case of ion translocation through a membrane channel, the WHAM iterations changed the free energies of windows at one end of the channel only slowly relative to the windows at the other end. Even in a globally not yet converged state, each window was already well-matched with its neighbors, and iterations had to maintain that matching. The standard WHAM iteration thus faces similar numerical challenges as, say, local update schemes in path integral simulations or in polymer simulations. The use of minimizers successfully addresses this problem by using what amounts to gradient-based multiparticle moves in the simulation analogs.

The coupled nonlinear WHAM equations also make it difficult to directly estimate the statistical errors in the obtained free energies, especially for the free energy difference across multiple umbrella windows. To address this problem, a coarse-grained free energy profile can be calculated by integrating the mean restraining forces corresponding to the free energy gradients. The statistical errors in the coarse-grained free energy can be easily determined from the variances of the average reaction coordinate in each individual window. The coarse-grained free energy is a good approximation for the normalization factors in WHAM and, in the stiff-spring case, the desired PMF itself. On this basis, one can use the error obtained from the coarse free energy profile to estimate the error of the fine WHAM profile. In the WHAM-derived free energy profile, the statistical uncertainty in the difference between two distant positions can thus be readily estimated (Eq. 32), with an explicit expression of the error accumulation over the intermediate umbrella windows. Moreover, in this scheme it is straightforward to decompose the statistical errors into the contributions of each individual window; to further improve the accuracy of the PMF, one may focus on the windows with predominant contributions to the errors and extend the sampling in these windows.

The error estimation above relies on a proper estimate for the variance of the average reaction coordinate in each window, which can only be achieved when the other orthogonal degrees of freedom are also sufficiently sampled. Insufficient equilibration or sampling of the orthogonal coordinates is a common problem in practical applications of free energy methods, and may result in significant underestimation of the errors. This problem typically manifests itself as inconsistencies between neighboring (and more distant) histograms of the umbrella windows, as different states of the orthogonal coordinates normally correspond to different probability distributions also of the WHAM coordinate x. The inconsistency coefficients introduced in this study can help identify discontinuities in the orthogonal coordinates between neighboring umbrella windows, when data in each individual window alone give no hint of such a problem, as demonstrated in our tests. If inconsistency is detected, one may thoroughly examine all relevant coordinates in the involved windows for potential discontinuity,24 or simply extend the simulations for a better equilibration. It is therefore advisable to perform these consistency tests in addition to the error estimation in umbrella sampling simulations.

As alternatives to WHAM, some other methods aim to directly determine the normalization factors {fi} or equivalently, the free energy factors {gi}, without constructing histograms. The multistate Bennett acceptance ratio (MBAR) method,16 e.g., obtains the optimal {gi} by solving a set of coupled nonlinear equations, and provides a formula for the statistical uncertainty in the {gi}.16 In this study, we introduce other routes to determine the {gi}, by minimizing the function A^ in Eq. (19) and, more approximately, by using free energy gradients in Eq. (28). It was shown that the MBAR formulism is equivalent to the WHAM equations as the bin width in WHAM approaches zero.16 We also demonstrate in this study that WHAM and our gradient-based method give very similar results for the same dataset. Given such similarities, we therefore expect the statistical errors estimated by our method and by MBAR to be comparable. However, we have not performed a direct comparison because a full implementation of the MBAR error estimate would require manipulation of matrices of dimensions n× k, where n is the total number of data points summed over all k simulations. At least for one-dimensional umbrella sampling with harmonic bias functions, our error estimation scheme is thus much simpler to implement than that of MBAR, and clearly reveals the contribution of each umbrella window to the accumulated error.

A potential drawback of WHAM is the somewhat artificial choice of the bin width for the histograms and the associated discretization error.25 The theoretical estimate25 of such error in our test cases is well below 0.05 kBT; indeed, we find that when using different bin widths the WHAM results typically differ by less than 0.01 kBT, much smaller than the magnitude of other errors discussed in this study. However, we note that if the simulations are extended by orders of magnitude, other errors could in principle be dramatically reduced such that the discretization becomes the major error source. In such cases one can simply use a smaller bin width to reduce the error. Nonetheless, WHAM has practical advantages in certain applications. For example, the number of bins is typically much smaller than the number of the original data, and does not increase with the data size. Therefore the reduction of the data into low-dimensional histograms can have significant benefit in the speed and memory requirement when dealing with very large datasets. Also, WHAM directly provides the unbiased free energy profile, which is typically the main objective of umbrella sampling simulations. Furthermore, the histograms permit a more detailed examination of the data. It was noted that an exchange of data between different simulations will not change the MBAR result.14,16 In contrast, with the histograms one can further check whether the data in each simulation are consistently distributed to detect potential problems, as demonstrated in this study. Overall, in practical applications, major errors are almost always due to imperfect data arising from the finite sampling and the initial conditions. Having proper diagnostics of such errors and indicators of the resulting inconsistencies provides a basis for the development of optimized and refined free energy sampling strategies.


We thank Dr. Edina Rosta for helpful discussions regarding Eq. (38). This research was supported by the Intramural Research Programs of the NIDDK, NIH, and utilized the high-performance computational capabilities of the Biowulf Linux cluster at the National Institutes of Health, Bethesda, MD (


In this appendix, the estimate of the statistical error in Eqs. 1, 32, and 36 is generalized to higher-dimensional umbrella sampling in a space spanned by N order parameters xα, sampled in S simulations with added harmonic bias potentials Σα=1NKα(xαrαi)22 for simulation i. From each simulation, we obtain an estimate of the N components of the free energy gradient from the mean restraining forces, Fαi=Kα(rαixα). As in Eq. 28, we want to combine these gradients into an estimate of the relative free energies Gi=kBTgi of the restrained simulations. However, in multiple dimensions the numerically estimated free energy gradients can be integrated through multiple paths, the results of which may not match exactly, due to both statistical errors in the estimated gradients and discretization errors of the integration. A path-independent result can be obtained by minimizing the squared deviation of the free energy differences GjGi between adjacent windows i and j from the corresponding averaged gradients,


where the sum is over pairs (i,j) of simulations with nearby restraining centers rαi and rαj, such that the linear approximation in Eq. A1 is applicable. For restraining points on a rectangular grid, minimization of χ2 results in a discrete Poisson equation with boundary condition G1=0 (where we arbitrarily chose i=1 as reference of the free energies, without loss of generality). In general, setting the derivatives of χ2 with respect to the Gi to zero results in a set of linear equations that can be written in matrix form asMG = BF, where M amounts to a discrete version of the Laplace operator, G is the vector of (unknown) free energies Gi, and B is a matrix that produces the appropriate linear combination of the restraining forces Fαi forming the vector F, consistent with Eq. A1. The formal solution then is G = M−1BF. From the errors in the mean positions of the order parameters, we can again estimate the uncertainties in the corresponding restraining forces, varvar(Fαi)=Kα2vari(xα), and more generally estimate their covariances cov(Fαi,Fγi)=KαKγcovi(xα,xγ). The covariances of the estimated free energies are then obtained as linear combinations of the covariances in the restraining forces,


where cikα = (M−1B)i,kα. Here we assumed that there are no correlations between the results of different simulations i. If we assume further that the restraining forces are uncorrelated, we obtain an estimate of the uncertainty in the free energies,


In practice, it may be simpler to circumvent the matrix computations and instead calculate cikα directly by repeated minimization of the quadratic target function χ2, in what amounts to a variational construction of the Green's function M−1. Specifically, if all Fαi are set to zero in Eq. A1, except Fγk, then numerical minimization of the resulting χ2 with respect to the Gi_(as above with the reference G1=0) directly produces the required solutions cikγ = Gi. By repeated minimization of χ2 for all k and γ, one can thus build up the entire set of coefficients entering the error formulas Eqs. A2 and A3. We note that for one-dimensional umbrella sampling, with the (i,j) sum in Eq. A1 restricted to nearest neighbors, one recovers the results of the main text, in particular Eqs. 32 and 36.


In this appendix we show that our analysis of the statistical errors can be similarly applied to umbrella sampling with more general biasing potentials w(x,r), such as those with nonuniform spring constants, or anharmonic potentials. In the general case we define the coarse-grained free energy


whose derivative is


We define


The derivative of Gr(r) at ri is then given by


in which Fi is obtained by averaging F(x,ri) in simulation i with biasing potential w(x,ri). Then the difference of Gr (r) between any two windows is again given approximately by Eq. (28), with the statistical uncertainty accordingly determined from the estimated variances of the {Fi}.

For the harmonic biasing potentials w(x,r) = K(xr)2 / 2 discussed in the main text, we have F=rw(x,r)=xw(x,r), and Fi thus coincides with the average restraining force of the biasing potential.


1. Torrie GM, Valleau JP. Chem Phys Lett. 1974;28(4):578–581.
2. Ferrenberg AM, Swendsen RH. Phys Rev Lett. 1989;63(12):1195–1198. [PubMed]
3. Kumar S, Bouzida D, Swendsen RH, Kollman PA, Rosenberg JM. J Comput Chem. 1992;13(8):1011–1021.
4. Boczko EM, Brooks CL., 3rd Science. 1995;269(5222):393–396. [PubMed]
5. Hub JS, de Groot BL, van der Spoel D. J Chem Theory Comput. 2010;6(12):3713–3720.
6. Gallicchio E, Andrec M, Felts AK, Levy RM. J Phys Chem B. 2005;109(14):6722–6731. [PubMed]
7. Chodera JD, Swope WC, Pitera JW, Seok C, Dill KA. J Chem Theory Comput. 2007;3(1):26–41. [PubMed]
8. Rosta E, Nowotny M, Yang W, Hummer G. J Am Chem Soc. 2011;133(23):8934–8941. [PMC free article] [PubMed]
9. Allen TW, Andersen OS, Roux B. Biophys J. 2006;90(10):3447–3468. [PubMed]
10. Bartels C, Karplus M. J Comput Chem. 1997;18(12):1450–1462.
11. Heath MT. Scientific Computing: An Introductory Survey. McGraw-Hill; New York: 2002.
12. Flyvbjerg H, Petersen HG. J Chem Phys. 1989;91(1):461–466.
13. Pollard D. NSF-CBMS Regional Conference Series in Probability and Statistics. 1990.
14. Kong A, McCullagh P, Meng XL, Nicolae D, Tan Z. J Roy Stat Soc Ser B (Stat Method) 2003;65:585–604.
15. Tan ZQ. J Amer Statistical Assoc. 2004;99(468):1027–1036.
16. Shirts MR, Chodera JD. J Chem Phys. 2008;129(12):124105. [PubMed]
17. MATLAB. MathWorks.
18. Hummer G, Szabo A. Acc Chem Res. 2005;38(7):504–513. [PubMed]
19. Maragliano L, Fischer A, Vanden-Eijnden E, Ciccotti G. J Chem Phys. 2006;125(2):24106. [PubMed]
20. Fukunishi H, Watanabe O, Takada S. J Chem Phys. 2002;116(20):9058–9067.
21. Park S, Schulten K. J Chem Phys. 2004;120(13):5946–5961. [PubMed]
22. Zhu F, Hummer G. Proc Natl Acad Sci U S A. 2010;107(46):19814–19819. [PubMed]
23. Bereau T, Swendsen RH. J Comput Phys. 2009;228(17):6119–6129.
24. Rosta E, Woodcock HL, Brooks BR, Hummer G. J Comput Chem. 2009;30(11):1634–1641. [PMC free article] [PubMed]
25. Kobrak MN. J Comput Chem. 2003;24(12):1437–1446. [PubMed]