Estimation of variance in density maps has been attempted in several different settings. Blow and Crick (1959)
estimated the mean variance in an X-ray crystallographic density map derived by the isomorphous replacement method. Henderson and Moffat (1971)
extended the error analysis to a difference Fourier synthesis. Their estimate of the mean variance requires
is the difference in structure amplitude, F
, and, δΔF
is the r.m.s. error in measurement of ΔF
. Since there is considerable error in EM structure factor amplitudes derived by Fourier transformation of images, we did not attempt a similar calculation. However, our results confirm that variation of the variance is small throughout the map even with large errors in the structure factor amplitudes. Another attempt to estimate the projection variance map was made by Mitra et. al. (1993)
who synthesized one single error map with the standard deviation of each structure factor randomly flipped in direction. We found that such an estimator was inconsistent and note that it does not reflect the fact that each image, not reflection, is a sample of the distribution.
In this study, we bootstrapped the images in order to estimate the variation in the density maps. We demonstrated through analysis of simulated (Fig. and ) and real data (Figs. and ) that the calculation is practical and that the standard error map so estimated can display the expected properties when structure variation is present. Interestingly, the variance of the voxel-wise standard error was surprisingly low except at the symmetry axes (). Variation in the structure accounts for only part of the variance obtained from analysis of EM images. There are several other sources of systematic error that contribute to the variance in producing the reconstruction that we will consider.
The most obvious source of variation is the artifact of symmetry averaging. We observed that the variances were stronger at the special positions of the crystallographic symmetry elements: The higher the symmetry, the more dominant the variation. We rationalize this observation as follows. In the unit cell of a crystal with p6 symmetry, the density value at any non-special position is, in fact, the mean of the 6 symmetrically related positions. On the other hand, density at a position on the 6-fold axis has no symmetry pair to average. Therefore, assuming that, before symmetrization, all positions in the unit cell have the same statistical property, the p6 symmetrized map would find the pixel-wise standard deviation at non-special position at a value square-root of 6 times that at the 6-fold axis, as suggested by Eqn. 4
. Indeed, when we divide the standard deviation value at the 6-fold axis from any of our maps by
, the resulting values are all within
. Therefore, we should always look beyond the obvious variance peaks at the symmetry axes for the true representation of the map error.
The analysis of 2D crystals is unique in its reconstruction method. Translational alignment error is found at two places. During image processing a reinterpretation method is used to “unbend” the crystal to correct for lattice distortions. This process in effect adjusts the translational alignment of the molecules so that information at the reciprocal lattice is all that is passed on in further analysis, thereby generating error. Other sources of error in the amplitude and phase values for each reflection are rotational variation of molecules within each unit cell, white-noise of the background, and variation in imaging conditions such as specimen drift and charging. Phase origin alignment between images is the other translational alignment that may introduce variation in the reconstruction, although probably at a lower level due to the strong symmetry constraint. In addition, tilt geometry and the CTF are refined by the best fit of the image data against the fitted lattice lines, and they too have their own uncertainty. Since the scaling among images is based on the very simple model of total scattering, its variation may also account for variance in the reconstruction. Unfortunately, many of these sources of systematic error such as drift and any of the alignment and fitting errors cannot be separated from structure variation in the same data set, and each may affect the pixel-wise variation in different ways. For example, drift, translational alignment error, the CTF, and scaling probably add variance uniformly throughout the map. The effect of tilt geometry error, on the other hand, is probably not isotropic over the 3D volume. The consistent difference we observed between
was puzzling at first (compare ). We expect that, in whatever ways the variance map is affected by the factors described above, the estimate of the map by bootstrap resampling should be affected equally. This difference is not present when structure variance was the only source of the variance as in the simulated data set. Defocus distribution in the data set, which is the most likely parameter that is present in a distorted or polarized distribution, was found to be approximately symmetric and mono-modal. We also checked the bootstrap distribution of the pixels for possible strong deviations from a normal distribution and ruled it out as the source of the bias. When we examined
is the reconstructed projection map from a single image, we found that
on average almost 5 times that of
. We therefore suspect that the complexity of the reconstruction algorithm plays an important role in this discrepancy. For example, in the MRC algorithm AVGAMPHS that is used to average the projected structure factors in reciprocal space, the amplitudes are combined separately from the phases. Therefore, the combination is not truly vectorial. As a result, adding two maps together in real space is not equal to combining the structure factors through AVGAMPHS. Similar separation of amplitude and phase components is also used in LATLINE. Therefore, we believe that it is likely that our comparison between
is not meaningful for obtaining the bias level of the bootstrap method as an estimate of
. The effect is not apparent in the simulated structure because the lack of noise resulted in a very narrow distribution of the phase for a given (h, k) reflection. By adding different levels of Gaussian noise to the simulated crystal images, we confirmed that the deviation of
increases with the increase of noise level and that bias of
increases faster than that of
as an estimate of
In our residual bootstrap attempt, many of the standard deviation peaks are elongated extensively in the z* direction (red arrows in ). We believe that the cause of the elongation is the incorrect assumption that the measured structure factor vectors along z* is homoscedastic. Plotting the fitting error against z* for the lattice lines shows that data along some of the lattice lines are heteroscedastic (Fig. S2
). In contrast, the simple bootstrap estimate does not require such an assumption of directional homoscedastic behavior but requires only that the variance in each voxel in the 3D map contributes to all projection images. Therefore, the latter estimate produced a better voxel independency.
Apart from the bootstrap estimator, the jackknife method is also commonly used in complicated systems. Not being a Monte Carlo simulation, the jackknife method has the obvious advantage of speed in comparison to the bootstrap technique. Which one is a better estimator of the variance is often determined by the parameter that is being estimated. Density map reconstruction involves Fourier transformation, filtering, and complex model fittings. As a result, predicting which method yields a better estimate is difficult. The best way to choose the better estimator is to perform tests on simulated data of known variance. Our simulation () showed that the bootstrap method outperforms the jackknife method in our particular application.
The recent paper by Penczek et.al. describes the successful application of the bootstrap technique to estimate the 3D variance map reconstructed from single particle projection images (Penczek et al., 2006b
). In addition, they showed that the variance map can be used to focus further classification in the volume that displays high variance (Penczek et al., 2006a
). In this way there is potential to distinguish different conformational states of molecules within a sample. Molecules in 2D crystals are usually tightly packed so that the projection of one molecule often overlaps with another, especially when tilted. Therefore, it is unlikely that we can use the variation information for separation of heterogeneous molecules within a crystal.