The Wiener filter (

Wiener, 1949) has been applied to both 2D and 3D cryo-EM image processing problems, with the goal of optimally combining noisy images into a “best” representation of the noise-free object being imaged (

Saxton, 1978;

Ludtke et al., 2001;

Zhang et al., 2008a;

Scheres, 2012). If one can obtain an estimate of the signal-to-noise ratio (SNR) of the Fourier-space representation of the data, the Wiener filter will suppress the noise in poorly measured parts of the Fourier space in order to obtain better agreement with the noise-free signal. However, the utility of the Wiener filter is compromised in single-particle imaging applications by an ambiguity in the definition of the SNR: as noted in

Sindelar and Grigorieff (2011), the SNR of a particular imaged particle can be made arbitrarily low just by increasing the field of view to include more noise in the surrounding solvent area. Thus, the behavior of the Wiener filter depends on the selected image size, for a given particle, and in general tends to give over-filtered results for images of single particles (

Sindelar and Grigorieff, 2011).

The above deficiency can be linked to the observation that the Wiener filter is only guaranteed to be optimal for stationary processes, where the expected mean and variance of the target function does not vary under translation (

Van Trees, 2001). In fact, the target function in the currently considered case, a 3D density map of a single particle, is highly non-stationary: the mean and variance of the density inside a particle will always be substantially different than the mean and variance in the solvent region. We therefore seek a modified filter that better captures the properties of single particles.

Deriving a 3D “single-particle” Wiener filter

To a first approximation, a large number of randomly oriented images will contribute a variable number of Fourier space measurements

(i=1, 2, ...

*n*_{hkl}) to each discrete point

**s**_{hkl} in the 3D discrete Fourier transform, or DFT, of the particle map ρ(

**r**) (see

Appendix A). Here,

**s**_{hkl} represents a discrete grid point in the 3D DFT having integer indices

*hkl*,

*n*_{hkl} is the number of measurements for

**s**_{hkl} contributing to this grid point. Here and in the following, bold symbols are vectors and italicized non-bold symbols refer to the length of the corresponding vectors. In particular,

*s*_{hkl} will be the radial spatial frequency corresponding to grid point

**s**_{hkl}. If the SNR of the measurements

is available as a function of

**s**_{hkl}, then the Wiener filter supplies a set of linear coefficients that minimize the average error in the resulting DFT. By Parseval's theorem, the error is also minimized in the corresponding real-space 3D map ρ

^{W}(

**r**) obtained by Fourier inversion. The Wiener expression (

Saxton, 1978) generalized for 3D is (see

Appendix A):

where CTF

_{i,hkl} are the previously estimated CTF values of the microscope for the given Fourier space measurement, accounting for the image defocus level, astigmatism, etc.

We now derive a modification to

Eq. (1) that addresses the special properties of single particles. Following our approach for the case of aligned 2D images (

Sindelar and Grigorieff, 2011), we define a 3D binary enveloping function, env

_{3D}(

**r**), outside of which the target particle density is known to be zero. We then seek the new set of linear coefficients to the measurements

that yield a real-space map where the error is specifically minimized inside the envelope. Applying a set of assumptions that are expected to be reasonable for single-particle cryo-EM data sets (for example, that the data set is sufficiently large to yield a well-localized particle map), it is straightforward to adapt the previously presented 2D SPW filter to its 3D analog (

Appendix A). After including a “gridding” formalism to account for the fact that, in the 3D case, most Fourier space measurements do not fall exactly on the discrete grid points

**s**_{hkl} (see

Appendix B) we arrive at the following expressions for the 3D SPW filter:

with

and

(note that

*f*_{particle} refers to a fraction of a 3D volume whereas in

Sindelar and Grigorieff (2011) it referred to a fraction of a 2D image). Here and in the following, PSSNR and SSNR are functions of radial spatial frequency

*s*. They approximate SNR values found at grid points

**s**_{hkl} by averaging over all values in a resolution shell. It is important to note here that

Eqs. (2) -

(4), as is the case with the equivalent 2D SPW expressions, can be applied in the absence of any specific knowledge about the shape of the envelope function. Instead, all that is required is the mean squared value of the envelope function,

*f*_{particle}, which is equal to the fractional volume occupied by the envelope within the boundary of the reconstructed box.

Eqs. (2) -

(4) will then minimize the reconstruction error of the particle density inside the envelope. Below we will describe how to find a “best” value for

*f*_{particle} that optimizes the map within the particle itself.

Accurate estimation of the image SSNR by masking

In order to implement

Eq. (2) it becomes necessary to obtain an accurate estimate of the SSNR of the raw data images. The SSNR can be most accurately obtained from a ‘masked’ FSC calculated from two volumes each containing half the data (

Harauz and Van Heel, 1986) where solvent noise surrounding the particle is suppressed with a soft-edged mask function env

_{mask}. The term env

_{mask} differs from env

_{3D} as it usually has a simpler shape such as a sphere and therefore contains substantially more volume than the actual particle volume. As shown by

Sindelar and Grigorieff (2011), for a set of aligned 2D images,

where the FRC is the 2D analog of the FSC, formed by comparing two independently averaged image data sets (

Harauz and Van Heel, 1986), S(

*s*) is a resolution shell centered around radial spatial frequency

*s*, and

*n*_{S} is the number of Fourier space pixels contained within S.

For a 3D data set, the corresponding result is (see

Appendix C):

where

is the mean-squared value of the soft-edged mask function evaluated over the 3D (real-space) reconstruction volume. This expression estimates the SSNR found in the raw data images, including the noise found in the solvent region, and thus may be combined with

Eq. (3) to obtain the PSSNR (assuming knowledge of

*f*_{particle}; see below). Here and in the following we make the assumption that the SSNR does not vary significantly between images and therefore, an average SSNR for the entire data set can be assumed. In the Discussion, we will consider the case of variable SSNR in a data set.

Derivation of a related post-processing SPW filter

The PSSNR term in the denominator in

Eq. (2) systematically down-weights structure factors

*F*^{SPW} where the number of measurements is not sufficient to overcome the measurement noise.

*F*^{SPW} thus represents an optimal estimate of the true structure factors (in the least squares sense and ignoring gridding-related artifacts), and its calculation requires incorporation of the SSNR found in the 2D image data during calculation of the final reconstruction. An alternative scheme has been described that uses a filter based on an FOM (

Rosenthal and Henderson, 2003). Unlike the Wiener filter and its SPW derivative described above, the FOM filter is not incorporated directly into the 3D reconstruction step, and is instead applied in a post-processing step after the reconstruction has been calculated. To compare these filtering methods, we relate

*F*^{SPW} to unfiltered gridded reconstruction using

Eqs. (2) and (A2.6):

where we have left out the small ε term from Eq. (A2.6), which is expected to have a negligible effect on this expression. The above expression represents a voxel-by-voxel correction to the unfiltered reconstruction

*F*^{LSQ}. We now substitute our estimate of the SSNR as a function of the masked FSC given in

Eq. (6):

The above expression still requires knowledge of the individual CTF terms in the 2D image data. To further simplify this expression, we now assume that the filter is approximately constant within a given resolution shell. This requires that the sum of squared CTF values (this can be considered as the effective number of Fourier-space measurements) is similar for all structure factors within the resolution shell. This condition will be met when (1) a sufficiently large number of images have been collected, such that every point in Fourier space is measured many times by a spread of defocus values, and (2) there are no strongly preferred orientations in the data set (note that the presence of astigmatism in the images would not affect our analysis, under the above conditions). The expected value of this filter is then

where the brackets

denote the average value for all possible instances of the noise in resolution shell S(

*s*_{hkl}). The above expression is expected, upon application to a non-filtered 3D reconstruction, to optimally filter the density map to reduce noise.

Eq. (9) describes how to obtain an approximation to the SPW algorithm (

Eq. (2)), by defining a post-processing filter to be applied to the unfiltered reconstruction (

*F*^{LSQ}). This result may be compared with the FOM filter described by

Rosenthal and Henderson (2003), which is written in our terminology as:

In contrast, we see that in the limit of

*f*_{particle} =

*f*_{mask}
Eq. (9) reduces to:

Note that while Rosenthal and Henderson applied masks to their reconstructed volumes prior to calculating the FSC, they did not explicitly consider the effects of masking in their expressions for C

_{ref}.

*De novo estimation of f*_{particle }*from* FSC *half volumes*

The above results indicate that successful application of the SPW filter requires an accurate estimate for *f*_{particle}. However, *f*_{particle} is defined by the shape of the solvent envelope of the particle, which is frequently challenging to obtain in experimental applications. Here we present a strategy for estimating *f*_{particle} using only information available from the input images. We begin with the property that the SPW filter minimizes the expected error within the particle region, compared with a noise-free reference volume. We further note that the SPW filter minimizes the reconstruction error everywhere in the particle simultaneously. In other words, if a chosen value of *f*_{particle} minimizes the error in any given region within the solvent envelope, the error should also be minimized at all other regions within the envelope as well, assuming equal quality of the map in all regions. Thus, one may restrict the above error evaluation to a small mask located within a “core” region of the particle, which is straightforward to establish even when the solvent boundary is indistinct (see below). If the noise-free reference volume is available, it is therefore possible to estimate *f*_{particle} by systematically varying this quantity during application of the SPW filter. The best estimate of *f*_{particle} will be the value that minimizes the error in the “core” region of the filtered reconstruction, with respect to the reference volume. In this way, *f*_{particle} can be estimated without knowledge of the precise shape of the particle envelope.

In experimental studies, the noise-free reference volume remains unknown, requiring further modification to the above strategy. It is straightforward to show, however, that because the SPW filter minimizes the error with respect to the noise-free reference volume, this filter also minimizes the error with respect to a noisy reference volume (so long as the added noise is random, and the reference is otherwise unfiltered). For any given experimental data set, moreover, a noisy reference volume is readily obtained by gridded Fourier inversion (Eq. (A2.6)). This observation implies that we may use an experimentally obtained reference volume in the above estimation procedure for *f*_{particle}, rather than using a noise-free reference, and still obtain the same result.

We thus arrive at the following scheme for estimating

*f*_{particle}: From one half of the data we calculate an unfiltered, noisy estimate,

*F*^{LSQ}, of the reconstruction (Eq. (A2.6)). The second half of the data is used to calculate a filtered estimate,

*F*^{SPW} in which we now allow

*f*_{particle} to vary (

Eq. (2)). We subsequently perform a series of reconstructions using values of

*f*_{particle} that range from 0 to 1, comparing the real-space cross-correlation coefficient (

*CCC*) between core regions of the filtered half-data density map and the unfiltered, noisier half-data map. We then choose the value of

*f*_{particle} that optimizes the core region

*CCC*. This value of

*f*_{particle} is thus expected to give a SPW filter whose output minimizes the error in the particle region. Moreover, this value of

*f*_{particle} is expected to correspond (at least approximately) to the fraction of the reconstructed volume that is occupied by particle density (this property will be tested below). Thus, the procedure just described will yield an approximation of the SPW filter using only the images and Euler angles that are standard input in any 3D reconstruction algorithm.

FLOW CHART: Integrated SPW filter

- Insert projections into Fourier volume via box convolution (equivalent to nearest-neighbor interpolation if box dimension is 1 × 1 × 1 in voxel units):
- Calculate sum in numerator of Eq. (2), stored on a per-voxel basis.
- Calculate sum in denominator of Eq. (2), also on a per-voxel basis (this and the preceding step are identical to the previously published FREALIGN implementation).
- Gather separate numerator, denominator tallies for two half-data-set reconstructions, for FSC computation.

- Perform Fourier inversion (Eq. (A2.6)) to obtain both half-data-set reconstructions, and compute the FSC between the two maps (using a smoothed mask where
*f*_{mask} is conservatively chosen to significantly exceed the volume of the particle) to obtain a lower bound on the reconstruction resolution. - Estimate the whole-image SSNR from the masked FSC, by Eq. (6).
- Select a ‘core region’ of the density by low-pass-filtering the reconstruction several times lower than the resolution lower bound computed in the last step, and defining the binary envelope to enclose a small fraction (i.e. ~10%) of the filtered reconstruction density.
- Perform a series of reconstructions using the second half data set, according to the formula: where
*f*, representing the unknown quantity *f*_{particle}, is varied between 0 and 1. - Estimate
*f*_{particle} as the value of *f* that maximizes the real-space *CCC* between core regions of the first (unmodified) half-data set reconstruction and the filtered reconstructions generated in step E. - Compute the full-data reconstruction by Eqs. (2) - (4).

FLOW CHART: Post-processing SPW filter

If the SPW filter is implemented with a post-processing filter rather than as an integrated reconstruction algorithm, any reconstruction algorithm may be used and fewer steps are necessary:

- Obtain unfiltered half-data-set reconstructions and compute the masked FSC and full-data-set reconstructions.
- Define a ‘core region’ of the density, as described in step D in the integrated SPW procedure.
- Apply a series of filters to the second half-data-set reconstruction, using the following form of the SPW post-processing filter: where
*f*, representing the unknown quantity *f*_{particle}, is varied between 0 and 1. Note that this post-processing filter has been modified from Eq. (9) in order to take into account the reduced signal-to-noise ratio found in a reconstruction made with half the data, compared with a full-data-set reconstruction. - Estimate
*f*_{particle} as the value of *f* that maximizes the real-space *CCC* between core regions of the first half-data set reconstruction and the filtered reconstructions generated in step C. - Apply the post-processing SPW filter (Eq. (9)), using the value for
*f*_{particle} estimated in part D, to the unfiltered full-data-set reconstruction step A.