To align and classify a collection of three-dimensional images, it is first necessary to select a metric that provides a way of measuring similarity between images. Although image metrics have been extensively studied, there have only been limited analyses in cases where images contain regions of missing information, as is the case with volumes reconstructed from electron tomography. In this section, we describe the origin and the geometry of the missing information region in electron tomographic images and introduce a way of measuring image similarity that explicitly accounts for the missing data.
In electron tomography, a series of projection images is acquired as the specimen is rotated along the tilt axis, typically covering a ±70° range for thin specimens. According to the central slice theorem (see
Natterer and Wuebbeling, 2001), each measured 2D projection provides frequency information in a slice through the origin of the Fourier transform of the specimen in a plane orthogonal to the incident beam. The limited tilt range results in a
wedge-shaped region in Fourier space where measurements are not available, see . Real space reconstructions using the available measurements exhibit a characteristic structured noise referred to often as the
wedge effect (
McIntosh et al., 2005), reflecting the absence of measurements at high tilts. Because this artifact affects all voxels equally in real space, measured and missing information become intermixed and cannot be distinguished from one another. Note that the naturally stronger weighting of low frequency components in the Fourier decomposition may cause low-resolution features like membranes to be visually more distorted than smaller size features, but ultimately, the lack of frequency information will affect all spacial scales equally. In reciprocal space however, measured and missing information are non-overlapping, making this an optimal choice to compute similarities between images for purposes of classification and alignment.
For simplicity, we now introduce the ideas involved in comparing images with missing data in a one-dimensional case. In the example in , we consider two uni-dimensional functions
v1(
x) and
v2(
x) that have missing information represented by 0-valued segments. We model these functions as products with occluding masks
m1(
x) and
m2(
x) that are defined to be 1 in the regions where measurements are available and 0 in areas of missing information. We are now concerned with defining a metric to measure dissimilarity between
v1 and
v2 in a manner that is consistent with the missing data paradigm. This can be achieved if we restrict the comparison between the signals to regions where both are available, that is, where
m1(
x) and
m2(
x) are simultaneously non-zero (overlap region):
Without this restriction, regions where only one of the signals is available will still contribute to
d(
v1,
v2) by an amount proportional to either
v1(
x
or
v2(
x)

depending on which signal is missing. Note that the extent of the overlap region
m1(
x)
m2(
x) may change for different sizes or relative position of the masks, effectively changing the size of the integration domain in
Eq. (1). For reasons that will become evident in the next section, it is desirable to have a standardized measure of dissimilarity that does not depend on the size of the overlap region. This can be achieved by replacing
d(
v1(
x),
v2(
x) with a normalized version that measures the
average dissimilarity within the overlap region:
We now extend these ideas to the case of signals that represent 3D volumes affected by the missing wedge. As mentioned before, we have chosen to carry out all computations in Fourier space where measured and missing information are naturally separated. Let
V1 and
V2 be two volumes reconstructed from limited-angle tomography having 3D complex Fourier Transforms

and

, respectively. The missing wedge of each volume is represented by a 3D mask in reciprocal space
i=1,2 with value 1 in regions where frequency measurements are available and 0 in areas with no data coverage due to the limited tilt range. Strictly speaking, each tilted projection only contributes a thin slice of information in reciprocal space leaving a small wedge of missing data in between tilts. We assume that within a limited range of frequencies (usually given by a bandpass filter) the spacing between tilts can be neglected provided that the angular sampling is sufficiently fine. As in the one-dimensional case, we define the dissimilarity between the two volumes as:
where
v :

→

is a similarity kernel and

comprises the range of relevant frequencies usually imposed by a bandpass filter in order to eliminate the unwanted contributions of low and high frequency components.
2 As noted before, the denominator in
Eq. (2) makes the measurement independent of the size of the overlap region effectively measuring the average v-dissimilarity within the overlapping area. From now on we will use the square matching kernel
v(
x) =
x2 and in this case
Eq. (2) will measure the average 2-norm of the difference between the Fourier Transforms in the overlapping region and within the frequency range

. Note that minimization of the numerator in
Eq. (2) in this case is equivalent to maximization of the usual cross-correlation coefficient after the images have been filtered to include only overlapping frequency components.
Sub-volumes extracted from reconstructions of different tilt series that were acquired under different conditions usually have dissimilar ranges of intensity making direct evaluation of
Eq. (2) inappropriate as it will inevitably reflect these differences. This problem is also present in single-particle electron microscopy, and typically resolved by application of a normalization step (that forces zero mean and unit variance on all images) before evaluation of the dissimilarity score. For tomographic data however, the problem is more complex because the missing information in reciprocal space prevents the univocal determination of statistical image properties like the mean and the variance. For example, consider two uni-dimensional signals corresponding to the same function but affected by two different occluding masks. The variances of the partially occluded signals will be different because the occluding masks are different (even though the original signal is the exact same). Assuming that the signals are aligned to each other, one can restrict the computation of the variance only to include the overlap region and in this case both signals will have the same variance. Although this does not hold if the signals are not aligned, one would still expect true corresponding matches to yield better scores than those accounted for by scaling differences alone. We have then chosen to apply this type of normalization in our approach, which is equivalent to that implied in the use of constrained cross-correlation presented in
Frangakis et al. (2002) and
Förster et al. (2005,
2008).
Assuming this normalization scheme, the expression in
Eq. (2) provides a formal framework for the generalization of dissimilarity measures previously proposed in the literature. The constrained cross-correlation function proposed in
Frangakis et al. (2002) to measure dissimilarity between a volume with missing wedge and a reference with no missing wedge, can be obtained by substituting
2(
f) = 1 in
Eq. (2) and ignoring the integral in the denominator. The extension of the constrained correlation (
Förster et al., 2008) to the case of two volumes affected by the missing wedge can also be derived from
Eq. (2) by keeping the same integral in the numerator and eliminating the one in the denominator. Per the discussion in the previous section, the absence of this normalizing term can bias the alignment toward orientations that minimize the size of the overlap region
1(
f)
2(
f), by effectively reducing the extent of the integration domain. The measure of dissimilarity introduced in
Schmid et al. (2006) can also be derived from
Eq. (2) by removing the overlap term
1(
f)
2(
f) from the numerator and leaving the denominator unchanged. In this case, the varying size of the overlap region is correctly accounted for, but the integral in the numerator will include contributions from regions where either of the missing wedges is not zero, resulting in possibly biased alignment estimates.
In this section we have introduced a way of measuring similarity between volumes reconstructed from limited-angle tomography by treating the problem as the comparison of signals with missing data. The definition of dissimilarity in
Eq. (2) constitutes the foundation for the subsequent analysis of image alignment and classification, it also provides a formal framework for the generalization of other measures proposed in the literature.