|Home | About | Journals | Submit | Contact Us | Français|
The registration of images is a task that is at the core of many applications in computer vision. In computational neuroimaging where the automated segmentation of brain structures is frequently used to quantify change, a highly accurate registration is necessary for motion correction of images taken in the same session, or across time in longitudinal studies where changes in the images can be expected. This paper, inspired by Nestares and Heeger (2000), presents a method based on robust statistics to register images in the presence of differences, such as jaw movement, differential MR distortions and true anatomical change. The approach we present guarantees inverse consistency (symmetry), can deal with different intensity scales and automatically estimates a sensitivity parameter to detect outlier regions in the images. The resulting registrations are highly accurate due to their ability to ignore outlier regions and show superior robustness with respect to noise, to intensity scaling and outliers when compared to state-of-the-art registration tools such as FLIRT (in FSL) or the coregistration tool in SPM.
There is great potential utility for information extracted from neuroimaging data to serve as biomarkers, to quantify neurodegeneration, and to evaluate the efficacy of disease-modifying therapies. Currently, the accurate and reliable registration of images presents a major challenge, due to a number of factors. These include differential distortions that affect longitudinal time points in different ways; true, localized anatomical change that can cause global offsets in the computed registration, and the lack of inverse consistency in which the registration of multiple images depends on the order of processing, which can lead to algorithm-induced artifacts in detected changes. Thus, the development of an accurate, robust and inverse consistent method is a critical first step to quantify change in neuroimaging or medical image data in general.
Since the object of interest is typically located differently in each acquired image, accurate geometric transformations are necessary to register the input images into a common space. Approaches based on robust statistics are extremely useful in this domain, as they provide a mechanism for discounting regions in the images that contain true differences, and allow one to recover the correct alignment based on the remainder of the data. Inverse consistency is critical to avoid introducing bias into longitudinal studies. A lack of inverse consistency in registration is likely to bias subsequent processing and analysis, as documented in Yushkevicha et al. (2009). The goal of this work is thus to develop a robust and inverse consistent registration method for use in the analysis of neuroimaging data. The core application of this technique is intra-modality and intra-subject registration with important implications for:
Although the remainder of this paper deals with neuroimaging data, the method can be used for other image registration task as well.
Highly accurate rigid registrations are of importance when averaging multiple scans taken within a session to reduce the influence of noise or subject motion. Since it is nearly impossible for a person to remain motionless throughout a 20 minutes scan, image quality can be increased by taking shorter scans and performing retrospective motion correction (Kochunov et al., 2006). Many common sequences are short enough to allow for several structural scans of the same modality within a session. Here even a slightly inaccurate registration will introduce additional artifacts into the final average and likely reduce the accuracy, sensitivity and robustness of downstream analysis.
Compared with cross-sectional studies, a longitudinal design can significantly reduce the confounding effect of inter-individual morphological variability by using each subject as his or her own control. As a result, longitudinal imaging studies are becoming increasingly common in clinical and scientific neuroimaging. Degeneration in subcortical structures and cortical gray matter is, for example, manifested in aging (Jack et al., 1997; Salat et al., 1999, 2004; Sowell et al., 2003, 2004), Alzheimer's disease (Dickerson et al., 2001; Thompson et al., 2003; Lerch et al., 2005), Huntington's disease (Rosas et al., 2002), multiple sclerosis (Sailer et al., 2003) and Schizophrenia (Thompson et al., 2001; Kuperberg et al., 2003; Narr et al., 2005) and has been useful towards understanding some of the major pathophysiological mechanisms involved in these conditions. As a result, in vivo cortical thickness and subcortical volume measures are employed as biomarkers of the evolution of an array of diseases, and are thus of great utility for evaluating the efficacy of disease-modifying therapies in drug trials. To enable the information exchange at specific locations in space, highly accurate and unbiased registrations across time are necessary. They need to be capable of efficiently dealing with change in the images, which can include true neurodegeneration, differential positioning of the tongue, jaws, eyes, neck, different cutting planes as well as session-dependent imaging distortions such as susceptibility effects.
As an example see Figure 1 showing longitudinal tumor data (same slice of five acquisitions at different times, MPRAGE, 256 × 256 × 176, 1mm voxels) registered to the first time point (left) with the proposed robust method. The five time points are: 5 days prior to treatment start, 1 day prior, 1 day after treatment start, and 28, 56 days after treatment start. Despite of the significant change in these images the registration is highly accurate (verified visually in non-tumor regions). The bottom row depicts the outlier weights (red/yellow overlay), which are blurry regions of values between 0 (outlier) and 1 (regular voxel) that label differences in the images. In addition to the longitudinal change in tumor regions and consequential deformation (e.g. at the ventricles), the robust method also picks up differences in the scalp, eye region and motion artifacts in the background. In our robust approach the influence of these differences (or outliers) is reduced when constructing the registrations, while they have a detrimental influence on the final registration result in non robust methods.
Statistically, robust parameter estimation has a history of supplying solutions to several computer vision problems (Stewart, 1999) as it is capable of estimating accurate model parameters in the presence of noise, measurement error (outliers) or true differences (e.g. change over time). The approach presented here is based on robust statistics and inspired by Nestares and Heeger (2000), who describe a robust multi-resolutional registration approach to rigidly register a set of slices to a full resolution image. Our approach, however, is designed to be inverse consistent to avoid introducing a bias. It also allows the calculation of an additional global intensity scale parameter to adjust for different intensity scalings that can be present especially in longitudinal data. A more complex intensity preprocessing is therefore not needed in most cases. Furthermore, we automatically estimate the single parameter of the algorithm that controls its sensitivity to outliers. This is a necessary addition, since a fixed parameter cannot adequately deal with different image intensity scales, which are common in MRI. In addition to the multi resolutional approach described in Nestares and Heeger (2000) we use moments for an initial coarse alignment to allow for larger displacements and situations where source and target may not overlap. Finally, we describe the registration of two full resolution images (instead of only a set of slices) and explain how both rigid and affine transformation models can be used in the symmetric algorithm. We demonstrate that our approach yields highly accurate registrations in brain regions and out-performs existing state-of-the-art registration algorithms.
The remainder of this paper is organized as follows. After discussing related work and introducing the theoretical background, such as robust statistics in Section 2, we present our symmetric registration model, different transformation models as well as intensity scaling in Section 3. Then we describe the registration algorithm in detail, taking care that the properties of the theory are carried over to the implementation (Section 4). We specifically focus on maintaining inverse consistency by resampling both images into a `half-way' space in intermediate steps as opposed to resampling the source at the estimated target location. This asymmetric sampling, which is commonly used, introduces a bias as the target image will not be resampled at all, and will thus be less smooth than the resampled source. In Section 5 we demonstrate the superiority of the proposed method over existing registration algorithms with respect to symmetry, robustness and accuracy on synthetic and real data as well as a motion correction application. The software implementing the presented robust registration is publicly distributed as part of the FreeSurfer (surfer.nmr.mgh.harvard.edu) software package as mri_robust_register.
Over the last 20 years methods for the registration of images (and in particular medical images) have been studied intensely (see e.g. Maintz and Viergever (1998); Maes et al. (1999); Hill et al. (2001) for surveys and comparisons). Many different applications domains exist for registration, including multimodal intra-subject registration, cross-subject volumetric registration, surface-based registration etc…, each of which require domain-specific approaches to maximize accuracy. Some of the most prominent intensity based algorithms are Cross-Correlation (Collins et al., 1995), Mutual Information (MI) (Maes et al., 1997, 1999; Wells et al., 1996), Normalized Mutual Information (NMI), and Correlation Ratio (CR) (Roche et al., 1998). Recently (Saad et al., 2009) found registration errors when comparing CR and MI and proposed a new cost function using a local Pearson correlation.
Intensity based methods consider information from the whole image and are often deemed to be more reliable and accurate than feature based methods (West et al., 1997, 1999). Driving the optimizations based on geometrically defined features such as points (Schönemann, 1931; Evans et al., 1989; Bookstein, 1991), edges (Nack, 1977; Kerwin and Yuan, 2001), contours (Medioni and Nevatia, 1984; Shih et al., 1997) or whole surfaces (Pelizzari et al., 1989; Fischl et al., 1999; Dale et al., 1999; Greve and Fischl, 2009) has the advantage of reducing computational complexity, but introduces reliability difficulties when extracting/placing the features. Furthermore, extracting surfaces is a complicated and time consuming process in itself and not feasible in cases where only an initial rigid registration is needed or for the purpose of averaging two structural scans from the same session. Additionally, hybrid approaches exist such as Greve and Fischl (2009), a surface based approach that additionally incorporates information derived from local intensity gradients. Note that a large body of work describes rigid registration in the Fourier domain, e.g. van der Kouwe et al. (2006); Bican and Flusser (2009); Costagli et al. (2009), but since we expect and wish to detect spatial outliers/change we operate in the spatial domain.
A number of different registration methods are implemented in freely available software packages. The widely used registration tool FLIRT (Jenkinson et al., 2002), part of the FSL package (Smith et al., 2004), implements several intensity based cost functions such as standard least squares (LS), correlation ratio (CR) and mutual information (MI) as well as sophisticated optimization schemes to prevent the algorithms from being trapped in local minima. Another freely available and widely used registration tool is based on Collignon et al. (1995) and distributed within the SPM software package (Ashburner and Friston, 1999). In this paper, we use these two programs as standards to evaluate the accuracy and robustness of our technique.
Instead of applying a rigid or affine transformation model, more recent research in image registration has focused on nonlinear warps, which typically depend on an initial affine alignment. Non-linear models include higher-order polynomials (Woods et al., 1992, 1998), thin-plate splines (Bookstein, 1989, 1991), B-splines (Unser et al., 1993; Kostelec et al., 1998; Rueckert et al., 1999; Kybic et al., 2000), discrete cosine basis functions (Ashburner and Friston, 1997; Ashburner et al., 1997), linear elasticity (Navier-Stokes equilibrium) (Bajcsy and Kovavcivc, 1989; Gee et al., 1993) and viscous fluid approaches (Gee et al., 1993; Christensen et al., 1994). Specifically a method described in Periaswamy and Farid (2006) presents promising results. It is based on a linear model in a local neighborhood and employs the expectation/maximization algorithm to deal with partial data. Similar to our approach, it constructs a weighted least squares solution to deal with outlier regions, however, with an underlying globally non-linear (and usually asymmetric) transformation model.
Several inverse consistent approaches exist for nonlinear warps. Often both forward and backward warps are jointly estimated, e.g. (Christensen et al., 2001; Zeng and Chen, 2008). Others match at the midpoint (Beg and Kahn, 2007) or warp several inputs to a mean shape (Avants and Gee, 2004). Yeung et al. (2008) describe a post processing method to create a symmetric warp from the forward and backward warp fields.
While nonlinear methods are often capable of creating a perfect intensity match even for scans from different subjects (change information is stored in the deformation field), it is not trivial to model and adjust the parameters of these algorithms, in particular the trade-off between data matching and regularization. In addition, it is worth noting that perfect intensity matching does not guaranty accurate correspondence. These methods need to be designed to allow the warp enough freedom to accurately match the data while restricting the algorithm to force the warp to behave `naturally', for example preventing the merging of two gyri into one, or more simply to ensure smoothness and invertibility. Due to their robustness, transformation models with low degrees of freedom are generally better suited for tasks where no change (e.g. motion correction) or only little change (e.g. longitudinal settings) is expected. Furthermore, rigid or affine registrations are frequently used to initialize higher order warps. We therefore focus on highly accurate, low degrees of freedom, intensity based registrations in this work.
The field of robust statistics describes methods that are not excessively affected by outliers or other model violations. Classical methods rely heavily on assumptions that may not be met in real applications. Outliers in the data can have a large influence on the results. For example, the mean is influenced arbitrarily by a single outlier, while the median is robust and stays fixed even with outliers present. That is why robust parameter estimation plays an import role in computer vision applications (see e.g. Stewart (1999)).
A measure for robustness is the breakdown point that describes the fraction of incorrect (arbitrarily large) observations that can be present before the estimator produces an arbitrarily large result. The breakdown point of the mean is 0 while for the median it is 0.5, which is the maximum attainable, as for values above one half, it is impossible to distinguish between the correct and the contaminating distribution.
M-estimators are a generalization of maximum likelihood estimators (MLEs) and were introduced by Huber (1964). Instead of computing the estimator parameter θ minimizing for a family of probability density functions f of the observations x1 … xn as done for MLEs, Huber proposed to minimize any general function ρ:
The mean, for example, minimizes the sum of squared errors, so ρ(xi, θ) := (xi − θ)2 (where “:=” means “define”). The median can be understood as an M-estimator minimizing the sum of absolute errors ρ(xi, θ) := |xi − θ|. Since most commonly used ρ can be differentiated, the solution can be computed by finding the zeros to Σψ(xi, θ) with ψ(xi, θ) := ρ(xi, θ)/θ. For most ρ and ψ no closed form solutions exist and iterative methods are used for the computations. Usually an iteratively reweighted least squares (IRLS) algorithm is performed (see next section).
A specific ρ used often in robust settings is the Tukey's biweight function (see Figure 2):
For small errors the biweight is similar to the squared error, but once a specific threshold c is reached it flattens out. Therefore large errors of outliers do not have an arbitrarily large influence on the result. Often the (scaled) derivative of ρ:
is referred to as the Tukey's biweight function, as it is used in the actual computations.
To further highlight the difference between the robust and least squares approach Figure 3 depicts the distribution of the residuals after a successful registration (zoom-in into the histogram of residuals normalized by the number of voxels). For least-squares registration, the ideal residuals would be Gaussian noise, and in fact most residuals are around zero (the high peak there is cut off by the magnification). However, due to true differences in the images caused by distortion and anatomical change, larger residuals exist that cannot be explained by Gaussian noise models. These regions have extremely low probability under the Gaussian model (red curve in Fig. 3), which causes them to have a disproportionately large influence on the registration. As mentioned above, even a single large outlier can have an arbitrarily large effect on the result of the least-squares registration that is only optimal for zero-mean, unit variance Gaussian noise. Together with the residual distribution Fig. 3 shows two curves: where f (x) is either the parabola x2 (red) or the Tukey's biweight function ρ(x) (green). It can be seen that the parabola results in the Gaussian (red curve) and cuts off the tails significantly while the green function produced by the Tukey's biweight better models the larger residuals.
Consider a linear regression model with design matrix A and N observations in vector :
The M-estimator then minimizes the objective function
where vector is the i-th row of the matrix A. When using least-squares estimation we obtain the standard least squares linear regression solution, which can be solved directly. For a general ρ with derivative ψ := ρ′ one proceeds by differentiating the objective function (with respect to ) and by setting the partial derivatives to zero:
when setting the weights . These equations describe a weighted least squares problem that minimizes . Since the weights depend on the residuals ri, which in turn depend on the estimated coefficients (which depend on the weights), an iteratively reweighted least-squares algorithm is used. It selects an initial least-squares estimate (all weights equal to one), then calculates the residuals from the previous iteration and their weights, and then solves for a new weighted-least-squares estimate:
with the current weight matrix in iteration (j) (wi depends on the parameter vector ). These iterations are continued until a maximum number of iterations is reached or until the total squared error:
cannot be reduced significantly in the next iteration. It should be noted that the residuals are normalized before computing the weights in each step:
σ is a robust estimator for the standard deviation obtained by a scaled version of the median absolute deviation (MAD):
where the median is taken over all elements, i, j = 1, …, N.1 Fig. 4 shows a zoom-in of the distribution of residuals (blue) as presented in Fig. 3 of two images after successfull registration. Here also the distribution of the weighted residuals (wiri) is shown in green. It can be seen that the weights reduce the tail (large residuals) significantly.
As described above, the first step in constructing a robust simultaneous alignment of several images into an unbiased common space for a longitudinal study or for motion correction, is to register two images symmetrically. To avoid any bias, the resulting registration must be inverse consistent, i.e., the same registration (inverse transformation) should be computed by the algorithm if the time points are swapped.
We first describe our symmetric gradient based image registration setup. Instead of understanding the registration as a local shift of intensity values at specific locations from the source to the target, we transform both images: the source IS half way to the target IT and the target half way in the opposite direction towards the source. The residual at each voxel is
where is the intensity at voxel location is the local displacement from source to target and depends on the spatial parameters . This setup is symmetric in the displacement. We will explain later how an intensity scale parameter can be incorporated.
When applying a small additive change to the n parameters in vector we can write the result using a first order Taylor approximation
Since there is one such equation at each voxel, it is convenient to write this in matrix form (a row for each voxel):
We will call the design matrix containing the partial derivatives the A matrix. For N voxels and n parameters it is a N × n matrix. In the following we will simply refer to the residuals to be minimized as and the observations at the current location . Thus, equation (13) can be written .
The goal is to find the parameter adjustments that minimize Σρ(ri), which can be achieved with iteratively reweighted least squares (cf. Section 2.3). Choosing the Tukey's biweight function ρ will prevent the error from growing without bound. This will filter outlier voxels, and at the end of the iterative process we obtain the robust parameter estimate and the corresponding weights, which identify the regions of disagreement.
What remains is to set up the design matrix A, i.e. to compute the partial derivatives of (Eq. 11):
Here DI = (I1 I2 I3) denotes a row vector containing the partial derivatives of the image I in the three coordinate directions. The vector , the derivative of the displacement for each parameter pi, will be described in the following section. This formulation allows us to specify different transformation models (the ), that can easily be exchanged.
Note that common symmetric registration methods (Frackowiak et al., 2003) need to double the number of equations to set up both directions. They solve the forward and backward problem at the same time. In our approach this is not necessary, due to the symmetric construction detailed above. However, a symmetric setup like this is not su cient to guarantee symmetry. The full algorithm needs to be kept symmetric to avoid treating the source image differently from the target. Often, for example, the source is resampled to the target in each iteration, which introduces a bias. We describe below how to keep the algorithm symmetric by mapping both images into a halfway space to ensure that they are treated in the same manner, with both images being resampled into the symmetric coordinate system.
This section describes some possible transformation models (for background see e.g. Frackowiak et al. (2003)). Depending on the application, different degrees of freedom (DOF) are allowed. For within subject registration, 6 DOF are typically used to rigidly align the images (translation and rotation) across different time points or within a session for the purpose of motion correction and averaging of the individual scans. To align images of different subjects to an atlas usually 12 DOF transforms (affine registrations) or higher dimensional warps are used. However, even in higher-dimensional approaches, a linear registration is often computed for initial alignment. In the next paragraphs we will describe how to implement a transformation model with up to 12 DOF.
Generally the displacement can be seen as a function of the n dimensional model parameter vector into (for a fixed location ). Here is assumed to be linear in the parameters (or it has to be linearized) and can be written as
where M can be seen as a 3 × n Jacobian matrix containing as columns the partials , needed in the construction of the design matrix A (see Eq. 14). In the following paragraphs we will compute these Jacobians M for the affine (MA) and the rigid (MRT) case. Note also that the displacement is not equivalent with the transformation T, but it is the amount of which a location is displaced, so .
The affine 12 DOF displacement is given by a translation vector and a 3 × 3 matrix:
It is straightforward to construct a transformation matrix (in homogeneous coordinates) from these parameters:
For the rigid case, we can restrict this transform, to only allow rotation and translation. However, for small rotation it is more convenient to use the cross product to model the displacement of a rotation around the vector (p4, p5, p6)T by its length in radians:
Note that this model is used to compute the values p4…p6 in each step. It is not used to map the voxels to the new location as small amounts of stretching could accumulate. To construct the transformation, only the translation and the rotation around the vector (p4, p5, p6)T by its length are considered. With , , and (a unit quaternion) we obtain the transformation matrix T:
After specifying the displacement model, we can plug it into equation (14) and obtain the matrix equation:
Images can differ in both geometry and intensity in longitudinal settings. If the assumption that a set of images share an intensity scale is violated, many intensity based registration algorithm can exhibit degraded accuracy. Often a pre-processing stage such as histogram matching (Mishra et al., 1995; Nestares and Heeger, 2000) is employed. An alternative to preprocessing the images is to utilize a similarity measure that is insensitive to scalings of intensity such as mutual information or entropy. Due to difficulties when estimating geometric and intensity changes simultaneously only a few exceptions such as Woods et al. (1992, 1998); Ashburner and Friston (1997); Ashburner et al. (1997); Periaswamy and Farid (2006) incorporate explicit models of intensity differences obviating the need for complex intensity pre-processing.
We can easily incorporate a global intensity scale parameter s into our model in a symmetric fashion. First the intensity scale factor is applied to both source and target to adjust their intensities to their geometric mean:
Recall that the additive spacial displacement was kept symmetric by adding half the displacement to the source and half of the negative displacement to the target, to move both to towards a common half way space. The intensity scale factor is multiplicative, so instead of simply multiplying the source image's intensities by s we scale them by and the target by to map both images to their intensity (geometric) mean. This keeps the residual function symmetric with respect to the intensity scaling factor in addition to the symmetric displacement setup.
For the approximation, the corresponding partial derivative is added in the Taylor approximation:
Thus, in order to incorporate intensity scaling, one simply appends s to the parameter vector and attaches a column to matrix A, containing the partial derivative of the vector with respect to s:
The algorithm consists of the following steps:
The above algorithm will be described in more detail in the following sections.
Since the Taylor based registration can only estimate small displacements, it is necessary to employ a multiresolution approach (Roche et al., 1999; Hellier et al., 2001), together with an initial alignment (see next section). As described in Nestares and Heeger (2000) we construct a Gaussian pyramid, bisecting each dimension on each level until the image size is approximately 163. We typically obtain about 5 resolution levels with a standard adult field-of-view (FOV) for an MRI image that is approximately 1mm isotropic (i.e. an FOV of 256mm). First a standard Gaussian filter (5-tab cubic B-Spline approximation)
is applied in each direction of the image, which is then subsampled to the lower resolution. These pyramids (source and target) need to be constructed only once for the entire process.
In order to speed up the registration and increase its capture range, an initial coarse alignment is constructed using moments. Geometric moments have proven to be an efficient tool for image analysis (Del Bimbo, 1999). For a grayscale image with pixel intensities I(x1, x2, x3), the raw image moments Mijk are calculated by
where i, j, k are the exponents of the coordinates x1, x2, x3 respectively (taking the values 0 or 1 in the following equation). The centroid of an image can be derived from the raw moments:
We compute the translation needed to align the centroids and use it by default as an initial transformation to ensure overlapping images when starting the robust registration algorithm. Furthermore, it is possible to use central moments as defined below to compute an initial rotational alignment. For full head images with possibly different cropping planes, such a rotational pre-alignment can be very inaccurate and should therefore only be used when aligning skull stripped images. Central moments are defined translation invariant by using the centroid (Eq. 26):
The covariance matrix of the image I can now be defined using :
The eigenvectors of the covariance matrix correspond to the three principal axes of the image intensities (ordered according to the corresponding eigenvalues). These axes are then aligned for two images. Care needs to be taken to keep the correct orientation. This is achieved by flipping the first eigenvector if the system has left-handed orientation. Even if both systems are right-handed, it can still happen that two of the axes are pointing in the opposite direction, which can be detected and fixed by projecting each axis onto its corresponding axis in the other image and flipping it if necessary. If the angle between the corresponding axes is too large, the correct orientation cannot be determined without additional information and the initial rotational alignment is not performed. Note that initial moment based orientation alignment was never necessary and therefore not used in any of our tests, since head MRI images are usually oriented similarly.
There are three nested loops in the registration algorithm: the different resolutions of the pyramid (step 3), several iterations on each level (remapping the images (step 4), and finally the iteratively reweighted least squares algorithm for the robust parameter estimation (inside step 4(b), see Section 2.3). Note, when switching form a lower to a higher resolution in step 3, the translational parameters need to be adjusted (scaled by 2) when given in voxel coordinates.
On each resolution level there are several iterations of the resampling and robust parameter estimation as explained next.
The registration model (Eq. 11) is designed to maintain symmetry in the algorithm, however we must also ensure that all steps are performed similarly for both images. Therefore it is not sufficient to map the source to the target in each iteration and re-estimate the new parameters. In such a setup only the source would be resampled at (or close to) the target location while the target would not go through the resampling process. In order to avoid this asymmetry, which can introduce biases due to the arbitrary specification of source and target, we propose to resample both images to the half way space in each iteration step.
For a given transformation T from source to target the half way maps are constructed by approximating the square root of the matrix T (here T is again assumed to be a 4 × 4 matrix in homogeneous coordinates). For a positive definite matrix T (we don't allow reflections and projections) there exists exactly one positive definite matrix with . For its computation we use the Denman-Beavers square root iteration (Denman and Beavers, 1976; Cheng et al., 2001): Let Y0 = T and Z0 = I, where I is the identity matrix. The iteration is defined by
The matrix Yk converges quadratically to the square root , while Zk converges to its inverse, . Once has been approximated, the source image is mapped to and the target to (to ensure both get resampled at the same location). For the resampling process tri-linear interpolation is used, although other interpolation algorithms can easily be employed. Note that to maintain symmetry the square root iteration should only be stopped when the largest element of is sufficiently small.
To set up the robust estimation problem (Eq. 20), the partial derivatives and a smoothed version of both images need to be computed. Smoothing is used to prevent the algorithm from being trapped in a local minimum. For smoothing we apply a Gaussian kernel in each image direction (Nestares and Heeger, 2000):
The smoothed derivatives can be computed by applying
in the direction of the derivative and the smoothing kernel in the two other directions. Once the image derivatives DI are computed, the matrix A and vector can be constructed (see Eq. 20). If the matrix gets too large, it is often sufficient to subsample the image at the highest resolution and only select every second voxel. As the derivatives and intensity information are selected from the high resolutional image the result will still be more accurate than stopping at the previous lower resolution level in the Gaussian pyramid. For further improvement stochastic sampling algorithms can be employed to avoid aliasing. Subsampling specific regions more densely than others (e.g. depending on gradients, edges or the outlier weights) is also likely to improve accuracy. Our tests, however, show very accurate results even with the simple subsampling algorithm.
Once the system has been constructed, the iteratively reweighted least squares algorithm (Section 2.3) is employed to compute the new parameters and weights. For this reason, the saturation parameter c of the Tukey's biweight functions must be specified. In Nestares and Heeger (2000) a constant saturation value c = 4.685 is recommend (suggested for Gaussian noise in Holland and Welsch (1977)). However, a fixed value cannot adjust well to different image contrast types and SNR levels, such as non-linear deformations or larger intensity differences. In these cases it can happen that the registration fails as too many voxels are considered outliers. Therefore in order to reduce the number of detected outliers particularly in the brain, it is necessary to find a less sensitive (i.e. larger) value in these cases. The user can always adjust this parameter according to the specific image situation. For full head scans, however, we developed a method that automatically estimates the sensitivity parameter. It also works remarkably well in brain-only registrations. For full head images, a global limit on the number of outlier voxels will not be a good measure, as large outlier regions especially at the skull, jaw and neck should be permitted. The following outlier measure uses a Gaussian to weigh voxels at the center of the image more strongly than voxels further away (see also Figure 5):
where di is the distance of voxel i to the center. W is zero iff (if and only if) all weights wi are one (meaning no outliers). A large W means that many voxels in the center of the image are labeled outlier. In that case the saturation is automatically incremented and W recomputed until W < Wthresh. All of this can be computed quickly on a lower resolution level (we choose the third highest level, i.e. for a 2563 image this is 643). The threshold Wthresh will be discussed and determined in Section 5.4 below. Note that in situations with significant outliers in the center, a global unweighted threshold can be used instead or the sensitivity parameter can be adjusted manually.
In order to measure how much a new parameter estimate differs from the last iteration, the root mean square (RMS) deviation of the two corresponding transformations is computed. This measure will also be used to assess the quality of a registration when compared to some ground truth. The RMS deviation measures the average difference of voxel displacements inside a spherical volume for two given affine transformations and , where M1, M2 are two 3 × 3 linear transformation matrices and t1, t2 the corresponding 3×1 translation vectors. The RMS error for a spherical volume with radius r is then given by:
where tr is the trace (see Jenkinson (1999) for the derivation). An average displacement error is used as a quality measure for a transformation instead of, for example, the maximum displacement because it depends on all voxels contained in the sphere instead of possibly only a single voxel. The misalignment of a single voxel is not very important if the rest of the image is aligned accurately. While a translation has an equally strong effect everywhere, a rotation, for example, shifts voxels different distances depending on the distance to the rotation center. For a translation of 0.1mm (and 1mm3 voxels) both maximum displacement and average displacement are the same ERMS = 0.1. Such a displacement can easily be seen on the screen when switching between the images. Even ERMS of 0.05 and below can be noticed when magnifying the images. These displacements, however, are too small to visualize in a printed version of the image (e.g. checkerboard).
In this work the RMS error is measured on the transformations defined in RAS (right, anterior, superior) coordinates with the origin located approximately at the center of the image. The radius of the spherical volume is set to r = 100 which corresponds to 100mm, enough to include the full brain. The iterations of the parameter estimation are usually terminated once ERMS < 0.01 i.e. the average displacement consecutive estimates is below 0.01mm, which is very restrictive. To avoid long runtimes in ill conditioned cases, a maximum number of iterations can also be specified by the user (the default is 5).
This section presents results quantifying the accuracy and robustness of the robust registration in comparison to other commonly used methods. As mentioned above, the robust registration is capable of ignoring outlier regions. This can be verified when checking the weights during a successful registration, as shown in Figure 6. The top images show the (enhanced) differences between the target and registered source. The regions that contain the strongest differences are correctly detected as outliers as can be seen in Figure 6 (bottom), where the weights are overlayed (red/yellow regions). Note that the weights range from 0 to 1 and are blurry (because they are computed on smoothed images), so they can blur in from neighboring slices.
Figure 7 (left) is an example of an image that is misaligned using FLIRT with the mutual information similarity function. The visible edges in the brain regions indicate an alignment error. Figure 7 (right) shows the differences when using the robust registration, where clearly less strong edges are visible. The remaining residuals are due to resampling and noise. Figure 7 bottom shows a magnification of the target (red) and registered source (green) on top of each other. The red and green edges (left) at the ventricle and temporal lobe indicate misalignment while yellow regions are accurately aligned (right). The difference between the two transforms here is ERMS = 0.88, almost one voxel on average.
In the following sections we analyze the performance of different registration tools. We use the RMS deviation of two transformations as described in Section 4.5 to quantify the distance (the error) of a computed transformation with respect to some ground truth transformation. For the following tests we use a set of 14 healthy subjects each with two scans 14 days apart. The images are MPRAGE T1 weighted full head scans (on Siemens Sonata 1.5T) and are resampled to 2563 voxels each with 1mm side length (original dimensions 256×256×128 with 1mm ×1mm ×1.33mm voxels).
Since this algorithm is intended to compute inverse consistent registrations, we need to verify experimentally that the final transforms are exactly inverses of each other when switching source and target. For each subject we register the image from time point 2 to the time point 1 and vice versa while comparing the obtained transforms of several common registration algorithms. We compare the robust registration (with different parameter settings) with the FLIRT registrations (Jenkinson et al., 2002) from the FSL suite (Smith et al., 2004) using different cost functions: standard least squares [FLIRT-LS], correlation ratio [FLIRT-CR], mutual information [FLIRT-MI]. Furthermore, we compare with a registration tool from the SPM software (Ashburner and Friston, 1999) based on Collignon et al. (1995) [SPM]. The robust variants are: [Robust] robust rigid registration (no intensity scale), [Robust-I] with intensity scaling, and [Robust-I-SS] with additional sub-sampling at the highest resolution. We also include our implementation with standard least squares [LS] instead of the robust Tukey's biweight error function to see the effect of the robust estimation with no other differences in the algorithm. In Figure 8 the RMS deviation of the forward and inverse backward transforms are computed and compared for different image types as used in the FreeSurfer software package: full head scans (orig), intensity normalized images (T1) and normalized skull stripped images (norm).
The FLIRT registrations perform similarly. The higher mean in the mutual information method on the orig images is due to a single outlier (ERMS = 2.75). It can be seen that the robust registration methods are extremely symmetric, even with intensity scaling switched on, adding another degree of freedom and a higher chance for numerical instabilities. Also our non-robust method [LS] with the standard least squares error function is perfectly symmetric in all cases. This test, however, does not tell us anything about the accuracy of the registration.
In this section we present results using images that were transformed, intensity scaled and otherwise manipulated with known transformations, which then can be used as ground truth. We compare how well several registration algorithms perform on the same test set of the 14 MPRAGE T1 weighted full head scans (Siemens Sonata 1.5T) of the same healthy subjects. The registration methods are the same as in the previous section (FLIRT, SPM and robust registration).
A random rigid transformation (rotation, translation) was computed for each image. The parameters were chosen in a way that reflects possible (large) head movements in a scanner: 50mm translation in a random direction together with a random rotation of 25 degrees around an arbitrary axis with the origin at the center of the image. The maximum displacement of a corner of theses image was between 130mm and 140mm. The parameters were chosen, so that all methods can find approximate solutions. For larger transformations [SPM] was no longer capable of recovering the correct registrations at all, while the robust methods performed perfectly (not shown) in tests up to 100mm translation and 40 degrees rotation. These transformations move the images apart so that there is almost no overlap, furthermore parts of the face, skull, neck and jaw can be cropped because they are mapped outside the field of view. The robust approach can deal well with this kind of partial matching. Moreover, we believe that, due to the multiresolution algorithm and the initial moment based alignment, even larger transformations will be recovered accurately.
For the synthetic registration comparison, the transform that represents half the random translation and rotation is used to map and resample each image at the target location and the inverse is applied to map and resample each image at the source location. This ensures that both images (source and target) will be resampled and do not move outside the field of view as easily. An accurate registration from source to target needs to be close to the original random transform. The accuracy is measured using the RMS deviation (see Section 4.5) of the ground truth and the computed transformation matrix. Four different tests were performed. In all cases random rigid motion was applied (as described above):
The results of this experiment are given in Figure 10. It can be seen that the robust version outperforms the three different FLIRT similarity functions in all tests. [SPM] yields similar accuracy, but fails completely for larger transforms (not shown). The robust registration shows almost no influence of the outlier boxes since these are accurately detected as such. There is only little influence of noise. However, when the global intensity scale is different in the two images, the robust registration methods needs 7 DOF (one additional intensity scale parameter: [Robust-I]) to maintain accuracy, because it strongly depends on similar intensity levels. This underlines the importance of incorporating automatic intensity scaling into the robust registration method. Subsampling on the highest resolution in the robust registration [Robust-I-SS] leads to a significant reduction in memory and run time, but still yields the same registration accuracy in these tests. The simple non-robust implementation LS performs poorly in most cases.
It should be noted that the FLIRT methods produce a few individual registrations with low accuracy when outliers or noise are present (as can be seen by checking the scatter data, the small circles in Figure 10, some are too large and not shown). The SPM method on the other hand produces quite accurate results in most test cases. However, as mentioned above it fails completely for larger transformations.
In contrast to the simulations with available ground truth transformations we do not know the correct registration in advance in typical registration paradigms. Therefore we need to establish a different performance metric. This can be achieved by registering the skull stripped and intensity normalized images of a test-retest study (two time points) with different registration methods. These registrations are highly accurate as the images contain only brains of healthy normals and only small changes in the brain are involved (e.g. noise etc.). In these well behaved situations the registration of these brain images computed by the different algorithms deviate from each other only by small amounts. The goal here is to find registrations of the corresponding full head images that are as close as possible to the brain-only, intensity normalized registrations.
The group chosen for this test is the same as described above. This test will be more noisy as the `ground truth' is already defined inaccurately. Figure 11 (left) shows the distances of all other methods to the SPM registration of the skull stripped normalized image (norm). It can be seen that compared to the full head registrations, the norm registrations are on a similar low level for all methods (SPM has of course zero distance to itself). SPM has been chosen to construct the ground truth registration of the norm images, as it performed more accurately than the FLIRT methods in the previous tests. We did not choose a robust registration to establish the ground truth to not favor our method. However, we tested establishing the `ground truth' with any other method which leads to very similar results and almost exactly the same plots.
The results on the full head (orig) image (Figure 11 middle) and intensity normalized full head (T1) image (Figure 11 right) evidence behavior that is similar to the previous tests. SPM performs (here only slightly) better than the FLIRT methods, while the robust registration yields the most accurate results. As expected for the orig images intensity scaling [Robust-I] improves the registrations further, while for the normalized T1 images it is not necessary. Again subsampling [Robust-I-SS] on the highest resolution reaches the same accuracy, indicating that the expensive iterations on the highest resolution level can be avoided.
As described in Section 4.4.2 a fixed saturation level c cannot be recommended for all image types. The value c = 4.685 from Nestares and Heeger (2000) will lead to erroneous registrations in many common settings. Figure 12 (top) shows the accuracy of each robust registration of the orig images plotted vs. the selected saturation level. For some subjects the optimal registration is reached at c ≈ 6 while other need a higher value c ≈ 15. For the normalized T1 images or for [Robust-I] (with intensity scaling enabled) the results look similar (not shown), however with individual minima spread between c = 4 and c = 9. When using a fixed saturation level for all registrations, c ≈ 14 is optimal for [Robust] with an average RMS error of slightly below 0.3 and c = 8.5 is optimal for [Robust-I]. Even with a fixed saturation, both robust methods are on average better than the other non-robust registration methods (cf. Figure 12 bottom).
For [Robust] without intensity scaling, a relative high saturation value (c = 14) is particularly necessary to compensate for the differences in image intensity. Lower values might label too many voxels outlier due to the intensity differences or non-linearities, resulting in misaligned images (see Figure 13 for an example). Instead of manually inspecting the outliers and registrations while determining an optimal saturation setting per image, we introduce the center focused weight measure W (Eq. 32) for full head images to indicate when too many outliers are detected in brain regions and to adjust the sensitivity accordingly. Figure 13 (bottom row) shows the same image registration, where the automatic parameter estimation results in less detected outliers and a successful alignment.
We will now determine an optimal W for the automatic saturation estimation. Figure 14 presents scatter plots of registration accuracies [Robust] and [Robust-I] on the full head (orig) images here plotted versus W. The horizontal red line shows the average minimum error when choosing the individual saturation that leads to the best registration for each subject (with respect to the ground truth). The automatic saturation estimation can almost reach this optimum by fixing the center focused weight measure W around 0.2 (see the black curve showing the average of W between 0.05 and 0.3). Additionally, W is quite robust since the average (black dashed curve) is relatively flat. Ensuring a W around 0.2 for the tested image types in the automatic saturation estimation leads to registrations that are almost as accurate as when taking the optimal result per subject (which is of course not know a priori).
Frequently several images of a given scan type are acquired within a session and averaged in order to increase SNR. These images are not perfectly aligned due to small head movements in the scanner (for some groups of patients there can be even large differences in head location, due to uncontrolled motion) and need to be registered first. Since not only noise but other differences such as jaw movement or motion artifacts are prevalent in these images, a robust registration method should be optimally suited to align the images while discounting these outlier regions. It can be expected that except for noise, brain tissue and other rigid parts of the head will not contain any significant differences (except rigid location changes). A misalignment of the within-session scans will of course affect the average image negatively and can reduce the accuracy of results generated by downstream processes. Therefore highly accurate registrations for motion correction are the first step, for example, towards detecting subtle morphometric differences associated with disease processes or therapeutic intervention.
To test the robust registration for this type of within-session motion correction, the two scans of the first session in the longitudinal data set presented above were selected. The second scan was registered to the first with the different registration methods. It was then resampled at the target location (first scan) and an average image was created. Since these within-session scans should show no change in brain anatomy, it can be expected that the difference between scan 1 and aligned scan 2 in brain regions will be very small and mainly be due to noise (and of course scan 2 will be smoother due to resampling). Therefore a larger difference in the brain-region between the registered images implies misalignment, most likely due to image differences elsewhere (e.g. jaw, neck, eyes) or less likely due to non-linear differences between the two scans. The gradient non-linearities will badly influence all rigid registrations similarly, while possible non-brain outlier regions will influence the employed methods differently. Therefore we will evaluate the performance of full head registration only within the brain mask.2
We first quantify the registration error and compute the sum of squared errors (SSE) of the intensity values in scan I1 and aligned/resampled scan :
where the sum is taken over all brain voxels . The brain masks to specify brain regions were created automatically for each subject with the FreeSufer software package and visually inspected to ensure accuracy.
The SSE measure quantifies the intensity differences of the two images after alignment within the brain. For a perfect registration these differences should be small as they only measure noise, non-linearities and global intensity scaling (all of these should be small as the two images are from the same scan session). Figure 15 (top) shows the signed difference of SSE with respect to the result of the method [Robust-I]. The robust methods perform best on average, while [FLIRT-LS], [FLIRT-CR] and [LS] yield a better results (lower SSE) only in one single instance (white circles with negative value). To test significance of these results, we applied a Wilcoxon signed rank test (Wilcoxon, 1945) for each algorithm with respect to [Robust-I] to test if the median of the pairwise differences is equal to zero (null hypothesis). This is similar to the t-test on the pairwise differences, without the assumption of normally distributed data. We found that all non-robust methods show significant differences from [Robust-I] at a p < 0.001 while the null hypothesis cannot be rejected within the robust methods, as expected, since their performance is basically the same.
In order to test if differences can be detected in the resulting average images, we count the number of edges. Correctly aligned images should minimize the number of edges since all edges will be aligned, while misalignment increases the edge count. The edges were detected by scanning the x component of the gradient (using the Sobel filter) in the x directions and counting local maxima above a threshold of 5. Figure 15 (bottom) shows that the misalignment increases edge count on average when compared to [Robust-I]. However, due to the large variance the FLIRT results are not significant. [SPM] is significantly different at level p = 0.058 and [LS] at the p < 0.001 significance level in the Wilcoxon signed rank test.
In this work a robust registration method based on Nestares and Heeger (2000) is presented, with additional properties such as initial coarse alignment, inverse consistency, sensitivity parameter estimation and global intensity scaling. Automatic intensity scaling is necessary for the method to function when global intensity differences exist. Similarly the automatic estimation of the saturation parameter avoids misalignment in specific image situations where a fixed value potentially ignores too many voxels.
The presented method outperforms commonly used state-ofthe-art registration tools in several tests, and produces results that are optimally suited for motion correction or longitudinal studies, where images are taken at different points in time. Local differences in these images can be very large due to movement or true anatomical change. These differences will influence the registration result, if a statistically non-robust approach is employed. In contrast, the robust approach presented here maintains high accuracy and robustness in the presence of noise, outlier regions and intensity differences.
The symmetric registration model together with the `half-way' space resampling ensure inverse consistency. If an unbiased average of two images is needed, it is easily possible to resample both, target and source, at the `half-way' location and perform the averaging in this coordinate system. Furthermore, these registrations can be employed to initialize nonlinear warps without introducing a bias. Robust registration has been successfully applied in several registration tasks in our lab, including longitudinal processing and motion correction. The software is freely available within the FreeSurfer package as the mri_robust_register tool.
Future research will extend the presented registration to more than two images and incorporate these algorithms into a longitudinal processing stream, where more than two time points may be involved. In those settings instead of simply registering all images to the first time point, it is of interest to create an unbiased template image and simultaneously align all input images in order to transfer information at a specific spatial location across time. Similar to the idea in (Avants and Gee, 2004) it is possible to estimate the unbiased (intrinsic) mean image and the corresponding transforms iteratively based on the pairwise registration algorithm described in this paper.
Support for this research was provided in part by the National Center for Research Resources (P41 RR14075, and the NCRR BIRN Morphometric Project BIRN002, U24 RR021382), the National Institute for Biomedical Imaging and Bioengineering (R01 EB006758), the National Institute on Aging (R01 AG022381,U54 AG024904), the National Institute for Neurological Disorders and Stroke (R01 NS052585, R01 NS042861, P01 NS058793). Additional support was provided by The Autism & Dyslexia Project funded by the Ellison Medical Foundation. The authors would like to thank Dr. Greg Sorensen for kindly supplying the tumor sample data.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
1The constant is a necessary bias correction. The MAD alone estimates the 50% interval ω around the median rm of the distribution of r: P(|r − rm| ≤ ω) = 0.5. Under normality ω = 0.6745 σ σ = 1.4826 ω.
2In some applications it might be better to compute registrations on skull stripped brains directly. However automatic skull stripping is a complex procedure, and frequently needs the user to verify all slices manually. Furthermore, in some situations it makes sense to keep the skull, for example, when registering to a Talairach space with skull to estimate intracranial content, which depends on head size rather than brain size. Finally even skull stripped images can contain significant differences, for example in longitudinal data or simply because different bits of non-brain are included, so that the robust registration is still the best choice.