The most important finding of this paper is that the bias in DBM-based longitudinal analysis of hippocampal atrophy can largely be attributed to the
asymmetry in the application of global transformations. This finding is important because it implies that the step of bias elimination can be introduced into researchers’ data processing pipelines in a fairly transparent manner, without requiring changes to the underlying complex image registration software. In particular, it suggests that specialized metrics that account for bias (
Leow et al., 2007) may not be required in the context of atrophy estimation in the hippocampus.
Why does asymmetry in global transformation affect the bias in SyN experiments when other factors (asymmetry in deformable transformation, number of interpolations, the registration method) seem to have so little effect on it? One plausible explanation is that the deformable transformation between the baseline image and the followup image is largely determined by the initial gradient of the image match metric. In greedy diffeomorphic registration, the overall deformation is computed by repeatedly taking this gradient, smoothing it and composing the resulting smooth elastic deformations over multiple iterations. However, since the deformation between the baseline image and followup image is small to begin with, the initial gradient may account for much of the total deformation. Now, if the global transformation is applied asymmetrically, at the time the initial gradient is computed, one of the images has undergone a resampling/interpolation operation (which smooths the image) and the other has not. Thus, much of the initial gradient may be driven by differences in sampling and interpolation, rather than anatomical differences. When the global transformation is symmetric, the same kind of resampling/interpolation is applied to both images. So the initial gradient of the metric reflects anatomical differences, as well as noise. Whether the deformable registration is symmetric or not does not matter, because it is primarily driven by the initial gradient.
The idea of splitting the global transformation via the matrix square root operation is not new. It falls within the unbiased atlas framework proposed by
Guimond et al. (2000);
Davis et al. (2004);
Joshi et al. (2004) and adopted by many studies. This framework finds the Frechét mean of the input anatomies in the space of image transformations. The Frechét mean of the baseline image and the followup image, within the space of global transformations, is precisely the matrix square root of the global transformation estimated between these two images by global registration. Of course, the unbiased atlas formulation also applies the Frechét mean to the diffeomorphic transformations. However, based on our findings, this step may not be required, at least in the context of hippocampal atrophy.
The power of the MCI vs. control comparison did not substantially change under different DBM configurations. This suggests that the effect of longitudinal bias may be altogether negligible when reporting group differences in atrophy. In the context of designing clinical trials, this suggests that sample size should be calculated relative to the control atrophy rate. In other words, when we ask, “how many subjects are needed in each cohort to detect an x% reduction in atrophy in the treatment group with given statistical power and given alpha level,” the term “reduction” should refer to the relative change from the MCI rate of atrophy to the control rate of atrophy, rather than absolute reduction in the MCI rate of atrophy. However, when absolute atrophy rate is used for power calculations, severe underpowering can occur.
4.1 Relationship to Prior Work
Bias in longitudinal image registration has been the subject of several papers in the recent years.
Leow et al. (2007) introduced an unbiased DBM approach based on an additional regularization term that penalizes the logarithm of the Jacobian determinant in the non-rigid transformation.
Yanovsky et al. (2009) further refined this method by introducing a symmetric unbiased DBM technique. The authors evaluated the technique in data from 10 ADNI AD subjects and 10 controls. As in the present study,
Yanovsky et al. (2009) use scans acquired at short intervals to assess DBM-related bias in absence of real atrophy. They find that the symmetric unbiased and asymmetric unbiased DBM substantially reduce bias vis-a-vis methods that do not control for bias. However, the unbiased approaches from these authors do not examine the effects of asymmetry in global registration on bias.
Hua et al. (2009) compared atrophy estimation in a large ADNI cohort using different configurations of the Leow et al. unbiased registration framework, including 6-parameter and 9-parameter global registration. However, the effect of symmetry in global transformation was not considered. As such, our paper arrives at a different set of conclusions regarding bias. Our results suggest that symmetry in the application of global transformation is sufficient to eliminate significant bias. By contrast, the papers discussed above suggest that bias reduction should be enveloped into the regularization prior of deformable registration. It is important to note that our results are constrained to a small anatomical region (the hippocampus) and may not extrapolate to other brain regions.
Camara et al. (2008) used a synthetic dataset with known gold standard atrophy to compare the accuracy of atrophy estimation by two global atrophy estimation techniques (
Freeborough and Fox, 1997;
Smith et al., 2002) and two DBM techniques. The two DBM techniques were the FFD method (
Rueckert et al., 1999) and a fluid-based image registration method (
Crum et al., 2005). The authors found statistically significant differences in atrophy rates reported by DBM techniques and the gold standard in presence of simulated deformations consistent with AD pathology (DBM techniques underestimated atrophy), but did not find significant differences when simulated atrophy was consistent with healthy aging. The paper did not discuss the specifics of how global transformations were applied to the data, nor the amount of smoothing applied to the images. Nevertheless, it is curious that the bias detected on simulated data was in the opposite direction of the results presented in this paper.
One of the explanations for this difference lies in the way that the volume change induced on the hippocampus by a given deformation is calculated. We use a mesh-based calculation, where the deformation field is applied to each vertex of a volumetric tetrahedral mesh and the change in mesh volume is calculated exactly.
Camara et al. (2008) and many other authors integrate the determinant of the Jacobian matrix of the deformation over the region of interest. When used in the context of non-parametric registration (e.g., SyN), the latter calculation uses deformation field values from voxels adjacent to the region of interest, since to calculate the Jacobian discretely, a finite difference approximation is used. Many of the voxels adjacent to the hippocampus are in the cerebrospinal fluid, which expands when the hippocampus shrinks. Thus mixing deformation field values across hippocampus boundaries can reduce atrophy estimates, and cause underestimation of atrophy.
Other authors have argued against direct application of DBM for longitudinal atrophy estimation.
Davatzikos et al. (2001) proposed
RAVENS maps, which avoid Jacobian computations, and instead preserve tissue density under deformable transformations.
Studholme et al. (2003) argued that the Jacobian map should be spatially filtered using a measure of normalization uncertainty derived from the normalization procedure.
Rohlfing (2006) examined the Jacobian fields yielded by different DBM approaches and found them to be strikingly different despite similar region-wise normalization accuracy performance. Despite these widely cited limitations, DBM remains widely used for longitudinal atrophy analysis.
4.2 Utility for Clinical Studies
The DBM-based atrophy estimation approach, both in absence and presence of bias, finds statistically significant differences between 1-year hippocampal atrophy in MCI patients and atrophy in controls. Particularly, the statistical power of DBM-based analysis is substantially greater than in the analysis of ADNI data that uses independent semi-automatic segmentation of the hippocampus in multiple timepoints (
Schuff et al., 2009). Based on 1.5 Tesla MRI data from 127 controls and 226 MCI patients,
Schuff et al. (2009) report annual percent change of −0.8 ± 5.6 in controls and −2.6 ± 4.5 in MCI patients.
3 In our analysis of 3 Tesla MRI, we report annual percent change of −0.7±1.1 in controls and −2.0±1.9 in MCI patients (these are the results for the symmetric HW/HW comparison in ). Our results detect a change in MCI that is less in magnitude than in (
Schuff et al., 2009), although the 95% confidence intervals for our study (1.6 – 2.5) and Schuff et al. study (2.0 – 3.2) overlap. On the other hand, the variance in the DBM-based approach is significantly reduced. In terms of sample size calculation, our calculation (see Sec. 3.1) yields
N = 1570 for the
Schuff et al. (2009) study
4 and
N = 508 for DBM-based estimation. It is unlikely that these findings are due to differences in MRI modality, as it was recently reported that field strength in ADNI does not significantly affect atrophy estimates (
Ho et al., 2009). This indicates that DBM-based atrophy estimation is more sensitive than comparison of hippocampal volumes extracted using semi-automatic segmentation.
4.3 Limitations
One of the limitations of the current study is that it only assesses additive bias in atrophy estimation. There are other types of bias that our methods are not capable of detecting. For example, certain DBM configurations may introduce multiplicative bias that can not be detected by the two experiments used in this study. In the direct bias estimation experiment, true atrophy is zero, so multiplicative effect can not be seen. In the intercept-based experiment, multiplicative bias can not be detected if the factor by which true atrophy is multiplied is the same at 6 months and 12 months. Multiplicative bias may explain why the average MCI atrophy detected by the symmetric DBM configuration is lower than the atrophy reported by
Schuff et al. (2009).
Intercept-based atrophy estimation makes an underlying assumption that atrophy is linear over time. This assumption is not uncommon in the evaluation of atrophy estimation techniques (
Fox and Freeborough, 1997). The fact that in the unbiased configuration on DBM we observe intercept values not significantly different from zero substantiates this assumption. Additional experiments on ADNI data from all available time points would allow this assumption to be evaluated more extensively.
In the SyN experiments, the results of direct bias estimation and intercept-based bias estimation experiments are overall very consistent. But in the FFD experiment (), there was some inconsistency between these two ways of estimating bias. Direct estimation finds significant bias in the BL/FU and HW/FU configurations whereas intercept-based estimation finds significant bias in BL/FU but not in HW/FU. However, we do not expect bias to be zero in either of these experiments because the deformable registration (FFD) is not fully symmetric. Both configurations are less asymmetric than FU/FU, in which substantial bias is detected using both measures. So overall, the FFD results fit the pattern of SyN results. Nevertheless, a more extensive evaluation of bias in parametric registration methods is warranted.
Our analysis does not take into consideration the heterogeneity of the clinical groups, particularly the MCI subjects. The only accurate way of determining AD pathology is through autopsy, and many of the MCI patients likely do not have AD pathology. CSF biomarkers are available for a subset of ADNI subjects and may have been used to identify MCI subjects with an AD-like chemical biomarker profile. Reducing heterogeneity in the cohorts would probably reduce the variance in atrophy in each cohort as well as the sample size for the MCI-control comparisons. However, there would not be an obvious effect on the bias of DBM methodology. Hence, we felt that for the purpose of evaluating bias in DBM methodology, such partitioning of the subjects was not necessary.
The experiments in this paper can not detect spatial biases in atrophy estimation. It is entirely possible that atrophy detected in the hippocampus is partially attributable to atrophy in other surrounding structures. DBM, by design, can not estimate change in the volume of a particular small region independently of surrounding image regions. Deformation fields in DBM are smoothed, which causes propagation of information across voxels. Our study can not detect and measure this type of bias.