In this paper, we examined the robustness of different MRI scan types for mapping brain changes using tensor-based morphometry. We found that SPGR acquired using the birdcage design with N3 correction was the most stable sequence with least deviation. While in theory a phased array design increases the signal to noise ratio relative to a birdcage design, the latter yielded a lower deviation in our comparison test. This is probably because regularizers are always applied to deformation fields in TBM analyses. Thus, the noise level of the images, so long as it is within a certain acceptable range, plays a less crucial role in determining the performance of a particular scan type. On the other hand, phased array receive coils are more prone to image intensity inhomogeneities. In theory, the B1 correction technique should largely remove this, but our statistical tests did not support the assumption that a B1-corrected phased array design clearly outperforms a birdcage design (notice that, in this paper, no statistical testing was directly performed on the effect of B1 correction as the B1 correction was built-in for all phased array images). In fact, without N3 correction, B1-corrected phased array images had greater deviation than those acquired using a birdcage transmit/receive coil. This difference became non-significant after applying N3. It is therefore safe to assume that B1 correction does not entirely remove the RF inhomogeneity as N3 correction further improves phased array images. N3 also removes most of the differences between coil types in terms of RF homogeneity, supported by the disappearance of performance differences after the application of N3. For TBM, the most relevant difference between coil types is RF homogeneity (rather than signal to noise ratio), and N3 is an effective correction in removing this artifact.

Synthetic T1 imaging gave inherently different results relative to other non-calculated pulse sequences. Synthetic T1 images showed good repeatability even without N3, but not much improvement was seen after applying N3. Moreover, repeatability was visually more comparable between Synthetic T1 images acquired using different coil types than other sequence types. The high repeatability of Synthetic T1 does not translate into a better performance, which is harder to interpret. One possible explanation is that there is a difference in the baseline mean log *J* field for this scan type compared to other non-calculated scan types. We also studied the effect of B0 correction for spatial distortion, and the results could not confirm a statistical improvement, although a larger sample of scans might be required to detect a subtle effect.

Although we concluded that high SNR matters less than good intensity homogeneity for tensor-based morphometry, the lower SNR for the MP-RAGE scans may have handicapped that sequence relative to SPGR in these analyses. We ultimately changed the acquisition parameters to boost SNR on the 1.5 T birdcage coils to correct this, and the data used for the analyses here may have under-represented the optimal performance of MP-RAGE relative to SPGR in terms of SNR. Because the poor SNR in the birdcage scans was easily correctable, it was not considered a fundamental feature/flaw of the MP-RAGE sequence.

This paper only reports the TBM analysis of longitudinal data acquired with the purpose of determining the optimal MRI pulse sequence for the Alzheimer’s Disease Neuroimaging Initiative. In addition to the longitudinal data, cross-sectional studies comparing controls to subjects with Alzheimer’s Disease were acquired. Furthermore, all the MRI data were also analyzed by the following methods that rely on, and exploit, different aspects of image quality: atlas-based measurements of hippocampal volume (

Haller et al., 1997;

Hsu et al., 2002), the boundary shift integral (

Fox and Freeborough, 1997;

Fox et al., 2000), voxel-based morphometry using Statistical Parametric Mapping (VBM;

Ashburner and Friston, 2000), cortical thickness measures (

Fischl and Dale, 2000) and tensor-based morphometry (TBM;

Studholme et al., 2001;

Leow et al., 2005a,

b). The results of these studies, and the data and rationale for selecting MRI pulse sequences for the Alzheimer’s Disease Neuroimaging Initiative, will be reported elsewhere. Fundamentally, one of the inevitable and predicted problems of evaluating reproducibility of MRI scanning at short intervals is that we were able to assess the sequences in terms of insensitivity to noise (in that controls scanned 2 weeks apart should show little change) but we were unable to assess the sequences’ sensitivity to real change (e.g., the ability to pick up change in Alzheimer’s disease over a year). The final decision on specific imaging protocols therefore involved both quantitative and qualitative assessments of sequence performances and was not therefore based solely on evaluations of change over short intervals as detected using TBM.

In this study, after some discussion, we did not randomize the order of the pulse sequences because this would have been difficult logistically. This leaves open the possibility that some systematic bias might have been introduced (e.g., greater motion in later sequences). In mitigation, the subjects were normal controls, and so the problems of excessive motion or loss of concentration are not as marked as in AD patients. The imaging protocol was reviewed and approved by a panel of experienced MR professionals and was designed to reduce human errors related to data acquisition in the preparatory phase. It was decided that randomizing the scan order would increase the complexity of prescribing the acquisition parameters and increase the likelihood of technologist errors in collecting the data.

It is also not known which biological sources of variation contributed most to the residual deformations observed in this serial MRI data. Regardless of the imaging protocol, the ultimate geometrical stability of the brain scanned over time is limited by biological changes in brain tissue hydration (which also may impact T1 or T2 contrast), mechanical effects such as ventricular deformation (which may be minimized by consistent placement of the subject in the scanner), as well as other short-term physiological changes.

Oatridge et al. (2001) have reported a change in CSF volume later in the menstrual cycle in women, and other studies have noted minor brain volume variations during normal pregnancy or with jet-lag or alcohol intake (

Oatridge et al., 1998,

1999). Better understanding of these relatively short-term reversible biological effects may lead to improved statistical methods to model and adjust for them in studies of serial anatomical change.

In this paper, we applied logarithmic transformation to all Jacobian maps before conducting statistical analysis. Log transformation of a Jacobian determinant field has become standard practice in most TBM papers. The Jacobian determinant of a diffeomorphic (smooth) map is bounded below by zero but unbounded above, so, at any voxel, its null distribution would be a better fit to a symmetric normal distribution if the Jacobians are logged.

Another observation supporting the use of logarithmic transformation comes from registering two images where no difference other than noise is present. We expect the chosen (unbiased) statistic to pick up no statistically significant change between these two images. In a classical statistical setting, one would hope that statistic used to estimate change might follow a Gaussian distribution with zero mean. This again suggests that some symmetrizing should be applied to Jacobian maps, leading to the use of logarithmic transform. Unfortunately, the resulting distribution does not have zero mean: as we showed in

Leow et al. (2005a,

b), log-transformed Jacobian maps always lead to biased estimates (i.e., have negative means under null conditions), and this problem occurs even if log Jacobian maps are analyzed at the voxel-by-voxel level.

We also showed, using Kullback–Liebler distances on material density functions, that the integral of the logged Jacobian map of *any* volume-preserving transformation (not just inverse-consistent mappings) with respect to an image domain is always negative inside this domain. Moreover, inverse-consistent mappings constructed by symmetrizing regularizers (in the form of differential operators) integrate to a less negative number than their inconsistent counterparts applied to the same data.

This negative mean bias is not introduced by integrating the logged Jacobian field over a region as the same bias even occurs when considering log-transformed Jacobian maps at the voxelwise level (averaging across subjects at a single voxel). By utilizing multiple copies of the artificial Jacobian map presented in

Leow et al. (2005a,

b) (squeezing half of the domain of interest to an arbitrarily small size while preserving the overall volume/size of the domain), we can now easily construct a collection of Jacobian maps defined in a region stable in volume/size, whose (voxelwise) mean log Jacobian approaches minus infinity at every voxel. This would incorrectly reject the null hypothesis that the overall volume/size of this domain is unchanged and shows that log Jacobian maps are biased in the sense that they are not zero mean across subjects at each voxel.

One way of alleviating this bias is to use inverse-consistent mappings and to integrate the log Jacobian over a spatial domain, as is done in this paper. We acknowledge that this integral produces a summary quantity whose geometric meaning is harder to grasp, and one could argue that it is preferable to integrate the volume change over a region first before performing the log operation (which would yield the log of deformed volume). Here, we preferred to integrate the log Jacobian, as we did not wish to apply one approach voxelwise, and a different approach when considering statistics across a region. As such, the integral of the logged Jacobian is a regional, if somewhat abstract, summary of the fluctuations in the log Jacobian measure over a region. A somewhat related approach is taken in

Pennec et al. (2005), where the deformation tensor of a mapping is logged and integrated over the image domain, and this integral is used as a penalty function (cost functional) to regularize the deformation.

We would also like to comment on our choice of cost function (mutual information) for nonlinear registration. As would be expected for any registration algorithm driven by mutual information, there is a gain in the mutual information of the registered images, for each sequence type, after nonlinear registration. The algorithm works by gradient descent on the deformation parameters to make the mutual information increase from its initial value, subject to other constraints on the smoothness and symmetry of the deformation. Note that the mutual information is not necessarily monotonically increasing with better registration as the registration quality is quantified by the sum of two terms, the mutual information and the energy of the applied deformation field, which describes how much image distortion is required to attain the measured level of signal correspondence. Depending on the application, both geometric and intensity similarity may be important factors in considering image reproducibility, so it is possible that there will be no clear best sequence: some imaging sequences may provide better geometric similarity over time and others may tend to be more similar in terms of intensity. Further work is needed to help assess intensity similarity. For example, an information-theoretic metric might be used to estimate the degree to which the image at time 1 predicts the image at time 2 or how many bits of information are needed to represent the residual information. The final value for the mutual information of two registered images may be difficult to compare objectively across imaging sequences as it depends on the deformation energy allowed in the registration process.

Finally, even though the TBM analysis of longitudinal data from normal subjects in this study indicated that SPGR gave the most robust results (when N3 correction was not used), this should not be interpreted to mean that SPGR is the ‘‘best sequence’’ for longitudinal studies. Selection of MRI pulse sequences is extremely dependent on the needs of the study including the specific hypotheses, patients to be studied, equipment available, analysis techniques and other factors. Therefore, the major message of this report is that TBM is a useful quantitative tool when comparing different methods for studying longitudinal change of the brain. Our results provide statistical information on the baseline repeatability, reproducibility and variability of changes detected in different scan types at an interval short enough to be insensitive to any disease- or age-related structural brain change. The order of performance of different scan types was determined, providing researchers with relevant baseline information when deciding on a particular sequence/scanner or correction type. Our results will be used as a reference for our future serial scan studies of disease in individuals and groups, reducing the possibility of detecting false positive signals.