We apply the pipeline detailed above to three intrasubject and intersubject registration problems. We start with a challenging intrasubject registration experiment, after which inter-subject registration is evaluated using two independent data sets. The first cross-subject evaluation is based on the work of Fischl

*et al.* [

42]. It has the advantage of containing detailed manual segmentation labels for both cortical as well as subcortical regions. The second data set was obtained from the publicly available IBSR database of the Center for Morphometric Analysis at the Massachusetts General Hospital (

http://www.cma.mgh.harvard.edu/ibsr/). It contains manually labeled brain acquisitions where the available labels are mostly subcortical. In both sets of intersubject registration experiments, we use the segmentation labels to quantitatively characterize the performance of our algorithm.

In order to demonstrate the performance of our registration framework, we compare it to two standard algorithms from the literature: an affine registration algorithm, implemented in FLIRT [

43] and the publicly available version of HAMMER [

16]

^{1}. In all of our experiments, we first arbitrarily select one of the input brains as a target and then register the rest of the data set to it. To quantify the alignment accuracy of each method, we compute the Jaccard Coefficient metric [

45], which is often referred to as Tanimoto Coefficient in the literature and is closely related to the Dice overlap measure. It describes the similarity or overlap between the labels of registered images (

*V* and

*T*) as

Additionally, in order to assess overall accuracy, we also compute an extended version of this overlap measure. If

*S* is a set of labels for which we want to compute the overlap, given two volumes

*V* and

*T*, this measure is defined as

where [

*V* *S*] is by definition an abbreviated way of writing {

*i* *I* :

*V*(

*i*)

*S*}. In

(24) it is implicitly assumed, by a slight abuse of notation, that

*V*(

*i*) represents the morphed subject labels in the target frame. (All the volumes are sampled in the same frame of reference and

*I* is the set of coordinate voxels). This generalized version of the overlap measure is similar to the generalized pair-wise multilabel Tanimoto Coefficient defined by Crum

*et al.* for fuzzy labels [

46].

A. Deformation Fields and Convergence

Before describing the specifics of our three sets of different registration experiments, we show an example deformation field and a graph of the residual distance (in the elastic phase) from one of our experiments. These figures provide a visual intuition of the deformation characteristics of CVS and they can be taken as a good representation of the remainder of our results.

We start by showing an example deformation field obtained by forward morphing a surface in the midcoronal plane section of the target space in . As expected, the surface-driven elastic displacement field smoothly diffuses from the cortical surfaces, and it is reasonably smooth in the ventricular areas. On the contrary, the intensity-driven morph mostly deforms the ventricular areas, with almost no contribution in the cortical areas. For all of our warps we constrained the Jacobian at the mesh node locations to be positive definite, therefore prohibiting self-intersection. As the alignment of folding patterns in the spherical registration step requires a weakly regularized warp, it is somewhat sensitive to noise and thus the 2-D slices displayed in has regions with apparently degenerate morph components. We emphasize, however, that the warp is constrained to be diffeomorphic and these regions are not discontinuities.

In , we include five plots of average Euclidean distances between the current position of the warped subject and the target positions (based on information provided by the surface registration) computed after each iteration of the linear solver corresponding to five random subjects. These traces empirically show that our numerical method converges. The number of iterations of the linear solver used in these cases is *N* = 18 (the number of total steps, as presented in Section III-E).

And finally we display a plot of Euclidean distances computed on the left and right hemispheres between the target and subject surfaces after the spherical registration, elastic morph and the full CVS registration pipeline in . The distances computed for 35 subjects confirm that the elastic warp manages to keep corresponding surfaces at a sufficiently close distance (≈1 mm on average). Even though the intensity-based volu-metric step worsens somewhat that accuracy (< 2 mm on average), as we show in the below discussion, the label overlap accuracy still remains almost unaffected. That observation is important, as in our registration framework we do not fix the solution resulting from the surface-based registration step, so it could potentially be possible for the subsequent operations to decrease the already achieved alignment accuracy in the cortex. As our results show, however, that does not happen.

B. Intrasubject Registration of Ex Vivo and In Vivo Scans

In the case of our intrasubject registration experiments, we use the first two stages of the CVS framework to align an *ex vivo* scan with the same hemisphere scanned *in vivo*. The imaging protocol for the *ex vivo* tissue is different from that of the *in vivo*. The former images are acquired using a multiecho flash protocol due to the reduced T1 contrast observed postmortem. This makes the preprocessing step required to obtain the surfaces for the *ex vivo* images more challenging, however, it does not affect the surfaces being used in the registration process.

We emphasize that registering these two modalities is a challenging problem given the differences between the intensities exhibited by the *ex vivo* and *in vivo* scans, together with the large scale deformations induced in the *ex vivo* hemisphere during the autopsy process, as well as the fact that we are registering a single hemisphere image to one with an entire head. We are currently implementing a cross-contrast intensity procedure that could complement the current intensity-based morph component of our CVS framework. However, as the results show, the elastic volume morph initialized by the surface-based registration already produces highly promising results.

In we present the results of the surface-driven elastic morph applied to the *ex vivo* image so that it matches the *in vivo* one. The resulting correspondence is almost perfect, since the underlying anatomy is the same and the deformation is truly a biomechanical one. However, we remind the reader that the correspondence does contain some inaccuracies near the lateral ventricles. This is because none of the cortical surfaces used to initialize the deformation field are near this region.

C. Intersubject Registration Experiment 1

In our first set of inter-subject registration experiments, we used a data set of 36 brains from the work of Fischl

*et al.* [

42]. All the input data were individually morphed to a randomly chosen individual considered to be the target. In order to evaluate the quality of the registration, we sampled the set of corresponding cortical segmentation labels into the volume and merged them with the subcortical ones. We used 20 cortical

^{2} and 21 subcortical labels. The numbers shown in present the mean extended overlap measures (with standard error as vertical black lines) for the labels divided into two sets: cortical and subcortical labels. Here, besides the three registration algorithms to be compared (FLIRT, HAMMER, and CVS), we also include results for the elastic registration without using intensity-based volumetric alignment. In the case of both cortical and subcortical labels, it is the affine method that performs most poorly, followed by HAMMER. CVS clearly outperforms the simply elastic method in the case of subcortical labels and retains the same high level of accuracy for the cortical labels. We emphasize that outperforming HAMMER by such a margin with both our elastic and our combined registration frameworks, especially in the case of the subcortical segmentation labels, has great significance. That is because HAMMER, a volumetric registration tool, should intuitively align subcortical structure more accurately than a method relying on a surface-based component. We note that although the relationship between the folds and subcortical structures is poorly understood, given the massive connectivity with cortex, their position is unlikely to be independent of cortex [

47]. Therefore, we believe that the surface-based registration establishes a robust initialization for the elastic (and intensity-based registration) steps and thus helps avoiding local optima. We leave the study of the relationship between the folds and subcortical structures to future work.

The poor accuracy of the affine alignment of FLIRT (which, we believe, would be true of any 12 parameter transform, in general) is indicative of the difficulty encountered by nonlinear volumetric procedures initialized with affine transforms. Essentially, the affine initialization is poor over much of the cortex so the nonlinear energy functionals are not in the basin of attraction containing the true minima, and hence do not converge to the correct solution. Initializing HAMMER in a more robust way, we believe, could remove most of its performance differences with our methods.

In , we include the mean and standard error results (indicated in parenthesis) of the Jaccard Coefficient overlap metric computed for all the selected labels individually over all subjects. The labels are superiorfrontal, inferiorparietal, rostralmiddlefrontal, precentral, middletemporal, lateraloccipital, superiorparietal, superiortemporal, postcentral, inferiortemporal, fusiform, and precuneus. The three columns of the table correspond to alignment achieved with FLIRT (Affine), HAMMER, and CVS. Statistically significant (*p* > 0.05) differences between HAMMER and CVS are indicated by bold typesetting. Similarly, the mean and standard error overlap measures for the 21 subcortical labels are included in . We point out that in the case of cortical labels, all the selected labels perform statistically significantly better than HAMMER and in the case of subcortical labels, out of 12 cases where statistically significant differences are established between HAMMER and CVS, the latter is superior on nine occasions. Identical information, the Jaccard overlap metric computed for individual subcortical and cortical manual segmentation labels is displayed in the bar plots of .

| **TABLE I**Mean Jaccard Coefficients and Standard Error Computed for 20 Cortical Labels Across Subjects in Intersubject Registration Experiment 1. We Include the Results of FLIRT (Affine), HAMMER, and CVS. Manual Segmentation Labels for the Left (L) and Right (R) (more ...) |

| **TABLE II**Mean Jaccard Coefficient and Standard Error Computed for 21 Subcortical Labels Across Subjects in Intersubject Registration Experiment 1. We Include the Results of FLIRT (Affine), HAMMER, and CVS. Manual Segmentation Labels Are: Brain Stem (BrStem), Left (more ...) |

We show registration results for these intersubject registration experiments in two figures. displays deformed intensity images of a subject and the target and demonstrates the corresponding deformed manual segmentation volumes (same slice as before). The first column, on both of the figures, displays the target; the second represents the subject after affine registration (using FLIRT); the third displays the subject after the surface-driven elastic registration; and the fourth column shows the results of our combined surface-based and intensity-based registration. In all the images, the surfaces shown are those of the target (pial surfaces in red and gray/white surfaces in yellow). The intensity-based morph mostly affects areas away from the cortical ribbon, such as the ventricles.

D. Intersubject Registration Experiment 2

In our second set of intersubject registration experiments, we used MRI acquisitions from eleven subjects of the IBSR data set. This data set was collected by the Center for Morphometric Analysis, MGH and is publicly available (

http://www.cma.mgh.harvard.edu/ibsr/). It includes manual segmentation results for 19 principal gray and white matter structures: brain stem, left and right cerebral white matter, cerebral cortex, lateral ventricle, thalamus proper, caudate, putamen, pallidum, hippocampus, and amygdala. We note that this data set is not ideal to convey the improved performance of our algorithm both in the cortical and subcortical domains, given that most of the available labels identify subcortical areas. Nonetheless, as a recently published registration method similar to our work [

19] relied on these acquisitions for validation, we decided to include it as well for easier comparison.

The overlap measures in are computed for each of the available segmentation labels. Again, we evaluated the mean Jaccard Coefficient for the three different alignment algorithms: affine alignment with FLIRT (Affine), HAMMER, and CVS. The standard error (SE) of the measurements is indicated in parenthesis. The same information in bar plot format is presented in . The plot on the top displays the extended Jaccard overlap metric, while the plots on the bottom display the overlap measures computed for each label individually in a bar- and a scatter-plot format, respectively. The latter aids the reader to more easily asses significance of individual measures.

| **TABLE III**Mean Jaccard Coefficient and Standard Error Computed Across Subjects for Experiment 2. We Include the Results of FLIRT (Affine), HAMMER, and CVS. Manual Segmentation Labels Are: Brain Stem (BrStem), Left and Right (L/R) Cerebral White Matter (CrWM), Cerebral (more ...) |

As in the previous experiments, for most of the labels, our method outperforms the affine and HAMMER algorithm and also has a reduced standard error, indicating the stability of the registration. In six cases, the difference between the performance of HAMMER and CVS are also statistically significant (as indicated by the bold typesetting). In the case of four of those CVS outperforms HAMMER. Again, as the majority of the IBSR labels are subcortical, we cannot demonstrate on this data set the accuracy of the cortical alignment.

We show registration results for these intersubject registration experiments as well. displays the segmentation volume of the target and the deformed segmentation volumes of a subject using the three different registration algorithms. The first column displays the target segmentation and the second, third, and fourth columns represent the deformed segmentation volumes resulting from FLIRT (affine), HAMMER, and our new CVS registration. In all the images, the surfaces shown are those of the target (pial surfaces in red and gray/white surfaces in yellow).