Water molecules in biology tissue tend to diffuse faster along, relative to
across, obstacle structures. Diffusion tensor imaging (DTI) noninvasively measures
this anisotropy (approximated with second-order tensors), providing information
about structures such as neural fibers. As more and more large group studies, such
as those sponsored by the Human Connnectome Project [

1], require nonlinear normalization of diffusion tensor images, it is
important to design deformation registration methods that are not only theoretically
rigorous, but also computationally efficient, for processing large data sets.

Registration of diffusion tensor images is more complicated than scalar
images, because displacing a voxel to a new place does not only change its own
diffusion tensor value, but also reorients those of its adjacent voxels [

2]. As a result, the deformation force is no
longer independent for adjacent voxels, but has a sparse 3D grid structure.

Because reorientation significantly complicates the computation, methods
based on rotation-invariant features have been developed. Many of these methods
first extract scalar-valued or vector-valued features from tensors, for instance,
the fractional anisotropy (FA), and then register diffusion images by aligning these
features with multi-channel registration [

3,

4].

Some other methods directly work on tensors, with similarity metric defined
on tensors, and indirect or direct involvement of reorientation into velocity field
optimization. Alexander and Gee [

5]
reorientated tensors according to the new displacement field after each iteration,
though not directly accounted the reorientation into optimization. Zhang

*et
al.* [

6] applied piece-wise affine
registration to image subblocks, and then fused these transformations together by
smoothing. Since finite-strain (FS) reorientation [

2] can be analytically incorporated into affine transformation, the
method efficiently estimates the optimal local affine parameters, but it is not
clear how the fusion step affects the total registration energy. Cao

*et
al.* [

7] extended the Large
Deformation Diffeomorphic Metric Mapping framework, to analytically embed the
preservation-of-principal-direction (PPD) reorientation [

2] into the deforming force. In 2009, Yeo

*et
al.*
8] embedded exact FS reorientation into the
diffeomorphic Demons algorithm [

9,

10], showing that exact reorientation
considerably improved registration accuracy.

Yeo

*et al.*’s work [

8] was based on the Demons algorithm, a fast algorithm with

*O*(

*n*) complexity for scalar images, where

*n* is the number of voxels. However, because a large sparse
linear system was solved at every iteration, the algorithm became considerably
slower. This effect becomes more exacerbated for such an algorithm whose
scalar-image version enjoys linear complexity. For example, for our DTI images of
size 128×128×128, the method took about 7 hours and more than 20
Giga Bytes (GB) memory on a desktop with an Intel Xeon 2.80 GHz CPU. On the other
hand, the regularization on the displacement field was separated as a Gaussian
smoothing step after updating the displacement field. This may not be always
consistent with the diffeomorphic framework [

10].

This raises the question “can the Demons algorithm still enjoy its

*O*(

*n*) complexity when the deformation force is
coupled between adjacent voxels and regularization is incorporated?” Though
this question is raised from diffusion tensor images, it is generally applicable to
other diffusion images, such as high-angular-resolution diffusion images (HARDI) and
diffusion spectrum images (DSI). We are particularly interested in the Demons
algorithm, because its original linear complexity in [

9,

10] is well suited for
processing large group data sets.

In this paper, we extend the Demons algorithm to incorporate both exact
reorientation and regularization into velocity calculation, but without directly
solving a large non-separable linear system. This method restores the
*O*(*n*) computational efficiency of the original
Demons algorithm to diffusion images, but does not sacrifice registration goodness,
in comparison with solving a large linear system at each iteration. In our
experiments, it introduced a 10-fold reduction of the computation time, and achieved
state-of-art registration performance.