Our segmentation method combines the global information of location and direction contained in a probabilistic atlas of known white matter tracts with the local diffusion information obtained from the image within a MRF model (

Li, 2009). In a MRF, the conditional probability distribution for the labels

*l* at

*x* is entirely determined by the conditional probability distribution in their neighborhood. Due to the Hammersley-Clifford theorem, the joint distribution follows a Gibbs distribution:

for a MRF including a unary term

*V*_{1}(

*x, l*) and a pairwise term

*V*_{2}(

*x, y, l, m*), with

*y* and

*m* the location and label of a voxel in the neighborhood of

*x, l, T* a scaling parameter and

*Z* a normalization factor. The terms

*V*_{1} and

*V*_{2} are often referred to as energy terms, and we will also use

*U*(

*x, l*) =

*V*_{1}(

*x, l*) + ∑

_{y,m} *V*_{2}(

*x, y, l, m*) to denote the overall energy at voxel

*x*. In our model, the unary term

*V*_{1}(

*x, l*) combines the tensor direction and diffusion characteristics of the data with the tract atlas to encode the likelihood of each label

*l* at

*x*, and the pairwise term

*V*_{2}(

*x, y, l, m*) encodes the local connections between neighboring diffusion tensors.

As it is well known that there is extensive overlap and crossing between multiple tracts at the current voxel resolution in diffusion MRI, it would be unrealistic to expect a given voxel to contain a single tract everywhere, and so our set of labels {*l*} must not only include labels for each separate tract but also overlapping tracts within a single voxel. To avoid combinatorial explosion, we limit ourselves to a maximum of two tracts per voxel and exclude unlikely pairs (see Section 2.4). The approach can in principle be extended to multiple overlaps, provided that the underlying diffusion information is sufficient to differentiate multiple crossings from isotropic regions.

To obtain a tract segmentation, we need to first model *V*_{1} and *V*_{2} as a function of the diffusion data for the possible labels and then to maximize *P*({*l*}) with respect to the label field {*l*}. The resulting segmented tracts are represented by membership functions (probabilistic assignments at each voxel) and can be analyzed in a variety of ways including computation of tract volume and determining average diffusion properties (e.g., FA, MD).

Our segmentation algorithm, referred to as Diffusion-Oriented Tract Segmentation (DOTS), is organized as follows. First, the diffusion weighted images are combined and pre-processed to obtain tensor images (Section 2.1). DOTS starts by registering our probabilistic atlas (described in Section 2.2) to the DTI data, and then using the MRF model to estimate the posterior probability of the voxel label given the data (Section 2.3). The model incorporates information about the local anisotropy (Section 2.3.1), the connectivity between neighboring voxels (Section 2.3.2) and the statistics of the atlas (Section 2.3.3). Section 2.3.4 brings all the different components of the MRF into the probability function *P*({*l*}) to be maximized by the algorithm. Additional optional modeling is provided to exclude anatomically irrelevant crossing configurations (Section 2.4) and to handle white matter lesions (Section 2.5). Additional implementation details are given in Section 2.6.

2.1. Diffusion-weighted image pre-processing

To process a series of diffusion-weighted images, we first perform a tensor reconstruction to get a set of diffusion eigenvectors and associated eigenvalues {

_{n}(

*x*), λ

_{n}(

*x*)}

_{1, N} at each voxel

*x* with the standard linear reconstruction method (

Basser and Jones, 2002). The images are first aligned to the b0 image and corrected for distortions against a structural image within the CATNAP software (

Landman et al., 2007). Extra-cranial tissues are removed with an automatic skull-stripping method (

Carass et al., 2007) applied to a co-registered structural image or using a semi-automatic method (

Bazin et al., 2007) applied to the mean diffusivity image if a structural image is not available.

2.2. White Matter Tract Atlas

Our statistical atlas is based on the atlas of Mori and Wakana (

Mori et al., 2005). Following the atlas tract definitions and delineation protocols, we obtained a set of semi-manual delineations of major tracts, which we augmented with additional expert delineations of a few other tracts of interest. All tracts were delineated using tractography and multiple regions of interest (ROI) selection in DTI Studio.

^{1} Our standard atlas currently includes the following fiber tracts (separated for left and right when applicable): anterior, superior, and posterior thalamic radiations, corpus callosum and tapetum, inferior and superior longitudinal and fronto-occipital fasciculi, cingulum, fornix, uncinate fasciculus, optic tract, optic radiation, cortico-spinal and cortico-pontine tracts, medial lemniscus, inferior, middle, and superior cerebellar peduncles (see ).

Our atlas includes both the spatial probability

*p*_{l}(

*x*) of the existence of label

*l* at voxel

*x* as well as the most probable diffusion direction

_{l}(

*x*) along the tract. Deterministic tractography usually provides very conservative delineations, including only the portion of the tracts that can be reconstructed through multiple ROIs. For this reason, we extended the definition of the tracts in the vicinity of the delineated regions with a smoothing function:

where

is the delineated tract region,

the smoothed delineation,

*k* a smoothing kernel,

the principal direction at voxel

*x* in image

*t* for tract

*l*, and

*N* (

*x*) is the neighborhood defined by the smoothing kernel. Note that the directions are meaningful at first only inside the delineated regions

, and must be extrapolated from higher probability to lower probability regions. The directions are orientation-independent (i.e.

and −

are the same direction), so the sums of directions are performed as follows:

where

*d⃗.gif" border="0" alt="d" title=""/>* is the oriented vector corresponding to direction

. The averaged direction in

Eq. (2) may not have unit norm, unless all the directions are the same. This provides information about the uncertainty in direction at each location in the atlas. In this work we use a linear smoothing kernel of radius 5 mm to construct our atlas. The size of the kernel has some impact on the segmentation, as a larger scale is better at representing tracts with large individual variations but also increases the risk of mislabeling regions with too many overlapping candidates. Our experiments however indicate that a small but reasonable smoothing (5 mm corresponds to about two voxels at the atlas’s original resolution) improves the results.

Examples from the atlas are given in . Although this atlas includes most of the major WM tracts in the human brain, the segmentation also includes isotropic regions like the gray matter structures or the ventricles, and some undefined regions of white matter, for instance the “U” fibers that connect neighboring gyri of the cortex. We model the isotropic regions by creating a low anisotropy mask (FA ≤ 0.1) and we capture the undefined white matter regions by subtracting from the opposite mask (FA > 0.1) any region included in one of the tracts defined above.

2.3. Markov Random Modeling of Diffusion

Given the DTI data and the statistical atlas above, we derive below the terms of the MRF model of

Eq. (12): unary terms representing local diffusion properties of the tensor and the contributions of the tract atlas in terms of shape and direction, and a binary term representing connectivity between neighboring tensors.

Our set of possible labels includes the individual tracts listed in , undefined fiber tracts and isotropic regions, as well as overlapping pairs of tracts. In the following, we describe general properties that apply to any individual tract, group of overlapping tracts or isotropic region independently of the specific label. Let us denote by *T, O*, and *I* general attributes of the single tract, overlapping tracts and isotropic regions respectively.

2.3.1. Diffusion Type At every voxel

*x*, we can have one of three types of structures in this model: isotropic diffusion, diffusion along a single tract, or diffusion along multiple tracts overlapping or crossing. To model the type of diffusion, we use indices of diffusion along a single tract (

*T*), overlapping tracts (

*O*), and isotropic regions (

*I*) derived from the linear, planar and spherical indices of (

Westin et al., 2002):

Note that water may still diffuse along a single direction for overlapping but aligned tracts, so we cannot differentiate between the two in regions of linear diffusion and

*d*_{O} ≥

*d*_{T} (see ).

2.3.2. Local tensor connectivity The evidence for fiber tracts comes from the diffusion tensors: if two tensors are aligned they likely correspond to the same tract or overlapping tracts. The connectivity between neighboring tensors is modeled as follows, assuming a single tract:

where

_{xy} represents the direction vector between voxels

*x* and

*y*,

_{1}(

*x*) and

_{1}(

*y*) represent the principal eigenvector directions at

*x* and

*y*, and

arccos(|

_{1} ·

_{2}|). The function θ(

_{1},

_{2}) gives the angle between directions

_{1} and

_{2} normalized in [0, 1]. Because it is linear, it is more sensitive to small angular variations than the product

_{1} ·

_{2}. In some of the following cases, the vectors in this formula have less than unit norm as a way to encode uncertainty in their direction as a lower bound on angles. In such cases, the function θ(

_{1},

_{2}) returns values in [α, 1], where

arccos(|υ

_{1}| |υ

_{2}|). Note that other measures involving the complete tensor are possible and they may be interesting to explore in the future.

The similarity function *s*_{T}(*x, y*) is close to +1 when the diffusion directions at *x* and *y* are aligned with each other and with the path from *x* to *y*. If both diffusion directions are orthogonal to that path, we cannot assume that they are related even if they are aligned: many fiber tracts have “kissing” fibers that follow the same direction before diverging. In such cases, *s*_{T}(*x, y*) goes to zero in order to model the uncertainty. When the diffusion directions are orthogonal and one of them is aligned with the path, then it is clear that both points cannot be part of the same tract, which translates into a negative value up to −1 (see ).

For crossing or overlapping fibers, the main diffusion direction can easily switch from

_{1} to

_{2} if the tensor is “flat”, and so we extend the connectivity to consider the secondary directions:

The first part of this equation means that we select the pair of directions at

*x* and

*y* that are best aligned among the first two eigenvectors {

_{1}(

*x*),

_{2}(

*x*),

_{1}(

*y*),

_{2}(

*y*)}.

We penalize the

_{2} by

to account for the uncertainty due to the fact it is required to be orthogonal to

_{1}. Note also that these directions are not the true fiber directions, which cannot be retrieved with the tensor representation. However, the erroneous tensors will be similar and share directions as long as the two fibers are crossing along similar directions and the diffusion is slightly stronger in one direction. In practice, many regions of overlap between the tracts that we can observe at clinical resolution involve primarily kissing fibers as opposed to crossing ones, for which the principal direction of diffusion as estimated from a tensor reconstruction is still meaningful.

Finally, isotropic regions are assumed to have uniform connectivity *s*_{I}(*x, y*) = *s*_{I}, given as a prior parameter. The value of *s*_{I} will influence how labels from isotropic regions propagate into anisotropic ones as compared to fibers and overlapping regions. Because all the regions evolve simultaneously, we set *s*_{I} = 1/*N*_{t} in our experiments, where *N*_{t} is the total number of regions in the atlas.

2.3.3. Atlas information The atlas provides a prior on tract location

*p*_{l} and tract direction

_{l}, which is integrated in the MRF model through a shape prior term

*u*_{l} and a direction coefficient

*c*_{l} as follows.

Because we allow the presence of multiple tracts at a location, the prior probability for a particular label defined by the atlas cannot be used directly. Instead, we define the shape prior term

*u*_{l}(

*x*) for a single tract

*l* as follows:

giving high values where the prior probability

*p*_{l}(

*x*) is high and likely to be a single tract (∑

_{m} *p*_{m}(

*x*) ≈

*p*_{l}(

*x*), for all tract labels

*m*).

To compute

*c*_{l}, we compare the direction prior

*d⃗.gif" border="0" alt="d" title=""/>*_{l}(

*x*) with the principal direction

_{1}(

*x*) of the tensor in the image to segment:

The coefficient

*c*_{l} is positive when the directions coincide, and negative when they are orthogonal. The uncertainty on the direction prior, represented by its norm, lowers both positive and negative values of

*c*_{l}. For two crossing tracts

*l* and

*m*, the shape prior term combines the priors of both objects with the likelihood for two overlapping tracts:

If the tracts have different orientations, the reconstructed tensor will represent a mixture pointing to a different direction than the individual diffusion tensors for each tract. However, if we assume the true tensors with principal directions vectors

*d⃗.gif" border="0" alt="d" title=""/>*_{1,l}(

*x*) and

*d⃗.gif" border="0" alt="d" title=""/>*_{1,m}(

*x*) to be prolate tensors with high anisotropy, we can derive from the Stejskal-Tanner equation that the estimated tensor for the mixture will have for its principal direction the vector

*d⃗.gif" border="0" alt="d" title=""/>*_{l,m}(

*x*) =

*d⃗.gif" border="0" alt="d" title=""/>*_{1,l}(

*x*)±

*d⃗.gif" border="0" alt="d" title=""/>*_{1,m}(

*x*) with largest norm.

The composite tensor direction

*d⃗.gif" border="0" alt="d" title=""/>*_{l,m}(

*x*) is then renormalized so that its norm is

. From this composite vector, we can define the following direction coefficient as we did before in the single tract case:

Finally, isotropic regions use the same shape prior as individual tracts and assume no preferred direction, setting

.

2.3.4. Propagation of the tract probabilities Once we have the various elements of diffusion and atlas information above, we can derive the terms of the energy function. From the diffusion and prior-based terms, we derive the following unary term:

for single tracts, overlapping tracts and isotropic regions respectively. We combine all the terms as a product since we require all three conditions to be met jointly in order to attribute a label

*l* to a given voxel (unlike with more classical probabilistic models, we cannot separate the terms of

*V*_{1} into a sum of conditionally independent elements).

The neighborhood term propagates the unary energy values along the most likely fiber directions or isotropically depending on the type of label. Let

*x*^{+} = argmax

_{yN+(x)}*s*_{T}(

*x, y*), where

*N*^{+}(

*x*) is the half of the 26-neighborhood

*N*(

*x*) of

*x* such that

_{xy} ·

_{1}(

*x*) > 0, and similarly

*x*^{−} = argmax

_{yN−(x)}*s*_{T}(

*x, y*), with

*N*^{−}(

*x*) such that

_{xy} ·

_{1}(

*x*) ≤ 0. The same definition holds for overlapping tracts, substituting

*s*_{O}(

*x, y*) to

*s*_{T}(

*x, y*). The binary term

*V*_{2}(

*x, y, l, m*) is only non-zero if

*y* =

*x*^{+} or

*y* =

*x*^{−} and

*l, m* include one same label, leading to the following formula for the energy term

*U*(

*x, l*) =

*V*_{1}(

*x, l*) + ∑

_{y,m} *V*_{2}(

*x, y, l, m*) at

*x*:

These three equations list all the possible cases: single tract labels, pairs of overlapping labels, and isotropic regions. Because regions of overlap interact with single tract regions, the energy can propagate between different labels as long as they share a common tract.

With this model, only the neighboring voxels most likely to be along the same tract are considered for single or overlapping tracts, whereas the contribution of all neighbors is averaged for isotropic regions. By using such a subset of the neighborhood, we greatly simplify the structure of the MRF and improve computational stability, as the field is locally oriented along the tracts, removing many loops in the neighborhood graph. The probability function can be efficiently maximized through an iterated conditional modes algorithm which converges quickly in practice and requires little computational overhead (

Besag, 1986). This simplified neighborhood structure enable us to estimate the MRF efficiently without the help of a more elaborate MRF solver (

Bazin et al., 2009a;

Kolmogorov and Zabih, 2004).

2.4. Anatomical constraints

Even with a limit of two overlapping tracts per voxel, the number of possible pairs grows quickly when including more white matter tracts. However, it is well known from anatomy that certain pairs of tracts will not overlap, even in the presence of deforming pathology or trauma. For instance, it is clear that the superior longitudinal fasciculus (SLF) should not interact with the uncinate tract (UNC), or the cortico-spinal tract (CST) with the frontal forceps (CCF). From this observation, we restrict the possible label pairs to a list set a priori from anatomical information. The list may depend on image resolution, as more tracts will overlap in a voxel of coarser resolution.

To obtain such information from atlases and anatomy experts is challenging when the number of tracts to be estimated grows. We define an overlap probability for each pair of tracts in the atlas as follows:

where

*p*_{m}(

*x*),

*p*_{l}(

*x*) are the spatial probabilities for tracts

*m, l*. We then restrict the possible label pairs to those such that

*P*_{O} > 1/2.

Similarly, we make the assumption that our blurred atlas priors should fully include the corresponding tracts, thus estimating only probabilities for tracts with non-zero prior at each voxel. These two modeling constraints greatly reduce the combinatorial increase and computational burden when additional tracts and potential crossings are added to the atlas, which allowed the method to scale well from an initial atlas of 10 tracts to our current atlas of 39 tracts.

2.5. Handling WM pathology

Despite the development of many diffusion MRI processing algorithms in recent years, studies of the WM structures have been difficult when pathology is present. In diseases like multiple sclerosis, Alzheimer’s disease, or even in normal aging, lesions appear inside the WM, changing the diffusion properties of the tissue (

Reich et al., 2010;

Wheeler-Kingshott and Cercignani, 2009).

Our approach is already robust to most lesions as it combines information coming from multiple directions around the area of lesion, however lesions with a sharp decrease in anisotropy will likely be segmented as isotropic regions rather than tracts. If we have an estimate of the lesion location from co-registered structural MRI (using (

Shiee et al., 2010) in our case), we can use that information as an additional prior: to compensate for the low anisotropy inside lesions, we simply update the indices in all lesion voxels

*i* by:

This update enforces that DOTS segments lesions as part of at least one tract, provided that the local diffusion direction matches the atlas. Note that DOTS includes a tract label for unidentified WM regions and allows isotropic regions to spread to their neighborhood, so the lesions are not forced to merge with a nearby tract if their tensor directions are not compatible.

2.6. Algorithmic Details

The complete segmentation process is performed as follows. First we register the shape and direction atlas to the skull-stripped tensor image to be segmented with a multi-scale gradient descent method that maximizes *E*_{R} = ∑_{x} ∑_{l} ‖*a*(*x*)*p*_{l}(*T*(*x*))‖^{2}, where *a*(*x*) is the fractional anisotropy and *T* a rigid transform. The direction atlas is rotated accordingly. Next, the estimation algorithm is initialized with *U*(*x, l*) = *V*_{1}(*x, l*), and then refined using iterated conditional modes and registration until convergence. Convergence is measured by the proportion of changed labels for each iteration.

To reduce computational overhead, we only record the energy function for the *N*_{B} labels with highest energy at a given voxel, and approximate the others to zero. In our experiments, no significant differences could be observed in the results for *N*_{B} ≥ 8. Also note that the connectivities *s*_{T}(*x, x*^{+}), *s*_{T}(*x, x*^{−}), *s*_{O}(*x, x*^{+}), *s*_{O}(*x, x*^{−}) are functions of the data alone, and can be precomputed to improve computation speed. A 181×217×181 voxel image (1mm cubic resolution) is processed with this method in less than 30 minutes, and a more typical 256×256×60 voxel image takes about 10 minutes to converge on a modern workstation with 8GB of available memory.

Once the algorithm has converged, we obtain a hard segmentation by selecting the labeling of highest energy, and attributing the voxel to the underlying tract or tracts in case of overlap; see . The corresponding membership function is given by:

where

*g*_{0} is a parameter controlling the sharpness of the membership.

The DOTS algorithm has been implemented as a plug-in for the MIPAV software package (

McAuliffe et al., 2001) and the JIST pipeline environment (

Lucas et al., 2010). The software is freely available on the NITRC repository

^{2}.