When errors are positively correlated, simply combining atlases by assigning higher weights to better registered atlases becomes inefficient. To address this problem, we propose to estimate the posterior label probability through regression.

Given the image appearance of a target image at a local patch

(

*x*), our goal is to estimate the label posterior distribution at

*x*, i.e.

*p*(

*l*|

*T*_{F}(

(

*x*))). This is a high dimensional function of local appearance. Each registered atlas provides one observed value for this function

at one data point

. One common way to estimate a function’s values at unobserved data points is interpolation or regression analysis. Note that the image similarity-based weighting methods can be interpreted as applying the nearest neighbor interpolation [

19]. However, as we show above, nearest neighbor interpolation does not properly handle correlated errors.

Since we work on high dimensional regression problems, over-fitting is a key issue to be addressed. Hence, we study low order polynomial regression for label fusion in this paper. To this end, we model the label posterior probability as a second order polynomial function of image intensities from the local patch:

and

are first order and second order weights, respectively. Since the regression model is anatomy dependent, the regression weights may be different for different locations. For a simple notation, we rewrite the equation:

and

*T* = [

*T*_{F}(

*y*_{1}); …;

*T*_{F}(

*y*_{m});

*T*_{F}(

*y*_{1})

^{2}; …;

*T*_{F}(

*y*_{m})

^{2}], where

*y*_{i}
(

*x*) for

*i* = 1, …

*m* and

*m* is the size of the local image patch

(

*x*).

*t* stands for matrix transpose.

Using the observations obtained from the registered atlases, one can estimate

*β* by solving the following linear equations:

*A* = [

*A*_{F}; (

*A*_{F})

^{2}] and

is the matrix of the local appearances from the registered atlases.

is a vector of size

*m* × 1 that stores the local image intensity patch from

.

*P*_{l} = [

*p*(

*l*|

*A*^{1},

*x*); …;

*p*(

*l*|

*A*^{n},

*x*)] is the vector of the corresponding label posterior probabilities from all atlases. When enough atlases are used, we can solve

*β* via linear least square fitting:

Note that the matrix *AA*^{t} has dimension 2*m* × 2*m*. In practice, the number of atlases used are usually limited. For instance, for small image patches with dimension 3 × 3× 3 or 5 × 5 ×5, the patch size is usually greater than the number of atlases used for label fusion. In such cases, the matrix does not have a full rank and the matrix inverse is not well defined. To address this problem, we propose the following approximation to implement the regression analysis.

(

14) is obtained by substituting (

10). When

*n* = 2

*m*,

*A*(

*A*^{t}A)

^{−1}*A*^{t} is an identity matrix. (

14) gives the accurate solution. When

*n* < 2

*m*, (

14) approximates the label posterior probability via least square fitting. With this approximation, we transfer the problem of solving inverse of a large matrix [

*AA*^{t}]

_{2}_{m}_{×2}_{m} into solving inverse of a smaller matrix [

*A*^{t}A]

_{n}_{×}_{n}. If this smaller matrix is still near singular, one can add an identity matrix weighted by a small positive number

*λ* to it. Adding a weighted identity matrix can be interpreted as penalizing large weights that exploit correlations beyond some level of precision in the data sampling process [

20].

In (

14), each registered atlas has one single weight computed by

*w*_{n}_{×1} = (

*A*^{t}A)

^{−1}*A*^{t}T, i.e.:

Hence, as previous label fusion techniques, our regression-based method applies weighted-voting as well. One main difference is that the weights computed by our method can be either positive or negative, while the weights used by other similarity-based weighting approaches are nonnegative. When the errors produced by different atlases are positively correlated, applying negative weights to some of the atlases allows to cancel out the positively correlated errors between these negatively weighted atlases and other positively weighted atlases, which results in smaller combined error (see

equation (7)). Furthermore, most similarity based weighting methods rely on some pre-selected weighting models, e.g. (

3). The optimal model parameter, e.g. the standard deviation in the Gaussian model, is usually determined empirically. By contrast, our method automatically determines the optimal weights.

To see that our approximation applies linear least square fitting, note that

*w* is the linear least square fitting solution of the following linear equations when

*n* < 2

*m*:

We compute weights such that both the original target image and its squared image can be linearly interpolated by the warped atlas reference images and their squared images, respectively. In other words, the weighted local appearance of the warped atlases center around the local appearance of the target image. Since atlas-based segmentation relies on the assumption that segmentations are strongly correlated with appearances, putting the local target image at the center of the weighted warped local atlas images can be interpreted as an effort of putting the local segmentation of the target image at the center of the weighted warped local atlas segmentations, therefore the weighted segmentation errors are uncorrelated.

4.1. Error analysis

Since we attempt to model the relationship between image appearance and segmentation labels, the errors produced by our method can be categorized into two classes: 1) the appearance-label model selection error, i.e., the errors caused by employing the polynomial function to model the appearance-label relationship and 2) the model fitting error, i.e., the target image can not be perfectly represented as a linear combination of the atlas reference images.

The model fitting errors usually can be alleviated by employing enough atlases for label fusion. The actual number of atlases required for accurate model fitting depends on the complexity of the appearance-label model. Typically, more complex appearance-label models require more atlases. Hence, when limited atlases are used, to avoid significant model fitting errors one should use simple appearance-label models, like ours. However, segmentation labels usually correlate to image appearances in a more ambiguous fashion. Due to the factors such as the limitations in the imaging process and employing local image appearance for label fusion, most local appearance patterns may have more than one plausible segmentation and vice versa. Under these circumstances, the performance of our method also depends on how well the polynomial model captures the relationship between appearances and segmentation labels.

4.2. A toy example

demonstrates applying our regression approach to a toy example in 2D. In this binary segmentation problem, we have five atlases. After registration, suppose that the foreground label patterns of the target image and the warped atlases on a local patch with size 3 × 3 have the structures shown in . For simplicity, we consider deterministic atlases. The label posterior produced by each atlas is either 1 or 0. The registration quality is assumed to be constant within the patch. Furthermore, suppose that the image appearances have the same patterns as the foreground labels such that image appearances indeed linearly correlate to segmentation labels.

The pairwise correlations of label posterior errors between the atlases are all positive, indicating a consistent bias in the atlases. Also note that *A*^{1} and *A*^{4} have the largest combined inter-atlas error correlations, indicating that they contain the most common segmentation errors produced by all atlases. As a result, the segmentation obtained from majority voting has a structure more similar to *A*^{1} and *A*^{4}, with five mislabeled pixels. Due to the strong error correlation, the optimal image similarity based weighting label fusion approach reduces to the single-atlas based segmentation, i.e. applying zero weights to all atlases except the most similar atlas *A*^{3}. Applying our regression technique, only atlas *A*^{4} receives a negative weight. This allows to cancel some of the consistent errors among all the atlases. Correspondingly, our method gives the best solution on this local patch with two mislabeled pixels. Since the polynomial model accurately captures the appearance-label relationship for this example, the errors are purely due to the model fitting error.

4.3. Improving the robustness of model fitting

Since we face a high dimensional regression problem, the computed weights may be sensitive to outliers/noises caused by registration errors. In this section, we propose two techniques to reduce the model fitting error.

4.3.1 Similarity-rankig based regularization Note that our regression approach represents the target image as a linear combination of the registered atlas images. It is reasonable to expect that the atlases whose reference images are more similar to the target image will contribute differently from those whose reference images are less similar to the target image. Such similarity-ranking based weighting strategy has been explored in the classifier combining literature, [

10,

16]. To alleviate the over-fitting problem, we propose to accommodate this prior knowledge to regularize our regression weights. To this end, we rewrite

equation (16) as follows:

where

*π*(1), …,

*π*(

*n*) is a permutation of 1, …,

*n*.

is the appearance vector of the

*i*_{th} most similar atlas to the target image

*T* = [

*T*_{F}(

(

*x*));

*T*_{F}(

(

*x*))

^{2}], measured by SSD. Now, the regression weights assigned to atlases are associated with their similarity rankings.

To estimate these similarity-ranking based weights, again we use the atlases. Since all atlases are registered to the same target image, these registrations also establish the pairwise correspondence between the atlases via the target image. Hence, the image patches

*T*_{F}(

(

*x*)) and

for

*i* = 1, …,

*n* all represent the same anatomy structure. Following the logic that one can estimate the label posterior probability for the target image using the atlases, given the label posterior probability of the target image, we should be able to regress the label posterior probability for any atlas using the target image and the remaining atlases as well. For this purpose, we still need represent the warped atlas reference image as a linear combination of the target image and the remaining warped atlas reference images. For a simpler notation, let the appearance vectors of the target image and the warped atlases be represented by

*F*_{1}, …,

*F*_{n}_{+1}. Similar to

equation (17), any image

*F*_{j} can be represented as a similarity-ranking based linear combination of the remaining images as follows:

where

*F*_{πj(i)} is the

*i*_{th} most similar image to

*F*_{j} from the remaining images and

*π*^{j}(1), …,

*π*^{j}(

*n*) is a permutation of 1, …,

*j* − 1,

*j* + 1, …,

*n* + 1.

Under this formulation, each image provides a set of linear constraints for solving the weights *w*. Simultaneously solving all these linear equations produces the desired weights. The main advantage of employing similarity-ranking based weights is that it significantly increases the training data through a cross validation process, therefore the results are less sensitive to noises.

4.3.2 Reducing extrapolation via local searching When the warped atlas images are scattered around the target image, the target image is located within the reference range defined by the atlases. The posterior label distribution can be reliably estimated via interpolating label distributions obtained from the atlases. When the warped atlases are strongly biased, most warped atlases deviate from the target segmentation in a consistent way and the target image may be completely outside the reference range. Our method is more akin to extrapolation. Extrapolation is based on a strong assumption that the fitted regression model is still valid outside the reference range defined by the training samples. Hence, extrapolation is usually more prone to errors than interpolation. To alleviate such unreliability caused by extrapolation, we propose to use the most similar image patches from each atlas for label fusion.

Note that the goal of image-based registration is to correspond the most similar image patches between the registered images. However, the correspondence obtained from registration may not give the maximal similarity between all corresponding regions. For instance, deformable image registration usually needs to balance the image matching constraint and the regularization prior on deformation fields. A global regularization constraint on the deformation fields is necessary to clarify the ambiguous appearance-label relationship arising from employing small image patches for matching. However, enforcing a global regularization constraint on the deformation fields may compromise the local image matching constraint. In such cases, the correspondence that maximizes the appearance similarity between the warped atlas and the target image may be within a small neighborhood of the registered correspondence.

Motivated by this observation, instead of using the original registered correspondence, we propose to remedy the risk of extrapolation by searching for the correspondence, that gives the most similar appearance matching, i.e. minimal SSD distance, within a small neighborhood centered around the registered correspondence in each atlas. The locally searched optimal correspondence is:

*x*^{i} is the location from

*i*_{th} warped atlas with the best image matching for location

*x* in the target image within the local area

(x). Again, we use a cubic neighborhood definition, specified by a radius

*r*_{s}.

and

may represent different neighborhoods and they are the only free parameters in our method. Instead of the registered corresponding patch

*A*^{i}(

(

*x*)), we apply the searched patch

*A*^{i}(

(

*x*^{i})) to produce the fused label at

*x* for the target image, i.e. (

2) becomes

. Note that a similar local searching technique was recently proposed by [

5] to reduce noise effects for similarity-based local weighting label fusion.

With more similar data for regression, the local searching can significantly reduce the risk of extrapolation. In this regard, larger searching neighborhoods are more desirable. However, using larger local searching windows also compromises the regularization prior on the deformation fields. This drawback makes the appearance-label relationship more ambiguous on local patches, resulting greater model selection errors. Hence, avoiding model-fitting errors and avoiding model-selection errors can not be satisfied simultaneously. It is reasonable to expect an optimal searching range that balances these two factors.