Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC3402713

Formats

Article sections

- Abstract
- I. Introduction
- II. The Proposed Framework
- III. Computationally Efficient Implementation
- IV. Experiments and Results
- V. Discussion and Conclusion
- References

Authors

Related links

IEEE Trans Med Imaging. Author manuscript; available in PMC 2012 July 24.

Published in final edited form as:

Published online 2011 February 14. doi: 10.1109/TMI.2011.2114362

PMCID: PMC3402713

NIHMSID: NIHMS387950

Tianhao Zhang, T. Zhang is with the Section of Biomedical Image Analysis, Department of Radiology, University of Pennsylvania, Philadelphia, PA 19104 USA (Email: Tianhao.Zhang/at/uphs.upenn.edu)

The publisher's final edited version of this article is available at IEEE Trans Med Imaging

See other articles in PMC that cite the published article.

Gaussian smoothing of images prior to applying voxel-based statistics is an important step in Voxel-Based Analysis and Statistical Parametric Mapping (VBA-SPM), and is used to account for registration errors, to Gaussianize the data, and to integrate imaging signals from a region around each voxel. However, it has also become a limitation of VBA-SPM based methods, since it is often chosen empirically and lacks spatial adaptivity to the shape and spatial extent of the region of interest, such as a region of atrophy or functional activity. In this paper, we propose a new framework, named Optimally-Discriminative Voxel-Based Analysis (ODVBA), for determining the optimal spatially adaptive smoothing of images, followed by applying voxel-based group analysis. In ODVBA, Nonnegative Discriminative Projection is applied regionally to get the direction that best discriminates between two groups, e.g., patients and controls; this direction is equivalent to local filtering by an optimal kernel whose coefficients define the optimally discriminative direction. By considering all the neighborhoods that contain a given voxel, we then compose this information to produce the statistic for each voxel. Finally, permutation tests are used to obtain a statistical parametric map of group differences. ODVBA has been evaluated using simulated data in which the ground truth is known and with data from an Alzheimer’s disease (AD) study. The experimental results have shown that the proposed ODVBA can precisely describe the shape and location of structural abnormality.

Voxel-Based Analysis and Statistical Parametric Mapping (VBA-SPM) [1][18] of imaging data have offered the potential to analyze structural and functional data in great spatial detail, without the need to define a priori regions of interest (ROIs). As a result, numerous studies [7][11][23][33][48][49][50] during the past decade have better investigated brain structure and function in normal and diseased populations, and have enabled the quantification of spatio-temporal imaging patterns.

A fundamentally important aspect of VBA-SPM has been the spatial smoothing of images prior to analysis. Typically, Gaussian blurring of full-width-half-max (FWHM) in the range of 8–16mm is used to account for registration errors, to Gaussianize the data, and to integrate imaging signals from a region, rather than from a single voxel. The effect of this smoothing function is critical: if the kernel is too small for the task, statistical power will be lost and large numbers of false negatives will result in missing many regions that might present group differences in structure and function; if the kernel is too large, statistical power can also be lost by blurring image measurements from regions that display group differences with measurements from regions that have no group difference. In the latter case, spatial localization is also seriously compromised, as significant smoothing blurs the measurements out and often leads to false conclusions about the origin of a functional activation or of brain atrophy. Moreover, a filter that is too large, or that is not matched with the underlying group difference, will also have reduced sensitivity in detecting group differences. As a result, Gaussian smoothing is often chosen empirically, or in an ad hoc fashion, an obvious limitation of such VBA-SPM analyses, in part because of its heuristic nature, and in part because it can lead to overfitting of the data without proper cross-validation or correction for multiple comparisons.

However, the most profound limitation of Gaussian smoothing of images prior to applying the General Linear Model (GLM) [18] is its lack of spatial adaptivity to the shape and spatial extent of a structural or functional region of interest. For example, if atrophy or functional activation in the hippocampus is to be detected, Gaussian smoothing will blur volumetric or activation measurements from the hippocampus with such measurements from surrounding tissues, including the ventricles, the fusiform gyrus, the amygdala and the white matter. Previous work in the literature [13] has shown that spatially adaptive filtering of image data can dramatically improve statistical power to detect group differences. However, little is known about how to optimally define the shape and extent of the smoothing filter, so as to maximize the ability of VBA-SPM to detect group effects.

In this paper, we present a mathematically rigorous framework for determining the optimal spatial smoothing of structural (and potentially functional) images, prior to applying voxel-based group analysis. We consider this problem in the context of determining group differences, and we therefore restrict our experiments to voxel-wise statistical hypothesis testing. However, the mathematical formalism and algorithm are generally applicable to any type of VBA. In order to determine the optimal smoothing kernel, a regional discriminative analysis, restricted by appropriate nonnegativity constraints, is applied to a spatial neighborhood around each voxel, aiming to find the direction (in a space of dimensionality equal to the size of the neighborhood) that best highlights the difference between two groups in that neighborhood. Since each voxel belongs to a large number of such neighborhoods, each centered on one of its neighboring voxels, the group difference at each voxel is determined by a composition of all these optimal smoothing directions. Permutation tests are used to obtain the statistical significance of the resulting ODVBA maps.

This approach is akin to some fundamental principles of signal and image processing, and more specifically to the matched filtering, which states that optimal detection of a signal in the presence of noise is achieved by filtering whose kernel is related to the signal itself. In the context of voxel-based statistical analysis, the “signal” is not known, as it relates to the underlying (unknown) group difference. Therefore, the purpose of our optimization is to actually find the kernel that maximizes the signal detection.

ODVBA has some similarities with the searchlight approach [29][30], however it is significantly different. The searchlight method is basically a local multivariate analysis constrained to the immediate neighborhood of a voxel, whereas ODVBA performs a high-dimensional discriminative analysis using machine learning technique over large neighborhoods, which captures anatomical and functional patterns of larger range, thereby determining the optimally discriminative spatially varying filter. ODVBA also relates to the extensive literature using linear discriminant analysis (LDA) on multivariate patterns of whole brain images. However, implementing the standard LDA directly on the images usually suffers from the singularity problem [14] because the number of images is much smaller than the number of brain voxels. To overcome the problem, Kustra and Strother [31] used the smoothness-constrained, penalized LDA as a tool not only for a strict classification task of positron emission tomography (PEI) images, but also for extracting the activation patterns. Thomaz et al. [45][46] presented a PCA plus Maximum uncertainty LDA that solves the small sample size problem for classification and visual analysis of structural MRIs. Carlson et al. [6] used PCA plus LDA to classify brain activities of different stimulus categories [20][40] and to find which voxels contribute to the activity. Unfortunately, although the singularity problem has been addressed in the above LDA-based methods, a great deal of important information in the images would be lost, since they employ the smoothness constraint or PCA to reduce the dimensionality prior to implementing LDA. However, ODVBA does not have the singularity problem since it never involves the matrix inverse computation and the discriminative analysis is conducted on the data set constructed according to the local neighborhood so it avoids the curse of dimensionality naturally. More importantly, all these three methods attempt to obtain discriminants or a “canonical image”, which is a spatial distribution of voxels that maximally differentiates between different experimental conditions, for interpretation of abnormality or activation in the groups. However, the resulting discriminants are then usually analyzed based on visual inspection or simply thresholding without determining a voxel-wise statistical value; therefore, they do not produce the *p* values for each voxel in the style of traditional SPM, whereas ODVBA does. Finally, ODVBA also employs a non-negativity constraint, which is important as it prohibits canonical images with positive and negative value cancelations, which are often difficult to interpret, especially if a brain region is involved in many canonical images with different weights.

The rest of the paper is organized as follows: Section II describes the general formulation, and its numerical optimization solution. Section III introduces a method of computationally efficient implementation for the ODVBA. Section IV presents a number of experiments with 1) simulated data of known ground truth and 2) structural images of elderly individuals with Alzheimer’s disease (AD). These experiments demonstrate that the proposed methodology significantly improves both the statistical power in detecting group differences, and the accuracy with which the spatial extent of the region of interest is determined by VBA-SPM analysis. Section V contains the discussion and conclusion. For convenience, Table I lists important notations used in the paper.

The proposed framework consists of the following stages:

For each voxel, we examine a (typically large) neighborhood centered on it (sometimes referred to as a “subvolume”), and use the Nonnegative Discriminative Projection (NDP) to find the direction that best discriminates the two groups being compared. This direction can be viewed as a spatially adaptive filter, which locally amplifies the group differences, since projection is a weighted average of all voxels in the subvolume, i.e. it is a filter.

A given voxel belongs to a number of neighborhoods centered on its neighboring voxels, so it may correspond to a collection of values, each being part of a different discriminative direction and reflecting underlying group differences. In order to eventually determine a single value for each voxel that represents the group difference at that spatial location and will be used for statistical analysis, we observe that the contribution of each voxel to a discriminative direction of a neighborhood to which it belongs is given by the respective coefficient of the optimally discriminative direction of that neighborhood. We sum up all these coefficients belonging to a given voxel to determine a quantity that reflects the voxel’s discriminating value.

Since assumptions of Gaussianity cannot be made for the derived voxel-wise discriminative measurements, we resort to permutation tests [24][39] to obtain statistical significance (*p* values).

Although not necessarily part of ODVBA, classification is demonstrated here as an additional use of ODVBA. In particular, along the lines of the COMPARE algorithm [16], the regional clusters showing the highest group differences are used as the input features to be fed to a classifier, e.g., an SVM.

For a given voxel *x* in volume *X*, we construct its neighborhood in which each voxel *x _{i}* follows ‖

The Nonnegative Discriminative Projection (NDP) algorithm which is developed based on Nonnegative Quadratic Programming [42] is used to find the optimal discriminative directions which project the high-dimensional samples onto a 1-dimensional space that maximizes the classification accuracy, and therefore the group differences. NDP is implemented on each learning set. The resultant optimally discriminant vector *$\stackrel{\u20d7}{w}$* has a special property: it is nonnegative. This stems from the nonnegativity constraint that is incorporated into the objective function of the NDP method. This constraint is used to help us interpret the group differences. Specifically, our goal is not simply to find an image contrast, prescribed by *$\stackrel{\u20d7}{w}$*, which distinguishes the two groups, but also requires that this contrast tells us something about the properties of the images we are measuring, e.g. about regional volumetrics or functional activity. We therefore limit ourselves to nonnegative, albeit arbitrarily shaped, local filters, each of which prescribes a regional weighted average of the signal being measured. In a regional volumetric analysis, for example, the optimal regional filter will reflect the shape of the region whose volume is different between two groups. In a functional image analysis, this filter might represent the region whose signal is summed up to reflect the activation. This is in contrast with the traditional methods of feature extraction for pattern classification, which are free to derive any feature obtained by any image filter that maximizes classification accuracy, but are not designed to necessarily measure (interpretable) group differences.

To better illustrate the idea of NDP, we show its results on a toy dataset before we describe the formulation. In this artificial test, we generated images that contained a square (the region of group difference) with intensity that varied from one image to another. We generated two groups of images: the first set of squares had intensities with mean 120.53 and standard deviation 5.79, while the second had 90.36 and 5.72, respectively. Fig. 2A shows the difference of means from the two groups. Fig. 2B shows the *$\stackrel{\u20d7}{w}$* obtained from the learning set constructed according to the neighborhood I; it is basically noise with very small values of (*$\stackrel{\u20d7}{w}$*)_{i}, indicating that no local filter can be found that distinguishes the two groups at that neighborhood. Fig. 2C shows the *$\stackrel{\u20d7}{w}$* obtained from the learning set constructed according to the neighborhood II; the estimated optimal filter *$\stackrel{\u20d7}{w}$* is well aligned with the underlying group difference, within which it has high values. The bottom line here is that, following the formulation to be presented next, the filter that locally amplifies the group difference in this set of images calls for a weighted average of the pixel values within the part of the square that is included in the neighborhood, correctly reflecting the underlying difference. We now present the details of the formulation.

Using one given learning set constructed by the subvolume vectors from all the involved individuals, we probe into neighborhood elements’ contributions for discrimination of the two groups. We expect to find such a nonnegative vector *$\stackrel{\u20d7}{w}$* to describe the contributions of elements in the neighborhood: the larger the value of (*$\stackrel{\u20d7}{w}$*)_{i} is, the more the corresponding element ()_{i} contributes to the discrimination. Equivalently, (*$\stackrel{\u20d7}{w}$*)_{i} is the *i ^{th}* coefficient of the regional filter denoted by

(1)

where, Ψ is the 1-dimensional projection of ^{k}. We expect that the two classes will be separated as much as possible along *$\stackrel{\u20d7}{w}$*, and at the same time the samples from the same class get more compact.

A measure of the separation between the two classes is the distance of projected class means:

(2)

where, ; *C _{i}* means the

And, the intra-class compactness is defined as follows:

(3)

where, .

*S _{B}* and

(4)

where, *A* = (γ*S _{W}* −

**Theorem 1.**
*A* is a positive definite matrix

The proof is in the Appendix A.

Since *A* is positive definite, *J* (*$\stackrel{\u20d7}{w}$*) is a convex function and has the unique global minimum. We solve the above optimization problem using the Nonnegative Quadratic Programming (NQP) [42]. According to [42], let *A*^{+} and *A*^{−} denote the nonnegative matrices described as:

(5)

and

(6)

It is clear that *A* = *A*^{+} − *A*^{−}. According to the two nonnegative matrices, the objective function in (4) can be written as the combination of three terms:

(7)

where

(8)

We define their derivatives as follows:

(9)

Note that the partial derivatives on *J _{a}* and

Using the above derivatives, multiplicative updates rule which does not involve the learning rates is introduced to minimize the objective function iteratively:

(10)

where *i* = 1, , *k*. Equation (10) means that all the elements in *$\stackrel{\u20d7}{w}$* are updated in parallel. Since *A*^{+}
*$\stackrel{\u20d7}{w}$* ≥ 0 and *A*^{−}
*$\stackrel{\u20d7}{w}$* ≥ 0, the updated *$\stackrel{\u20d7}{w}$* in (10) is always nonnegative.

**Theorem 2.** The function of *J*(*$\stackrel{\u20d7}{w}$*) in Equation (4) decreases monotonically to the value of its global minimum under the multiplicative updates in Equation (10).

The proof is in Appendix B.

For all the *M* voxels in one volume, we have *M* discriminative directions, each applied to a different neighborhood, as described in Section II-A. For a given voxel *x*, we obtain a list of (*$\stackrel{\u20d7}{w}$*)_{i} values from different *$\stackrel{\u20d7}{w}$* since *x* may belong to a number of neighborhoods, as shown in Fig. 3. (Recall that for each neighborhood to which *x* belongs, the respective coefficient in *$\stackrel{\u20d7}{w}$* reflects the discrimination power of this voxel in terms of the pattern seen in that neighborhood.). To quantify the group difference measured at voxel *x*, we use the following function, termed *discrimination degree*, which relates to the effect size [8]:

(11)

where, ϕ is a tuning parameter aiming to reduce potential outliers in the dataset. Let Δ = {|*x* } denote the set of neighborhoods that the given voxel *x* belongs to, then we define the group difference on *x* by summing up contributions from all neighborhoods to which it participates:

(12)

where, *$\stackrel{\u20d7}{w}$*_{} denotes the coefficients corresponding to voxels in , (*$\stackrel{\u20d7}{w}$*_{})_{i} denotes that *x* is the *i ^{th}* element in , and δ

It is worth noting that, in ODVBA, one learning set is constructed according to one given neighborhood (subvolume vector) by different individuals. Actually, based on one learning set, the obtained coefficient *$\stackrel{\u20d7}{w}$* has only one direction (group1>group2 / group2>group1) to maximize the classification accuracy. When ODVBA handle data that have both possible positive and negative differences, it does not need to calculate the *$\stackrel{\u20d7}{w}$* twice using Nonnegative Discriminative Projection described in Section II-A, but it has to determine two different statistical values *S _{x}* for both positive and negative cases, before implementing permutation test respectively. In particular, only having the coefficient

Moreover, for one given neighborhood, the optimal coefficients can only reflect the positive (assuming group1>group2) signals if the projected class means _{group1} > _{group2} in this subvolume. However, the negative signals will never be ignored if they are really significant, because they may be highlighted in other random neighborhood where they are stronger than the positive ones, so that _{group1} < _{group2}.

Assuming that the null hypothesis is that there is no difference between the two groups, the statistical significance can be assessed by comparison with the distribution of values obtained when the labels are randomly permuted [24][39]. In particular, we randomly assign the subjects into two groups, and then implement Section II-A and Section II-B to calculate the statistic for each voxel. The above relabeling is repeated *N _{p}* times. For one given voxel, let

(13)

where,

(14)

The original method described in Section II is based on voxel-wise computation, an extreme solution that uses all the neighborhoods corresponding to all the *M* voxels. Unfortunately, this approach would be complicated and computationally expensive. The simplest way to reduce the computational complexity is to randomly select a subset of the neighborhoods to represent all the neighborhoods, and then the number of the involved learning sets can be reduced. However, with this method, the selected neighborhoods may not cover all the voxels, and therefore, it is not guaranteed that statistics can be derived for each voxel. In this section, we use the Greedy Algorithm [10] to select the subset of the neighborhoods which include all the voxels and at the same time overlap each other under a certain rate. Moreover, to facilitate the greedy search, we introduce a new approach named Graph-assisted Greedy Neighborhoods Cover, which uses a graph to assist in seeking one neighborhood in each iteration of the greedy algorithm.

We select the minimum subset of the neighborhoods *B* as the replacement of all the ones *F*, where *B* *F*, and then the learning sets are constructed corresponding to the selected neighborhoods respectively. To get a reasonable statistic value for each voxel after obtaining the discriminative directions using NDP, the selected neighborhoods are expected: i) to cover all the voxels in *X*, that is,

(15)

ii) to overlap each other under at least an overlapping factor [51] *a* (0 < *a* < 1), that is,

(16)

for each in *B*.

The problem introduced in Section III-A can be solved using the greedy algorithm [10][51], which works by picking that covers the maximum number of remaining elements that are uncovered under certain constraint in each iteration. Let *U* contain all the remaining uncovered elements at each iteration. The algorithm chooses the neighborhood _{i} that overlaps with the selected neighborhoods in *B* and covers as many uncovered elements in *U* as possible. Then, the selected _{i} is put into *B* and the elements covered by _{i} is removed from *U*. It finally terminates when *U* is empty. The algorithm is summarized in Table II.

It is worth noting that Line 5 of the algorithm in Table II may be time-consuming because it picks _{i} from all the *M* neighborhoods without using the neighbor information for searching in each iteration. We propose a Graph-assisted Greedy Algorithm which select _{i} by implementing simple operations on the small sub-block of the graph in each iteration. The algorithm is described as follows.

A graph *G* is introduced to model the neighbor relationship between every two voxels in *X*. Each sample point in *X* is a vertex of the graph. An edge is put from *x _{i}* to

(17)

Obviously, each row of *G* represents one neighborhood. Let denote the set of the elements covered by the neighborhoods from *B*, that is = _{ B} , use *I _{B}* to denote the index set of the selected neighborhoods, and we can get the sub-matrix from

(18)

where set *L* follows

*L**X*;*L*∩*I*= ;_{B}*G*(_{sub}*i,*) ·*$\stackrel{\u20d7}{e}$*≠*k, i**L*;- (19)

Among the terms in (19), 1) means that the rows of *G _{sub}* come from

Compute

(20)

and we can know *T*(*i*) is the number of elements covered by , in the *L* {*i*}^{th} neighborhood. Let *i _{min}* denote the index of element which is the minimum in

In this section, we designed two different kinds of experiments to evaluate the performance of ODVBA compared with the original SPM [1][44] and nonparametric permutation based SPM (referred to SnPM) [24][39]. In the first experiment, we applied the three methods to the simulated atrophy data in which we know the ground truth. Here, it is easy for us to evaluate the accuracy and statistical power of ODVBA, in comparison with SPM and SnPM. The second experiment demonstrated the success of our method on the real data of AD patients from ADNI (www.loni.ucla.edu/ADNI).

The dataset consisted of real MRI scans of 60 normal controls, with a relatively small age range, obtained from ADNI. The description of the MR Image acquisition and pre-processing protocol can be found in Section IV-B. Finally, RAVENS maps [13] which quantify the regional distribution of GM, WM, and CSF, are formed for each tissue type. In this test, RAVENS of GM is employed. Next, 30 samples were randomly picked and manipulated to introduce 30% atrophy with “U”-like shape in the specific region (a gyrus). In other words, the corresponding RAVENS values in that region are decreased by 30%. Fig. 4. Shows one selected slice of RAVENS map and the location of the atrophy.

Location of simulated atrophy (red points in the dashed ellipse) in the “Blurring” case. Different colors in the background reflect different RAVENS values, that is, different tissue densities.

Regarding the simulated data as a group of patients and the remaining data as the control group, we conduct SPM, SnPM and ODVBA for the group analysis. The smoothing size (FWHM of the involved Gaussian filter) in SPM and SnPM is 7mm. The parameters in ODVBA are set as ξ = 15, ϕ = 1, μ = 1, γ = 10^{−5}, τ^{2} = 10^{−5}. The experimental results are shown in Fig. 5. The background is the RAVENS map of a sample, appearing as a gray image, and the overlapping areas are the red regions obtained by different *p* value thresholds (*p* <0.05, 0.02). We can see that the proposed ODVBA method precisely describes the “U”-like shape of the atrophy, while SPM and SnPM do not. In SPM and SnPM, the images are blurred so that the results are not accurate.

The maps of *p* values from the “Blurring” case. The background is the RAVENS map of a sample, appearing as a gray image, and the overlapping areas are the red regions obtained by different *p* value thresholds.

For the simulated data, since we know the ground truth, we employ several types of metrics to evaluate ODVBA, compared with SPM and SnPM. We denote the significant voxels obtained from the three different methods as *V _{R}*, the ground truth voxels as

(21)

where #(·) means the number of voxels.

We varied the significance level of the group analysis from [0, 1] providing a series of TPR/FPR to build the receiver-operating characteristics (ROC) curve [35], as demonstrated in Fig. 6. We also list the TPR and FPR values with different specific *p* value thresholds (*p* <0.05, 0.02) in Table IV. We can see that, for the “Blurring” case, SPM and SnPM not only produce worse accuracies but also suffer from high false positive errors. Otherwise, ODVBA offers a globally better result since for any FPR, the corresponding TPR is higher.

The data used in this experiment contain 60 normal controls which also come from ADNI. The purpose is to evaluate the capabilities of three methods for detecting the simulated atrophies with different degrees. As well as in the “Blurring” case, RAVENS maps are finally created to represent the tissue density (which is proportional to regional volume, at that location). We randomly selected 30 volumes of GM and introduced the atrophy of 10%, 15%, 20%, 25%, 30%, and 35% (RAVENS values are reduced by 10%–35%) in the region around Hippocampus, separately. Fig. 7 shows one selected slice and the location of the simulated atrophy. The remaining 30 samples were regarded as control. The smoothing size in SPM and SnPM is 8mm. The parameters in ODVBA are set as ξ = 15, ϕ = 1, μ = 1, γ = 10^{−5}, τ^{2} = 10^{−5}. Fig. 8 demonstrates the performances (TPR and FPR) of the three involved methods versus the different degrees of atrophy. The *p* value which is used to get the significant region is 0.05. We can see that ODVBA has higher TPR but lower FPR on each dataset with different degrees of atrophy than SPM and SnPM. In Fig. 9, we provide some representative slices on which we plot the ground truth and the detected significant regions obtained by the three methods. It is shown that ODVBA is much more powerful to detect the true atrophy hidden in the data.

Location of simulated atrophy (red points in the black dashed ellipse) in the “Different degrees” case. Different colors in the background reflect different RAVENS values, that is, different tissue densities.

Performance vs. the degree of atrophy in the “Different degrees” case. (a) TPR vs. the degree of atrophy; (b) FPR vs. the degree of atrophy.

In this section, we study the effect of the kernel size on the performance of the three methods, using the simulated data with 30% atrophy in the “Different degrees” case. For SPM and SnPM, the kernel size means the FWHM of the involved Gaussian filter, which is changed from 5mm to 9mm. For ODVBA, the kernel size means the size of neighborhood with varying ξ from 9mm to 17mm, with an interval of 2mm. By changing the kernel size, we can get a series of different results for the three different methods. Since the ground truth is known, we plot the corresponding ROC curve in Fig. 10. As shown, the performance of ODVBA is stable. However, the sensitivity/specificity of SPM and SnPM vary a lot with different kernel sizes and are companied with high FPR increasing.

In this section, we look into the effect of the *discrimination degree* on the performance of ODVBA, using the simulated data with 30% atrophy in the “Different degrees” case. By varying the tuning parameter ϕ described in (11), we can get a series of different resulting significant maps of ODVBA. Note that larger ϕ means the coefficient (*$\stackrel{\u20d7}{w}$*)_{i} will be combined with a stronger *discrimination degree* to determine the statistical value (referring to (12)), and ϕ = 0 means that only (*$\stackrel{\u20d7}{w}$*)_{i} is used. Since the ground truth is known, we plot the corresponding TPR and FPR versus ϕ in Fig. 11. As shown, along with the increasing of ϕ, the power of detecting the true positives is gradually enhanced, accompanied by low level FPR.

With the development of computer-aided diagnosis, Alzheimer’s disease has attracted a lot of attention [3][17][27][28][37][47] in the community of neuroimage. The data used in this experiment was obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (www.loni.ucla.edu/ADNI), which has recruited approximately 800 adults, ages 55 to 90, including 200 with Normal Control (NC), 400 with mild cognitive impairment (MCI) and 200 with AD. For up-to-date information, see www.adni-info.org.

The goal of our study is to conduct complete evaluations of the structural MR images, and identify potentially complex spatial patterns of brain atrophy in AD patients, compared with NC subjects. We randomly selected 100 subjects from the ADNI cohort. This included 50 NCs and 50 ADs, whose MRI scans were then analyzed. The datasets included standard T1-weighted images with varying resolutions. Adhering to volumetric 3D MPRAGE or equivalent protocols, only images obtained using 1.5 T scanners were used. The sagittal images were preprocessed according to a number of steps detailed on the ADNI website, which corrected for field inhomogeneities and image distortion, and were resliced to axial orientation. Images were preprocessed according to the following steps. 1) Alignment of the brain with the ACPC plane; 2) Removal of extra-cranial material (skull-stripping); 3) Tissue segmentation into grey matter (GM), white matter (WM), and cerebrospinal fluid (CSF), using [41]; 4) High-dimensional image warping [43] to a standardized coordinate system, a brain atlas (template) that was aligned with the MNI coordinate space [26]; 5) Formation of tissue density maps, i.e. RAVENS maps, typically used in the modulated SPM analysis [12] and in [2].

RAVENS maps quantify the regional distribution of GM, WM, and CSF, since one RAVENS map is formed for each tissue type. RAVENS values in the template’s (stereotaxic) space are directly proportional to the volume of the respective structures in the original brain scan [12]. Therefore, regional volumetric measurements and comparisons can be performed via measurements and comparisons of the RAVENS maps. In this experiment, we used GM for evaluation purposes.

Based on the measurements of tissue density maps, we compared the performance of SPM, SnPM, and our proposed ODVBA method. For SPM and SnPM, smoothing is performed using 8 mm FWHM kernel. For ODVBA, the parameters are set as follows: ξ = 15, ϕ = 1, μ = 1, γ = 10^{−5}, τ^{2} = 10^{−5}. Both SnPM and ODVBA were implemented with 2000 permutations. Fig. 12 shows some selected sections from the results (with the *p* value < 0.001 threshold) of SPM, SnPM, and our ODVBA, respectively. We can see that the results of ODVBA not only reflect significant GM loss in AD compared with NC, but also identify many additional regions, e.g., Precuneus (Fig. 12A), Insula Cortex (Fig. 12B), Middle Temporal Gyrus (Fig. 12C), Lentiform (Fig. 12D), Occipital Lobe (Fig. 12E), Parahippocampal Gyrus (Fig. 12F), Superior Parietal Gyrus (Fig. 12G), Middle Frontal Gyrus (Fig. 12H), and Inferior Frontal Gyrus (Fig. 12I). As shown in the above mentioned sections in Fig. 12, the significant regions detected by ODVBA are either totally or partially missing from the results of SPM and SnPM. Fig. 13 shows the surface rendering of significant regions obtained from the three different methods. Moreover, these are all regions that are generally known from histopathology studies [4][5][15][21][22][34].

Representative sections with regions of relatively reduced GM in AD compared to NC. The scale indicates the −*log*(*p*) values.

Fig. 14 shows two representative magnified regions that were discovered by SPM, SnPM and ODVBA. Among them, Fig. 14A shows the region near the Hippocampus and Fig. 14B shows the region around the Temporal Lobe. For the Fig. 14A, we can see that SPM and SnPM blurred the regions of the Hippocampus and Fusiform Gyrus. In contrast, a clear division between the two regions can be found in the results of ODVBA. For the Fig. 14B, SPM and SnPM blurred the different gyri and sulci in the region of the Temporal Lobe, while failing to detect other significant areas altogether; however, ODVBA delineates a more precise area of significant atrophy in that region.

We also employ the FDR procedure [25], a powerful approach commonly used in neuroimaging applications to adjust for statistical error. FDR aims to control the portion of false positive error, instead of excluding this error [38]. In addition, FDR is adaptable since its value is computed directly on the observed *p* value distribution. We partitioned the resulting significant maps with *p* values of 0.001 and 0.05 (FDR corrected) respectively, according to predefined anatomical regions from the Jacob Atlas; calculated the means of the tissue density maps per region for all the samples; and finally, computed the *t* value based on these means according to the class information.

Table V lists the sizes and *t* statistics of major anatomical regions. We can see that not only the sizes detected by ODVBA, but also the corresponding *t* values are generally greater than those detected by SPM and SnPM. This means that the regions found by ODVBA display a higher degree of differentiation between the two groups and that SPM and SnPM might have missed some significant information.

In a separate experiment, we used the detected significant regions as the features input to a classifier of individual scans into NC or AD, and compared the three different methods in terms of the performance of classification with SVM. We randomly divided the original 50 (NC) +50 (AD) into 5 subsets (10+10 each), and then implemented 5-fold cross validation.

For each fold, 1) we used 40+40 for training and got the significant regions (with the *p* value < 0.001 threshold) by the three different group analysis methods; 2) we calculated the means of the tissue density values of significant voxels in each predefined anatomical region (from the Jacob Atlas) for all the samples, and using the means from different regions as the input features, we got the SVM classifier [9]; 3) finally, we used the left testing set with size of 10+10 for validation.

The radial basis function (RBF) kernel is used in this study. Based on the 5-fold cross-validation, the classification rate of ODVBA (90%) is superior to that of SPM (86%) and that of SnPM (87%).

In this paper, we have introduced a new framework, termed Optimally-Discriminative Voxel-Based Analysis (ODVBA), for group analysis of medical images. In the proposed framework, Nonnegative Discriminative Projection (NDP) is introduced to find the optimal discriminative direction in each learning set constructed by the neighborhood centered at the given voxel. Subsequently, each voxel’s statistic is determined by a composition of all the smoothing directions which are associated with the given voxel. Finally permutation tests are used to obtain the statistical significance of the resulting ODVBA maps. In addition, to reduce the cost of computation, we developed a new method termed Graph-assisted Greedy Neighborhoods Cover to select a minimum subset of the neighborhoods which are used to learn the discriminative directions. We compared ODVBA to the traditional SPM and the nonparametric SPM (SnPM) with both simulated data and real AD data from the ADNI cohort. The experimental results have shown tested ODVBA against the conventional smoothing methods.

The main premise of our approach is that it effectively applies a form of matched filtering, to optimally detect a group difference. Since the shape of the target region of group difference is not known, regional discriminative analyses are used to identify voxels displaying the most significant differences. In addition to potentially improving sensitivity of detection of a structural or functional signal, this approach was shown in several experiments to better delineate the region of abnormality, in contrast with conventional smoothing approaches that blur through boundaries and dilute the signal from regions of interest with signal from regions that do not display a group difference. We use the coarse grid search over tuning parameters, e.g., ξ, ϕ, μ, and γ. Section IV-A.3) *Effect of kernel size* and Section IV-A.4) *Effect of discrimination degree* are examples of the parameter selection that we studied the performance of ODVBA on different parameters. Generally, our method is not so sensitive to the tuning parameters once its performance is stable. We can determine the optimal parameters in the simulated data in which we know the ground truth. If the simulated data and the real data come from the same data source, have similar sample size, and have similar size and degree of atrophy, we assume that the optimal parameters determined by the simulated data are also suitable for the real data. That is, the simulated data would be an appropriate reference for the real data to select the optimal parameters.

As described in the Introduction, our method are related with methods which use LDA and its variants over the entire image to determine “canonical image” that best discriminate between two groups or conditions. However our approach has significant differences from these approaches. In particular, such global LDA-based methods tend to be very limited by the small sample size and high-dimensionality problem, and therefore are typically able to only detect some of the regions that display group differences or activations, but not all. They are most suitable for classification purposes, rather than for voxel-based analysis. Most importantly, these methods produce both positive and negative loadings, which reflect cancelations in the data and therefore render it very difficult to interpret the data. In general, they are not designed to construct a precise spatial map of a group difference, but rather to find the best global discriminant. The non-negativity constraints in our ODVBA are essential. Our results can be interpreted in terms of brain activity or atrophy, for example, and the regions found match exactly the underlying regions of interest. The non-negativity constraints are by far not a trivial issue to implement, since they influence the entire optimization process.

The authors would like to thank Alzheimer’s Disease Neuroimaging Initiative (ADNI) for the data collection and sharing (supported by NIH grant U01 AG024904). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the organizations as listed at www.adni-info.org. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of California, Los Angeles. This research was also supported by NIH grants P30 AG010129, K01 AG030514, and the Dana Foundation. They also thank Mr. Kayhan Batmanghelich for his contribution to the efficient software implementation of the algorithm.

The work was supported by the National Institutes of Health under Grant R01AG14971.

If λ_{min} ≥ 0, the smallest eigenvalue of γ*S _{W}* −

Auxiliary function [32] is used to derive the rule of multiplicative updates in (10).

**Definition 1.**
*G* (, *$\stackrel{\u20d7}{w}$*) is an auxiliary function for *J* (*$\stackrel{\u20d7}{w}$*) if the conditions

(22)

are satisfied.

**Lemma 1.** if *G* (, *$\stackrel{\u20d7}{w}$*) is an auxiliary function, then *J* (*$\stackrel{\u20d7}{w}$*) is nonincreasing under the update:

(23)

One can refer to [32] for the proof of the above lemma. Following [42], we introduce such an auxiliary function for *J* (*$\stackrel{\u20d7}{w}$*):

(24)

Note that [42], so it is clear that the conditions *G* (, *$\stackrel{\u20d7}{w}$*) ≥ *J* () and *G* (*$\stackrel{\u20d7}{w}$, $\stackrel{\u20d7}{w}$*) = *J* (*$\stackrel{\u20d7}{w}$*) are met.

Rewrite the auxiliary function as the following form:

(25)

where, *G _{i}* (()

(26)

Then, the minimization of *G* (, *$\stackrel{\u20d7}{w}$*) can be achieved by minimizing each *G _{i}* (()

(27)

Since the derivative of *G _{i}* (()

(28)

and considering (*$\stackrel{\u20d7}{w}$*′)_{i} ≥ 0, we obtain the update rule as follows:

(29)

Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending a request to pubs-permissions/at/ieee.org.

^{†}The work used data from the Alzheimer𒀙s Disease Neuroimaging Initiative (ADNI) database.

Tianhao Zhang, T. Zhang is with the Section of Biomedical Image Analysis, Department of Radiology, University of Pennsylvania, Philadelphia, PA 19104 USA (Email: Tianhao.Zhang/at/uphs.upenn.edu)

Christos Davatzikos, C. Davatzikos is with the Section of Biomedical Image Analysis, Department of Radiology, University of Pennsylvania, Philadelphia, PA 19104 USA (Email: Christos.Davatzikos/at/uphs.upenn.edu)

1. Ashburner J, Friston KJ. Voxel-Based morphometry-the methods. Neuroimage. 2000;vol. 11(no. 6):805–821. [PubMed]

2. Ashburner J, Csernansky JG, Davatzikos C, Fox NC, Frisoni GB, Thomson PM. Computer-assisted imaging to assess brain structure in healthy and diseased brains. The Lancet Neurology. 2003;vol. 2(no. 2):79–88. [PubMed]

3. Baron JC, Chetelat G, Desgranges B, Perchey G, Landeau B, de l SV, Eustache F. In vivo mapping of gray matter loss with voxel-based morphometry in mild Alzheimer’s disease. NeuroImage. 2001;vol. 14(no.2):298–309. [PubMed]

4. Braak H, Braak E. Neuropathological stageing of Alzheimer-related changes. Acta Neuropathologica. 1991;vol. 82(no. 4):239–259. [PubMed]

5. Brun A, Englund E. Regional pattern of degeneration in Alzheimer’s disease: neuronal loss and histopathological grading. Histopathology. 1981;vol. 5:548–564. [PubMed]

6. Carlson TA, Schrater P, He S. Patterns of activity in the categorical representations of objects. Journal of Cognitive Neuroscience. 2003;vol. 15(no.5):704–717. [PubMed]

7. Chen R, Herskovits E. Graphical-model-based morphometric analysis. IEEE Transactions on Medical Imaging. 2005;vol. 24(no. 10):1237–1248. [PubMed]

8. Cohen J. Statistical Power Analysis for the Behavioral Sciences. 2rd ed. Hillsdale, NJ: Lawrence Erlbaum Associates; 1988.

9. Collobert R, Bengio S. SVMTorch: Support Vector Machines for Large-Scale Regression Problems. Journal of Machine Learning Research. 2001;vol. 1:143–160.

10. Cormen TH, Leiserson CE, Rivest RL, Stein C. Introduction to Algorithms. 2rd ed. MIT Press; 2001.

11. Dale AM, Liu AK, Fischl BR, Buckner RL, Belliveau JW, Lewine JD, Halgren E. Dynamic statistical parametric mapping: combining fMRI and MEG for high-resolution imaging of cortical activity. Neuron. 2000;vol. 26(no. 1):55–67. [PubMed]

12. Davatzikos C, Genc A, Xu D, Resnick SM. Voxel-based morphometry using the RAVENS maps: Methods and validation using simulated longitudinal atrophy. NeuroImage. 2001;vol. 14(no. 6):1361–1369. [PubMed]

13. Davatzikos C, Li HH, Herskovits E, Resnick SM. Accuracy and sensitivity of detection of activation foci in the brain via statistical parametric mapping: a study using a PET simulator. NeuroImage. 2001;vol. 13(no. 1):176–184. [PubMed]

14. Duda R, Hart P, Stork D. Pattern Classification. 2rd ed. Wiley; 2000.

15. Engler H, Forsberg A, Almkvist O, Blomquist G, Larsson E, Savitcheva I, Wall A, Ringheim A, Langstrom B, Nordberg A. Two-year follow-up of amyloid deposition in patients with Alzheimer’s disease. Brain. 2006;vol. 129:2856–2866. [PubMed]

16. Fan Y, Shen D, Gur RC, Gur RE, Davatzikos C. COMPARE: Classification Of Morphological Patterns using Adaptive Regional Elements. IEEE Transaction on Medical Imaging. 2007;vol. 26(no. 1):93–105. [PubMed]

17. Frisoni GB, Testa C, Zorzan A, Sabattoli F, Beltramello A, Soininen H, Laakso MP. Detection of grey matter loss in mild alzheimer’s disease with voxel based morphometry. Journal of Neurology, Neurosurgery & Psychiatry. 2002;vol. 73(no.6):657–664. [PMC free article] [PubMed]

18. Friston KJ, Holmes AP, Worsley J-B, Poline KJ, Frith CD, Frackowiak RSJ. Statistical Parametric Maps in functional imaging: A general linear approach. Human Brain Mapping. 1995;vol. 2:189–210.

19. Goldszal AF, Davatzikos C, Pham D, Yan M, Bryan RN, Resnick SM. An image processing protocol for the analysis of MR images from an elderly population. Journal of Computer Assisted Tomography. 1998;vol. 22(no.5):827–837. [PubMed]

20. Haxby JV, Gobbini MI, Furey ML, Ishai A, Schouten JL, Pietrini P. Distributed and overlapping representations of faces and objects in ventral temporal cortex. Science. 2001;vol. 293:2425–2430. [PubMed]

21. Hardy GA, Higgins JA. Alzheimer’s disease: the amyloid cascade hypothesis. Science. 1992;vol. 256:184–185. [PubMed]

22. Hensley K, Hall NC, Subramaniam R, Cole P, Harris M, Aksenov M, Aksenova M, Gabbita SP, Wu JF, Carney JM, Lovell M, Markesbery WR, Buttereld DA. Brain regional correspondence between Alzheimer’s disease histopathology and biomarkers of protein oxidation. Journal of Neurochemistry. 1995;vol. 65:2146–2156. [PubMed]

23. Herskovits E, Peng H, Davatzikos C. A Bayesian morphometry algorithm. IEEE Transactions on Medical Imaging. 2004;vol. 23(no. 6):723–737. [PubMed]

24. Holmes AP, Blair RC, Watson JD, Ford I. Nonparametric analysis of statistic images from functional mapping experiments. Journal of Cerebral Blood Flow and Metabolism. 1996;vol. 16(no. 1):7–22. [PubMed]

25. Genovese CR, Lazar NA, Nichols T. Thresholding of statistical maps in functional neuroimaging using the false discovery rate. NeuroImage. 2002;no. 15:870–878. [PubMed]

26. Kabani N, MacDonald D, Holmes CJ, Evans A. A 3D atlas of the human brain. NeuroImage. 1998;vol. 7(no.4):S717.

27. Karas GB, Burton SA, Rombouts EJ, van Schijndel RA, O’Brien JT, Scheltens Ph, McKeith IG, Williams D, Ballard C, Barkhof F. A comprehensive study of gray matter loss in patients with Alzheimer’s disease using optimized voxel-based morphometry. Neuroimage. 2003;vol. 18(no. 4):895–907. [PubMed]

28. Kloppel S, Stonnington CM, Chu C, Draganski B, Scahill RI, Rohrer JD, Fox NC, Jack CR, Jr, Ashburner J, Frackowiak RSJ. Automatic classification of MR scans in Alzheimer’s disease. Brain. 2008;vol. 131(no.3):681–689. [PMC free article] [PubMed]

29. Kriegeskorte N, Goebel R, Bandettini P. Information-based functional brain mapping. Proceedings of the National Academy of Sciences. 2006;vol. 103:3863–3868. [PubMed]

30. Kriegeskorte N, Bandettini P. Analyzing for information, not activation, to exploit high-resolution fMRI. NeuroImage. 2007;vol. 38(no. 4):4649–4662. [PMC free article] [PubMed]

31. Kustra R, Strother SC. Penalized discriminant analysis of [^{15}*O*] water PET brain images with prediction error selection of smoothing and regularization hyperparameters. IEEE Transaction on Medical Imaging. 2001;vol. 20:376–387. [PubMed]

32. Lee DD, Seung HS. Algorithms for non-negative matrix factorization. Advances in Neural Information Processing. 2001;vol. 13

33. Matthew T, Wells WM, III, Louis CD, Arbel T. Feature-Based Morphometry: Discovering Group-related Anatomical Patterns. NeuroImage. 2010;vol. 49(no. 3):2318–2327. [PubMed]

34. McKee AC, Kosik KS, Kowall NW. Neuritic pathology and dementia in Alzheimer’s disease. Annals of Neurology. 1991;vol. 30:156–165. [PubMed]

35. Metz CE. ROC methodology in radiologic imaging. Investigative Radiology. 1986;vol. 21:720–733. [PubMed]

36. Misra C, Fan Y, Davatzikos C. Baseline and longitudinal patterns of brain atrophy in MCI patients, and their use in prediction of short-term conversion to AD: results from ADNI. Neuroimage. 2009;vol. 44(no. 4):1415–1422. [PMC free article] [PubMed]

37. Morra JH, Tu Z, Apostolova LG, Green AE, Toga AW, Thompson PM. Comparison of adaBoost and support vector machines for detecting alzheimer’s disease through automated hippocampal segmentation. IEEE Transactions on Medical Imaging. 2010;vol. 29(no. 1):30–43. [PMC free article] [PubMed]

38. Nichols T, Hayasaka S. Controlling the familywise error rate in functional neuroimaging: a comparative review. Statistical methods in medical research. 2003;vol. 12(no. 5):419–446. [PubMed]

39. Nichols AP, Holmes TE. Nonparametric permutation tests for functional neuroimaging: a primer with examples. Human Brain Mapping. 2002;vol. 15(no. 1):1–25. [PubMed]

40. Norman KA, Polyn SM, Detre GJ, Haxby JV. Beyond mind-reading: Multi-voxel pattern analysis of fMRI data. Trends in Cognitive Sciences. 2006;vol. 10(no. 9):424–430. [PubMed]

41. Pham DL, Prince JL. Adaptive fuzzy segmentation of magnetic resonance images. IEEE Transactions on Medical Imaging. 1999;vol. 18(no. 9):737–752. [PubMed]

42. Sha F, Lin Y, Saul LK, Lee DD. Multiplicative updates for nonnegative quadratic programming. Neural Computation. 2007;vol. 19(no. 8):2004–2031. [PubMed]

43. Shen D, Davatzikos C. HAMMER: Hierarchical attribute matching mechanism for elastic registration. IEEE Transactions on Medical Imaging. 2002;vol. 21(no. 11):1421–1439. [PubMed]

44. SPM. Available: www.fil.ion.ucl.ac.uk/spm/

45. Thomaz CE, Boardman JP, Counsell S, Hill DLG, Hajnal JV, Edwards AD, Rutherford MA, Gillies DF, Rueckert D. A multivariate statistical analysis of the developing human brain in preterm infants. Image and Vision Computing. 2007;vol. 25(no.6):981–994.

46. Thomaz CE, Duran F, Busatto GF, Gillies DF, Rueckert D. Multivariate statistical differences of MRI samples of the human brain. Journal of Mathematical Imaging and Vision. 2007;vol. 29(no.2–3):95–106.

47. Thompson PM, Hayashi KM, de Zubicaray G, Janke AL, Rose SE, Semple J, Herman D, Hong MS, Dittmer SS, Doddrell AW, Toga DM. Dynamics of gray matter loss in Alzheimer’s disease. Journal of Neuroscience. 2003;vol. 23(no. 3):994–1005. [PubMed]

48. Van De Ville D, Seghier ML, Lazeyras F, Blu T, Unser M. WSPM: Wavelet-Based Statistical Parametric Mapping. Neuroimage. 2007;vol. 37(no. 4):1205–1217. [PubMed]

49. Wink JBTM, Roerdink AM. Denoising functional MR Images: a comparison of wavelet denoising and Gaussian smoothing. IEEE Transactions on Medical Imaging. 2004;vol. 23(no. 3):374–387. [PubMed]

50. Xu L, Pearlson G, Calhoun V. Source Based Morphometry: The Use of Independent Component Analysis to Identify Gray Matter Differences with Application to Schizophrenia. Human Brain Mapping. 2009;vol. 30:711–724. [PMC free article] [PubMed]

51. Yang L. Alignment of Overlapping Locally Scaled Patches for Multidimensional Scaling and Dimensionality Reduction. IEEE Transactions Pattern Analysis and Machine Intelligence. 2008;vol. 30(no. 3):438–450. [PubMed]

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's Canada Institute for Scientific and Technical Information in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |