PMCCPMCCPMCC

Search tips
Search criteria 

Advanced

 
Logo of nihpaAbout Author manuscriptsSubmit a manuscriptNIH Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
 
IEEE Trans Med Imaging. Author manuscript; available in PMC Jul 24, 2012.
Published in final edited form as:
PMCID: PMC3402713
NIHMSID: NIHMS387950
ODVBA: Optimally-Discriminative Voxel-Based Analysis
Tianhao Zhang, Member, IEEEcorresponding author and Christos Davatzikos, Senior Member, IEEE
Tianhao Zhang, T. Zhang is with the Section of Biomedical Image Analysis, Department of Radiology, University of Pennsylvania, Philadelphia, PA 19104 USA (Tianhao.Zhang/at/uphs.upenn.edu)
corresponding authorCorresponding author.
Gaussian smoothing of images prior to applying voxel-based statistics is an important step in Voxel-Based Analysis and Statistical Parametric Mapping (VBA-SPM), and is used to account for registration errors, to Gaussianize the data, and to integrate imaging signals from a region around each voxel. However, it has also become a limitation of VBA-SPM based methods, since it is often chosen empirically and lacks spatial adaptivity to the shape and spatial extent of the region of interest, such as a region of atrophy or functional activity. In this paper, we propose a new framework, named Optimally-Discriminative Voxel-Based Analysis (ODVBA), for determining the optimal spatially adaptive smoothing of images, followed by applying voxel-based group analysis. In ODVBA, Nonnegative Discriminative Projection is applied regionally to get the direction that best discriminates between two groups, e.g., patients and controls; this direction is equivalent to local filtering by an optimal kernel whose coefficients define the optimally discriminative direction. By considering all the neighborhoods that contain a given voxel, we then compose this information to produce the statistic for each voxel. Finally, permutation tests are used to obtain a statistical parametric map of group differences. ODVBA has been evaluated using simulated data in which the ground truth is known and with data from an Alzheimer’s disease (AD) study. The experimental results have shown that the proposed ODVBA can precisely describe the shape and location of structural abnormality.
Index Terms: Gaussian smoothing, Statistical Parametric Mapping, Nonnegative Discriminative Projection, Optimally-Discriminative Voxel-Based Analysis, Voxel-Based Morphometry, Alzheimer’s disease, ADNI
Voxel-Based Analysis and Statistical Parametric Mapping (VBA-SPM) [1][18] of imaging data have offered the potential to analyze structural and functional data in great spatial detail, without the need to define a priori regions of interest (ROIs). As a result, numerous studies [7][11][23][33][48][49][50] during the past decade have better investigated brain structure and function in normal and diseased populations, and have enabled the quantification of spatio-temporal imaging patterns.
A fundamentally important aspect of VBA-SPM has been the spatial smoothing of images prior to analysis. Typically, Gaussian blurring of full-width-half-max (FWHM) in the range of 8–16mm is used to account for registration errors, to Gaussianize the data, and to integrate imaging signals from a region, rather than from a single voxel. The effect of this smoothing function is critical: if the kernel is too small for the task, statistical power will be lost and large numbers of false negatives will result in missing many regions that might present group differences in structure and function; if the kernel is too large, statistical power can also be lost by blurring image measurements from regions that display group differences with measurements from regions that have no group difference. In the latter case, spatial localization is also seriously compromised, as significant smoothing blurs the measurements out and often leads to false conclusions about the origin of a functional activation or of brain atrophy. Moreover, a filter that is too large, or that is not matched with the underlying group difference, will also have reduced sensitivity in detecting group differences. As a result, Gaussian smoothing is often chosen empirically, or in an ad hoc fashion, an obvious limitation of such VBA-SPM analyses, in part because of its heuristic nature, and in part because it can lead to overfitting of the data without proper cross-validation or correction for multiple comparisons.
However, the most profound limitation of Gaussian smoothing of images prior to applying the General Linear Model (GLM) [18] is its lack of spatial adaptivity to the shape and spatial extent of a structural or functional region of interest. For example, if atrophy or functional activation in the hippocampus is to be detected, Gaussian smoothing will blur volumetric or activation measurements from the hippocampus with such measurements from surrounding tissues, including the ventricles, the fusiform gyrus, the amygdala and the white matter. Previous work in the literature [13] has shown that spatially adaptive filtering of image data can dramatically improve statistical power to detect group differences. However, little is known about how to optimally define the shape and extent of the smoothing filter, so as to maximize the ability of VBA-SPM to detect group effects.
In this paper, we present a mathematically rigorous framework for determining the optimal spatial smoothing of structural (and potentially functional) images, prior to applying voxel-based group analysis. We consider this problem in the context of determining group differences, and we therefore restrict our experiments to voxel-wise statistical hypothesis testing. However, the mathematical formalism and algorithm are generally applicable to any type of VBA. In order to determine the optimal smoothing kernel, a regional discriminative analysis, restricted by appropriate nonnegativity constraints, is applied to a spatial neighborhood around each voxel, aiming to find the direction (in a space of dimensionality equal to the size of the neighborhood) that best highlights the difference between two groups in that neighborhood. Since each voxel belongs to a large number of such neighborhoods, each centered on one of its neighboring voxels, the group difference at each voxel is determined by a composition of all these optimal smoothing directions. Permutation tests are used to obtain the statistical significance of the resulting ODVBA maps.
This approach is akin to some fundamental principles of signal and image processing, and more specifically to the matched filtering, which states that optimal detection of a signal in the presence of noise is achieved by filtering whose kernel is related to the signal itself. In the context of voxel-based statistical analysis, the “signal” is not known, as it relates to the underlying (unknown) group difference. Therefore, the purpose of our optimization is to actually find the kernel that maximizes the signal detection.
ODVBA has some similarities with the searchlight approach [29][30], however it is significantly different. The searchlight method is basically a local multivariate analysis constrained to the immediate neighborhood of a voxel, whereas ODVBA performs a high-dimensional discriminative analysis using machine learning technique over large neighborhoods, which captures anatomical and functional patterns of larger range, thereby determining the optimally discriminative spatially varying filter. ODVBA also relates to the extensive literature using linear discriminant analysis (LDA) on multivariate patterns of whole brain images. However, implementing the standard LDA directly on the images usually suffers from the singularity problem [14] because the number of images is much smaller than the number of brain voxels. To overcome the problem, Kustra and Strother [31] used the smoothness-constrained, penalized LDA as a tool not only for a strict classification task of positron emission tomography (PEI) images, but also for extracting the activation patterns. Thomaz et al. [45][46] presented a PCA plus Maximum uncertainty LDA that solves the small sample size problem for classification and visual analysis of structural MRIs. Carlson et al. [6] used PCA plus LDA to classify brain activities of different stimulus categories [20][40] and to find which voxels contribute to the activity. Unfortunately, although the singularity problem has been addressed in the above LDA-based methods, a great deal of important information in the images would be lost, since they employ the smoothness constraint or PCA to reduce the dimensionality prior to implementing LDA. However, ODVBA does not have the singularity problem since it never involves the matrix inverse computation and the discriminative analysis is conducted on the data set constructed according to the local neighborhood so it avoids the curse of dimensionality naturally. More importantly, all these three methods attempt to obtain discriminants or a “canonical image”, which is a spatial distribution of voxels that maximally differentiates between different experimental conditions, for interpretation of abnormality or activation in the groups. However, the resulting discriminants are then usually analyzed based on visual inspection or simply thresholding without determining a voxel-wise statistical value; therefore, they do not produce the p values for each voxel in the style of traditional SPM, whereas ODVBA does. Finally, ODVBA also employs a non-negativity constraint, which is important as it prohibits canonical images with positive and negative value cancelations, which are often difficult to interpret, especially if a brain region is involved in many canonical images with different weights.
The rest of the paper is organized as follows: Section II describes the general formulation, and its numerical optimization solution. Section III introduces a method of computationally efficient implementation for the ODVBA. Section IV presents a number of experiments with 1) simulated data of known ground truth and 2) structural images of elderly individuals with Alzheimer’s disease (AD). These experiments demonstrate that the proposed methodology significantly improves both the statistical power in detecting group differences, and the accuracy with which the spatial extent of the region of interest is determined by VBA-SPM analysis. Section V contains the discussion and conclusion. For convenience, Table I lists important notations used in the paper.
TABLE I
TABLE I
Important Notations Used In The Paper.
The proposed framework consists of the following stages:
1) Regional Nonnegative Discriminative Projection
For each voxel, we examine a (typically large) neighborhood centered on it (sometimes referred to as a “subvolume”), and use the Nonnegative Discriminative Projection (NDP) to find the direction that best discriminates the two groups being compared. This direction can be viewed as a spatially adaptive filter, which locally amplifies the group differences, since projection is a weighted average of all voxels in the subvolume, i.e. it is a filter.
2) Determining each voxel’s statistic
A given voxel belongs to a number of neighborhoods centered on its neighboring voxels, so it may correspond to a collection of values, each being part of a different discriminative direction and reflecting underlying group differences. In order to eventually determine a single value for each voxel that represents the group difference at that spatial location and will be used for statistical analysis, we observe that the contribution of each voxel to a discriminative direction of a neighborhood to which it belongs is given by the respective coefficient of the optimally discriminative direction of that neighborhood. We sum up all these coefficients belonging to a given voxel to determine a quantity that reflects the voxel’s discriminating value.
3) Permutation tests
Since assumptions of Gaussianity cannot be made for the derived voxel-wise discriminative measurements, we resort to permutation tests [24][39] to obtain statistical significance (p values).
4) Classification (optional)
Although not necessarily part of ODVBA, classification is demonstrated here as an additional use of ODVBA. In particular, along the lines of the COMPARE algorithm [16], the regional clusters showing the highest group differences are used as the input features to be fed to a classifier, e.g., an SVM.
A. Regional Nonnegative Discriminative Projection
For a given voxel x in volume X, we construct its neighborhood N in which each voxel xi follows ‖xxi‖ < ξ. To render this process computationally efficient when the neighborhood size is large, we randomly select k − 1 voxels x1, (...), xk−1 in this neighborhood and represent this neighborhood using a k dimensional subvolume vector: [theta w/ right arrow above] = [x, x1, (...), xk−1]T. Provided that there are N subjects, we can obtain N subvolume vectors which form a data set: Θ = [[theta w/ right arrow above]1, [theta w/ right arrow above]2, (...), [theta w/ right arrow above]N] for learning. The procedure can be illustrated as Fig. 1. If there are M voxels in each subject (i.e. if the template to which all subjects were normalized has M voxels), we will have M learning sets.
Fig. 1
Fig. 1
Learning set construction.
1) The Basic Idea of NDP
The Nonnegative Discriminative Projection (NDP) algorithm which is developed based on Nonnegative Quadratic Programming [42] is used to find the optimal discriminative directions which project the high-dimensional samples onto a 1-dimensional space that maximizes the classification accuracy, and therefore the group differences. NDP is implemented on each learning set. The resultant optimally discriminant vector w.gif" border="0" alt="w" title=""/> has a special property: it is nonnegative. This stems from the nonnegativity constraint that is incorporated into the objective function of the NDP method. This constraint is used to help us interpret the group differences. Specifically, our goal is not simply to find an image contrast, prescribed by w.gif" border="0" alt="w" title=""/>, which distinguishes the two groups, but also requires that this contrast tells us something about the properties of the images we are measuring, e.g. about regional volumetrics or functional activity. We therefore limit ourselves to nonnegative, albeit arbitrarily shaped, local filters, each of which prescribes a regional weighted average of the signal being measured. In a regional volumetric analysis, for example, the optimal regional filter will reflect the shape of the region whose volume is different between two groups. In a functional image analysis, this filter might represent the region whose signal is summed up to reflect the activation. This is in contrast with the traditional methods of feature extraction for pattern classification, which are free to derive any feature obtained by any image filter that maximizes classification accuracy, but are not designed to necessarily measure (interpretable) group differences.
To better illustrate the idea of NDP, we show its results on a toy dataset before we describe the formulation. In this artificial test, we generated images that contained a square (the region of group difference) with intensity that varied from one image to another. We generated two groups of images: the first set of squares had intensities with mean 120.53 and standard deviation 5.79, while the second had 90.36 and 5.72, respectively. Fig. 2A shows the difference of means from the two groups. Fig. 2B shows the w.gif" border="0" alt="w" title=""/> obtained from the learning set constructed according to the neighborhood I; it is basically noise with very small values of (w.gif" border="0" alt="w" title=""/>)i, indicating that no local filter can be found that distinguishes the two groups at that neighborhood. Fig. 2C shows the w.gif" border="0" alt="w" title=""/> obtained from the learning set constructed according to the neighborhood II; the estimated optimal filter w.gif" border="0" alt="w" title=""/> is well aligned with the underlying group difference, within which it has high values. The bottom line here is that, following the formulation to be presented next, the filter that locally amplifies the group difference in this set of images calls for a weighted average of the pixel values within the part of the square that is included in the neighborhood, correctly reflecting the underlying difference. We now present the details of the formulation.
Fig. 2
Fig. 2
Illustration of the basic idea of NDP using a toy dataset.
2) The Formulation of NDP
Using one given learning set constructed by the subvolume vectors from all the involved individuals, we probe into neighborhood elements’ contributions for discrimination of the two groups. We expect to find such a nonnegative vector w.gif" border="0" alt="w" title=""/> to describe the contributions of elements in the neighborhood: the larger the value of (w.gif" border="0" alt="w" title=""/>)i is, the more the corresponding element ([theta w/ right arrow above])i contributes to the discrimination. Equivalently, (w.gif" border="0" alt="w" title=""/>)i is the ith coefficient of the regional filter denoted by w.gif" border="0" alt="w" title=""/>. By exploiting w.gif" border="0" alt="w" title=""/>, the learning set can be projected from the k-dimensional space Rk onto the 1-dimensional space R to be optimally classified, such as
equation M1
(1)
where, Ψ is the 1-dimensional projection of [theta w/ right arrow above] [set membership] Rk. We expect that the two classes will be separated as much as possible along w.gif" border="0" alt="w" title=""/>, and at the same time the samples from the same class get more compact.
A measure of the separation between the two classes is the distance of projected class means:
equation M2
(2)
where, equation M3; Ci means the ith class, i = 1, 2; Ni denotes the number of samples in Ci; equation M4; SB = (m.gif" border="0" alt="m" title=""/>1m.gif" border="0" alt="m" title=""/>2) (m.gif" border="0" alt="m" title=""/>1m.gif" border="0" alt="m" title=""/>2)T.
And, the intra-class compactness is defined as follows:
equation M5
(3)
where, equation M6.
SB and SW are called the between-class scatter matrix and the within-class scatter matrix separately, according to the classic Fisher Linear Discriminant Anaylsis [14] in which the criterion function is based on the generalized Rayleigh quotient. Herein we consider SB and SW under the formulation of quadratic programming which is amenable to the nonnegative constraint as follows:
equation M7
equation M8
(4)
where, A = (γSWSB + (|λmin| + τ2)I); γ is the controlling parameter; |λmin| is the absolute value of the smallest eigenvalue of γSWSB; τ2 << 1 acts as the regularization parameter; I is the identity matrix; e.gif" border="0" alt="e" title=""/> is the vector with all ones; the second term e.gif" border="0" alt="e" title=""/>T w.gif" border="0" alt="w" title=""/> is used to achieve equation M9 which means the solutions of (w.gif" border="0" alt="w" title=""/>)i are not all zeros under the nonnegative constraint; μ is the balance parameter.
Theorem 1. A is a positive definite matrix
The proof is in the Appendix A.
Since A is positive definite, J (w.gif" border="0" alt="w" title=""/>) is a convex function and has the unique global minimum. We solve the above optimization problem using the Nonnegative Quadratic Programming (NQP) [42]. According to [42], let A+ and A denote the nonnegative matrices described as:
equation M10
(5)
and
equation M11
(6)
It is clear that A = A+A. According to the two nonnegative matrices, the objective function in (4) can be written as the combination of three terms:
equation M12
(7)
where
equation M13
equation M14
equation M15
(8)
We define their derivatives as follows:
equation M16
equation M17
equation M18
(9)
Note that the partial derivatives on Ja and Jc above are guaranteed to be nonnegative because of the nonnegativity of w.gif" border="0" alt="w" title=""/>.
Using the above derivatives, multiplicative updates rule which does not involve the learning rates is introduced to minimize the objective function iteratively:
equation M19
(10)
where i = 1, (...), k. Equation (10) means that all the elements in w.gif" border="0" alt="w" title=""/> are updated in parallel. Since A+ w.gif" border="0" alt="w" title=""/> ≥ 0 and A w.gif" border="0" alt="w" title=""/> ≥ 0, the updated w.gif" border="0" alt="w" title=""/> in (10) is always nonnegative.
Theorem 2. The function of J(w.gif" border="0" alt="w" title=""/>) in Equation (4) decreases monotonically to the value of its global minimum under the multiplicative updates in Equation (10).
The proof is in Appendix B.
B. Determining each voxel’s statistic
For all the M voxels in one volume, we have M discriminative directions, each applied to a different neighborhood, as described in Section II-A. For a given voxel x, we obtain a list of (w.gif" border="0" alt="w" title=""/>)i values from different w.gif" border="0" alt="w" title=""/> since x may belong to a number of neighborhoods, as shown in Fig. 3. (Recall that for each neighborhood to which x belongs, the respective coefficient in w.gif" border="0" alt="w" title=""/> reflects the discrimination power of this voxel in terms of the pattern seen in that neighborhood.). To quantify the group difference measured at voxel x, we use the following function, termed discrimination degree, which relates to the effect size [8]:
equation M20
(11)
where, ϕ is a tuning parameter aiming to reduce potential outliers in the dataset. Let Δ = {N|x [set membership] N} denote the set of neighborhoods that the given voxel x belongs to, then we define the group difference on x by summing up contributions from all neighborhoods to which it participates:
equation M21
(12)
where, w.gif" border="0" alt="w" title=""/>N denotes the coefficients corresponding to voxels in N, (w.gif" border="0" alt="w" title=""/>N)i denotes that x is the ith element in N, and δN which acts as the weight for w.gif" border="0" alt="w" title=""/>N denotes the discrimination degree achieved in neighborhood N and is defined in (11). Sx will serve as the statistic reflecting group differences on the respective voxel x, and will be used next to determine statistical significance. Higher values of Sx reflect stronger group differences.
Fig. 3
Fig. 3
A given voxel belonging to a set of neighborhoods.
It is worth noting that, in ODVBA, one learning set is constructed according to one given neighborhood (subvolume vector) by different individuals. Actually, based on one learning set, the obtained coefficient w.gif" border="0" alt="w" title=""/> has only one direction (group1>group2 / group2>group1) to maximize the classification accuracy. When ODVBA handle data that have both possible positive and negative differences, it does not need to calculate the w.gif" border="0" alt="w" title=""/> twice using Nonnegative Discriminative Projection described in Section II-A, but it has to determine two different statistical values Sx for both positive and negative cases, before implementing permutation test respectively. In particular, only having the coefficient w.gif" border="0" alt="w" title=""/>, by which the projected class means mgroup1 > mgroup2 (referring to (2)), involved in (12), we can get the statistical value for positive case equation M22. Conversely, only use w.gif" border="0" alt="w" title=""/> by which the projected class means mgroup1 < mgroup2, we can get the equation M23 for negative case.
Moreover, for one given neighborhood, the optimal coefficients can only reflect the positive (assuming group1>group2) signals if the projected class means mgroup1 > mgroup2 in this subvolume. However, the negative signals will never be ignored if they are really significant, because they may be highlighted in other random neighborhood where they are stronger than the positive ones, so that mgroup1 < mgroup2.
C. Permutation tests
Assuming that the null hypothesis is that there is no difference between the two groups, the statistical significance can be assessed by comparison with the distribution of values obtained when the labels are randomly permuted [24][39]. In particular, we randomly assign the subjects into two groups, and then implement Section II-A and Section II-B to calculate the statistic for each voxel. The above relabeling is repeated Np times. For one given voxel, let S0 denote the statistic value obtained under the initial class labels, and Si, i = 1, (...), Np denote the ones obtained by relabeling. The p value for the given voxel is calculated according to:
equation M24
(13)
where,
equation M25
(14)
The original method described in Section II is based on voxel-wise computation, an extreme solution that uses all the neighborhoods corresponding to all the M voxels. Unfortunately, this approach would be complicated and computationally expensive. The simplest way to reduce the computational complexity is to randomly select a subset of the neighborhoods to represent all the neighborhoods, and then the number of the involved learning sets can be reduced. However, with this method, the selected neighborhoods may not cover all the voxels, and therefore, it is not guaranteed that statistics can be derived for each voxel. In this section, we use the Greedy Algorithm [10] to select the subset of the neighborhoods which include all the voxels and at the same time overlap each other under a certain rate. Moreover, to facilitate the greedy search, we introduce a new approach named Graph-assisted Greedy Neighborhoods Cover, which uses a graph to assist in seeking one neighborhood in each iteration of the greedy algorithm.
A. Minimum Subset of the Neighborhoods for ODVBA
We select the minimum subset of the neighborhoods B as the replacement of all the ones F, where B [subset, dbl equals] F, and then the learning sets are constructed corresponding to the selected neighborhoods respectively. To get a reasonable statistic value for each voxel after obtaining the discriminative directions using NDP, the selected neighborhoods are expected: i) to cover all the voxels in X, that is,
equation M26
(15)
ii) to overlap each other under at least an overlapping factor [51] a (0 < a < 1), that is,
equation M27
(16)
for each N in B.
B. The Traditional Greedy Neighborhoods Cover Algorithm
The problem introduced in Section III-A can be solved using the greedy algorithm [10][51], which works by picking N that covers the maximum number of remaining elements that are uncovered under certain constraint in each iteration. Let U contain all the remaining uncovered elements at each iteration. The algorithm chooses the neighborhood Ni that overlaps with the selected neighborhoods in B and covers as many uncovered elements in U as possible. Then, the selected Ni is put into B and the elements covered by Ni is removed from U. It finally terminates when U is empty. The algorithm is summarized in Table II.
TABLE II
TABLE II
Algorithm of Greedy Neighborhoods Cover
C. Graph-assisted Greedy Neighborhoods Cover algorithm
It is worth noting that Line 5 of the algorithm in Table II may be time-consuming because it picks Ni from all the M neighborhoods without using the neighbor information for searching in each iteration. We propose a Graph-assisted Greedy Algorithm which select Ni by implementing simple operations on the small sub-block of the graph in each iteration. The algorithm is described as follows.
A graph G is introduced to model the neighbor relationship between every two voxels in X. Each sample point in X is a vertex of the graph. An edge is put from xi to xj if xj is located in the neighborhood centered at xi. That is:
equation M28
(17)
Obviously, each row of G represents one neighborhood. Let B denote the set of the elements covered by the neighborhoods from B, that is B = [union or logical sum]N [set membership] B N, use IB to denote the index set of the selected neighborhoods, and we can get the sub-matrix from G as follows:
equation M29
(18)
where set L follows
  • L [subset or is implied by] X;
  • LIB = [empty];
  • Gsub (i, B) · e.gif" border="0" alt="e" title=""/>k, i [set membership] L;
  • equation M30
    (19)
Among the terms in (19), 1) means that the rows of Gsub come from G; 2) means that the neighborhoods to which the rows of Gsub correspond have never been selected to B; 3) means the sum of each row in Gsub are not equal to k, that is, the neighborhood which are fully covered by the B are removed; 4) means the neighborhoods having no intersection with B are removed.
Compute
equation M31
(20)
and we can know T(i) is the number of elements covered by B, in the L {i}th neighborhood. Let imin denote the index of element which is the minimum in T with constraint: T(imin) ≥ a · |N| = a · k. Then the selected N in this iteration should be NL{imin}. The proposed algorithm is summarized in Table III.
TABLE III
TABLE III
Algorithm of Graph-assistanted Greedy Neighborhoods Cover
In this section, we designed two different kinds of experiments to evaluate the performance of ODVBA compared with the original SPM [1][44] and nonparametric permutation based SPM (referred to SnPM) [24][39]. In the first experiment, we applied the three methods to the simulated atrophy data in which we know the ground truth. Here, it is easy for us to evaluate the accuracy and statistical power of ODVBA, in comparison with SPM and SnPM. The second experiment demonstrated the success of our method on the real data of AD patients from ADNI (www.loni.ucla.edu/ADNI).
A. Simulated Atrophy Data
1) “Blurring” case
The dataset consisted of real MRI scans of 60 normal controls, with a relatively small age range, obtained from ADNI. The description of the MR Image acquisition and pre-processing protocol can be found in Section IV-B. Finally, RAVENS maps [13] which quantify the regional distribution of GM, WM, and CSF, are formed for each tissue type. In this test, RAVENS of GM is employed. Next, 30 samples were randomly picked and manipulated to introduce 30% atrophy with “U”-like shape in the specific region (a gyrus). In other words, the corresponding RAVENS values in that region are decreased by 30%. Fig. 4. Shows one selected slice of RAVENS map and the location of the atrophy.
Fig. 4
Fig. 4
Location of simulated atrophy (red points in the dashed ellipse) in the “Blurring” case. Different colors in the background reflect different RAVENS values, that is, different tissue densities.
Regarding the simulated data as a group of patients and the remaining data as the control group, we conduct SPM, SnPM and ODVBA for the group analysis. The smoothing size (FWHM of the involved Gaussian filter) in SPM and SnPM is 7mm. The parameters in ODVBA are set as ξ = 15, ϕ = 1, μ = 1, γ = 10−5, τ2 = 10−5. The experimental results are shown in Fig. 5. The background is the RAVENS map of a sample, appearing as a gray image, and the overlapping areas are the red regions obtained by different p value thresholds (p <0.05, 0.02). We can see that the proposed ODVBA method precisely describes the “U”-like shape of the atrophy, while SPM and SnPM do not. In SPM and SnPM, the images are blurred so that the results are not accurate.
Fig. 5
Fig. 5
The maps of p values from the “Blurring” case. The background is the RAVENS map of a sample, appearing as a gray image, and the overlapping areas are the red regions obtained by different p value thresholds.
For the simulated data, since we know the ground truth, we employ several types of metrics to evaluate ODVBA, compared with SPM and SnPM. We denote the significant voxels obtained from the three different methods as VR, the ground truth voxels as VG, and all the voxels involved in analysis as VA. True Positive Rate (TPR) and False Positive Rate (FPR), two commonly used assessment metrics, are defined as follows:
equation M32
equation M33
(21)
where #(·) means the number of voxels.
We varied the significance level of the group analysis from [0, 1] providing a series of TPR/FPR to build the receiver-operating characteristics (ROC) curve [35], as demonstrated in Fig. 6. We also list the TPR and FPR values with different specific p value thresholds (p <0.05, 0.02) in Table IV. We can see that, for the “Blurring” case, SPM and SnPM not only produce worse accuracies but also suffer from high false positive errors. Otherwise, ODVBA offers a globally better result since for any FPR, the corresponding TPR is higher.
Fig. 6
Fig. 6
ROC curve of the “Bluring” case.
TABLE IV
TABLE IV
TPR and FPR values with the specific p value thresholds in the “Blurring” case.
2) “Different degrees” case
The data used in this experiment contain 60 normal controls which also come from ADNI. The purpose is to evaluate the capabilities of three methods for detecting the simulated atrophies with different degrees. As well as in the “Blurring” case, RAVENS maps are finally created to represent the tissue density (which is proportional to regional volume, at that location). We randomly selected 30 volumes of GM and introduced the atrophy of 10%, 15%, 20%, 25%, 30%, and 35% (RAVENS values are reduced by 10%–35%) in the region around Hippocampus, separately. Fig. 7 shows one selected slice and the location of the simulated atrophy. The remaining 30 samples were regarded as control. The smoothing size in SPM and SnPM is 8mm. The parameters in ODVBA are set as ξ = 15, ϕ = 1, μ = 1, γ = 10−5, τ2 = 10−5. Fig. 8 demonstrates the performances (TPR and FPR) of the three involved methods versus the different degrees of atrophy. The p value which is used to get the significant region is 0.05. We can see that ODVBA has higher TPR but lower FPR on each dataset with different degrees of atrophy than SPM and SnPM. In Fig. 9, we provide some representative slices on which we plot the ground truth and the detected significant regions obtained by the three methods. It is shown that ODVBA is much more powerful to detect the true atrophy hidden in the data.
Fig. 7
Fig. 7
Location of simulated atrophy (red points in the black dashed ellipse) in the “Different degrees” case. Different colors in the background reflect different RAVENS values, that is, different tissue densities.
Fig. 8
Fig. 8
Performance vs. the degree of atrophy in the “Different degrees” case. (a) TPR vs. the degree of atrophy; (b) FPR vs. the degree of atrophy.
Fig. 9
Fig. 9
Respective slices with ground truth and significant regions. The background is the MR image of the template, appearing as a gray image, the overlapping green color areas are the ground truth, and the overlapping red color areas are the detected significant (more ...)
3) Effect of kernel size
In this section, we study the effect of the kernel size on the performance of the three methods, using the simulated data with 30% atrophy in the “Different degrees” case. For SPM and SnPM, the kernel size means the FWHM of the involved Gaussian filter, which is changed from 5mm to 9mm. For ODVBA, the kernel size means the size of neighborhood with varying ξ from 9mm to 17mm, with an interval of 2mm. By changing the kernel size, we can get a series of different results for the three different methods. Since the ground truth is known, we plot the corresponding ROC curve in Fig. 10. As shown, the performance of ODVBA is stable. However, the sensitivity/specificity of SPM and SnPM vary a lot with different kernel sizes and are companied with high FPR increasing.
Fig. 10
Fig. 10
Performances of three methods with different kernel sizes.
4) Effect of discrimination degree
In this section, we look into the effect of the discrimination degree on the performance of ODVBA, using the simulated data with 30% atrophy in the “Different degrees” case. By varying the tuning parameter ϕ described in (11), we can get a series of different resulting significant maps of ODVBA. Note that larger ϕ means the coefficient (w.gif" border="0" alt="w" title=""/>)i will be combined with a stronger discrimination degree to determine the statistical value (referring to (12)), and ϕ = 0 means that only (w.gif" border="0" alt="w" title=""/>)i is used. Since the ground truth is known, we plot the corresponding TPR and FPR versus ϕ in Fig. 11. As shown, along with the increasing of ϕ, the power of detecting the true positives is gradually enhanced, accompanied by low level FPR.
Fig. 11
Fig. 11
Performance of ODVBA vs. ϕ. (a) TPR vs. ϕ ; (b) FPR vs. ϕ.
B. Real AD data
With the development of computer-aided diagnosis, Alzheimer’s disease has attracted a lot of attention [3][17][27][28][37][47] in the community of neuroimage. The data used in this experiment was obtained from the Alzheimer’s Disease Neuroimaging Initiative (ADNI) database (www.loni.ucla.edu/ADNI), which has recruited approximately 800 adults, ages 55 to 90, including 200 with Normal Control (NC), 400 with mild cognitive impairment (MCI) and 200 with AD. For up-to-date information, see www.adni-info.org.
The goal of our study is to conduct complete evaluations of the structural MR images, and identify potentially complex spatial patterns of brain atrophy in AD patients, compared with NC subjects. We randomly selected 100 subjects from the ADNI cohort. This included 50 NCs and 50 ADs, whose MRI scans were then analyzed. The datasets included standard T1-weighted images with varying resolutions. Adhering to volumetric 3D MPRAGE or equivalent protocols, only images obtained using 1.5 T scanners were used. The sagittal images were preprocessed according to a number of steps detailed on the ADNI website, which corrected for field inhomogeneities and image distortion, and were resliced to axial orientation. Images were preprocessed according to the following steps. 1) Alignment of the brain with the ACPC plane; 2) Removal of extra-cranial material (skull-stripping); 3) Tissue segmentation into grey matter (GM), white matter (WM), and cerebrospinal fluid (CSF), using [41]; 4) High-dimensional image warping [43] to a standardized coordinate system, a brain atlas (template) that was aligned with the MNI coordinate space [26]; 5) Formation of tissue density maps, i.e. RAVENS maps, typically used in the modulated SPM analysis [12] and in [2].
RAVENS maps quantify the regional distribution of GM, WM, and CSF, since one RAVENS map is formed for each tissue type. RAVENS values in the template’s (stereotaxic) space are directly proportional to the volume of the respective structures in the original brain scan [12]. Therefore, regional volumetric measurements and comparisons can be performed via measurements and comparisons of the RAVENS maps. In this experiment, we used GM for evaluation purposes.
1) Resulting Significant Maps
Based on the measurements of tissue density maps, we compared the performance of SPM, SnPM, and our proposed ODVBA method. For SPM and SnPM, smoothing is performed using 8 mm FWHM kernel. For ODVBA, the parameters are set as follows: ξ = 15, ϕ = 1, μ = 1, γ = 10−5, τ2 = 10−5. Both SnPM and ODVBA were implemented with 2000 permutations. Fig. 12 shows some selected sections from the results (with the p value < 0.001 threshold) of SPM, SnPM, and our ODVBA, respectively. We can see that the results of ODVBA not only reflect significant GM loss in AD compared with NC, but also identify many additional regions, e.g., Precuneus (Fig. 12A), Insula Cortex (Fig. 12B), Middle Temporal Gyrus (Fig. 12C), Lentiform (Fig. 12D), Occipital Lobe (Fig. 12E), Parahippocampal Gyrus (Fig. 12F), Superior Parietal Gyrus (Fig. 12G), Middle Frontal Gyrus (Fig. 12H), and Inferior Frontal Gyrus (Fig. 12I). As shown in the above mentioned sections in Fig. 12, the significant regions detected by ODVBA are either totally or partially missing from the results of SPM and SnPM. Fig. 13 shows the surface rendering of significant regions obtained from the three different methods. Moreover, these are all regions that are generally known from histopathology studies [4][5][15][21][22][34].
Fig. 12
Fig. 12
Representative sections with regions of relatively reduced GM in AD compared to NC. The scale indicates the −log(p) values.
Fig. 13
Fig. 13
Surface renderings of regions detected by the three methods.
Fig. 14 shows two representative magnified regions that were discovered by SPM, SnPM and ODVBA. Among them, Fig. 14A shows the region near the Hippocampus and Fig. 14B shows the region around the Temporal Lobe. For the Fig. 14A, we can see that SPM and SnPM blurred the regions of the Hippocampus and Fusiform Gyrus. In contrast, a clear division between the two regions can be found in the results of ODVBA. For the Fig. 14B, SPM and SnPM blurred the different gyri and sulci in the region of the Temporal Lobe, while failing to detect other significant areas altogether; however, ODVBA delineates a more precise area of significant atrophy in that region.
Fig. 14
Fig. 14
Two representative magnified regions. The scale indicates the −log(p) values.
We also employ the FDR procedure [25], a powerful approach commonly used in neuroimaging applications to adjust for statistical error. FDR aims to control the portion of false positive error, instead of excluding this error [38]. In addition, FDR is adaptable since its value is computed directly on the observed p value distribution. We partitioned the resulting significant maps with p values of 0.001 and 0.05 (FDR corrected) respectively, according to predefined anatomical regions from the Jacob Atlas; calculated the means of the tissue density maps per region for all the samples; and finally, computed the t value based on these means according to the class information.
Table V lists the sizes and t statistics of major anatomical regions. We can see that not only the sizes detected by ODVBA, but also the corresponding t values are generally greater than those detected by SPM and SnPM. This means that the regions found by ODVBA display a higher degree of differentiation between the two groups and that SPM and SnPM might have missed some significant information.
TABLE V
TABLE V
Statistics on Anatomical Regions.
2) Pattern Classification
In a separate experiment, we used the detected significant regions as the features input to a classifier of individual scans into NC or AD, and compared the three different methods in terms of the performance of classification with SVM. We randomly divided the original 50 (NC) +50 (AD) into 5 subsets (10+10 each), and then implemented 5-fold cross validation.
For each fold, 1) we used 40+40 for training and got the significant regions (with the p value < 0.001 threshold) by the three different group analysis methods; 2) we calculated the means of the tissue density values of significant voxels in each predefined anatomical region (from the Jacob Atlas) for all the samples, and using the means from different regions as the input features, we got the SVM classifier [9]; 3) finally, we used the left testing set with size of 10+10 for validation.
The radial basis function (RBF) kernel is used in this study. Based on the 5-fold cross-validation, the classification rate of ODVBA (90%) is superior to that of SPM (86%) and that of SnPM (87%).
In this paper, we have introduced a new framework, termed Optimally-Discriminative Voxel-Based Analysis (ODVBA), for group analysis of medical images. In the proposed framework, Nonnegative Discriminative Projection (NDP) is introduced to find the optimal discriminative direction in each learning set constructed by the neighborhood centered at the given voxel. Subsequently, each voxel’s statistic is determined by a composition of all the smoothing directions which are associated with the given voxel. Finally permutation tests are used to obtain the statistical significance of the resulting ODVBA maps. In addition, to reduce the cost of computation, we developed a new method termed Graph-assisted Greedy Neighborhoods Cover to select a minimum subset of the neighborhoods which are used to learn the discriminative directions. We compared ODVBA to the traditional SPM and the nonparametric SPM (SnPM) with both simulated data and real AD data from the ADNI cohort. The experimental results have shown tested ODVBA against the conventional smoothing methods.
The main premise of our approach is that it effectively applies a form of matched filtering, to optimally detect a group difference. Since the shape of the target region of group difference is not known, regional discriminative analyses are used to identify voxels displaying the most significant differences. In addition to potentially improving sensitivity of detection of a structural or functional signal, this approach was shown in several experiments to better delineate the region of abnormality, in contrast with conventional smoothing approaches that blur through boundaries and dilute the signal from regions of interest with signal from regions that do not display a group difference. We use the coarse grid search over tuning parameters, e.g., ξ, ϕ, μ, and γ. Section IV-A.3) Effect of kernel size and Section IV-A.4) Effect of discrimination degree are examples of the parameter selection that we studied the performance of ODVBA on different parameters. Generally, our method is not so sensitive to the tuning parameters once its performance is stable. We can determine the optimal parameters in the simulated data in which we know the ground truth. If the simulated data and the real data come from the same data source, have similar sample size, and have similar size and degree of atrophy, we assume that the optimal parameters determined by the simulated data are also suitable for the real data. That is, the simulated data would be an appropriate reference for the real data to select the optimal parameters.
As described in the Introduction, our method are related with methods which use LDA and its variants over the entire image to determine “canonical image” that best discriminate between two groups or conditions. However our approach has significant differences from these approaches. In particular, such global LDA-based methods tend to be very limited by the small sample size and high-dimensionality problem, and therefore are typically able to only detect some of the regions that display group differences or activations, but not all. They are most suitable for classification purposes, rather than for voxel-based analysis. Most importantly, these methods produce both positive and negative loadings, which reflect cancelations in the data and therefore render it very difficult to interpret the data. In general, they are not designed to construct a precise spatial map of a group difference, but rather to find the best global discriminant. The non-negativity constraints in our ODVBA are essential. Our results can be interpreted in terms of brain activity or atrophy, for example, and the regions found match exactly the underlying regions of interest. The non-negativity constraints are by far not a trivial issue to implement, since they influence the entire optimization process.
Acknowledgment
The authors would like to thank Alzheimer’s Disease Neuroimaging Initiative (ADNI) for the data collection and sharing (supported by NIH grant U01 AG024904). ADNI is funded by the National Institute on Aging, the National Institute of Biomedical Imaging and Bioengineering, and through generous contributions from the organizations as listed at www.adni-info.org. ADNI data are disseminated by the Laboratory for Neuro Imaging at the University of California, Los Angeles. This research was also supported by NIH grants P30 AG010129, K01 AG030514, and the Dana Foundation. They also thank Mr. Kayhan Batmanghelich for his contribution to the efficient software implementation of the algorithm.
The work was supported by the National Institutes of Health under Grant R01AG14971.
APPENDIX A
Proof of the Positive Definite Matrix
If λmin ≥ 0, the smallest eigenvalue of γSWSB + (|λmin|+τ2) I is 2λmin2 which is greater than 0. If λmin < 0, the smallest eigenvalue of γSWSB + (|λmin| + τ2)I is just τ2. In a word, all the eigenvalues of A is greater than 0. Since SW, SB, and I are all symmetric matrices, A = γSWSB + (|λmin| + τ2)I is a symmetric matrix. Thus, we complete the proof.
APPENDIX B
Proof of Convergence and Optimality
Auxiliary function [32] is used to derive the rule of multiplicative updates in (10).
Definition 1. G ([upsilon with right arrow above], w.gif" border="0" alt="w" title=""/>) is an auxiliary function for J (w.gif" border="0" alt="w" title=""/>) if the conditions
equation M34
(22)
are satisfied.
Lemma 1. if G ([upsilon with right arrow above], w.gif" border="0" alt="w" title=""/>) is an auxiliary function, then J (w.gif" border="0" alt="w" title=""/>) is nonincreasing under the update:
equation M35
(23)
One can refer to [32] for the proof of the above lemma. Following [42], we introduce such an auxiliary function for J (w.gif" border="0" alt="w" title=""/>):
equation M36
(24)
Note that equation M37 [42], so it is clear that the conditions G ([upsilon with right arrow above], w.gif" border="0" alt="w" title=""/>) ≥ J ([upsilon with right arrow above]) and G (w.gif" border="0" alt="w" title=""/>, w.gif" border="0" alt="w" title=""/>) = J (w.gif" border="0" alt="w" title=""/>) are met.
Rewrite the auxiliary function as the following form:
equation M38
(25)
where, Gi (([upsilon with right arrow above])i) means a function for each component in [upsilon with right arrow above]:
equation M39
(26)
Then, the minimization of G ([upsilon with right arrow above], w.gif" border="0" alt="w" title=""/>) can be achieved by minimizing each Gi (([upsilon with right arrow above])i), and according to (23) we have:
equation M40
(27)
Since the derivative of Gi (([upsilon with right arrow above])i) is:
equation M41
(28)
and considering (w.gif" border="0" alt="w" title=""/>′)i ≥ 0, we obtain the update rule as follows:
equation M42
(29)
Footnotes
Personal use of this material is permitted. However, permission to use this material for any other purposes must be obtained from the IEEE by sending a request to pubs-permissions/at/ieee.org.
The work used data from the Alzheimer𒀙s Disease Neuroimaging Initiative (ADNI) database.
Contributor Information
Tianhao Zhang, T. Zhang is with the Section of Biomedical Image Analysis, Department of Radiology, University of Pennsylvania, Philadelphia, PA 19104 USA (Tianhao.Zhang/at/uphs.upenn.edu)
Christos Davatzikos, C. Davatzikos is with the Section of Biomedical Image Analysis, Department of Radiology, University of Pennsylvania, Philadelphia, PA 19104 USA (Christos.Davatzikos/at/uphs.upenn.edu)
1. Ashburner J, Friston KJ. Voxel-Based morphometry-the methods. Neuroimage. 2000;vol. 11(no. 6):805–821. [PubMed]
2. Ashburner J, Csernansky JG, Davatzikos C, Fox NC, Frisoni GB, Thomson PM. Computer-assisted imaging to assess brain structure in healthy and diseased brains. The Lancet Neurology. 2003;vol. 2(no. 2):79–88. [PubMed]
3. Baron JC, Chetelat G, Desgranges B, Perchey G, Landeau B, de l SV, Eustache F. In vivo mapping of gray matter loss with voxel-based morphometry in mild Alzheimer’s disease. NeuroImage. 2001;vol. 14(no.2):298–309. [PubMed]
4. Braak H, Braak E. Neuropathological stageing of Alzheimer-related changes. Acta Neuropathologica. 1991;vol. 82(no. 4):239–259. [PubMed]
5. Brun A, Englund E. Regional pattern of degeneration in Alzheimer’s disease: neuronal loss and histopathological grading. Histopathology. 1981;vol. 5:548–564. [PubMed]
6. Carlson TA, Schrater P, He S. Patterns of activity in the categorical representations of objects. Journal of Cognitive Neuroscience. 2003;vol. 15(no.5):704–717. [PubMed]
7. Chen R, Herskovits E. Graphical-model-based morphometric analysis. IEEE Transactions on Medical Imaging. 2005;vol. 24(no. 10):1237–1248. [PubMed]
8. Cohen J. Statistical Power Analysis for the Behavioral Sciences. 2rd ed. Hillsdale, NJ: Lawrence Erlbaum Associates; 1988.
9. Collobert R, Bengio S. SVMTorch: Support Vector Machines for Large-Scale Regression Problems. Journal of Machine Learning Research. 2001;vol. 1:143–160.
10. Cormen TH, Leiserson CE, Rivest RL, Stein C. Introduction to Algorithms. 2rd ed. MIT Press; 2001.
11. Dale AM, Liu AK, Fischl BR, Buckner RL, Belliveau JW, Lewine JD, Halgren E. Dynamic statistical parametric mapping: combining fMRI and MEG for high-resolution imaging of cortical activity. Neuron. 2000;vol. 26(no. 1):55–67. [PubMed]
12. Davatzikos C, Genc A, Xu D, Resnick SM. Voxel-based morphometry using the RAVENS maps: Methods and validation using simulated longitudinal atrophy. NeuroImage. 2001;vol. 14(no. 6):1361–1369. [PubMed]
13. Davatzikos C, Li HH, Herskovits E, Resnick SM. Accuracy and sensitivity of detection of activation foci in the brain via statistical parametric mapping: a study using a PET simulator. NeuroImage. 2001;vol. 13(no. 1):176–184. [PubMed]
14. Duda R, Hart P, Stork D. Pattern Classification. 2rd ed. Wiley; 2000.
15. Engler H, Forsberg A, Almkvist O, Blomquist G, Larsson E, Savitcheva I, Wall A, Ringheim A, Langstrom B, Nordberg A. Two-year follow-up of amyloid deposition in patients with Alzheimer’s disease. Brain. 2006;vol. 129:2856–2866. [PubMed]
16. Fan Y, Shen D, Gur RC, Gur RE, Davatzikos C. COMPARE: Classification Of Morphological Patterns using Adaptive Regional Elements. IEEE Transaction on Medical Imaging. 2007;vol. 26(no. 1):93–105. [PubMed]
17. Frisoni GB, Testa C, Zorzan A, Sabattoli F, Beltramello A, Soininen H, Laakso MP. Detection of grey matter loss in mild alzheimer’s disease with voxel based morphometry. Journal of Neurology, Neurosurgery & Psychiatry. 2002;vol. 73(no.6):657–664. [PMC free article] [PubMed]
18. Friston KJ, Holmes AP, Worsley J-B, Poline KJ, Frith CD, Frackowiak RSJ. Statistical Parametric Maps in functional imaging: A general linear approach. Human Brain Mapping. 1995;vol. 2:189–210.
19. Goldszal AF, Davatzikos C, Pham D, Yan M, Bryan RN, Resnick SM. An image processing protocol for the analysis of MR images from an elderly population. Journal of Computer Assisted Tomography. 1998;vol. 22(no.5):827–837. [PubMed]
20. Haxby JV, Gobbini MI, Furey ML, Ishai A, Schouten JL, Pietrini P. Distributed and overlapping representations of faces and objects in ventral temporal cortex. Science. 2001;vol. 293:2425–2430. [PubMed]
21. Hardy GA, Higgins JA. Alzheimer’s disease: the amyloid cascade hypothesis. Science. 1992;vol. 256:184–185. [PubMed]
22. Hensley K, Hall NC, Subramaniam R, Cole P, Harris M, Aksenov M, Aksenova M, Gabbita SP, Wu JF, Carney JM, Lovell M, Markesbery WR, Buttereld DA. Brain regional correspondence between Alzheimer’s disease histopathology and biomarkers of protein oxidation. Journal of Neurochemistry. 1995;vol. 65:2146–2156. [PubMed]
23. Herskovits E, Peng H, Davatzikos C. A Bayesian morphometry algorithm. IEEE Transactions on Medical Imaging. 2004;vol. 23(no. 6):723–737. [PubMed]
24. Holmes AP, Blair RC, Watson JD, Ford I. Nonparametric analysis of statistic images from functional mapping experiments. Journal of Cerebral Blood Flow and Metabolism. 1996;vol. 16(no. 1):7–22. [PubMed]
25. Genovese CR, Lazar NA, Nichols T. Thresholding of statistical maps in functional neuroimaging using the false discovery rate. NeuroImage. 2002;no. 15:870–878. [PubMed]
26. Kabani N, MacDonald D, Holmes CJ, Evans A. A 3D atlas of the human brain. NeuroImage. 1998;vol. 7(no.4):S717.
27. Karas GB, Burton SA, Rombouts EJ, van Schijndel RA, O’Brien JT, Scheltens Ph, McKeith IG, Williams D, Ballard C, Barkhof F. A comprehensive study of gray matter loss in patients with Alzheimer’s disease using optimized voxel-based morphometry. Neuroimage. 2003;vol. 18(no. 4):895–907. [PubMed]
28. Kloppel S, Stonnington CM, Chu C, Draganski B, Scahill RI, Rohrer JD, Fox NC, Jack CR, Jr, Ashburner J, Frackowiak RSJ. Automatic classification of MR scans in Alzheimer’s disease. Brain. 2008;vol. 131(no.3):681–689. [PMC free article] [PubMed]
29. Kriegeskorte N, Goebel R, Bandettini P. Information-based functional brain mapping. Proceedings of the National Academy of Sciences. 2006;vol. 103:3863–3868. [PubMed]
30. Kriegeskorte N, Bandettini P. Analyzing for information, not activation, to exploit high-resolution fMRI. NeuroImage. 2007;vol. 38(no. 4):4649–4662. [PMC free article] [PubMed]
31. Kustra R, Strother SC. Penalized discriminant analysis of [15O] water PET brain images with prediction error selection of smoothing and regularization hyperparameters. IEEE Transaction on Medical Imaging. 2001;vol. 20:376–387. [PubMed]
32. Lee DD, Seung HS. Algorithms for non-negative matrix factorization. Advances in Neural Information Processing. 2001;vol. 13
33. Matthew T, Wells WM, III, Louis CD, Arbel T. Feature-Based Morphometry: Discovering Group-related Anatomical Patterns. NeuroImage. 2010;vol. 49(no. 3):2318–2327. [PubMed]
34. McKee AC, Kosik KS, Kowall NW. Neuritic pathology and dementia in Alzheimer’s disease. Annals of Neurology. 1991;vol. 30:156–165. [PubMed]
35. Metz CE. ROC methodology in radiologic imaging. Investigative Radiology. 1986;vol. 21:720–733. [PubMed]
36. Misra C, Fan Y, Davatzikos C. Baseline and longitudinal patterns of brain atrophy in MCI patients, and their use in prediction of short-term conversion to AD: results from ADNI. Neuroimage. 2009;vol. 44(no. 4):1415–1422. [PMC free article] [PubMed]
37. Morra JH, Tu Z, Apostolova LG, Green AE, Toga AW, Thompson PM. Comparison of adaBoost and support vector machines for detecting alzheimer’s disease through automated hippocampal segmentation. IEEE Transactions on Medical Imaging. 2010;vol. 29(no. 1):30–43. [PMC free article] [PubMed]
38. Nichols T, Hayasaka S. Controlling the familywise error rate in functional neuroimaging: a comparative review. Statistical methods in medical research. 2003;vol. 12(no. 5):419–446. [PubMed]
39. Nichols AP, Holmes TE. Nonparametric permutation tests for functional neuroimaging: a primer with examples. Human Brain Mapping. 2002;vol. 15(no. 1):1–25. [PubMed]
40. Norman KA, Polyn SM, Detre GJ, Haxby JV. Beyond mind-reading: Multi-voxel pattern analysis of fMRI data. Trends in Cognitive Sciences. 2006;vol. 10(no. 9):424–430. [PubMed]
41. Pham DL, Prince JL. Adaptive fuzzy segmentation of magnetic resonance images. IEEE Transactions on Medical Imaging. 1999;vol. 18(no. 9):737–752. [PubMed]
42. Sha F, Lin Y, Saul LK, Lee DD. Multiplicative updates for nonnegative quadratic programming. Neural Computation. 2007;vol. 19(no. 8):2004–2031. [PubMed]
43. Shen D, Davatzikos C. HAMMER: Hierarchical attribute matching mechanism for elastic registration. IEEE Transactions on Medical Imaging. 2002;vol. 21(no. 11):1421–1439. [PubMed]
44. SPM. Available: www.fil.ion.ucl.ac.uk/spm/
45. Thomaz CE, Boardman JP, Counsell S, Hill DLG, Hajnal JV, Edwards AD, Rutherford MA, Gillies DF, Rueckert D. A multivariate statistical analysis of the developing human brain in preterm infants. Image and Vision Computing. 2007;vol. 25(no.6):981–994.
46. Thomaz CE, Duran F, Busatto GF, Gillies DF, Rueckert D. Multivariate statistical differences of MRI samples of the human brain. Journal of Mathematical Imaging and Vision. 2007;vol. 29(no.2–3):95–106.
47. Thompson PM, Hayashi KM, de Zubicaray G, Janke AL, Rose SE, Semple J, Herman D, Hong MS, Dittmer SS, Doddrell AW, Toga DM. Dynamics of gray matter loss in Alzheimer’s disease. Journal of Neuroscience. 2003;vol. 23(no. 3):994–1005. [PubMed]
48. Van De Ville D, Seghier ML, Lazeyras F, Blu T, Unser M. WSPM: Wavelet-Based Statistical Parametric Mapping. Neuroimage. 2007;vol. 37(no. 4):1205–1217. [PubMed]
49. Wink JBTM, Roerdink AM. Denoising functional MR Images: a comparison of wavelet denoising and Gaussian smoothing. IEEE Transactions on Medical Imaging. 2004;vol. 23(no. 3):374–387. [PubMed]
50. Xu L, Pearlson G, Calhoun V. Source Based Morphometry: The Use of Independent Component Analysis to Identify Gray Matter Differences with Application to Schizophrenia. Human Brain Mapping. 2009;vol. 30:711–724. [PMC free article] [PubMed]
51. Yang L. Alignment of Overlapping Locally Scaled Patches for Multidimensional Scaling and Dimensionality Reduction. IEEE Transactions Pattern Analysis and Machine Intelligence. 2008;vol. 30(no. 3):438–450. [PubMed]