Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2998914

Formats

Article sections

- Abstract
- 1. INTRODUCTION
- 2. CLASSIFICATION VIA AGGREGATED REGRESSION (CAVIAR) ALGORITHM
- 3. OPTIMIZATION
- 4. EXPERIMENTS
- REFERENCES

Authors

Related links

Proc IEEE Int Symp Biomed Imaging. Author manuscript; available in PMC 2010 December 8.

Published in final edited form as:

Proc IEEE Int Symp Biomed Imaging. 2010 April 14; 2010: 1337–1340.

doi: 10.1109/ISBI.2010.5490244PMCID: PMC2998914

NIHMSID: NIHMS212623

Department of CISE, University of Florida, Gainesville, FL 32611

See other articles in PMC that cite the published article.

This paper presents a novel *c*l*a*ssification *vi*a *a*ggregated *r*egression algorithm – dubbed CAVIAR – and its application to the OASIS MRI brain image database. The CAVIAR algorithm simultaneously combines a set of weak learners based on the assumption that the weight combination for the final strong hypothesis in CAVIAR depends on *both* the weak learners and the training data. A regularization scheme using the nearest neighbor method is imposed in the *testing* stage to avoid overfitting. A closed form solution to the cost function is derived for this algorithm. We use a novel feature – the histogram of the deformation field between the MRI brain scan and the atlas which captures the structural changes in the scan with respect to the atlas brain – and this allows us to automatically discriminate between various classes within OASIS [1] using CAVIAR. We empirically show that CAVIAR significantly increases the performance of the weak classifiers by showcasing the performance of our technique on OASIS.

Brain MRI analysis and its associated application in the diagnosis and treatment of brain-based diseases has attracted immense attention in the past two decades. Dementia is an example of such a neurological disorder, that may occur at any stage of adulthood and lead to long-term decline in cognitive function. Therefore, a technique that is able to detect the changes in brain structures due to the onset of dementia and use this information to classify the subjects is of great value. One of the main challenges is that it is not easy to find a single classifier that achieves a low error rate. Due to the difficulty in obtaining good features from MRI, the classifiers we obtain are actually weak classifiers. However, can a set of weak classifiers create a single strong learner? Numerous variants of algorithms for classifier ensembles have been proposed in literature, for instance, boosting and bagging. Generally speaking, the boosting method iteratively refines each weak learner and re-adjusts the weighted combination of the training data after each iteration. Another type of algorithm is called bootstrap aggregation (bagging) proposed by Breiman [2]. Here bootstrap samples are trained using weak learners and the output of the weak predictors are combined by averaging or voting. In Adaboost [3], it is required that the performance of weak learners be slightly better than mere random assignment. Therefore, a ”very weak” weak classifier is not acceptable. Besides, this algorithm also needs a large number of weak learners in order to converge. Meanwhile, due to the equally weighted voting scheme, bagging also has its limitation in terms of improving on linear models.

In this paper, we propose a novel ensemble classifier - called CAVIAR - which is different from the bagging and boosting mentioned above and offers a significantly better performance. CAVIAR is a regression based classification algorithm, which assumes that the weights for combining the weak learners depend not only on the weak learners but also on the training data. *It is analogous to a medical consultation carried by a group of doctors on a number of patients. We think that it’s well-justified to assume that each patient’s personal condition has a different affect on the experts’ consensus final decisions*. Since in CAVIAR the weights vary over both the weak learners and the training set, it is prone to overfitting. We impose a regularization scheme wherein the weights corresponding to closed training patterns are forced to be similar. Furthermore, in the testing phase, a strong classifier is constructed by choosing the weights corresponding to the nearest neighbors of the test pattern among the training data set. *Intuitively speaking, as a new patient comes in for medical consultation, we expect that a good strategy for the experts involves searching through similar case studies in order to arrive at a consensus diagnosis*. To the best of our knowledge, this kind of data adaptive testing technique has never been reported in literature. We demonstrate the stability and effectiveness of the classification model, along with its novel testing technique via numerous experiments on the OASIS database.

Formally speaking, let *X* be the set of training feature vectors and *Y* be the set of labels, where (*x _{n}*,

So far, the curious reader may have two concerns. First, since the dimension of the weights to be optimized in CAVIAR is *N* × *T* which is huge for a large data set, the optimization might be difficult and time consuming. Second, this training technique is very likely to induce over-fitting. We now relegate the discussion of the optimization issue to the next section and address in detail our method for solving the over-fitting problem here.

It is justified to assume that the weights **w**_{n} and **w**_{m} are expected to be similar if the training samples **x**_{n} and **x**_{m} are close to each other in the feature space. We therefore adopt an aggregated regularization term in order to prevent over-fitting. A nearest neighbor graph *G* is pre-computed (once and for all for the training data) and stored, with *G*(*n*, *m*) = 1 if *x _{n}* and

Assume we obtain a set of weights for combining the weak learners in the training stage. In the testing phase, a filtering procedure is imposed in order to construct a strong learner based on the test data and to overcome over-fitting. Instead of using all the training results, we only use the learned weights that are associated with the training patterns which are in the nearest neighborhood of the test pattern. Formally, let *x* be an incoming test instance. We then take into account a set of nearest neighbors of *x* from the training data set *X*. Moreover, each neighbor is assigned a different weight according to its distance to the instance *x*. We choose the weight to be inversely proportional to the distance which accords with the assumption that the training data being more similar to the test sample *x* should have more contribution to the final strong classifier for that particular *x*.

The final hypothesis *H* of CAVIAR for a test sample *x* is the weighted combination of the *T* weak hypotheses weighted by the *K* nearest neighbors’ contribution, that is, $H(x)=\mathit{\text{sign}}({\displaystyle {\sum}_{k=1}^{K}{\alpha}_{{s}_{k}}{\displaystyle {\sum}_{t=1}^{T}{w}_{{s}_{k}t}^{*}{h}_{t}(x))}}$. Here ${w}_{{s}_{k}t}^{*}$ is the optimal weight obtained from the training. The pseudo code of 2-class CAVIAR algorithm is listed in Algorithm 1.

Note that any distance measure can be used to define the objective function in step 3 of the training stage. The most popular one is the L2 distance which leads to a closed form solution in the optimization.

Training stage: | |

1: | Input N labeled training samples (x_{1}, y_{1}),…, (x, _{N}y),_{N}where y {−1, 1} and _{i}T weak learners h_{1},…, h, _{T}h :_{t}X → [−1, 1] |

2: | Initialize the weight matrix: W = {w}, for _{nt}n = 1, 2,…, Nand t = 1, 2,…, T |

3: | Minimize the following objective function:
$${W}^{*}=\text{arg}\phantom{\rule{thinmathspace}{0ex}}\underset{\mathbf{w}}{\text{min}}{\displaystyle \sum _{n=1}^{N}\mathit{\text{Dist}}({\mathbf{w}}_{n}\xb7\mathbf{h}({x}_{n}),{y}_{n})+\lambda {\displaystyle \sum _{n,m=1}^{N}G(n,m)\mathit{\text{Dist}}({\mathbf{w}}_{n},{\mathbf{w}}_{m})}}$$ (1) |

Testing stage: | |

1: | Input the test sample x |

2: | Compute the nearest neighbors of x: x_{s1}, x_{s2},…, x _{sK}X,attained from X within the distance threshold |

3: | Assign weights to the chosen training samples using:
$${\alpha}_{{s}_{k}}=\frac{\text{exp}\left(\beta {d}_{{s}_{k}}\right)}{{\displaystyle {\sum}_{k=1}^{K}\text{exp}\left(\beta {d}_{{s}_{k}}\right)}}$$ (2) where d = _{sk}Dist(x, x)_{sk} |

4: | Output the strong hypothesis
$$H(x)=\mathit{\text{sign}}({\displaystyle \sum _{k=1}^{K}{\alpha}_{{s}_{k}}{\displaystyle \sum _{t=1}^{T}{w}_{{s}_{k}t}^{*}{h}_{t}(x))}}$$ (3) |

Next, we show how to *generalize this algorithm to the multi-class classification case*, where each training sample corresponds to a particular class label from the set of integers *Y* = {1, 2,…, *C*}, and *C* is the number of classes. In our approach, we define a *C*-dimensional label vector **L**_{n} for each training sample *x _{n}*, with the

$${W}^{*}=\text{arg}\phantom{\rule{thinmathspace}{0ex}}\underset{\mathbf{w}}{\text{min}}({\displaystyle \sum _{n=1}^{N}\mathit{\text{Dist}}({\mathbf{w}}_{n}\xb7\mathbf{h}({x}_{n}),{\mathbf{L}}_{n}))+\lambda {\displaystyle \sum _{n,m=1}^{N}G(n,m)\mathit{\text{Dist}}({\mathbf{w}}_{n},{\mathbf{w}}_{m}).}}$$

(4)

We do not present the detailed pseudo code here since the whole structure of the algorithm is similar to the 2-class case.

In this section, we briefly discuss the optimization technique and derive the closed form solutions for both 2-class and multi-class algorithms when using the L2 distance.

By using the L2 distance as the distance measure in the 2-class objective function, we obtain

$$E={\displaystyle \sum _{n=1}^{N}{\Vert {\mathbf{w}}_{n}\xb7\mathbf{h}({x}_{n})-{y}_{n}\Vert}^{2}}+\lambda {\displaystyle \sum _{n,m=1}^{N}G(n,m){\Vert {\mathbf{w}}_{n}-{\mathbf{w}}_{m}\Vert}^{2}.}$$

Expanding the right hand side of the equation and adopting the following notations, we get: Let *H _{n}* =

The cost function can be re-arranged into a matrix form as follows:

$$E={W}^{t}\left(\begin{array}{ccc}\hfill {B}_{1}\hfill & \hfill \dots \hfill & \hfill -2\lambda G(1,N){I}_{T\times T}\hfill \\ \hfill -2\lambda G(2,1){I}_{T\times T}\hfill & \hfill \dots \hfill & \hfill -2\lambda G(2,N){I}_{T\times T}\hfill \\ \hfill \dots \hfill & \hfill \dots \hfill & \hfill \dots \hfill \\ \hfill -2\lambda G(N,1){I}_{T\times T}\hfill & \hfill \dots \hfill & \hfill {B}_{N}\hfill \end{array}\right)W-2{b}^{t}W+{\displaystyle \sum _{n=1}^{N}{y}_{n}^{2}.}$$

Taking the derivative of *E* w.r.t. *W* and setting the equation to 0, we have, $\frac{\mathit{\partial}E}{\mathit{\partial}W}=({D}^{t}+D)W-2{b}^{t}=0$, with *D* being the matrix in the equation above that contains *B _{k}* as diagonal. The problem is finally reduced to solving the following linear system (

For the multi-class case, assume there are *C* classes, for each weak learner *h _{t}*, the output is a

A similar closed form solution as in the 2-dimensional case is obtained, one that requires us to solve the linear system $({D}^{t}+D)W=2({b}_{1}^{t}+{b}_{2}^{t}+\cdots +{b}_{C}^{t})$.

Note that in both cases, *D ^{t}* +

In this section, we empirically validate our proposed algorithm by classifying the OASIS MRI database into the constituent classes namely, young (Y), old (O), middle aged (M), control and very mild to moderate Alzheimer disease (AD).

In addition to the behavioural assessments and cognitive tests, a morphological marker for the Alzheimer disease is the enlargement of ventricles and the shrinkage of cortex and hippocampi. In [5], Mert *et al.* revealed the structural changes in the brain across different age groups and between the healthy and patients with dementia. This led us to the hypothesis that a good feature might be one that captures the structural differences among the brains. Therefore, we constructed the 3D histogram of the deformation field (vectors in the 3D space) required to co-register an emerging atlas to a sample MR brain scan as our feature. This was achieved by performing a group wise registration [6] of the MR images within the OASIS data set. The number of bins in each direction was set to (6 × 6 × 6) for constructing the histograms of the vectors.

In order to demonstrate the power of CAVIAR, we choose the most simple weak learners by randomly selecting a component from the feature vector and picking a random threshold. The samples were assigned to particular classes based on their values in that chosen component of the feature vector by comparing to the threshold. For instance, assume the data *x* has a d-dimensional feature. The weak learner *h*(*x*) assigns the data to class 1 if the *k ^{th}* dimension (randomly chosen) of the feature vector is larger than a random value

The free parameters involved in our CAVIAR algorithm include the regularization parameter λ, the number of nearest neighbors *K* in setting up the nearest neighbor graph *G*, the weighting parameter β and the distance threshold . The experiments indicate that only is crucial for CAVIAR’s performance, and requires tuning with the remaining parameters fixed *a priori*, without sacrificing performance. The parameter λ is used to tune the amount of similarity of the weights **w**_{n} and **w**_{m} corresponding to the nearest neighbors **x**_{n} and **x**_{m}. CAVIAR works well in the region λ (0, 1) and we set it to be 0.01 in the experiments. Empirically, *K* is chosen to be approximately 5% of the number of the training data. β is used to adjust the weights of the chosen training data as in Eqn.(2). A setting of β = 0 implies that the data are equally weighted and a β = ∞ corresponds to a single nearest neighbor choice. Stable performance is achieved when β is inversely proportional to the average distance among the samples.

The *most crucial* parameter for this algorithm is the distance threshold . Note that Euclidian distance is used as our distance measure here. A larger than necessary means more training data are involved in the testing which results in over-fitting. Meanwhile a smaller than necessary leads to a small portion of training results being used which pushes the algorithm closer to being a nearest neighbor method. Therefore, a proper choice of is of great importance. To this end, we discretize the search space for into *l* bins and this bin number only affects the resolution of the search. By searching over the *l* different values and compute the error, we find the best for each data set. The optimization curves of the classification errors w.r.t. (indicated by *l* bins) for both the validation and test data are shown in Fig.1. We obtain the best according to the validation error curve and use this in the testing stage.

The OASIS data set contains cross-sectional collection of 416 subjects aged 18 to 96. We divided the subjects into three groups, with ages below 40 designated as young, above 60 as old and the ones in between as middle aged. Among the old people, we took 70 subjects, and 35 of them were diagnosed with very mild to moderate AD while the rest were controls. For each experiment, we randomly divided the data set into 5 portions and took 4 of them for training and 1 for testing, with 1 of the training data selected as a validation set. For different numbers of weak learners, the experiments were repeated 20 times and the averages were taken as the result to be reported.

We first classified the data set between any two age groups using the 2–class version of CAVIAR and then demonstrated the performance of our multi-class classification algorithm by classifying the three age groups simultaneously. Finally, we presented a more serious challenge to our algorithm by classifying the healthy and the very mild to mild AD patients. For comparison, the results of the Adaboost algorithm with the same weak learner settings are also reported.

In Table 1, we show the average error for the weak learners in the second column and the test error of the final strong hypothesis in the third column, followed by the improvement of the performance of our algorithm w.r.t. the weak learners, and reported the test error for Adaboost in the last column. The error is measured by the mis-classification rate.

To illustrate the change of performance w.r.t. the increasing number of weak learners, we show the experimental results of AD vs. Control in Figure.2.

The classification errors of the weak learners and the final strong hypothesis w.r.t. different number of weak learners for AD vs. Control.

The experimental results indicate that the deformation field captures the structural changes across the brains very well and the CAVIAR algorithm significantly improves the performances of the weak classifiers, when presented with a small number of simple weak learners in both 2-class and multi-class cases.

This research was in part funded by the NIH grant RO1 NS046812.

1. Marcus DS, Wang TH, Parker J, Csernansky JG, Morris JC, Buckner RL. Open Access Series of Imaging Studies (OASIS): Cross-Sectional MRI Data in Young, Middle Aged, Nondemented, and Demented Older Adults. Journal of Cognitive Neuroscience. 2007;vol. 19(9):1498–1507. [PubMed]

2. Breiman L. Bagging Predictors. Machine Learning. 1996;vol. 24(2):123–140.

3. Freund Y, Schapire RE. A Decision-Theoretic Generalization of On-line Learning and an Application to Boosting. Journal of Computer and System Sciences. 1997;vol. 55:119–139.

4. Duff IS. MA57-A Code for the Solution of Sparse Symmetric Definite and Indefinite Systems. ACM Trans. Math. Software. 2004;vol. 30(2):118–144.

5. Sabuncu MR, Balci SK, Shenton ME, Golland P. Image-Driven Population Analysis Through Mixture Modeling. IEEE Transactions on Medical Imaging. 2009;vol. 28(9):1473–1487. [PMC free article] [PubMed]

6. Joshi S, Davis B, Jomier M, Gerig G. Unbiased Diffeomorphic Atlas Construction for Computational Anatomy. NeuroImage. 2004;vol. 23:S151–S160. [PubMed]

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |