When comparing imaging data of groups of patients with healthy control subjects to investigate disease-related changes, control subjects are usually selected to match the patient groups in possible confounding variables that are expected to have an impact onto imaging data. In the further statistical evaluation, possible confounding variables are then additionally included as covariates. This straightforward procedure, although sufficient to exclude major effects of possible confounding variables, is not always applicable in clinical studies for several reasons.
In this study, we propose a methodical approach to control for effects of confounding variables like age in imaging data prior to univariate or multivariate statistical evaluation by calculating linear regression models. This is approach is similar to a method applied in some earlier studies 
. In these studies a linear regression model was applied on regional volumes and on total brain volume to control for the effect of intracranial volume prior to statistical evaluation. Thereby, age-related effects on MRI data are estimated voxel-wise using only healthy control subjects to calculate the regression coefficient. In the second step, the amount of GM atrophy explained by the age factor is removed from all data on a single subject level. To investigate the effect of the proposed age-correction method, we compared VBM and SVM results for differentiation of AD patients and healthy control subjects, with and without age correction applied prior to statistical evaluation.
SVM classification using GM values without correction resulted in very high accuracy for differentiation of AD patients and healthy control subjects, consistent with results of previous studies applying this method 
. However, applying SVM without prior age correction resulted in a significant misclassification of younger AD patients and older control subjects indicating that age has a major impact onto differentiation accuracy using SVM. Applying age correction before SVM further increased the classification accuracy. In addtion, the two groups of misclassified patients and control subjects did not further show a difference in mean age. Although an increase of only about 2% might appear not to be noteworthy, when dealing with already very high accuracies it is more important to decrease the percentage of misclassified subjects which is in our case 17% (100%–83%: e.g. when already obtaining accuracies of 98% an additional improvement of only 1% to 99% would mean that the amount of misclassified subjects would decrease by 50% which is clinically important despite the fact that the accuracy is increased only by 1%). This error rate decreased to 15% after applying age correction which means a decrease by 12% (taking the initial 17% as baseline) in the amount of misclassified subjects which makes the improvement highly relevant for clinical application.
The improved classification accuracy after age correction and the absence of age differences between misclassified AD patients and control subjects after age correction indicate that some subjects were misclassified due to their large deviation from the mean age of the corresponding group. In younger AD patients, smaller age-related changes might have covered the disease-related effect while in older healthy control subjects, the normal age-related GM atrophy might have been misrecognized as a disease-specific alteration.
The results of VBM comparisons complement and provide further support for this interpretation. Here, three groups of AD patients differing in age were compared to the same group of control subjects. Generally, results confirmed previously reported regional atrophy patterns for AD patients 
. When age was not included as covariate, old AD subjects showed an extensive atrophy in frontotemporal, cingulate, thalamic and hippocampal regions. In contrast, young AD patients showed substantially less extended disease-specific reductions. However, when age was correlated with GM changes in healthy control subjects, similar regions to those detected in old AD patients showed an age-related GM atrophy. Calculating an overlay of atrophy regions detected in old AD patients without including age as covariate and regions showing a normal age related decline resulted in a large overlap of changes detected in both analyses (). This result indicates that as expected, changes detected in old AD patients whithout including age as a covariate substantially overestimate the real amount of atrophy in this patient group. The opposite effect was observed when age was included as covariate – less disease-related GM atrophy was detected in old AD patients. Young AD patients now showed a strong decrease in GM densities in hippocampal and inferior and middle temporal regions. The amount of GM atrophy was comparable in young and mean AD patiens.
Applying the proposed age correction method prior to VBM analyses substantially improved the detection of GM atrophy in young AD patients. This group showed the most extensive age-corrected GM atrophy compared to mean AD and old AD patients. VBM performed after age correction also detected GM atrophy in old AD patients in hippocampal, inferior temporal, parietal and frontal regions although these were less extensive than in the two other groups of AD patients. For the mean AD group, GM atrophy detected without age as covariate, with age as covariate and with age correction prior to statistical evaluation did not show any substantial qualitative or quantitative differences. Additionally, the quantative and qualitative pattern detected using age-corrected data was highly similar to differences detected in comparisons of age-matched AD patients and control subjects. This similarity indicates that the application of the proposed age correction method provides a sufficient control for the effect of possible covariates like age and therefore enables a direct comparison of clinical groups with substantial initial differences in potential confounding variables. The higher cluster extension for all comparisons using age-corrected data compared to age-matched comparisons are rather a statistical artifact resulting due to different group sizes used for these comparisons. Further, the results obtained using age-corrected data are in line with previous studies indicating that less severe disease-related pathology is required with increased age to induce a similar decline in cognitive performance 
. Furthermore, as all three groups of AD patients showed a similar stage of cognitive impairment, the amounts of atrophy detected in all three groups in our study suggest a negative linear relationship between age and GM atrophy which are sufficient to induce similar cognitive impairment.
However, some very important aspects have to be considered prior to application of the proposed method to control for possible confounding variables in VBM or SVM studies. One of these major points is the group size used for the comparisons. On the one hand, the group size of the control group used for the calculation of the regression coefficients shoud be sufficiently large to provide a robust estimate of age-related changes in the total population. On the other hand, the application of the pre-regression for age does change the degrees-of-freedom in the final statistical model for VBM studies which might lead to differences in results when using smaller sample sizes. Therefore, studies using lower sample sizes should take care or account for these altered degrees-of-freedom when using the proposed method.
A further important point which has to be considered prior to application of the proposed method is the potential mutual correlation between the variable used for pre-regression and other covariates used for subsequent analysis. In our study we only used sex as an additional covariate in the subsequent VBM analysis. Furthermore, it has been shown by Good et al. 
that the interaction between age and sex does not reach significance even in a substantially larger cohort. Therefore, we ignored this potential mutual effect in our study. Nonetheless, when applying the proposed method using any other covariates in the subsequent analysis the potential effect between these covariates and age (or any other variable for pre-regression) has to be carefully investigated. A possible option to account for mutual correlation between the covariates would be that a more complex pre-regression model is used including more than one covariate in a multiple linear regression model. However, this option has first to be carefully investigated.
Finally, the proposed method is not meant to replace classical VBM analyses which simply include possible covariates into the design matrix. The proposed option is rather meant for the studies when matching is not possible for any reason, as might be the case for example when comparing patient groups differing in more than one factor. Another option for the application of the proposed method would be a pre-regression for possible confounding effects prior to application of SVM classification.
Conclusion and perspectives
In our study, we suggest an easily applicable approach providing the possibility to compare groups of subjects differing in specific confounding variables or to control for the effect of confounding variables in different imaging modalities in a separate step before multivariate pattern classification algorithms are applied. Using age as an example of a confounding variable in comparisons of patients with AD and healthy control subjects, we showed that applying the proposed method improves the between-group classification using SVM and the detection of univariate differences using MRI data in groups of AD patients of differing age. However, the proposed approach is not limited to age or to between-group evaluation. It can be easily applied at a group or single subject level to remove effects of any other confounding variables which might affect the statistical evaluation. However, the proposed method is not meant to replace the usual statistical approach of including possible confounding variables directly into the statistical analyses of VBM studies. If matching is easily possible which is usually the case in studies investigating healthy volunteers common statistical methods should be prefered.