In this study we have explored longitudinal structural changes as measured by VBM and some of the potential pitfalls in the analyses of these data. Our approach to this question is unique in that we have used the same pool of subjects for both the control and learning phase of the experiment. Contrary to previously published studies, we found no statistically significant grey matter changes associated with 2 weeks of training on a visuo-motor task even though significant changes were found in fMRI activation and behavioral performance. We have suggested modifications to the structural analysis stream used in previously published longitudinal VBM studies. We have also carried out the longitudinal analyses using three different software packages to evaluate the consistency of the structural results.
Initial analyses in both SPM2 and FSL revealed clusters in both learning and control conditions that were ultimately determined to be artifactual. These clusters were determined to be due to two factors. First by aligning all scan sessions to the initial baseline scan a difference in interpolation is introduced, biasing the comparison. Scans that were interpolated are slightly smoothed before segmentation while the baseline scan is not. This leads to artificial differences in apparent grey matter density. When scans were aligned to a halfway point between the first two scans no significant clusters were found. Secondly, the non-stationarity correction method of statistical inference used in SPM2 is anticonservative for analyses in which the degrees of freedom are less than thirty (
Hayasaka et al., 2004). Performing the analysis using permutation-based methods in SPM2 resulted in no significant clusters of grey matter change. Also, version 2.0 of FSL's permutation tool, Randomise, used an anticonservative method of handling confounds (
Nichols et al., 2008). The corrected method used in Randomise 2.1 results in no significant clusters of grey matter change.
Analysis of the change in BOLD signal with training revealed several regions of decreased activity with learning including inferior parietal, precuneus, middle cingulate cortex, and bilateral middle frontal gyrus as well as several regions of increased activity including the medial frontal cortex. The large cluster spanning parietal and precuneus has been implicated in tasks involving spatial transformations while the middle cingulate cortex has been associated with response inhibition (
Cohen et al., 1996;
Ridderinkhof et al., 2004). Training on the mirror-tracking task enhances both of these skills. The changes in the activation of the middle frontal cortical areas are possibly related to the motor transformations required for inverting the movement of the joystick. Although our results on fMRI changes related to learning are statistically strong, they are challenging to interpret due to potential changes in the subjects' strategy or changes in the relative difficulty of the task. Most fMRI studies of learning suffer from similar difficulties (
Poldrack, 2000).
The central goal of this study was to determine if functional and structural measures of plasticity overlapped. We also sought to determine whether structural measures of plasticity were consistent across VBM implementations. Ultimately no significant structural changes were found, however it is still important to note there are significant differences in currently available longitudinal VBM implementations which are illustrated in the processing flow charts in
Supplementary Figs. 1–3. Note that SPM2 performs the segmentation step after the brain has been spatially normalized whereas SPM5 and FSL perform segmentation in the subject's native space. However we believe the most important source of variance is the segmentation itself. shows the segmented gray matter images and cumulative image histograms from a single subject using each of the three packages. The SPM2 segmented image differs from the others in that a much higher proportion of voxels are classified as 100% gray matter density. The FSL segmentation classifies voxel more continuously between 0 and 100% grey matter. In the SPM5 segmentation, the distribution of grey matter density appears to lie somewhere between the sharp distinction of SPM2 and the more gradual curve of FSL. Note that SPM5 uses the “grand unified segmentation” algorithm that does not rely on study-specific templates (
Ashburner and Friston, 2005).
Our results also demonstrated that interpolation related to the rigid alignment step had significant effects on the final results. Interestingly, the histograms in change very little when they are generated using volumes that were rigidly aligned and interpolated. This illustrates the point that the interpolation does not globally bias the grey matter density in one direction or the other. Rather, at different locations within the volume, artifactual focal changes in grey matter may be introduced in an unpredictable way. The changes balance each other out when averaged over the whole volume, but when an interpolated volume is compared against one that is not interpolated, false positives may be detected.
Currently all published longitudinal VBM studies of which we are aware have used the SPM2 pipeline (
Draganski et al., 2004,
2006;
May et al., 2007;
Boyke et al., 2008;
Ilg et al., 2008;
Driemeyer et al., 2008). In the analyses we performed we have made several small but significant changes from the processing stream of these studies. First, rather than aligning each scan to the initial scan, we align to a halfway point between the two scans being compared. Note that this point is significant regardless of whether subjects are used as their own controls or a separate group of controls is used. Second, we used non-parametric methods for all statistical testing and multiple comparison correction as the non-stationarity correction method has been demonstrated as anticonservative for analyses with relatively small degrees of freedom (
Hayasaka et al., 2004). We have demonstrated that these changes have a significant effect on the ultimate results across different software packages. This provides an important demonstration of the point made by
Ridgway et al. (2008) that very detailed explanations of VBM method are required for experiments to be replicable.
It should be emphasized that the animal literature leaves little doubt that it is possible for the mammalian brain to undergo large scale changes in its structure (cortical thickness, synaptic and capillary density, etc.) on a time scale of days to weeks (
Klintsova and Greenough, 1999). Most of these studies have used histological techniques to measure these changes, though some imaging work has been conducted measuring changes in cortical vascular in rats in response to exercise (
Pereira et al., 2007;
Swain et al., 2003). Thus despite the methodological concerns we have raised here, previously published studies may have provided more favorable conditions for detecting grey matter changes. Several of these studies were conducted on larger groups of subjects and employed longer periods of training that may have resulted in more dramatic grey matter changes. Nonetheless the contrary findings reported here and the demonstrated susceptibility of longitudinal VBM to false positives warrants a more careful examination of these methods. The development of a standard and robust method of investigating within-subject structural brain change remains an important challenge.
In an effort to help produce such a standard, all of the raw data used in this experiment will be made available on the SUMS database at Washington University (
Dickson et al., 2001;
http://sumsdb.wustl.edu/sums/directory.do?id=6694686). We encourage other researchers to download these data and reanalyze them with novel methods. It is possible that multivariate analysis techniques may prove more sensitive than traditional univariate analysis. Some of these methods have already been employed on cross-sectional VBM data (
Kloppel et al., 2008;
Kawasaki et al., 2007). Future studies might employ focused, higher resolution scanning or more targeted pulse sequences to obtain a more detailed picture of underlying changes (
Swain et al., 2003;
van der Kouwe et al., 2008). We are confident that these techniques in combination with rigorous statistical controls will one day make
in vivo measurement of human brain structure possible, opening up an entirely new technique in the study of learning and memory.