In this report we outlined a revision of the activation likelihood estimation (ALE) algorithm for coordinate-based neuroimaging meta-analyses addressing several shortcomings of the original implementation: By providing empirical estimates for between-subject and between-template variability, the subjective choice of FWHM for the Gaussian probability distributions could be replaced by a quantitative uncertainty model. The inference on the ensuing ALE maps was constrained to grey matter voxels and modified to reflect a null-hypothesis of random spatial association between experiments (random-effects) rather than foci (fixed-effects).

Spatial variability of neuroimaging results

In spite of high number of functional neuroimaging experiments in recent years, surprisingly few studies investigated between-subject variability using current imaging protocols (

la-Justina et al., 2008;

Otzenberger et al., 2005;

Seghier et al., 2004) and none provided quantitative estimates of the spatial uncertainty associated with reported stereotaxic coordinates. Earlier work, comparing the spatial uncertainly associated with fMRI and PET images, however, found these to be comparable between both imaging techniques. In these reports, the average inter-subject distance of functional activations was generally estimated in the range of 10–20 mm (

Bookheimer et al., 1997;

Clark et al., 1996;

Fox et al., 1999;

Fox et al., 2001;

Hasnain et al., 1998;

Xiong et al., 2000). These studies hence suggested a somewhat higher between-subject variance as opposed to our current data, which may be attributable to the fact that more recent neuroimaging studies trend to employ smaller voxel sizes and that normalisation procedures have generally become more refined over the course of continued development.

There also been quantitative evaluations of inter-subject realignment using the dispersion of anatomical landmarks after spatial normalisation (

Ardekani et al., 2005;

Grachev et al., 1999;

Hammers et al., 2002;

Hellier et al., 2003). In these studies, the residual anatomical uncertainty was different between regions (lower for subcortical regions) but generally estimated in the range of 6–9 mm average ED between corresponding landmarks. While we could confirm the generally lower variability in subcortical regions, the inter-subject variability of local BOLD maxima was clearly higher than that of anatomical landmarks. Our results hence imply that between-subject variability in functional neuroanatomy can only partially be explained by the inexactness of spatial normalisation. This argument is further supported by the observation that functional variability was similar for all normalisation approaches tested. It seems, therefore, that the observed dispersion of local maxima is a direct reflection of the microstructural variability of the cortex rendering the location of cortical areas partially independent of cortical landmarks (

Amunts et al., 2004;

Eickhoff et al., 2006b;

Grefkes et al., 2001;

Malikovic et al., 2007;

Rottschy et al., 2007). Our analysis moreover showed that the between-subject variance was inhomogeneous across brain regions. The smallest variability was found for the caudate nucleus, while the PFC was particularly variable. It must be assumed that both biological and technical effects contribute to these differences: On one hand, the functional neuroanatomy of regions like the PFC is more variable due to pronounced inter-individual differences in the relative size and shape of the different areas jointly occupying this part of the brain. This increased variability of “higher” cortical regions, compared to primary areas, has been well documented in neuroimaging experiments and histological mapping studies (Caspers et al., 2008;

Hasnain et al., 1998;

Scheperjans et al., 2007;

Walters et al., 2006; Watson et al., 1992;

Xiong et al., 2000 Zilles et al., 2003). The less conserved cortical organisation may provide an important biological basis for the observation that some regions show a higher inter-individual variability in the location of functional activations. From this line of argument, the high variability of M1 activations seems surprising at first. It may, however, be explained by between-subject variability in the topological arrangement of different body parts in this somatotopically organised area. In summary, there is hence clear evidence for a biological underpinning of the inter-regional differences in variability. It should, however, also be considered that some particularly variable areas (like the PFC) are at the same time located in brain regions where normalisation into standard space is usually less reliable due to the absence of prominent anatomical landmarks and marked inter-individual differences in cortical folding pattern. In contrast, macroanatomically distinct and less variable structures like the caudate nucleus may be normalised more reliably by automated registration algorithms. This was shown by previous analyses of the registration accuracy for various cortical and subcortical landmarks, showing best accuracy for subcortical structures and those located close to the major cortical landmarks (

Grachev et al., 1999;

Hellier et al., 2003). Some of the differences evident in may hence not be biological in nature but result from in local homogeneities in image registration precision.

Uncertainty modelling

In the original ALE approach, literature foci were modelled by Gaussian probability distributions of identical, user specified width (

Laird et al., 2005;

Turkeltaub et al., 2002). This approach was now modified in favour of a more flexible and principled solution. Here the size of the modelled probability distribution that is to reflect the “true” location of a reported activation is based on the spatial uncertainty associated with each experiment. In order to explicitly model this uncertainty, empirical estimates of both between-subject and between-template (inter-laboratory) were provided in the present study. These were subsequently used to model the spatial uncertainty associated with each particular set of coordinates when performing the ALE computation. It should be noted, that the current algorithm models the spatial uncertainty associated with the foci reported in a particular experiment using the same Gaussian distribution widths across all brain regions. Theoretically, however, it would be very straightforward to incorporate non-stationary variances in the proposed model in order to account for regionally specific uncertainties by substituting the (grand mean) Euclidean distances in formula (

1) by local estimates depending on the position of a particular focus. In practice, however, one major obstacle renders this approach unfeasible at present: The computation of regionally specific uncertainty models requires empirical data for each region or ideally every voxel of the reference space. In the present study, we demonstrated how estimates for the between-subject and between-template variances could be derived by investigating 14 cortical and 2 subcortical brain regions. To our knowledge, this analysis constitutes the most comprehensive assessment of variance associated with functional imaging data to date. It is nevertheless still clearly not sufficient to generate a whole-brain variance map. Such a map, however, would be a prerequisite for a more flexible model representing regionally specific uncertainties. Given an adequate amount of empirical data on the spatial variability of functional imaging results in various brain regions (which could be derived from a series of experiments employing a similar approach as described here), however, such a map could be constructed and then readily be integrated into the proposed framework.

While the motivation for modelling between-template variance is straightforward (coordinates from any of the normalisation approaches described here would be reported as “MNI space”), including the between-subject variance in meta-analyses of *group* results may seem counterintuitive. The main reason for this approach is the small sample size in typical neuroimaging studies and the resulting influence of unsystematic sampling errors on the localisation of group results. It should moreover be noted, that in the proposed model the between-subject variance is inverse scaled by the (square root of the) sample size. This accounts for the notion, that an activation reported in a study examining a small sample size is potentially less reliable as these results are more susceptible to individual outliers (in a single case, the added uncertainty equal between-subject variance). Conversely, if the sample size increases the sampling error and hence the uncertainly associated with a given focus will decrease.

Using the outlined model, foci derived from studies examining many studies will hence be modelled by tighter distributions as compared to those foci that were reported in experiments investigating fewer subjects. Consequently, foci provided by the latter studies will be more blurred and have less localising impact on the ALE maps. In other words, studies that provide the most reliable information about the location of a particular process also receive the highest weight in the meta-analysis. Modelling the reduced spatial uncertainty in larger studies may therefore represent a well-motivated approach to weighting the sample size for coordinate-based meta-analyses. The comparative ALE meta-analyses of simulated datasets using both the original and the revised ALE approach clearly showed that the revised ALE model does indeed give a higher localising power to larger studies (cf. ). Assuming that larger studies are less susceptible to sampling errors and hence report local maxima closer to their true location (as in our simulations), we suggest that this modification should result in a higher validity of coordinate-based meta-analysis results. In contrast to the simulated datasets, differences between both algorithms were inconspicuous in the analysis of the real finger-tapping data. This observation may predominantly be attributable to the rather small range of sample sizes among the analysed experiments. In particular, 28 of the 37 included experiments were based on the analysis of groups comprising between 8 and 13 subjects. In comparison to the more extreme situation in the simulated data, the influence of the specifically computed uncertainty was consequently much lower. The second major advantage of the proposed uncertainty model, however, also pertains to the exemplary analysis presented here: Unlike previous algorithms using ALE or kernel density estimation (KDE), the revised meta-analysis approach does not require the kernel width to be subjectively specified by the user but rather makes use of an (empirical) model for spatial uncertainty.

Random vs. fixed-effects analyses

In the original ALE algorithm permutation testing is performed by randomly relocating foci across the brain resulting in a null-distribution for above-chance clustering of individual activations. The object of meta-analyses, however, should pertain to above chance clustering between experiments rather than a convergence across individual foci. This difference becomes most evident, when considering, that in some studies several different coordinates for local maxima within the same (larger) activation may be reported. In this case, an observed above-chance clustering of these coordinates may not indicate convergence between (independent) experiments, but just a clustering of foci within a single one of the included experiments. To focus on the convergence of information across studies the (non-informative) clustering between individual foci reported for any given experiment should hence be considered fixed. This approach has been implemented in the current version of the ALE algorithm by computing a “modelled” activation (MA) volume for each individual experiment as the sum of the Gaussian probability distribution for its foci. ALE scores are then obtained by the (voxel-wise) union of these MA maps across studies. To compute the appropriate null-distribution, one random voxel is drawn from each MA map, (discarding its spatial location), and an ALE score is computed. By repeating this procedure, a null-distribution is constructed reflecting a random spatial association between different studies. Comparing the “true” ALE score to this distribution then allows focusing inference only on convergence between studies while preserving the relationship between individual foci within each study. Critically, this modification is conceptually equivalent to the distinction between a fixed-effect analysis, allowing generalisation only to the studies included in the analysis, and a random-effects model, allowing an inference about the population of studies from which the analysed experiments were drawn. In the current paper, both approaches were compared to each other based on real (meta-analysis of finger tapping experiments) and simulated datasets. Interestingly, the analysis of the finger tapping data did not show pronounced differences between both algorithms. These congruent results indicate that the activations revealed by the classical ALE analysis of this dataset were predominantly driven by random-effects (i.e., convergence between studies). In the simulation analysis, however, we also tested a case where the assumption that a convergence between foci is equivalent to a convergence between experiments was explicitly violated. In particular, we simulated a dataset, which contained a region of strongly converging foci across different experiments as well as a second region, which also showed a strong convergence between foci. Critically, however, all of these foci were derived from the same original experiment. That is, there was a dissociation between a fixed-effects convergence across foci (which was present) and a random-effects convergence across studies (which was absent). Comparative analysis then showed, that the classical ALE approach indicated significance for both regions (as well as for other locations of accidental convergence between the randomly allocated foci). In contrastf, the random-effects approach described here revealed the inferior frontal gyrus as the only regions where a true convergence between foci reported in different experiments occurred. This (simulated) example highlights the more conservative approach taken by random-effects analyses and provides a strong argument for the increased specificity (though apparently not reduced sensitivity) achieved by the revision of the classical ALE algorithm.

An alternative technique allowing random-effects inference in coordinate-based meta-analysis is kernel density estimation (KDE). Both KDE and ALE aim at identifying locations where reported coordinates show a higher convergence as expectable by chance, but they do so using different approaches. ALE investigates how much the distributions of location probabilities modelled for each study overlap in different voxels. KDE, on the other hand, assesses how many foci are reported close to any individual voxel (

Wager et al., 2007). The concept of RDFX-analyses is nevertheless very similar between the algorithm described here and multi-level kernel density estimation (MKDE). In particular, in both approaches RDFX analyses are based on summarising all foci reported for any given study in a single image [the “modelled activation” (MA) map in ALE and “comparison indicator maps” (CIM) in MKDE]. These are then combined across studies. Inference is subsequently sought on those voxels where MA maps (ALE) or CIMs (MKDE) overlap stronger as would be expected if there were a random spatial arrangement, i.e., no correspondence between studies. Both approaches also use a weighting for the study size based on the square root of the number of subjects. While this factor is multiplicative in KDE, however, it influences the obtained uncertainty model in our approach (cf. formula 3). Other differences pertain to the permutation algorithm (randomly relocating cluster centres vs. combining randomly selected voxels) and the fact, that MKDE uses a discount-factor for fixed-effect studies, which is not the case in the approach described here.

Restriction of analysis space

Neuroimaging using fMRI and PET is based on haemodynamic changes initiated by vasodilatory mediators released by cortical and subcortical grey matter under increased computational and metabolic demand (

Buxton et al., 2004;

Fox and Raichle, 1986;

Logothetis, 2003). Conversely, white matter, consisting only of fibre bundles, may not be expected to show task evoked changes in blood flow. Hence activations should hence be confined to cortical and subcortical grey matter, even when considering the spatial dispersion of haemodynamic signals (

Buxton et al., 2004;

Fox and Raichle, 1986;

Logothetis, 2003). This assumption was retrospectively confirmed by analysing the location of 35,196 activation foci included in the BrainMap database. After transformation into MNI space, 98,5% of these foci were located within the grey matter ROI used in our algorithm.

The fact that “true” activations occur almost exclusively in grey matter has important implications for the applied permutation test. In particular, if all intracranial voxels were to be included in this procedure, many of them would be drawn from regions where activation is known to be absent, like the ventricles or the deep white matter. Evidently, these regions will show values close to zero in their MA maps. Hence, the null-distribution will become left skewed and the significance of the experimental ALE scores is overestimated. To correct for this bias and to provide a null-distribution closer to the experimental situation, the analysis space of the modified ALE algorithm was hence restricted to those voxels of the MNI space, where the probability for grey matter was >10%.

Conclusions

The proposed revision of the activation likelihood estimation (ALE) algorithm overcomes several important drawbacks of the original implementation, namely the need for a manually defined width of the localisation probability distribution, the anatomically uninformed analysis space and its fixed-effects inference. In order to address the first shortcoming, we provided empirical estimates for between-subject and between-template variance of neuroimaging foci. The subsequent analysis was then revised in order to test for convergence between studies (random-effects) rather than foci (fixed-effects). This was achieved by a modification of the permutation procedure, which now reflects a null-distribution of a random spatial association between studies not between foci. Importantly, this change to a random-effects approach now allows generalisation of the results to the entire population of studies from which the analysed one were drawn. Finally, rather than analysing each voxel in the reference space, including those in deep white matter or the ventricles, the revised ALE algorithm now works with an explicit grey matter mask, solving the problem of an anatomically uninformed analysis space.

Importantly, we could show that the results derived from this novel, theoretically motivated algorithm to ALE meta-analysis are comparable to those obtained from previous implementations and experimental fMRI data. Simulation analysis confirmed this observation and demonstrated that the revised approach has a better specificity than classical ALE analysis while retaining the high sensitivity of the previous approach. Incorporated into the BrainMap application GingerALE, the revised ALE algorithm will thus provide an improved tool for conducting coordinate-based meta-analyses on functional imaging data, which in turn should become of growing importance for summarising the multitude of results obtained by neuroimaging research.