The specific question at the center of this paper is whether one common neural substrate near the RTPJ is recruited both during the Theory of Mind and exogenous attention tasks. Ideally, any substantive claim for a common neural mechanism requires evidence of substantial overlap and no reliable spatial separation between the regions activated by the two tasks, in individual subjects' brains. These standards of evidence are hard to achieve in the current case, because the region recruited by attentional reorienting is smaller and less reliable in individual subjects that then region recruited during Theory of Mind tasks
[2],
[14]. At any statistical threshold, partial overlap might be evidence that the two tasks recruit the same region to different degrees, or that the two tasks recruit neighboring but distinct neural populations whose peripheries are partly overlapping. To overcome these limitations, we used three analysis strategies: a direct comparison of the two regions in the eleven individuals in whom both regions could be identified, a cross-voxel correlation analysis within each region of interest, and group analyses using a bootstrap to estimate confidence intervals. All three analyses converged to suggest that Theory of Mind and attentional reorienting recruit distinct cortical regions near the RTPJ.
Mitchell
[14] recently reported that the activations associated with the two tasks, in group analyses, were substantially overlapping, and that ROIs defined by the Belief-Photo contrast significantly differentiated between Invalid and Valid trials of the attention task. We replicated this latter result in the ROIs, but we found much less evidence of overlap in direct tests. Specifically, cross-voxel correlation analyses suggested that distinct sub-populations within each region were driving the responses to the Belief-Photo, and Invalid-Valid, contrasts. The ROI average results may therefore reflect “bleed” of the functional response between two nearby regions, or neural populations. In an ROI analysis, the functional responses of all voxels in the region are averaged together, so relatively few voxels with an overlapping response may be sufficient to generate a significantly different average response. This is also consistent with the observation that the magnitude of the attention effect was small in the belief region, and vice versa (), and with strict conjunction analyses suggesting that the overlap between the two regions is relatively small and at the periphery of the two activations ().
Given these results, we suggest that the substantial overlap observed in the previous study may have been partly due to partial voluming effects in lower resolution data. Distinctions between nearby functional regions that are conflated in low resolution data can often be differentiated at higher resolution (e.g.
[12],
[18]). For example, response inhibition tasks yield bilateral activation in the anterior cingulate at low resolution (2×2×4 mm) but strongly right lateralized activation at high-resolution (1.5 mm isotropic,
[19]. In the current study, the separation between the peaks of the two regions was estimated to be 6–10 mm, approximately two voxels at the resolution of the previous paper (3.75×3.75×6 mm,
[14], making these regions nearly impossible to resolve. The higher resolution we used (1.6×1.6×2.4 mm) was probably a key factor making the difference between the current and previous conclusions in the overlap analyses.
One challenge for overlap analyses, though, is that the two functional contrasts may not be matched in power. In fact, it is unclear how to compare the power of the two experiments. Some considerations favor the attention effect. We measured the response to 16–24 invalid attention trials per individual, and only 12 belief trials per individual, allowing for a more accurate estimate of the amplitude of the response to invalid vs. valid trials; and our temporal model for the invalid cue (the onset of the target) was more precise than for the onset of belief representation, allowing for better prediction of the hemodynamic response in the attention task. On the other hand, the sentences implying a character's beliefs were presented for 10 seconds, whereas the target in the attention task was presented for less than a second. Since the latency, reliability, and duration of the two cognitive processes (detecting that a cue was invalid, constructing a belief representation) are both unknown, a precise estimate of the relative power of the two experiments is hard to derive. In subsequent analyses, we therefore analyzed the position rather than the extent of these activations. Because this approach utilized only peak coordinates it was relatively immune to differences in power across the experiments.
We used a non-parametric bootstrap procedure to estimate the relative spatial positions of the two regions of activation (). The voxels showing the strongest response in the two tasks, in the group Random Effects analyses, were separated by 2×6×10 mm. However, traditional group analyses do not provide a way to estimate confidence intervals for this measurement. The bootstrap technique is particularly useful for estimating the distribution of a statistic in a situation like this one in which a measurement can be obtained from a group average but not from individuals. Bootstrapping is a simple, widely used, but computationally intensive method for estimating uncertainty in statistics of interest
[20],
[21]. Although quite common in the broader science community, to our knowledge this is the first time the bootstrap has been used to estimate confidence in the location of peak activations in fMRI data (but see
[22]).
Non-parametric samples for the bootstrap are constructed by sampling randomly with replacement from the population; the average peak location (or other statistic) is then calculated for each task contrast. Since individuals from the original sample contribute differentially to each bootstrap sample (i.e. in any one bootstrap sample of size n, each of the original individuals is represented between zero and n times), the variability of the means across the bootstrap samples provide an estimate of the variability across individuals in the original sample. Efron and Tibshirani
[15] report that 25 to 200 bootstrap samples may be required to accurately compute the bootstrap estimate of the standard error of a parameter like the peak locations in the current study; we used 150 samples. This technique proved very useful. We were able to estimate that while the observed separation between the peaks in the anterior-posterior and medial-lateral axes was not reliably greater than zero, there was a highly reliable segregation of the peaks of the two regions in the inferior-superior axis. The same direction and magnitude of separation was apparent in the eleven individual subjects in whom activated clusters could be detected in both contrasts.
The inferior-superior segregation of the two activations in the current data is also consistent with the results of previous studies. First, Mitchell
[14] reported that the average peak of the Theory of Mind regions in three previous papers was [56 −54 19], and the average peak of the attention region in five previous papers was [55 −50 26]. The attention region was therefore on average 7 mm superior to the belief region, consistent with the distance estimated by our bootstrap. Second, Decety and Lamm
[16] recently conducted a meta-analysis of seventy previously published group analyses of attention and Theory of Mind. Again, the authors reported that attention tasks produced an average peak activation 10 mm superior to the average peak of Theory of Mind tasks. Decety and Lamm
[16] concluded that a spatial separation of 10 mm was consistent with a single underlying cortical region. However, the convergence of the meta-analysis with our current results is more consistent with a real dissociation between two distinct regions.
In sum, Theory of Mind and exogenous attention appear to recruit neighboring but distinct regions of cortex. These results are consistent with the (intuitive) idea that these tasks do not share a common cognitive component process - although it is of course still possible that these two regions are neighboring because they are functionally or ontogenetically related to one another.
Note that prior data already made it very unlikely that the results of either paradigm included a confound of the other. The attention task is does not covertly depend on Theory of Mind
[14]. RTPJ recruitment has been observed for exogenous attention tasks that don't involve any “false cueing” manipulation
[9]. Similarly, the results observed in Theory of Mind tasks are not confounded with shifts of exogenous attention. False photographs and false maps provide a well-matched control for false beliefs in terms of logical and inhibitory demands, as well as reading times and syntactic complexity, but do not recruit the RTPJ
[2],
[3],
[23]. The response of the RTPJ for Theory of Mind tasks generalizes from verbal to pictorial stimuli
[24], and from visual to aural presentation (Bedny et al, in preparation), each of which creates different attentional demands. The initial response of the RTPJ is specific in time to the moment when a belief is presented, but independent of both the truth-value and the emotional valence of the belief content
[5],
[25]; that is, the response is equally high for true and false beliefs, for negatively- or positively-valenced beliefs, and for beliefs shared or not shared by the participant. Finally, even when the stimuli and the subjects' responses are all physically identical, just changing the task instructions from an abstract rule to answering a question about a person's thoughts is sufficient to elicit enhanced recruitment of the RTPJ
[6]. Overall, this profile cannot be explained away in terms of attentional shifts; and instead suggests a neural mechanism involved in thinking about thoughts and beliefs.
The question for the current paper was therefore not whether the tasks used in prior studies were confounded. Instead, evidence of a common neural region would have suggested the presence of a component process, not evident from intuitive task analyses, but shared by both tasks. In principle, this kind of evidence could be an important contribution of fMRI to cognitive science. However, the current results illustrate some of the challenges for establishing that two dissimilar tasks share a common neural substrate based on overlapping activations in fMRI data. The low resolution of typical fMRI data relative to the true functional resolution of cortex, along with the combined effects of partial voluming, pooling due to shared vasculature
[12], distortions during normalization for group averaging, and spatial smoothing all conspire to bias fMRI analyses towards findings of spurious overlap. In the specific case under investigation here, the regions of RTPJ implicated in exogenous attention and Theory of Mind, our results suggest that the regions recruited for the two tasks are nearby but distinct, and consequently there is no need to posit a common psychological mechanism.
| Table 1Direct comparison of the activations for Theory of Mind and attentional reorienting. |