|Home | About | Journals | Submit | Contact Us | Français|
Voluntary action – in particular the ability to produce desired effects in the environment – is fundamental to human existence. According to ideomotor theory we can achieve goals in the environment by means of anticipating their outcomes. We aimed at providing neurophysiological evidence for the assumption that performing actions calls for the activation of brain areas associated with the sensory effects usually evoked by the actions. We conducted an fMRI study in which right and left button presses lead to the presentation of face and house pictures. We compared a baseline phase with the same phase after participants experienced the association between button presses and pictures. We found an increase in the parahippocampal place area (PPA) for the response that has been associated with house pictures and fusiform face area (FFA) for the response that has been coupled with face pictures. This observation constitutes support for ideomotor theory.
The ability to produce desired effects in the environment constitutes an essential aspect of human behavior (cf., Haggard, 2008). The concept of voluntary action is omnipresent in our interaction with the environment and our social lives. It has not only a wide impact on philosophical (free will), but also on ethical issues (accountability). Moreover, disorders of voluntary action are characteristic of many psychiatric and neurological conditions (Jahanshahi and Frith, 1998).
Yet, it is only rather recently that research has begun to investigate voluntary action experimentally. One focus of this research has been the fact that to purposefully achieve an action goal presupposes knowledge about action–effect relationships. Without this knowledge any action effect would be accidental and goal-directed action impossible. The ideomotor theory accentuates the role of the anticipation of sensory effects in action control (cf., Lotze, 1852; Harless, 1861; James, 1890). In particular, it claims that performing an action results in a bidirectional association between the action's motor code and the sensory effects the action produces. Once acquired, these associations can be used to select an action by anticipating or internally activating their perceptual consequences (e.g., Greenwald, 1970; Prinz, 1997; Elsner and Hommel, 2001; Herwig et al., 2007; Kühn et al., 2010).
A number of studies have corroborated the ideomotor account of action control. For instance, Elsner and Hommel (2001) used a paradigm similar to the one reported below. They investigated the incidental learning of arbitrary action–effect associations by making their participants first undergo an acquisition phase, in which two key presses always produced two particular tones. In the subsequent test phase, the same tones were presented as target stimuli for a speeded-choice response. Elsner and Hommel report that the choice responses were faster in response to the tone that the action had previously produced than to the tone that had been produced by the alternative action. This demonstrates that, during the acquisition phase, participants acquire bidirectional associations between the motor code of the action and the perceptual code of the auditory effect. Neuroimaging studies, although scarce, confirm the conclusions drawn from the behavioral experiments. Elsner et al. (2002) showed that the mere perception of auditory stimuli that previously had been presented as effect tones of certain actions result in the activation of neural motor structures (see also Mutschler et al., 2007; Melcher et al., 2008). Thus, the perception of action effects directly activates associated motor codes. The aim of the experiment presented below is to investigate the reverse link, viz. whether acting activates the corresponding perceptual representation.
Related to the ideomotor approach of action control, and extending it, is the common coding theory. The common coding theory goes to such lengths as to claim that perception and action share a common representational code (e.g., Prinz, 1990, Hommel et al., 2001): Actions are coded in terms of the distal perceptual effects they evoke in the environment. As a consequence, perceiving an action effect involves the same representation as performing the associated action and, conversely, performing an action involves the same representation as perceiving the associated effect.
The common coding principle has been corroborated by a number of studies (see Hommel et al., 2001, 2003; Schütz-Bosbach and Prinz, 2007; Hoffmann et al., 2009). For example, Kunde (2001, 2003) showed that compatibility between responses and their consistent sensorial effects influences performance in choice reaction tasks as if the forthcoming effects were already sensorially present. However, none of these studies showed neurophysiological evidence for their most straightforward prediction, viz. that performing an action involves the activation of the sensory effect it usually triggers.
The present study does so by harnessing the modularity of perceptual category representation in the human brain. In particular, it has been shown that faces and houses are represented in the fusiform face area (FFA) and in the parahippocampal place area (PPA), respectively (Kanwisher et al., 1997; Epstein and Kanwisher, 1998). We made subjects acquire an association between left and right key presses and face and house stimuli as action effects. Since activity in the PPA and the FFA can be easily dissociated when measuring functional magnetic resonance imaging (fMRI); see Heekeren et al. (2004), we were able to test subsequently whether performing an action in the absence of face and house effect stimuli yields activity in cortical areas involved in the perceptual representation of the sensory effects which the actions used to trigger.
We recruited nineteen healthy participants (three males; age: mean=21.0, ranging from 19 to 24) from whom we obtained written consent prior to the scanning session. All subjects had normal or corrected-to-normal vision. No subject had a history of neurological, major medical, or psychiatric disorder. All participants were right-handed as assessed by a handedness questionnaire (Oldfield, 1971). We had to exclude two participants, one because of severe movement artifacts the other because of technical failure.
During the first part of the fMRI experiment, the baseline phase, participants had to respond as fast as possible either with the index finger of their right or left hand when a white square was presented on the screen while lying in the scanner (Figure (Figure1).1). Participants were asked to select their responses voluntarily but aim at an equal distribution of right and left button presses without having a special rule of responding. The white squares remained on the screen until a response was given and were presented with a jittered inter-trial interval of 2000–4000ms (varied in steps of 500ms). After 100 trials participants were instructed via the intercom that their responses would now be followed by pictures to which they do not have to pay special attention. During this acquisition phase right and left hand responses were consistently followed by a certain stimulus category (e.g. right → face, left → house). The mapping was balanced between subjects. We used five different black and white pictures of faces and of houses (Heekeren et al., 2004) that were presented for 200ms in the square 50ms after a button press occurred. After two runs consisting of 400 acquisition trials participants were exposed to the test phase which was identical to the baseline phase. In total the experiment consisted of four runs and lasted approximately 45min.
After completion of the experiment subjects were asked whether they noticed an association between the button press and the pictures presented. If participants did notice, they were asked to specify the association. If they did not notice, they were specifically asked whether they noticed that a certain button press always elicited the same category of pictures.
Notice that our paradigm did not include a reaction time test phase assessing the degree to which participants have learned the action–effects [as in the study from Elsner and Hommel (2001)]. For two reasons. First, behavioral effects of action effect associations have been replicated in a number of experiments (e.g., Elsner and Hommel, 2004; Herwig et al., 2007; Herwig and Waszak, 2009). Second, after the fMRI test phase, the focus of the present study, the action–effect associations may have decayed anyway.
Images were collected with a 3T Magnetom Trio MRI scanner system (Siemens Medical Systems, Erlangen, Germany) using an 8-channel radiofrequency head coil. First, high-resolution anatomical images were acquired using a T1-weighted 3D MPRAGE sequence (TR=1550ms, TE=2.39ms, TI=900ms, acquisition matrix=256×256×176, sagittal FOV=220mm, flip angle=9°, voxel size=0.9×0.9×0.9mm3). Whole brain functional images were collected using a T2*-weighted EPI sequence sensitive to BOLD contrast (TR=2000ms, TE=35ms, image matrix=64×64, FOV=224mm, flip angle=80°, slice thickness=3.0mm, distance factor=17%, voxel size 3.5×3.5×3mm3, 30 axial slices). 390 image volumes aligned to AC-PC were acquired per run.
The fMRI data were analyzed with statistical parametric mapping using the SPM5 software (Wellcome Department of Cognitive Neurology, London, UK). The first four volumes of all EPI series were excluded from the analysis to allow the magnetization to approach a dynamic equilibrium. Data processing started with slice time correction and realignment of the EPI datasets. A mean image for all EPI volumes was created, to which individual volumes were spatially realigned by rigid body transformations. The high-resolution structural image was co-registered with the mean image of the EPI series. Then the structural image was normalized to the Montreal Neurological Institute (MNI) template, and the normalization parameters were applied to the EPI images to ensure an anatomically informed normalization. During normalization the anatomy image volumes were re-gridded to 1×1×1mm3. A commonly applied spatial filter of 8mm full-width at half maximum (FWHM) was used. Low-frequency drifts in the time domain were removed by modeling the time series for each voxel by a set of discrete cosine functions to which a cut-off of 128s was applied. The subject-level statistical analyses were performed using the general linear model (GLM). We modeled the onset of the squares during the baseline phase and the test phase separately for right and left hand responses. Moreover the model contained regressors of the onset of the face and the house picture in the acquisition phase. Vectors containing the event onsets were convolved with the canonical hemodynamic response function (HRF) to form the main regressors in the design matrix (the regression model). The vectors were also convolved with the temporal derivatives and the resulting vectors were entered into the model. The statistical parameter estimates were computed separately for each voxel for all columns in the design matrix. Contrast images were constructed from each individual to compare the relevant parameter estimates for the regressors containing the canonical HRF.
To explore the neural correlates of the reactivation of a pictorial stimulus resulting from an acquired action–effect binding we focused on percent signal changes in brain areas related to face and house processing, namely FFA and PPA. We defined the ROIs (with a radius of 6mm around the peak coordinate) on the contrasts of the face and house presentation during the acquisition phase individually for each participant thresholded with a level of significance of p<0.001. Two regions of interest (ROIs) were defined in bilateral FFA resulting from the whole-brain contrast of face>house in the acquisition phase. Moreover we defined two ROIs in bilateral PPA resulting from the whole-brain contrast of house>face in the acquisition phase. The mean MNI-coordinates were −24−54−17 (meanSD=6.4) for the left PPA, 27−48−17 (5.3) for the right PPA, −40−60−21 (4.6) for the left FFA and 43−57−21 (4.9) for the right FFA (similar to e.g., Haxby et al., 1999). Then we used these ROIs to assess the difference between brain activity during the baseline phase and the test phase. For each subject, region and condition separately the mean percent signal change over a time window of 4–6s after stimulus onset was calculated and analyzed by means of a ROI (FFA vs. PPA)× hemisphere (left vs. right)× experimental phase (baseline vs. test)× predicted effect (face vs. house) repeated measure ANOVA. Significant effects are reported with a threshold of p<0.05.
In order to exclude, that participants used simple alternation strategies to achieve an equal distribution we calculated the mean random number generation (RNG) measure (Evans, 1978). This measure considers the randomness of the sequence, namely the dependency between one choice and the next. Our results yielded to an average RNG of 0.827. Comparing this to randomly produced sequences of two equally distributed response options no significant difference was detected in the RNG measure (0.826, t(16)=0.47, p=0.65).
When computing a repeated measures ANOVA with the factors ROI, hemisphere, experimental phase and predicted effect the threefold interaction of ROI, experimental phase and predicted effect was significant (F(1,16)=10.62, p=.005; Figure Figure2).2). Post hoc paired t-tests revealed a significant difference between the face and house associated response in PPA (t(16)=−4.28, p<0.01) and a marginally significant difference in FFA (t(16)=2.06, p=0.056) during the test phase, but no significant differences during the baseline phase (PPA: t(16)=0.81, p=0.43; FFA: t(16)= −0.37, p=0.72). This indicates – in line with our prediction – that participants reactivate the consequence of their actions as they learned it during the acquisition phase. Furthermore the interaction of ROI and predicted effect (F(1,16)=5.38, p<0.05) and the main effect of ROI (F(1,16)=9.49, p<0.01) were significant.
To test whether participants showed this selective PPA and FFA activity during the test phase, even if they were not aware of the action–effect manipulation during the acquisition phase, we ran a separate ANOVA (ROI× experimental phase× predicted effect) including only those five subjects which, during the debriefing after the experiment, reported not to have noticed the mapping between the two keys and the two classes of effect stimuli. The triple interaction of this ANOVA was significant: F(1,4)=14.77, p=0.018. Post hoc paired t-tests revealed a significantly higher activation in house compared to face associated response during the test phase in PPA (t(4)=−5.60, p<0.01) but no significantly higher activation in face compared to house associated response during the test phase in FFA (t(16)=0.844, p=0.45). We will come back to this result in the discussion.
Furthermore we computed whole-brain analyses comparing the test phase with the baseline phase, separately for responses associated with faces and responses associated with houses. The conjunction of those contrasts (thresholded at p<0.001, cluster>22, based on Monte Carlo simulation determined minimum cluster size above which the probability of type I error was below 0.05, Alphasim, Ward, 2000) revealed significant activation in bilateral superior parietal cortex (BA 7) for the contrast test phase>baseline phase. This whole-brain analysis is confounded by the fact that the test phase was always presented later in the experiment than the baseline phase and should therefore be treated cautiously. The ROI analysis on the other hand is based on interactions between conditions and therefore circumvents this problem.
To investigate whether voluntary actions involve the activation of perceptual representations of learned action effects, we assessed brain activity in FFA and PPA during self-selected actions that previously triggered face or house stimuli.
During the baseline phase activity in FFA and PPA was not different between the two actions we investigated. However, activity in the test phase was elevated in the FFA for actions that in the acquisition phase triggered presentation of face stimuli (compared to actions that previously triggered house stimuli). By contrast, the opposite was true for the PPA: here activity was higher for actions that previously triggered presentation of house stimuli than for actions that previously triggered face stimuli. Notice that we observed these differences in activity in the absence of any action effect. It is thus the action itself that induces activation in FFA and PPA.
These results present straightforward evidence for the ideomotor principle of action control and the common coding principle which claim that motor plans represent the sensory events that the action habitually evokes in the environment (Prinz, 1990; Hommel et al., 2001; Kühn et al., 2010). Participants learn that a specific action (left/right key press) results in a certain effect in the surroundings (faces/houses). Once this knowledge is acquired the activation of the perceptual representation of these effects becomes an integral part of the preparation and execution of the actions. The whole-brain contrast revealed stronger superior parietal activation after the association between a specific action and its action–effect has been established. In the literature on the “binding problem” parietal cortex has been implicated in binding processes during visual feature perception (Shafritz et al., 2002) as well as in context of visual short-term memory (Xu, 2007). Our data suggests that superior parietal cortex may also be involved in binding of actions and their effects in the environment.
Recently, it has been suggested that the hemodynamic BOLD response measured in fMRI, not only implies a neural response, but can also reflect a preparatory mechanism that produces additional arterial blood in anticipation of an expected task or event (Sirotin and Das, 2009). While our current study cannot distinguish the contributions of these two components to the observed elevated sensory activation, the study of Waszak and Herwig (2007) demonstrates a direct neural contribution. Using event-related potentials (ERPs), Waszak and Herwig (2007) made subjects first work through an acquisition phase, in which a self-selected key press was always followed by a certain tone (e.g., left key press → high-pitch tone; right key press → low-pitch tone), establishing an association between the particular action and the perceptual code of the effect tone. In the test phase of the experiment subjects were required to perform random series of left and right key presses. The action triggered randomly one of the experimental stimuli of a typical oddball task (i.e., most of the time a standard tone and, rarely, a perceptually deviant tone was presented). Deviant and standard stimuli were the same tones used as effect tones in the first phase of the experiment. Deviant stimuli elicited a larger P3a when the action that triggered stimulus presentation was associated with the standard tone than when it was associated with the deviant tone, indicating a larger orienting response in the former case. Thus, the experiment of Waszak and Herwig shows, in accordance with the present study, that performing a voluntary action involves the anticipation of sensory events the movement usually brings about in the environment.
The results of the present study extend our present knowledge on ideomotor action in that they demonstrate that the activation of sensory effect representations actually draws on processes that are tightly associated to the genuine perception of these sensory events. That PPA and FFA are associated with the perceptual representation of houses and faces has been shown by Tong et al. (1998). These authors assessed the BOLD response in PPA and FFA to monitor stimulus-selective responses during binocular rivalry. When presented with a face and a house stimulus to different eyes, subjects experience spontaneous alternations every few seconds between a face percept and a house percept. Tong et al. showed that these perceptual alternations are accompanied by time-locked BOLD responses in the FFA and PPA that correlate with the content of visual awareness, suggesting that activity in the FFA and PPA reflects the perceived stimulus.
However, selective activation within the FFA and the PPA has not only been demonstrated during the perception of faces and houses, respectively, but also when the internal representations of face and house stimuli have to be maintained over time or re-evoked in the absence of any external stimulation. O'Craven and Kanwisher (2000) found top–down effects in FFA and PPA during mental imagery of faces and scenes. Likewise, Johnson et al. (2007) made subjects think briefly of a just-seen, but no longer present, face/scene stimulus. This act of “refreshing” of faces and scenes modulated, amongst others, activity in respectively the FFA and PPA. Finally, a number of studies have demonstrated that these areas also exhibit delay-period activity during working memory maintenance of the appropriate category of stimuli (e.g., Druzgal and D'Esposito, 2003; Ranganath et al., 2004). All these observations are in line with the view that reflective thought using internal representations of stimuli entails top–down modulation of activity in perceptual cortical regions initially activated during the perception of these stimuli.
The present study complements these findings in that they show that the tight coupling between visual perception of external stimuli and internal representation of stimuli in the absence of sensory input has to be extended to include internal anticipation of sensory action effects. It is important to note that even those participants who were not aware of the action–effect relationship during the acquisition phase showed selective activity in PPA and FFA during the test phase. This finding indicates that our participants did not actively generate internal representations of houses and faces while pressing the keys, because this would presuppose knowledge about the association between the two keys and the two effect classes. However, we would like to point out that the debriefing procedure we used did not include a response-bias free measure of the awareness of the action–effect coupling. Moreover, the effects in question were significant only in the PPA but not in the FFA. Thus, the conclusion that the effects we reported above are independent of awareness of the association is only suggestive and has to be taken with caution.
Instead, we suggest that imagery and internal effect anticipation rely on shared mechanisms, with “perceptual” representations in posterior brain regions being modulated by top–down mechanisms originating in anterior brain regions. In the case of sensory effect anticipation, this top–down modulation may take form of an “efference copy”, a concept that is intimately related to the common coding approach. In many computational models of action control a forward model predicts the future behavioral state of the system, and the sensory consequences of that behavior (Wolpert et al., 1995). To do so, the model uses efference copies of the motor commands (Von Holst and Mittelstaedt, 1950). There is a large number of studies investigating the notion of efference copies. For example, Haggard and Whitford (2004) showed that muscle twitches evoked by TMS over the primary motor cortex (M1) were perceived to be smaller when participants made intention-based actions than when they did not. This sensory suppression is considered to be a result of the correct prediction. They also showed that a conditioning TMS prepulse transiently disrupting the supplementary motor area (SMA) before the test pulse producing the muscle twitch almost abolished the sensory suppression effect. They concluded that the frontal cortex, specifically the SMA, sends an efferent signal of anticipated sensory effects to other brain areas. Evidently, a core aspect of the ideomotor theory is that intention-based action selection does not only involve the anticipation of proximal somatosensory consequences, but also of remote visual or auditory effects. One may argue that the results of the present study reflect one of the consequences of an efferent copy generated in frontal motor areas.
However, computational models of action control do not only implement forward models to predict future states given a particular motor command and a particular current state. They also incorporate inverse models that invert the computation of the system in that they provide the motor command which will result in a desired end state (e.g., a particular sensory state) given a particular current state (Wolpert et al., 1995). In terms of inverse computation our results might indicate that performing an action involves the activation of a desired end state before the selection and preparation of the movement. In our experiment this end state would include house/face representations, since houses and faces have been tightly linked to the two possible actions. The activation in FFA and PPA may be readout to determine a response. This interpretation would directly mirror the assumptions of the ideomotor theory: Selecting an action begins with selecting a desired effect in the environment.
However, our data do not allow for conclusions about the order of “perceptual” and “motor” activity. The activities in the FFA and PPA reported above might reflect ideomotor effect anticipation (taking place rather early in the chain of events resulting in a voluntary action) or they might reflect effect representations triggered by response execution. This question is of theoretical importance: If effect anticipation takes place during early stages, it has most probably an important role in action decisions. If it takes place during the execution stage, it may serve action outcome evaluation only. More research is necessary to shed light on the complex neurophysiological and cognitive antecedents of voluntary action and to explore the cognitive processes underlying ideomotor learning. Future research might focus on the time course of idemotor learning (similar to Van Opstal et al., 2008, 2009; Gheysen et al., 2010).
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
The work was supported by a research grant of the Agence National de la Recherche (ANR-08-FASHS-13) conferred to Florian Waszak. This work was supported by grant P6/29 from Interuniversitary Attraction Poles program of the Belgian federal government to Wim Fias. Simone Kühn and Ruth Seurinck are postdoctoral research fellows of the Fund for Scientific Research (Flanders Belgium). Wim Fias and Florian Waszak contributed equally to the present research.