|Home | About | Journals | Submit | Contact Us | Français|
The sensory–motor account of conceptual processing suggests that modality-specific attributes play a central role in the organization of object and action knowledge in the brain. An opposing view emphasizes the abstract, amodal, and symbolic character of concepts, which are thought to be represented outside the brain's sensory–motor systems. We conducted a functional magnetic resonance imaging study in which the participants listened to sentences describing hand/arm action events, visual events, or abstract behaviors. In comparison to visual and abstract sentences, areas associated with planning and control of hand movements, motion perception, and vision were activated when understanding sentences describing actions. Sensory–motor areas were activated to a greater extent also for sentences with actions that relied mostly on hands, as opposed to arms. Visual sentences activated a small area in the secondary visual cortex, whereas abstract sentences activated superior temporal and inferior frontal regions. The results support the view that linguistic understanding of actions partly involves imagery or simulation of actions, and relies on some of the same neural substrate used for planning, performing, and perceiving actions.
Recent research on the nature of conceptual representations has focused on the contrast between 2 general approaches. One view emphasizes that concepts are abstract, amodal, and symbolic, and represented independently of the brain's sensory–motor systems (see papers in Margolis and Laurence 1999, for a discussion of this view). More recent theories have emphasized the central role of sensory–motor information in the organization of conceptual knowledge (Allport 1985; Pulvermuller 1999; Barsalou et al. 2003; Gallese and Lakoff 2005). On this view, concepts are derived, wholly or in part, from sensory and motor experience. Understanding a word or a sentence involves re-instantiating or simulating this sensory–motor information. This process is thought to involve the same neural mechanisms as actual sensory–motor activity. This view is controversial because of a long-standing belief that conceptual knowledge cannot be reduced to such experiences (Fodor 1975, 1983).
Several behavioral studies suggest that understanding a sentence describing an action involves some degree of simulation of the action. Glenberg and Kaschak (2002) had participants judge sentence plausibility by making responses that required limb movement toward or away from the body. When the sentence implicated action in one direction (e.g., “open the drawer” suggests action toward the body), participants had relative difficulty indicating the response with a movement in the opposite direction (e.g., moving a lever away from the body). Similarly, Richardson et al. (2003) showed that processing sentences with a visual semantic component can selectively interfere with visual processing, and Zwaan and colleagues (Stanfield and Zwaan 2001; Zwaan et al. 2002) reported evidence of visual imagery in sentence processing using an object recognition task.
Neuroimaging evidence regarding activation of sensory–motor areas by linguistic stimuli is somewhat inconsistent (Pulvermuller 1999; Malach et al. 2002; Gainotti 2004). Most imaging studies have focused on nouns, but a few have also examined action verbs. Kable et al. (2002) used actions represented by either pictures or words in a conceptual matching task, and observed activation in the posterior middle temporal gyrus (MTG) for both in comparison to object pictures and words. Grossman et al. (2002) compared activation to “motion” (e.g., fall) and “cognition” (e.g., ponder) verbs. In contrast to the Kable et al. results, they found stronger activation of the posterior MTG by the cognition verbs. The motion verbs activated a ventral temporal–occipital region. Other recent studies suggest that action verbs may be represented somatotopically in motor and premotor cortex. Specifically, Hauk et al. (2004) used action words related to face (smile), arm (throw), or leg (kick) movements in an event-related design in which participants silently read each word. These words differentially activated areas that partially overlapped the areas activated by actual movements of the tongue, fingers, and feet, respectively. Pulvermüller, Shtyrov et al. (2005) used magnetoencephalography to identify activation elicited by the Finnish words for eat and kick. The face word activated inferior frontocentral areas more than the leg word, whereas the leg word activated a superior central site more than the face word. Tettamanti et al. (2005) had subjects listen to sentences describing actions performed with the mouth, hand, or leg. Compared with abstract sentences, greater activation of left dorsal and ventral premotor cortex, intraparietal sulcus (IPS), and posterior MTG was observed in association with the action sentences. The motor sentences, however, differed from the abstract sentences not only in the type of verb, but also in the type of noun (i.e., concrete vs. abstract). Possible contributions from other variables, such as processing difficulty or number of syllables in the sentences, were also not addressed. Finally, Kemmerer et al. (2008) compared running and hitting verbs to false fonts, and found activation in bilateral motor regions. The effects of phonology, however, were not controlled by the low-level baseline condition that used unpronounceable false fonts.
Other sensory–motor modalities have also been explored to some extent in the neuroimaging literature. Kiefer et al. (2008) found activation in the left superior temporal sulcus (STS) for words associated with sound (e.g., bell), but not for visual or action words. Simmons et al. (2007) found activation in the fusiform gyrus, overlapping with color perception areas, for color property verification relative to motor property verification.
The present research used functional magnetic resonance imaging (fMRI) to study neural activity associated with comprehension of sentences that contained hand/arm action, vision, and abstract verbs. Because the meaning of a verb is often closely related to its arguments (e.g., use the hammer implies an action, but use the opportunity does not), we used sentences instead of single verbs to clarify verb meaning and create more natural stimuli that describe entire events. The sensory–motor account predicts that for action and vision sentences, corresponding modality-specific areas (e.g., primary or secondary and association motor areas, motion perception and vision-related areas) should be activated. The predictions for abstract verb sentences are somewhat less clear. Under some theories abstract words are also perceptually grounded (e.g., Lakoff and Johnson 1980), whereas a simpler alternative is that abstract verbs are not associated with sensory–motor areas.
Participants were 33 healthy adults (18 men; mean age 30.2 ± 9.5; range 19 to 53; mean years of education 16.4 ± 3.0), with no history of neurological impairments and self-reported normal hearing. Participants were native speakers of English, and right-handed according to the Edinburgh Handedness Inventory (Oldfield 1971). The data from 2 other participants were excluded due to poor task performance (d′ ≤ 0.5), and one additional subject was removed due to excessive motion in the scanner. Informed consent was obtained from each subject prior to the experiment, in accordance with a protocol sanctioned by the Medical College of Wisconsin Institutional Review Board. Participants were compensated for participating in the study.
The stimuli were auditory sentences, spoken by a male native English speaker (J.R.B.) in a neutral voice, that were digitally recorded in a sound isolation booth at 44.1 KHz sampling rate. The sentences were of the form “I/You/We/They <verb> the <noun>” (e.g., I throw the ball; You see the rope; They consider the risk). The sentences were 1.3–1.8 s long (mean 1.52 s ± 0.15 s).
Based on the type of verb, the sentences were divided into 3 main conditions: Motor (M), Visual (V), and Abstract (A). The M sentences used a hand/arm action verb (e.g., grab, punch), the V sentences used a verb primarily visual in nature (e.g., read, browse), and the A sentences used abstract verbs (e.g., allow, explain). There were 23 verbs in each condition. The M and V verbs were combined with a set of 12 concrete nouns (e.g., ball, book) to generate sentences. The A verbs were combined with 12 abstract nouns (e.g., method, risk). The complete list of verbs and nouns is given in Appendix I. Each verb was combined with 4 different pronouns (I, You, We, and They) to generate 92 sentences per condition. Using the same set of nouns for M and V sentences ensures that any differences in activation between these conditions are primarily due to differences in the verbs. Each of the 12 nouns was used approximately, but not exactly, the same number of times in M and V conditions.
Additionally, 36 nonsense sentences were generated by taking 12 verbs each from the M, V, and A sets and combining them with nouns to create sentences that are not easily interpretable (e.g., They browse the ball, We ban the value). To make the nonsense sentences sufficiently difficult to interpret, some new nouns that were not part of the M, V, and A conditions were used. To ensure that every noun used in the Nonsense condition was also used in a sensible sentence, these new nouns were combined with M, V, or A verbs, creating 17 Filler sentences. Only the M, V, and A conditions are of interest here, and the data from Nonsense and Filler conditions were excluded from the analyses.
The M, V, and A verb sets were matched on frequency (from the CELEX database; Baayen et al. 1995), number of phonemes, and phonological neighborhood density (unstressed log frequency weighted; from the Irvine Phonotactic Online Dictionary; www.iphod.com. Six words used here were not in the online database; values for these words were obtained from the database author.). Similarly, there were no differences regarding these factors between the M/V nouns and the A nouns. The M/V nouns were rated as more imageable (P < 0.0001; 2-tailed t-test) than the A nouns, using the imageability ratings from the MRC Psycholinguistic Database (Coltheart 1981). Summary statistics for the sentences in each condition are given in Table 1. Although the verb and noun sets were matched on all relevant variables, several significant differences between conditions emerged when the full set of sentences was compared. These variables were therefore used as covariates in the multiple regression analysis (see Image Acquisition and Analysis).
The sentences were first normed in a preliminary study with 6 participants (mean age 34.0 ± 10.4, mean years of education 19.5 ± 2.9). Participants rated each sentence for meaningfulness on a scale of 1 (“does not make sense”) to 5 (“makes sense”). Reaction times (RTs) were also collected for each sentence, as a measure of processing difficulty. The results are summarized in Table 2. There were no differences in meaningfulness among M, V, and A sentences (all P > 0.19) whereas the Nonsense sentences were rated lower (all P < 0.001). There was no difference in RT between M and V sentences (P > 0.22). The A sentences produced longer RTs than M and V sentences (both P < 0.001), and the Nonsense sentences produced even longer RTs (P < 0.0001).
The verbs used in the M condition differ in the degree to which they refer to actions involving the hand versus the arm. Ratings were collected for each of the M verbs regarding the 2 types of knowledge. Nine raters (each with at least high school education) who did not participate in the pilot or the imaging experiment provided the ratings. For each action described by an M verb, raters were asked to decide if they can “do the action using only hands” by rating the verb on a scale of 1 (no) to 5 (yes). Verbs with low ratings indicate actions that require a significant contribution of arms (e.g., throw), whereas verbs with high ratings depict actions that can be performed mostly with the hands (e.g., crumple). The mean hand rating for M verbs was 3.01 ± 1.04, with a range from 4.55 to 1.33. These norms were used as a covariate in a secondary analysis.
Verbs highly associated with action and vision were selected for M and V conditions respectively. However, it is possible that some of the V and A verbs have associations with action, or that some of the M and A verbs are related to vision. Action and Vision ratings were collected to assess these associations for all verbs. Thirteen raters (mean age 25.9 ± 9.3, mean years of education 16.2 ± 2.5) rated each verb twice on a scale of 1 (not associated with action/vision at all) to 5 (very much associated with action/vision). M, V, and A verbs had an Action rating of 4.87 ± 0.2, 2.12 ± 0.6, and 2.09 ± 0.6, respectively. M verbs had a higher action rating than V or A verbs (P < 0.00001), with no reliable difference between V and A. Vision ratings for M, V, and A verbs were 1.66 ± 0.3, 4.13 ± 0.7, and 1.71 ± 0.4, respectively. V verbs had a higher vision rating than M or A verbs (P < 0.00001), with no reliable difference between M and A. This suggests a clear separation in action and vision attributes of the verbs consistent with their group assignment.
In the fMRI study, participants judged whether the sentences were sensible, pressing a response button with their left index finger only for sentences judged to be nonsense. The left hand response was used to minimize activation of the left hemisphere motor system. The Go/NoGo response procedure, which required motor responses only for the infrequent Nonsense trials, was designed to minimize activation of the motor response system during the M, V, and A conditions. Participants were familiarized with the task and response procedure before scanning by listening to a small set of practice sentences outside the scanner and pressing a button after each nonsense sentence. At the end of this practice session, feedback was given as to which sentences in the practice set were considered nonsense. Participants kept their eyes closed during scanning, and the room lights were dimmed to minimize extraneous visual stimulation. The stimuli were presented binaurally through electrostatic headphones at a level comfortable for the participant. Participants also wore mufflers that attenuated scanner noise. Any effects of scanner noise on activation were assumed to be similar across conditions.
The sentences were divided into 4 sets, and each set was used in a scanning run in an event-related design. The order of all stimuli was pseudorandomized. The interval between sentences was varied between 2 and 18 s to optimize the separation of the hemodynamic response to each condition, as determined by the Optseq program (Dale 1999). Two such pseudorandom orders of stimuli were used, each for approximately half of the participants.
These runs were followed by 2 runs aimed at localizing participants’ primary and association motor and visual cortices. In the Motor Localizer (ML) run, participants repeatedly opened and closed their left hand, right hand, or rested, according to instructions on the screen, over the course of eighteen 20-s blocks (6 per condition). The instructions (“left”, “right”, or “rest”) were displayed at the beginning of each block and then replaced by a fixation cross. In the Visual Localizer (VL) task, participants viewed a fixation cross or a circular checkerboard pattern that reversed in luminance at a frequency of 6 Hz in eighteen alternating 20-s blocks.
A 3T GE Excite scanner was used to acquire images. One volume of T2*-weighted, gradient echo, echo-planar images (echo time = 20 ms, flip angle = 80°, acquisition time = 2 s) was acquired every 2 s. The auditory sentence presentation was time-locked with the beginning of an acquisition. Volumes were composed of 36 axially oriented 3-mm slices with a 0.3-mm interslice gap, covering the whole brain, resulting in 3.25 × 3.25 × 3.30 mm voxel dimensions. Anatomical images of the entire brain were obtained using a 3-dimensional spoiled gradient echo sequence with 0.81 × 0.81 × 1.0-mm voxel dimensions.
The AFNI software package (Cox 1996) was used for image analysis. Within-subject analysis involved spatial coregistration (Cox and Jesmanowicz 1999) and registration of functional images to the anatomy using FLIRT (Jenkinson et al. 2002). Voxelwise multiple linear regression was performed with reference functions representing each condition. A standard hemodynamic response function convolved with the reference functions representing the onset of each sentence, and its temporal derivative, were used. In addition, a correction for amplitude bias was applied using the method described by Calhoun et al. (2004). Sentences in M, V, A, or Filler conditions that were considered nonsense by the participant were not included in the respective condition and were coded with a separate regressor. Images during which the participant pressed a button were also coded with a separate regressor to capture activation due to the buttonpress. To account for differences in RT (based on values collected in the rating study), sentence length, number of syllables, number of phonemes, and phonological neighborhood density between some of the conditions, de-meaned values for each of these variables were used as additional item-wise regressors. These additional regressors capture the modulation in activation solely due to differences in these variables, allowing more specific identification of activation changes due to the semantic factors of primary interest.
General linear tests were conducted to obtain the M–V, V–A, and M–A contrasts. The individual statistical maps and the anatomical scans were projected into standard stereotaxic space (Talairach and Tournoux 1988) and smoothed with a Gaussian filter of 5-mm full-width-half-maximum. In a random effects analysis, group maps were created by comparing activations against a constant value of 0. The group maps were thresholded at voxelwise P < 0.02 and corrected for multiple comparisons by removing clusters smaller than 22 voxels (766 μL) to achieve a mapwise corrected 2-tailed P < 0.05. Only the voxels within a mask that included the gray matter, but excluded areas outside the brain, deep white matter areas, and ventricles, were analyzed. The cluster threshold was determined through Monte Carlo simulations that estimate the chance probability of spatially contiguous voxels exceeding the voxelwise P threshold. The data from the 2 localizer scans were analyzed as block designs in a similar way.
A secondary analysis was performed to assess the effects of the degree of dependence of actions on hands versus arms. An additional regressor containing the mean hand rating for each M verb was included, and a group map of areas whose activation was correlated with hand ratings was created.
In addition to these whole-brain analyses, ROI analyses were performed to increase sensitivity to smaller activation clusters. Two ROIs were defined as the areas activated in the motor and visual localizer scans, respectively, at voxelwise P < 0.02.
Attentiveness during the long fMRI scan is a potential concern, especially given the relatively infrequent occurrence of the button responses. We analyzed gaps between successive responses (in the form of number of intervening trials) of each participant to find any periods during which the participant may have been inattentive. If a gap was greater than or equal to Q3 + 4 × IQR (Q3 = gap at the 75th percentile; IQR = interquartile range), it was determined to be an extreme outlier, and that run was removed from the analysis. Nine runs (6.8% of the data) were removed in this way.
To assess subject comprehension performance and overall vigilance during scanning, d′ values were calculated for each subject. The mean d′ was 1.78 ± 0.62, with a range from 0.53 to 3.27.
The activation maps for various contrasts are shown in Figure 1. The maps are displayed using Caret (Van Essen et al. 2001) on an inflated cortical surface of a representative subject, created through FreeSurfer (Dale et al. 1999). Cluster information and coordinates of the peak activations are reported in Appendix II.
Differences in activation between M and V conditions primarily reflect differences in motor and visual verb semantics because the same concrete nouns were used in both conditions. This contrast showed a focus in the inferior postcentral cortex, inferior to the anterior end of the IPS for the M sentences (Fig. 1a). This included the left inferior postcentral sulcus (PoCS) and anterior supramarginal gyrus (SMG; mostly in BA 40 and marginally extending into BA 2). Additionally, a focus on the posterior inferior temporal gyrus (pITG) (BA 37, 19), extending into the MTG and lateral fusiform gyrus, was also activated.
Areas activated more for the V sentences included bilateral (left > right) STS and superior temporal gyrus (STG).
To assess whether any areas, such as those associated with visual imagery, were activated in both M and V conditions, we compared them to the A condition. The areas activated more for M sentences were very similar to those in the M > V comparison, including the left inferior postcentral region and pITG (Fig. 1b).
For A sentences, extensive activation of bilateral (left > right) STG and STS, inferior and middle frontal gyri (IFG, MFG) and precentral gyrus (PrCG) was observed. Bilateral supplementary motor area (SMA) and anterior cingulate gyrus (aCG), cuneus, lingual gyrus, and superior occipital gyrus (SOG) were also activated.
An area activated more by the V sentences compared with A sentences was observed in the left SOG (BA 19), adjacent to the angular gyrus (AG) (Fig. 1c). Areas showing more activation for A sentences were similar to those in the A > M comparison, and included bilateral (left > right) STS and STG, IFG, SMA, superior frontal gyrus (SFG), and lingual gyrus.
To account for activation associated with general task difficulty and time on task, item-wise RT (collected during the preliminary study) was included as a single regressor in the analysis. The areas correlated positively with RT included bilateral (left > right) IFG, MFG, PrCG, SMA, and aCG (Fig. 1d). Activation in the posterior left STS/MTG was also positively correlated with RT. Extensive negative correlation was also observed, including bilateral (right > left) AG and SMG, anterior MTG, posterior cingulate gyrus and precuneus, SFG, and ventromedial frontal cortex.
The inclusion of an RT regressor is useful in removing variability due to processing time and difficulty. However, it can also mask activation of interest (e.g., activation due to semantic factors, as in this study) if the activation of interest is strongly correlated with RT. Here, the A sentences had longer RTs than both M and V sentences. To examine the effects of the RT regressor on the activations, the analysis was also repeated without the RT regressor. We do not give a complete listing of results here in the interest of space, because the results were largely similar. The primary difference was that in the V > A comparison, an additional focus was observed in the right AG.
Compared with the resting condition, the ML task induced extensive activation in sensory–motor cortex, including the central sulcus, PrCG, and PoCG. Frontal and parietal operculum, SMA, basal ganglia, and posterior MTG and ITG were also activated, with stronger contralateral activation. Some deactivation was observed in the aCG, anterior insula, SFG, portions of cuneus, and superior and middle occipital gyri.
The activation from the VL task included large portions of the occipital lobe, extending into posterior ITG and IPS bilaterally. Scattered bilateral activation of anterior SFG, MFG, IFG and PrCG was also observed. Deactivation was seen in bilateral cuneus, precuneus, anterior lingual gyrus, and AG.
Maps in Figure 2 show the overlap between areas activated in the localizer tasks and those activated in the M > V, M > A, and V > A contrasts. The inferior postcentral focus in the M > V and M > A comparisons overlapped almost completely with the ML activation. The pITG focus in these 2 comparisons also overlapped largely with the VL activation, and partly with the ML activation (Fig. 2a,b). The SOG focus in the V > A comparison was close to, but did not overlap, the VL activation (Fig. 2c). In summary, the areas activated by the M condition were also activated by the ML task, whereas the area activated by the V condition was just outside the VL activation.
To examine whether there was a difference between areas activated by hand-oriented and arm-oriented actions, we used hand ratings as an item-wise regressor for the M sentences in the analysis. The results showed a number of areas that were positively correlated with the ratings, showing higher activation for hand-oriented actions (Fig. 3). These included areas in the left PoCG, SFG, and pre-SMA; bilateral SMA, putamen/lentiform nucleus, AG, cuneus, and lingual gyrus; and right STG and precuneus. According to an atlas of sensory–motor areas (Mayka et al. 2006), the left PoCG cluster was in S1, and the SFG cluster was in dorsal premotor (PMd) cortex and pre-SMA. Activations in the PoCG, SMA, and putamen overlapped with the ML activation, as did part of the activation in pre-SMA/PMd.
No additional activations were found in the ROI analyses, which were restricted to the areas activated by ML and VL tasks.
We presented participants with sentences containing motor, visual, and abstract verbs to examine the involvement of sensory–motor regions in comprehension. The majority of the studies examining sensory–motor basis of conceptual processing have used action words and visual presentation. We introduced visual verbs, and used auditory presentation.
In the M > V as well as M > A contrasts, left inferior postcentral cortex was activated for the M condition. Inferior postcentral cortex is involved in a wide variety of tasks associated with goal-oriented hand/arm actions, including coordinated application of force, motor planning, pantomiming tool use, listening to tool sounds, visually guided grasping, and imagined grasping of viewed objects. For example, Ehrsson et al. (2001) reported activation in the inferior postcentral cortex when participants applied force to a small object held between the index finger and the thumb, compared with a baseline of weakly holding the object. Rushworth et al. (2001) found inferior postcentral activation when comparing the pre-execution planning phase for a specific finger movement with executing the same finger movement. Johnson-Frey et al. (2005) reported activation in this area for planning movements for tool use compared with preparing random movements, whereas Frey et al. (2005) found inferior postcentral cortex activation for visually guided grasping compared with pointing. Hand-object illusion (an illusion that the wrist, along with an object in the hand, is moving), compared with hand-illusion (illusory movement of empty hand), also activated inferior postcentral cortex (Naito and Ehrsson 2006). Pantomiming tool use (Rumiati et al. 2004; Johnson-Frey et al. 2005; Lewis et al. 2006) and listening to sounds of tools (Lewis et al. 2006) has also been linked to activation of this region. In addition, activation in this region was reported for making action judgments about pictures of objects compared with making function judgments (Kellenbach et al. 2003).
Damage to the anterior/inferior parietal lobe is associated with ideomotor apraxia (Haaland et al. 2000; Jax et al. 2006). These patients are commonly impaired in imitating actions or gestures, pantomiming, recognizing object-related pantomimes performed by others, and planning object-related actions (Varney and Damasio 1987; Buxbaum, Johnson-Frey, et al. 2005; Buxbaum, Kyle, et al. 2005; Goldenberg and Karnath 2006). Tunik et al. (2008) reported that transcranial magnetic stimulation (TMS) of the left SMG caused a delay in planning goal-oriented actions, but not in responses to an arbitrary stimulus. In summary, inferior postcentral cortex appears to play an important role in planning and control of complex or object-related hand movements. Activation of this region by sentences with hand/arm action verbs suggests that comprehension of such sentences also involves circuits involved in planning and control of complex movements.
The inferior parietal lobule in monkeys is also involved in action performance and understanding. For example, Fogassi et al. (2005) reported that neurons in this region in monkeys fire differently when the same act (e.g., grasping) is embedded in different actions (e.g., eating or placing), for both action observation and performance.
A dorsal part of the postcentral focus overlapped both ML and VL activation (white region in Fig. 3). Activation in this more dorsal region may play a role in visuo-motor coordination (e.g., Shikata et al. 2001).
A left pITG region, extending into posterior MTG, was activated in the M > V and M > A contrasts. It largely overlapped the VL activation, and partly the ML activation. Bilateral posterior MTG activation is frequently associated with processing visually or auditorily presented or imagined tools and manipulable objects (e.g., Chao et al. 1999; Beauchamp et al. 2004; Johnson-Frey et al. 2005; Lewis et al. 2006). More left lateralized activity, extending ventrally, is often observed when linguistic knowledge about tools or actions is involved (Martin et al. 1995; Damasio et al. 1996; Davis et al. 2004; Emmorey et al. 2005; Kable et al. 2005; Kellenbach et al. 2003; Noppeney et al. 2003, 2005; Tranel et al. 2005; Tyler et al. 2003). The pITG activation we observed was immediately anterior to the human visual motion processing area MT/MST, suggesting a role in more abstract motion processing (Kable et al. 2005). The lack of activation of this region in the V > A contrast, and its partial overlap with the ML activation, suggests that it is not purely involved in processing concrete nouns, but may mediate visuo-semantic knowledge of both actions and objects, although it is not exclusively associated with action (e.g., Rodd et al. 2005 report activation in this region for high relative to low ambiguity sentences). It is possible that somewhat different information about the same nouns is activated in motoric versus visual contexts. Such an effect, however, would best be viewed as a combinatorial verb–noun effect rather than purely a difference in the nouns.
In the V–A contrast, a region in the left SOG was activated for the V sentences. This region corresponds approximately to BA 19/V3, and is probably involved in higher-level vision and object recognition (Kaas 1996). Activation in this region is modulated by noun imageability (Binder, Westbury, et al. 2005); thus the difference in noun imageability between V and A sentences may contribute to the activation of this area. The right AG was also activated for V > A after removal of the RT covariate. This region also responds to noun imageability (Binder, Medler, et al. 2005; Binder, Westbury, et al. 2005; Sabsevitz et al. 2005). The finding that the left SOG and the right AG were activated for V > A, but not for M > A, suggests that visual verbs also contribute to the activation of these regions. These regions did not overlap with, but did border, the activation from the VL task. The VL task used only a checker board pattern and therefore probably did not activate higher-level visual systems. SOG and AG may process this relatively more abstract visual knowledge.
Compared with M and V sentences, the A sentences activated large portions of the superior temporal lobe as well as inferior frontal regions. Activations were bilateral, but stronger in the left hemisphere. This result is in agreement with a number of previous studies (e.g., Wise et al. 2000; Noppeney and Price 2004; Sabsevitz et al. 2005), and is consistent with the suggestion that processing of abstract concepts relies more on verbal associations of those concepts and thus activates a verbal–lexical system that is left hemisphere dominant (Paivio 1971, 1986). Other areas, such as SMA, aCG, thalamus and basal ganglia, were also activated more by A sentences. These regions are often positively correlated with RT (Binder, Medler, et al. 2005; Desai et al. 2006). Although an RT covariate was included in the analysis, RTs were collected from a separate group of subjects in a preliminary experiment, and the same mean RT regressor was used for all subjects in the imaging experiment. Therefore, some of the between-subject variability in RTs may not be adequately captured by this regressor. Because A sentences took longer to process than both M and V sentences, some of the areas showing greater activation for the A condition may have been activated due to general task difficulty and time-on-task effects.
The V sentences activate the superior temporal region more than the M sentences, but less than the A sentences. If superior temporal activation is taken as an index of abstractness or the degree to which processing evokes verbal associations, then V verbs appear to be intermediate between M and A verbs in degree of abstractness.
An unexpected result was the activation of some occipital areas, especially the lingual gyrus and cuneus, for A sentences relative to M and, to a lesser extent, relative to V sentences. One possibility is that the greater activation of A sentences in these regions was driven by greater suppression of activity during processing M and V sentences, and not due to activation (relative to rest) of these areas during processing A sentences per se. There is evidence of deactivation of visual areas when performing tasks in other modalities. For example, deactivation of the visual cortex during tactile (Merabet et al. 2007) and auditory (Laurienti et al. 2002; Hairston et al. 2008) processing has been reported. We measured the activation of each condition relative to baseline in a spherical ROI placed on the left lingual gyrus activation in the A > M contrast. The mean and standard deviation of the beta coefficients for M, V, and A conditions were −17.8 (68.2), −5.4 (70.4), and 11.4 (73.3), respectively. Note that the baseline condition of rest involves task-unrelated thoughts and semantic processing (Binder et al. 1999; McKiernan et al. 2003), and hence the sign of activation compared with baseline is somewhat arbitrary. Due to intersubject variability, none of the activations were reliably different from resting baseline. We also looked for deactivation in occipital regions (defined by the VL activation) during the ML task. In fact, deactivation in visual areas was found (peak Talairach coordinates: −22 −94 −5 and 24 −84 34 for the right hand; 39 −89 −6, −37 −86 −3, and −14 −93 25 for the left hand) during the ML task. This lends some support to the possibility that the greater activation for A sentences in occipital regions may be driven by relative suppression of activity during processing of M and V sentences, resulting from partial activation of motor plans induced by motor verbs or manipulable object nouns in M and V conditions.
Areas correlated with RT in the present study were similar to those found in other studies (Binder, Medler, et al. 2005; Binder, Westbury, et al. 2005; Desai et al. 2006) in which trial-wise RT from the imaging experiment was used in the analysis. As noted above, it is unlikely that the average RT regressor used in this study was completely able to account for item variability in RT, as item variability was likely not uniform across subjects. Nonetheless, it appears to have captured many of the areas associated with general task difficulty and time-on-task. One exception is the IPS, which was correlated strongly with RT in both of the Binder et al. (2005) studies and the Desai et al. (2006) study. This is likely due to the fact that these prior studies used visual word presentation, and IPS is associated with visual attention. With the auditory presentation of stimuli in the present study, IPS was not correlated with processing time. Conversely, RT effects in the posterior STS, observed here with auditory stimuli, were absent or very weak in the prior visual word studies.
The sensory–motor system is organized somatotopically (Penfield and Rasmussen 1950), and hands/fingers have a larger representation than arms. By collecting hand ratings for each M verb, we distinguished sentences that were weighted more toward hand-oriented actions from sentences weighted toward arm-oriented actions. Hand action verbs activated parts of primary sensory area S1, SMA, pre-SMA, PMd, and putamen, largely overlapping the ML activation. The PoCG activation (BA 2) was in an area activated during tactile interaction and hand stimulation (Boecker et al. 1995; Bodegard et al. 2001; Eickhoff et al. 2008). This suggests that activity associated with understanding sentences describing hand actions goes beyond planning for actions and involves primary sensory and premotor areas involved in action execution. For arm-oriented actions, it is possible that primary sensory–motor regions were not activated, or that due to intersubject variability in anatomy, the smaller cortical representation for arms could not be reliably detected in the group maps.
Several studies have emphasized the role of the precentral gyrus in action processing, as opposed to the postcentral gyrus/SMG activation found here. The lack of strong precentral activity here was perhaps partly due to better controls for phonological processing and the use of regressors representing the number phonemes and syllables that accounted for some the activity in the precentral gyrus.
To summarize, comprehending sentences with hand/arm action verbs activated areas associated with action planning, visualization, and execution. Processing sentences with concrete nouns activated areas involved in visualizing objects. These results are consistent with the proposal that the sensory modalities through which such concepts are learned and experienced play a role in how they are represented in memory and how the words corresponding to these concepts are comprehended. Such findings suggest linkages among perception, action, and cognition, and contrast with theories that strongly differentiate between sensory–motor systems and cognitive systems. According to such theories, sensory and motor systems give rise to perceptual states, which are then abstracted into amodal symbols (Fodor 1975, 1983; Pylyshyn 2003). Computations within the cognitive system operate over these abstract symbols, the forms of which bear arbitrary relations to what they represent. They are amodal in the sense that their representation is outside the sensory–motor system and independent of the perceptual modality that gives rise to their meaning. Schwanenflugel (1991) and Schwanenflugel and Stowe (1989) suggested that both abstract and concrete concepts are stored in an amodal semantic system, and the behavioral advantage (e.g., in lexical decision) for concrete items arises from having a greater number of associations. Computational models such as Latent Semantic Analysis (Landauer and Dumais 1997) and Hyperspace Analogue to Language (Burgess and Lund 1997) propose that word meaning can be captured by higher-order statistical relations among words. Our results suggest that such effects may be mediated by the fact that many words are connected to the world via sensory–motor associations.
According to embodied theories of cognition, sensory–motor systems play an important role in the representation of concepts (Lakoff 1987; Glenberg 1997; Barsalou 1999; Lakoff and Johnson 1999; Feldman and Narayanan 2004; Gallese and Lakoff 2005). This approach seems to apply most easily to concepts (such as actions or objects) that have transparent sensory–motor correlates and are learned via perception and action. However, what about the many words that lack such transparent sensory–motor bases? Two broad versions of the embodiment idea can be imagined. Under the strong version, all concepts derive from sensory–motor experience, even seemingly abstract ones. Lakoff and Johnson (1980) emphasized the extent to which many abstract concepts can be seen as grounded in sensory–motor experience. For example, the abstract concept of “understanding” is grounded in a “conduit” metaphor (e.g., ideas are hard to get across; ideas are given or captured, and so on). On a weaker version of the theory, there is a gradation in the degree of association with sensory–motor systems. Some concepts, such as those associated with concrete actions and objects, rely strongly on sensory–motor systems, whereas others are more removed from them. The present results show that Abstract sentences were associated with a different pattern of activation than the Motor and Visual sentences, strongly activating the superior/anterior temporal and inferior frontal areas. This suggests that abstract concepts may be represented primarily through verbal associations with other concepts, and are further removed from sensory–motor experiences. This is consistent with the weak version of embodiment. However, a stronger test would be provided by examining a sample of abstract verbs that have the strongest links to sensory–motor information. Glenberg et al. (2008) found that motor system activity is modulated even by abstract sentences depicting transfer of information towards or away from the reader (e.g., “Anna delegates the responsibility to you.”). Is it possible that abstract sentences in the current experiment also activated the motor system, and hence strong activation in primary motor regions was not found in the M-A comparison? This is unlikely, because the V-A contrast showed no activity in motor regions for A verbs, and one would not expect visual verbs to activate the motor system to the same degree, given their relative lack of motor associations as measured by the ratings. Furthermore, the most abstract verbs used in the study were nondirectional (e.g., assess, discuss, prove).
Even for words that have transparent sensory–motor associations, a distinction can be made within the embodiment theories in terms of the degree of involvement of sensory–motor systems. One can imagine a continuum where at one end, sensory–motor and conceptual systems are thought to be completely distinct, and at other end, they are virtually identical. Although the former end of the spectrum is not supported by these results, the latter extreme is also not supported. The activation resulting from the comprehension of motor sentences is not close to the activation from the motor localizer task, either in extent or in intensity. Relative to A and V sentences, M sentences engage relatively small portions of the motor system, and with much less intensity. Thus, an intermediate view seems most plausible, where sensory–motor and conceptual systems are tightly linked, but not identical.
Another question regarding the sensory–motor activations found here is whether they are epiphenomenal, and not causally linked to comprehension. Such activations can potentially arise from postcomprehension imagery that is not necessary for the comprehension process. This possibility cannot be ruled out with the current study. Multiple lines of evidence suggest that this activation may be causally related to action knowledge and not due to epiphenomenal imagery, however. First, studies using techniques with higher time resolution suggest that activation of sensory areas occurs very quickly—as early as 200 ms—after action words are presented (Pulvermuller et al. 2001; Hauk and Pulvermuller 2004; Pulvermuller, Shtyrov, et al. 2005; Borreggine and Kaschak 2006; Boulenger et al. 2006; Zwaan and Taylor 2006; Scorolli and Borghi 2007). Second, TMS studies indicate a functional link between sensory–motor systems and action knowledge (Buccino et al. 2005; Pulvermuller, Hauk, et al. 2005; Glenberg et al. 2008). Lastly, some patient studies indicate a conceptual deficit in action knowledge with damage to sensory–motor systems (Bak and Hodges 2003, 2004; Kemmerer and Tranel 2003; Tranel et al. 2003; Bak et al. 2006; Silveri and Ciccarelli 2007; Grossman et al. 2008).
The comparisons of activations elicited by auditory sentences describing hand/arm action, visual, and abstract events show that understanding sentences describing actions involves some of the same areas engaged in planning, performing and perceiving those actions. Visual verbs were associated with a higher-order visual area in the left hemisphere. These results are compatible with theories that emphasize the role of embodiment, simulation, and imagery in language and cognition and present challenges for theories that emphasize abstract, amodal cognitive processing.
National Institutes of Health (NIH) grant (R01 NS33576) to J.R.B.; NIH grant (R03 DC008416) to R.H.D.; and NIH General Clinical Research Center grant (M01 RR00058) to Medical College of Wisconsin.
We thank Edward T. Possing for technical assistance, David Medler for providing the program used in the visual localizer, Dana Krauss for help with collecting behavioral ratings, and Kenny Vaden for help with the iPhod database. Conflict of Interest: None declared.
|M > V|
|V > M|
|M > A|
|A > M|
|42 616||−3.0||−5.6||−53||−1||−1||L STG||22|
|19 120||−2.7||−4.3||−8||8||27||L CG||24|
|12 630||−2.8||−5.0||49||10||−9||R STG||38/22|
|12 390||−2.6||−3.7||39||−4||33||R prCG||6|
|V > A|
|A > V|
|20 525||−2.9||−4.7||−51||19||13||L IFG||44/45|
|(d) Areas correlated with hand ratings|
Note: ITS = inferior temporal sulcus, CG = cingulate gyrus, LG = lingual gyrus, PHG = Parahippocampal gyrus.
Approximate BAs are also indicated when appropriate.