|Home | About | Journals | Submit | Contact Us | Français|
Analysis of error types provides useful information about the stages and processes involved in normal and aphasic word production. In picture naming, semantic errors (horse for goat) generally result from something having gone awry in lexical access such that the right concept was mapped to the wrong word. This study used the new lesion analysis technique known as voxel-based lesion-symptom mapping to investigate the locus of lesions that give rise to semantic naming errors. Semantic errors were obtained from 64 individuals with post-stroke aphasia, who also underwent high-resolution structural brain scans. Whole brain voxel-based lesion-symptom mapping was carried out to determine where lesion status predicted semantic error rate. The strongest associations were found in the left anterior to mid middle temporal gyrus. This area also showed strong and significant effects in further analyses that statistically controlled for deficits in pre-lexical, conceptualization processes that might have contributed to semantic error production. This study is the first to demonstrate a specific and necessary role for the left anterior temporal lobe in mapping concepts to words in production. We hypothesize that this role consists in the conveyance of fine-grained semantic distinctions to the lexical system. Our results line up with evidence from semantic dementia, the convergence zone framework and meta-analyses of neuroimaging studies on word production. At the same time, they cast doubt on the classical linkage of semantic error production to lesions in and around Wernicke's area.
A common symptom of spoken language impairment in aphasia is the loss of precision in the mapping from concepts to words in production, manifesting in semantic errors (e.g. misnaming ‘sofa’ as ‘chair’ or ‘elephant’ as ‘zebra’). These errors have garnered considerable attention from researchers in many disciplines, in part because healthy speakers also make them, albeit less frequently. Our study employed the new lesion analysis technique known as voxel-based lesion-symptom mapping (Bates et al., 2003) to investigate the locus of lesions that give rise to semantic errors.
Two common assumptions about aphasia motivated this work: (i) analysis of error types is a critical component of the characterization of language impairments (Blumstein, 1973; Butterworth, 1979; Buckingham, 1980; Saffran, 1982; Howard and Orchard-Lisle, 1984; Caramazza, 1986; Caramazza and Hillis, 1990; Plaut and Shallice, 1993; Schwartz et al., 1994; Nickels, 1995; Rapp and Goldrick, 2000; Wilshire, 2002) and (ii) heightened error tendencies in aphasia result from parametric change in the premorbid cognitive and neural architecture of language systems, rather than from a fundamental restructuring of those networks (Caramazza, 1986; Plaut et al., 1996; Dell et al., 1997; Laine et al., 1998; Foygel and Dell, 2000; Rapp and Goldrick, 2000; Schwartz et al., 2006). The ultimate goal, then, was to understand the neural architecture of semantic word processing better by investigating where in the brain lesions give rise to semantic production errors.
Interest in lesion localization has fluctuated in recent years. In the flush of enthusiasm that accompanied the first generation of functional neuroimaging research, the traditional methods of lesion mapping seemed too fraught with bias and imprecision to compete with the new method. The cognitive neuroscience of the 21st century was expected to be much more about functional neuroimaging than neuropsychology. Recently, however, more balanced appraisals have appeared that weigh the strengths and weaknesses of lesion methods against those of functional MRI and conclude that each has something unique to offer (Rorden and Karnath, 2004; Chatterjee, 2005; Fellows et al., 2005; Kimberg et al., 2007). These arguments, along with advances in spatial registration of lesioned brains (Brett et al., 2001; Stamatakis and Tyler, 2005) and statistical testing and thresholding of lesion effects (Kimberg et al., 2007; Rorden et al., 2007; Rudrauf et al., 2008; Gläscher et al., 2009) have revived interest in lesion-symptom mapping and its potential to reveal which brain areas must retain their integrity in order for performance of a given task or condition to be performed at normal levels of proficiency.
Traditional approaches to lesion mapping typically reduce lesion data by binary grouping of patients (with and without lesions in one or more areas of interest) and/or some form of aggregation (calculating percent damage in areas of interest). Often, the behavioural data are also reduced to represent only the presence or absence of some deficit of interest. While powerful, these methods are often limited by the use of arbitrary criteria to dichotomize data, and the need to pre-identify regions of interest. In the case of overlap mapping, the analysis generally results in a non-statistical comparison.
In voxel-based lesion-symptom mapping, the association between a particular behavioural score (e.g. proportion of verb-naming errors) and lesion status (± lesion in that voxel) is calculated across a group of patients at each voxel, and the effect at each voxel is evaluated for significance using a threshold corrected for multiple comparisons (e.g. Tsuchida and Fellows, 2009). This approach does not require reducing either the amount of lesion data or behavioural data and, in a large and diverse patient sample, can potentially detect effects across the whole brain. A variety of methods are available to narrow down the interpretation and rule out nuisance variables. For example, in a voxel-based lesion-symptom mapping study aimed at localizing verb-specific naming deficits, effect-size maps were generated for verb- and for noun-naming, and the latter was subtracted from the former (Piras and Marangolo, 2007). An alternative approach would have been to use a regression framework whereby at each voxel the noun-naming score is treated as a nuisance covariate in a model that predicts verb scores from lesion status. In the present study, we used the regression framework to isolate semantic errors that originate during production from those that arise during conceptualization.
Psycholinguistic models of aphasia recognize that semantic errors are multi-determined. For example, one might say ‘truck’ in response to the picture of a bus because of confusion about what the picture depicts (visual system), or what it means (semantic system). In the latter case, the error could reflect deficiencies in how bus and truck are represented as concepts (semantic representational deficit) or deficiencies in how such conceptual representations are accessed by executive control processes (semantic access deficit). All the aforementioned cases bear on the conceptualization of the target of naming. Semantic errors arising during conceptualization are of secondary interest in this study. The primary focus is on semantic errors that arise during production, specifically in the mapping from semantics to lexical items. Semantic production errors arise when a correct semantic representation (e.g. of the bus concept) is mapped to the wrong word (e.g. truck) by virtue of the overlap in their semantic representations. Models of normal error performance have successfully simulated semantic error probability on the assumption that errors arise in the mapping to lexical representations (Dell et al., 1997). Furthermore, the fact that such errors also appear to be constrained to match on grammatical category provides further evidence that they occur during lexical access (e.g. Garrett, 1975).
To clarify the conceptualization versus production distinction as used here, Fig. 1 shows two influential accounts of lexical access in production. In both, the first lexical representation that is accessed during word production is an abstract, pre-phonological word form, or ‘lemma’ (Kempen and Huijbers, 1983). Semantic errors in production happen when a wrong item is selected at this stage without anything having gone wrong at the prior stage. For example, in the interactive two-step model, with cat as the target and the semantic features for cat having been activated normally, the selection of the word node for dog would constitute a semantic production error. In Levelt et al.'s model (1999), this corresponds to the lexical concept for cat being erroneously mapped to the lemma for dog. In contrast, semantic errors in conceptualization happen when, for whatever reason, the wrong lexical concept/semantic features are selected, e.g. with cat as target, the lexical concept for dog is selected and mapped (correctly) to the lemma for dog.
Lesion studies of aphasia have long implicated the left posterior temporal lobe in the genesis of semantic errors. Wernicke's aphasia and transcortical sensory aphasia are characterized by fluent speech, poor naming with a predominance of semantic errors and poor comprehension. They are distinguished by repetition, which is spared in transcortical sensory aphasia. Both are considered as left posterior syndromes. Patients with Wernicke's aphasia generally have lesions in posterior superior and middle temporal gyri (Wernicke, 1874; Luria, 1976; Damasio, 1981). Transcortical sensory aphasia has also been localized to posterior superior and middle temporal gyri (Boatman et al., 2000), but more commonly, lesions are found in surrounding sites in the temporal–parietal–occipital junction (portions of Brodmann area 39 and 19) or the inferior-middle temporal region (Brodmann area 37) (Heilman et al., 1981; Kertesz et al., 1982; Alexander et al., 1989).
In an elegant series of neurolinguistic studies with acute ischaemic stroke patients, Hillis and colleagues (2001a; DeLeon et al., 2007) extended and refined the evidence for posterior temporal involvement in semantic word processing and semantic error production, in particular. Using region of interest analysis of MRI perfusion- and diffusion-weighted imaging, they showed that within 24 h of the neurological event, a Wernicke-like pattern of poor spoken word comprehension and poor oral naming correlated with hypoperfusion in Brodmann area 22, whereas poor naming with spared word comprehension correlated with hypoperfusion in Brodmann area 37. Pharmacologic reperfusion produced improvements in accordance with the observed correlations, i.e. improved comprehension and naming correlated with reperfusion of Brodmann area 22; improved naming alone correlated with reperfusion of Brodmann area 37 (Hillis et al., 2001b, 2006). The most recent paper in this series (Cloutman et al., 2009) identified acute patients whose errors in oral naming were predominantly semantic (let us designate this group S+) and further classified them as to whether their semantic comprehension was impaired (S+/+) or spared (S+/−). Membership in the S+/+ group was predicted by tissue dysfunction in Brodmann area 22; membership in S+/− was predicted by tissue dysfunction in Brodmann area 37. These acute stroke studies offer persuasive evidence that damage to posterior temporal regions disrupts the mapping between concepts and words in production, with and without accompanying deficits in verbal comprehension.
However, the story is apparently more complex. Semantic production errors occur in normal speakers, as we have said, and in all types of aphasia, including the anterior syndromes (Schwartz et al., 2006). When one looks beyond the aphasia literature to neuropsychological research with other patient populations and to neuroimaging studies of normal language, it appears that the mapping from semantics to words in production takes place over an extensive left-lateralized network that may include parts of the middle temporal lobe and anterior temporal lobe and inferior and dorsolateral prefrontal cortices, in addition to the posterior and inferior temporal lobe (e.g. Damasio and Tranel, 1993; Damasio et al., 1996, 2004; Indefrey and Levelt, 2000, 2004; Maess et al., 2002; Duffau et al., 2003, 2005; Schnur et al., 2009). Some of these cortical regions are rarely, if ever, affected by middle cerebral artery infarctions that are the most frequent cause of aphasia, so it is understandable that they would have escaped detection in lesion mapping studies with this population. Indeed, the large neuroanatomical study of category-specific naming deficits that presented the earliest and perhaps most persuasive evidence for the necessary role of anterior inferotemporal regions (Damasio et al., 1996, 2004) included patients with herpes simplex encephalitis and temporal lobectomy in addition to infarction. On the other hand, some of these anterior cortical regions do fall within the territory of the middle cerebral artery and may have been missed in the past due to surmountable problems such as biased selection criteria or small sample size.
Motivated by the important role that semantic naming errors have assumed in psycholinguistic and neurolinguistic models of semantic processing and lexical access (e.g. Caramazza and Hillis, 1990; Rapp and Goldrick, 2000; Schnur et al., 2006; Schwartz et al., 2006; Cloutman et al., 2009) and by the unresolved questions and controversies surrounding the role of the left anterior temporal lobe in semantic-lexical mapping systems (Murtha et al., 1999; Wise, 2003; Hickok and Poeppel, 2004), we conducted a lesion study of semantic naming errors using voxel-based methods. By recruiting a large and diverse group of patients, we managed to obtain adequate coverage of middle and anterior temporal lobe structures, as will be shown. Our study further sought to determine whether anterior temporal lobe lesions would predict semantic error proportions after filtering out performance on semantic comprehension tests. Evidence that the anterior temporal lobe effect survives such filtering would bolster the notion that anterior temporal lobe plays an essential role in the genesis of semantic errors, in production, specifically.
Patients who had been diagnosed acutely with aphasia secondary to left hemisphere stroke were recruited from the Neuro-Cognitive Rehabilitation Research Patient Registry at the Moss Rehabilitation Research Institute (Schwartz et al., 2005) or the Centre for Cognitive Neuroscience Patient Database at the University of Pennsylvania. Eighty-two patients gave informed consent to take part in a multi-session language assessment under protocols approved by the Institutional Review Boards at Albert Einstein Medical Centre and the University of Pennsylvania School of Medicine. All but two of these also gave informed consent to undergo structural MRI or CT imaging of the brain in order to determine the precise localization of their lesion. For the remaining two, a recently performed clinical imaging study judged to be of high quality was used instead. Imaging studies were conducted under the protocol of the University of Pennsylvania School of Medicine and were conducted at that facility. Participants were paid for their participation and reimbursed for travel and related expenses.
To be accepted into the study, participants had to meet the following criteria: no major psychiatric or neurologic co-morbidities; pre-morbid right handedness; English as primary language; adequate vision and hearing without or with correction; some ability to name pictures; and CT or MRI confirmed left hemisphere cortical lesion. Of the 82 who consented, 18 were disqualified from participating: 9 because they failed to produce any correct naming responses and 9 because their scans revealed bilateral damage or damage restricted to sub-cortical areas. The 64 who were included in the study (42% females; 48% African American) had a mean age of 58 (range 26–78), mean years of education of 14 (10–21) and mean months post-onset of 68 (1–381). Ninety-two percent of participants were at least 6 months post-onset. All were living in the community at the time of testing.
The 175-item Philadelphia Naming Test (Roach et al., 1996) was used to measure semantic error production in picture naming. The black and white pictures represent non-unique entities from varied semantic categories, the largest being manipulable objects (41%) and animals (15%). Pictures have high familiarity, name agreement and image quality. Names range in length from 1 to 4 syllables and in noun frequency from 1 to 2110 tokens per million (Francis and Kucera, 1982).
Standard procedures were used to administer the Philadelphia Naming Test and classify errors (http://www.ncrrn.org/assessment/pnt; see also Dell et al., 1997; Schwartz et al., 2006). On each trial, the first complete (i.e. non-fragment) response produced within 20 s was scored. The current study focuses on errors classified as semantic; these are real word responses that constitute a synonym, category coordinate, superordinate, subordinate or strong associate of the target (e.g. vase for bowl; rose for flower). The number of semantic errors was divided by the number of trials (175) to generate the dependent variable ‘SemErr’.
Two tests of non-verbal semantic comprehension were administered. The Pyramids and Palm Trees Test (Howard and Patterson, 1992) and the Camel and Cactus Test (Bozeat et al., 2000; Lambon Ralph et al., 2001a) both involve picture–picture matching based on thematic relatedness (e.g. wine-grape). The 52-item Pyramids and Palm Trees Test requires a two-choice match; the 64-item Camel and Cactus Test requires a more demanding four-choice match. These two tests were scored for accuracy (percent correct), standardized by z-scores and averaged to create a non-verbal comprehension composite score, which we call ‘NVcomp’. NVcomp was used to control for semantic errors in naming that arise from selection of the wrong lexical concept/semantic features at the end of the conceptualization stage (Fig. 1).
We also administered two verbal comprehension tests. The Peabody Picture Vocabulary Test (Third edition-form A) (Dunn and Dunn, 1997) consists of 204 trials in which a spoken word must be matched to one of four pictures that best represents its meaning. The Synonym Judgement Test consists of 30 trials (half nouns, half verbs) on which three printed words are arrayed and read aloud and the subject must decide which two mean the same (Martin et al., 2005). These tests were scored for accuracy (percent correct), standardized by z-scores and averaged to create a verbal comprehension composite score, which we call ‘Vcomp’. Vcomp was used to control for any contribution to semantic errors in naming that arises from an impairment in the ability to access the meaning of a word from one of its input lexical representations (e.g. semantic errors in production may be more likely if one fails to detect—through comprehension processes—potential semantic errors, preventing them from being overtly spoken).
Various supplementary tests were also administered, including the Western Aphasia Battery (Kertesz, 1982) and tests for verbal apraxia, word and non-word repetition, auditory input processing, short-term memory strength and syntactic comprehension. Additional assessments were done to confirm right-hand dominance (Oldfield, 1971) and adequate hearing (Ventry and Weinstein, 1983).
Structural images were acquired using MRI (n = 34) or CT (n = 30). Thirty-two patients were scanned on a 3.0 T Siemens Trio scanner. High-resolution whole-brain T1-weighted images were acquired (repetition time = 1620 ms, echo time = 3.87 ms, field of view = 192 × 256 mm, 1 × 1 × 1 mm voxels) using a Siemens 8-channel head coil. In accordance with established safety guidelines (MRI safety; www.mrisafety.com), two patients were scanned on a 1.5 T Siemens Sonata because of an implant that had not been approved for a 3T environment. Whole-brain T1-weighted images were acquired (repetition time = 3000 ms, echo time = 3.54, field of view = 24 cm) with a slice thickness of 1 mm using a standard radio-frequency head coil. As MRI was contra-indicated for the remaining 30 patients, they underwent whole-brain CT scans without contrast (60 axial slices, 3 mm thick) on a 64-slice Siemens SOMATOM Sensation scanner.
For patients with high-resolution MRI scans available electronically (n = 34), lesions were segmented manually on a 1 × 1 × 1 mm T1-weighted structural image. The structural scans were registered to a common template using a symmetric diffeomorphic registration algorithm (Avants et al., 2006; see also http://www.picsl.upenn.edu/ANTS/). This same mapping was then applied to the lesion maps. To optimize the automated registration, volumes were first registered to an intermediate template constructed from images acquired on the same scanner. A single mapping from this intermediate template to the Montreal Neurological Institute (MNI) space ‘Colin27’ volume (Holmes et al., 1998) was used to complete the mapping from subject space to MNI space. The final lesion map was quantized to produce a 0/1 map, using 0.5 as the cut-off value. After being warped to MNI space, the manually drawn depictions of the lesions were inspected by H.B.C., an experienced neurologist who was naïve with respect to the behavioural data.
For patients with CT scans (n = 30), H.B.C., naïve to the behavioural data, drew lesion maps directly onto the Colin27 volume, after rotating (pitch only) the template to approximate the slice plane of the patient's scan. We have previously demonstrated excellent intra- and inter-rater reliability with this method (Schnur et al., 2009).
Voxels in which fewer than five patients were lesioned were excluded from the voxel-based lesion-symptom mapping analyses. A simple t-test comparing the scores between patients with and without lesions was performed at each voxel using the VoxBo brain imaging package (www.voxbo.org). The resulting t-map was thresholded to control the false discovery rate (Genovese et al., 2002) at q = 0.01, where q is the expected proportion of false positives among supra-threshold voxels. Imaging results reported below use this as the threshold for significance.
As stated earlier, our aim was to localize lesions that give rise to semantic errors in the mapping from concepts to words in production. To control for semantic errors in production that arise instead from faulty conceptualization, we performed an analysis in which NVcomp scores were factored out of the SemErr measure by regression. To control for semantic errors in production that may depend on a verbal comprehension component to the production process, we performed a second analysis in which Vcomp scores were factored out of the SemErr measure. The removal of NVcomp and Vcomp was done within the statistical package R (www.r-project.org).
Results for key language measures are shown in Table 1. The Western Aphasia Battery identified 6 of the 64 patients as ‘recovered’ (AQ > 93.8) (Kertesz, 1979), although all 6 scored outside the control range on other tests in the battery. The remaining 58 were classified as follows: anomic (n = 24), Broca's (n = 19), conduction (n = 10), Wernicke's (n = 4), transcortical motor (n = 1). The under-representation of Wernicke's aphasia (7%) and transcortical sensory aphasia (0%) is to be expected in chronic, unselected samples and reflects the tendency for these subtypes to evolve into anomic or conduction aphasia as the early symptoms—neologistic jargon and profound comprehension deficit—recover (Kertesz and Benson, 1970; Kertesz and McCabe, 1977; Goodglass and Kaplan, 1983).
On the Philadelphia Naming Test, there was a wide range in the proportion of items correct (0.02–0.97), with the median falling at 0.80, which is outside the normal range. The proportion of semantic errors (SemErr) produced was maximal at 0.12. This is roughly comparable with other large, unbiased aphasia samples (e.g. Dell et al., 1997; Schwartz et al., 2006). Not surprisingly, studies that select for semantic error production or employ criteria that bias towards semantic impairment tend to report higher semantic error frequencies (Ruml et al., 2005; DeLeon et al., 2007). Relative to total errors (instead of total trials), semantic error production ranged from 0.00 to 0.77 (SemErr/TotErr in Table 1). The purest semantic error patterns (0.30 or more SemErr/TotErr) occurred exclusively in the most accurate patients (0.70 or more correct).
The four comprehension tests all yielded scores below the mean for healthy elderly controls, as shown in Table 1. The two non-verbal tests correlated strongly with one another, (r = 0.79, P < 0.001), as did the two verbal tests (r = 0.64, P < 0.001). The correlation between NVcomp and SemErr was −0.44 (P < 0.001) and that between Vcomp and SemErr, −0.27 (P = 0.03).
After excluding voxels with fewer than five lesions, the number of voxels that qualified for analysis was 404 565, or 55% of the 738 535 voxels in the left hemisphere (using counts from the electronic AAL atlas) (Tzourio-Mazoyer et al., 2002). This included 83 096 distinct patient-lesion patterns, in which each such pattern is defined by the subset of patients lesioned in a voxel. The number of distinct voxels is maximal for lesion counts around 16, a quarter of the total patients. The number of voxels with between 27 and 37 lesions (37 was the maximum) was 47 558, representing 11 482 patterns.
In voxel-based lesion-symptom mapping, differences in power between regions are due to differences in the frequency with which lesions impinge the region. Maximal power is achieved in voxels lesioned in half the patients (32 in the present dataset). Figure 2 shows a colour map of the number of patients with lesions in each voxel and suggests the relative (not absolute) power of each voxel for detecting an association, if one exists, between lesioned status and the behavioural measures.
There are obvious and predictable limitations in our coverage. As aphasia is typically associated with strokes in the middle cerebral artery territory, we are unable to explore the contribution of brain regions typically supplied by the posterior or anterior cerebral arteries. These regions include the mesial portions of the hemisphere as well as the occipital lobe, posterior inferior temporal lobe and mesial temporal lobe. On the other hand, the entire peri sylvian region had good coverage. For example, in the left inferior frontal gyrus (Brodmann area 44/45) voxels with as many as 35 lesions were identified. The maximum number of lesions in the posterior superior temporal lobe and the superior portion of Brodmann area 37 were 28 and 24, respectively. Finally, it is important to note that voxels with substantial numbers of lesions were identified in the temporal pole (Brodmann area 38) and anterior middle temporal gyrus (Brodmann area 21); in the former, the maximal lesion count was 19, 16 in the latter.
The map in Fig. 3 depicts t-values for the difference in total lesion volume between patients with and without damage at each voxel. It shows that local lesion status is highly predictive of overall lesion volume in most of the left hemisphere—the relationship is no stronger in the anterior temporal lobe than elsewhere in the anterior half of the left hemisphere. In effect, lesion size was not strongly predicted by lesion location in this sample.
In the analysis performed to explore the anatomic basis of the variable SemErr, we found 35 466 voxels for which a significant correlation between lesion status and impaired performance on the SemErr measure was identified. As indicated in Fig. 4, the voxels with the highest t-values were located in the anterior temporal lobe. The highest concentration of significant voxels (19 411 voxels) was in the anterior half of the middle temporal gyrus and the temporal pole, in Brodmann area regions 21 and 38, respectively. A second distinct cluster of significant voxels was located in the posterior portion of the middle temporal gyrus, corresponding to the lateral and superior portion of Brodmann area 37 at approximately the termination of the middle temporal gyrus/occipital junction. There was also a smaller cluster of significant voxels in the left lateral prefrontal cortex; they were primarily located in the inferior and middle frontal gyri, corresponding to Brodmann areas 45 and 46. There were also a small number of significant voxels in the deep white matter. As indicated in Fig. 4, there were no significant voxels in the posterior superior temporal gyrus, corresponding to Wernicke's area; indeed, the peak t-values observed in this region were around 1.8, far below the critical t (3.27).
Filtering out Vcomp, as described above, changed the strength of effects slightly but not the overall pattern (Fig. 5). The major change was a reduction in the number of significant voxels in the lateral prefrontal cortex (Brodmann area 45/46). Of the 26 771 voxels that exceeded the critical threshold after the filtering, the majority (16 427) were concentrated in the anterior temporal lobe and middle portion of the middle temporal gyrus, defined arbitrarily as that portion of the temporal lobe anterior to a y-value of −35.
Filtering out NVcomp had a more substantial impact on the strength and pattern of results (Fig. 6). No voxels exceeded threshold in either Brodmann area 45/46 or Brodmann area 37. Of the 6366 voxels exceeding threshold, the majority (4361) were in the anterior temporal lobe region described above.
We investigated the anterior temporal lobe effect further, first by inspecting all the raw (pre-registered) scans for evidence of lesions within the anterior temporal lobe region defined in the unfiltered analysis. Thirty-four, more than half the sample, had readily identified damage here (see Fig. 7 for examples).
Second, we pulled out the 20 patients with the most semantic errors and the 18 with the fewest (to avoid ties) and constructed an overlap map showing, in each voxel, the proportion of lesions in the high SemErr group minus the proportion of lesions in the low SemErr group. As Fig. 8 shows, the voxels with large differences occupy the previously identified anterior temporal lobe and lateral prefrontal areas. Although the threshold for the overlap map is arbitrary, these two regions show the most robust, coherent overlap differences. Some additional voxels in primary sensory-motor cortices are also apparent.
In the final set of analyses, we further investigated the partial confound between anterior temporal lobe damage and lesion size. The strong association between total lesion volume and localized damage (in most of the covered voxels, including the anterior temporal lobe) means that partialing out volume would result in very low statistical power to detect independent effects at a voxel-wise level. However, as a first approximation, we carried out that analysis—regressing out total lesion volume from SemErr—and mapped the voxels that exceeded the critical threshold without correction (cf Karnath et al., 2004). Once again, the anterior temporal lobe involvement was most prominent (Fig. 9), although the frontal involvement was somewhat reduced. Moreover, we statistically confirmed the independent contribution of anterior temporal lobe in a regional analysis, in which we calculated the percentage of damage within Brodmann areas 38 (temporal pole) and 21 (middle temporal gyrus) and computed partial correlations with SemErr, controlling for total lesion volume. Results were significant for both areas (Brodmann area 21: r = 0.34, P = 0.006; Brodmann area 38: r = 0.33, P = 0.008), indicating that damage here correlated with semantic error production above and beyond the contribution of lesion size.
While picture naming is considered among the simplest of language tasks, it is cognitively complex, as indexed by the types of memory representations accessed, the levels of processing involved and the number of processing parameters required to simulate normal performance (Dell et al., 1997; Levelt et al., 1999). Research in the neuroanatomy of naming has become increasingly sophisticated in relating regional activation or sites of damage to psycholinguistic models (Murtha et al., 1999; Indefrey and Levelt, 2000, 2004; Moss et al., 2005; Price et al., 2005; DeLeon et al., 2007; Graves et al., 2007; Postman-Caucheteux et al., 2009). Extending this model-based approach, our study sought evidence on voxel-wise localization of lesions that give rise to semantic naming errors that arise during the process of mapping concepts to words.
The first (unfiltered) analysis identified significant voxels in three distinct areas, of which the strongest was the anterior temporal lobe. To control for deficits in pre-lexical conceptualization processes that might have contributed to semantic error production, we then regressed out scores on the composite comprehension measures Vcomp and NVcomp. Controlling for Vcomp in this manner made little difference, while controlling for NVcomp eliminated effects in the areas outside of anterior temporal lobe. Thus, the conclusion from the main analyses is that the symptom of interest—semantic errors generated at the word production stage—arises from damage to anterior temporal lobe. For the other two areas, Brodmann areas 37 and 45/46, an alternative basis for semantic errors is indicated. We return to this issue later in the Discussion section.
Could the anterior temporal lobe effect be spurious? A potential concern is that anterior temporal lobe damage in our sample might represent the peripheral extent of very large lesions, and what manifests as an anterior temporal lobe effect might instead be due simply to lesion size. This is unlikely for several reasons. First, far from being limited to a few patients, anterior temporal lobe lesions were evident in the raw scans of more than half the participants (34 of 64). While anterior temporal lobe damage may be rare in the stroke population at large (Wise, 2003; Noonan et al., in press), it is apparently quite common among chronic stroke patients with persisting aphasia. Second, the association between lesion status and lesion size was high in the anterior temporal lobe, but not so more than in other areas that were not associated with SemErr (Fig. 3). Finally, and most compellingly, in Brodmann areas 38 and 21—areas where significant voxels were identified—partial correlation analysis revealed associations between the amount of damage and SemErr score that were statistically independent of the contribution of total lesion volume.
These data provide compelling evidence that lesions within anterior temporal lobe give rise to semantic errors in word production. What is the basis for this effect? The voxels with highest t-values were clustered in the mid-part of the middle temporal gyrus. This location agrees remarkably well with a meta-analysis of the imaging literature on word production (Indefrey and Levelt, 2000, 2004). Drawing on a model of lexical access in speech production (Roelofs, 1992; Levelt et al., 1999), Indefrey and Levelt (2000) identified the computational stages of word processing tapped in activation tasks from 58 experiments, and then determined the spatial overlap of activated regions corresponding to one or another processing stage. In the 2004 publication, newer experiments were added (bringing the total to 82) and a subset were additionally analysed for the time course of their activations, again in relation to the theory and supporting experiments. Relevant to present concerns is the finding regarding the stage that Indefrey and Levelt call ‘conceptually driven lemma access’. In both the 2000 and 2004 analyses, the only region to show the activation pattern consistent with conceptually driven lemma access (a pattern defined by activation in picture naming and word generation and not word reading) was the mid-part of the left middle temporal gyrus. Invoking evidence from the time course of this activation, the authors argued that this area was more likely to be involved with lemma selection than with the preceding stages of conceptual processing (Fig. 1) (Indefrey and Levelt, 2004).
As stated in the Introduction section, lemma selection is the stage at which semantic errors arise during production. According to the interactive two-step model—a model that was developed and extensively studied by members of our team and that is geared to explaining the genesis of production errors in unimpaired and impaired speakers (Dell and O'Seaghdha, 1991; Dell et al., 1997; Foygel and Dell, 2000; Schwartz et al., 2006)—semantic errors arise when the wrong lemma is selected and the corresponding phonological form is correctly retrieved and spoken (Fig. 1; right panel). The many-to-one mapping between semantic features and lemmas creates the preconditions for semantic errors; when the target's semantic features are activated, this activation passes to the target lemma and, to a lesser degree, lemmas that share its features. Because lemma selection is probabilistic, semantic competitors occasionally win out even in healthy speakers. A computational case-series analysis of 94 individuals with aphasia showed that the heightened probability of semantic and other whole-word substitution errors could be quantitatively modelled by reducing the weights on connections between semantic features and lemmas (Schwartz et al., 2006). In this theoretical context, our findings identify the mid-part of the middle temporal gyrus as an essential component of the neural instantiation of lemma selection.
This conclusion is compatible with other accounts of how the left anterior temporal lobe functions in semantic word production. For example, in the convergence zone framework developed by the Iowa group, higher order cortices, including temporal pole and inferotemporal regions, play an intermediate or mediational role in concept and word retrieval. These ‘convergence regions’ contain systems that promote the reinstatement of elemental representations of concepts or words in distributed brain regions, via recurrent connections (Damasio, 1989; Damasio and Damasio, 1994; Damasio et al., 2004). In applying this framework to their seminal PET/lesion findings on category-specific naming deficits, Damasio and colleagues argued that left temporal pole and inferotemporal cortices ‘hold knowledge about how to reconstruct a certain pattern (for example, the phonemic structure of a given word) within the appropriate sensorimotor structures’ (Damasio et al., 1996 p. 504). Here and elsewhere (e.g. Damasio and Damasio, 1992), they drew an explicit connection between the function ascribed to left temporal pole and inferotemporal cortices in convergence zone theory and the processing step in psycholinguistic theory that intervenes between concepts and word forms. That processing step is lemma selection and it is noteworthy that the area we identified as critical for semantic error production is fully contained within their temporal pole and inferotemporal area (Damasio and Damasio, 1992; Damasio et al., 1996; Fig. 1A).
The left anterior temporal lobe appears to be particularly important for conceptualizing and/or naming famous people and other unique entities (Damasio et al., 1996; Tranel et al., 1997; Gorno Tempini et al., 1998; Gorno Tempini and Price, 2001; Tranel, 2006), perhaps because these rostral brain areas subserve a finer, more specific level of conceptualization of persons or things (Damasio and Damasio, 1994; Gorno Tempini and Price, 2001; Martin and Chao, 2001; Patterson et al., 2007). Might left anterior temporal lobe damage then predispose to semantic naming errors by blurring close conceptual distinctions? Our evidence suggests not. Each of the comprehension tests that entered into the composite scores (Vcomp and NVcomp) required conceptualization of close semantic distinctions, so in factoring out these scores we controlled for this possibility. Moreover, the lesson from semantic dementia is that it is only with bilateral anterior temporal lobe damage that one begins to see the conceptual over- and under-generalization indicative of conceptual blurring (Lambon Ralph et al., 2001b; Patterson et al., 2007; Lambon Ralph and Patterson, 2008). Given this, and based on the fact that our anterior temporal lobe effect survived the filtering of conceptualization measures, a more likely possibility is that the left anterior temporal lobe is specialized for conveying fine-grained distinctions to the lexical system. In the interactive two-step model, these distinctions are sent to potential lemma units by the semantic-to-lemma connections. The weights of these connections (s-weights in Fig. 1) are set by a learning algorithm that emphasizes or de-emphasizes the contribution of certain features to lemma selection (Gordon and Dell, 2003; Oppenheim et al., in press). We suggest that damage to the left anterior temporal lobe blunts this finer grain of differentiation, thereby raising the probability of semantic errors.
It must be emphasized that the region within the left anterior temporal lobe that our analysis picked out almost certainly underestimates the extent of the temporal lobe involvement. As noted earlier, there were too few patients with lesions in the inferior temporal or fusiform gyri to detect effects there. Given the many studies reporting activation in these inferotemporal areas during semantic and/or lexical processing (Damasio et al., 1996), it is not unlikely that an association with SemErr would have been found there, at least in the unfiltered analysis. The weak coverage in these areas is a consequence of vascular anatomy and a well-known limitation of lesion mapping in post-stroke aphasia. A few studies have circumvented this problem by enrolling patients with other focal aetiologies (Damasio et al., 1996) or using functional neuroimaging to identify effects remote from the lesion (Sharp et al., 2004; Crinion and Price, 2005).
Except for the inferior temporal/fusiform gyri, coverage in our study was good for left peri- and extra-sylvian regions previously implicated in semantic word processing, and we did identify areas outside of the anterior temporal lobe that correlated with semantic error rates. One was a posterior temporal region located in the lateral superior sector of Brodmann area 37; the other was in lateral prefrontal cortex encompassing parts of the inferior and middle frontal gyri (Brodmann area 45/46). In both areas, voxels surpassed the critical threshold in the unfiltered analysis and in the analysis that filtered out Vcomp. However, filtering out NVcomp weakened effects here to the degree that no voxels in either area reached significance. NVcomp is a stringent measure of conceptual processing. The tests that comprise this measure—Pyramids and Palms and Camel and Cactus—require concept identification (what target and foils depict), extraction of relevant features (how target and foil are related) and comparison in working memory (which thematic relations are stronger). Accordingly, performance on these tests has proven sensitive to semantic control deficits as well as semantic representational deficits (Jefferies and Lambon Ralph, 2006; Noonan et al., in press).
The tests of the Vcomp composite involve a simple match on the basis of common reference or synonomy and so are less demanding of semantic control. Our interest in Vcomp rested on the possibility of identifying areas in which regressing out Vcomp, and not NVcomp, eliminated the association with SemErr. This would have indicated that the effect in these areas was due to variance shared between accessing words from concepts (in production) and accessing concepts from words (in comprehension). We did not identify any voxels that fitted this pattern. Given this, we interpret the results of the filtering analyses as evidence that a sizeable portion of the semantic naming errors generated from lateral superior Brodmann areas 37 and 45/46 originated in processes shared between picture naming and the non-verbal comprehension tests, namely retrieving and/or controlling semantic information during selection of the lexical concept.
The area we identified in lateral superior Brodmann area 37 may be part of a posterior temporal network serving word-level semantic processing (Hart and Gordon, 1990) and/or auditory sentence comprehension (Dronkers et al., 2004). Lack of coverage inferior to this area may explain why we did not confirm prior evidence that Brodmann area 37 plays a necessary role in concept-word mapping in production specifically (Raymer et al., 1997; Foundas et al., 1998; Hillis et al., 2001a; DeLeon et al., 2007; Cloutman et al., 2009). That finding could hinge on involvement of the basal temporal language area (Lüders et al., 1986). Among other things, the basal temporal language area seems to be important for converting visual-semantic information to phonological representations (Lüders et al., 1991; Usui et al., 2003; Trébuchon-Da Fonseca et al., 2009). Blocked access to target phonology has been identified as a possible basis for semantic error production in naming (Caramazza and Hillis, 1990; Hillis et al., 2001a).
There is also precedent in the literature for the association we found between semantic error production and damage in lateral prefrontal cortex. In their study with acute stroke patients, Cloutman et al. (2009) reported a voxel-based analysis of damage associated with the S+/− pattern (frequent naming errors with spared comprehension), which showed effects in voxels in Brodmann areas 44 and 46. In the primary analysis, in which multiple regions of interest were entered as predictor variables in a regression model, only Brodmann area 37 was significant; Brodmann areas 44 and 45 were entered but did not contribute.
It has also been shown that lesions in dorsal Brodmann area 44 create vulnerability to experimentally generated semantic interference in lexical access, manifesting in semantic errors (Schnur et al., 2006, 2009). That finding adds to the evidence that inferior prefrontal cortex is important for controlled semantic retrieval and/or competitive selection during word production (Bookheimer, 2002; Kan and Thompson-Schill, 2004; Moss et al., 2005; Thompson-Schill et al., 2005; Badre and Wagner, 2007).
Prefrontal areas outside the well-studied inferior complex (Brodmann area 44/45/47) may also play an important role in semantic word production. Recent evidence to this effect comes from an electrostimulation study that mapped the anatomy of semantic naming errors in patients undergoing surgical resection for low-grade dominant hemisphere glioma (Duffau et al., 2005). Seventeen (of 150) surgical patients produced such errors during stimulation mapping, most of whom had tumours located in the dominant temporal or frontal lobes. ‘Semantic sites’ (i.e. sites that yielded semantic naming errors during the stimulation) were mapped in cortex that bordered the tumour and in the fibre tracts exposed by the resections. Within the temporal lobe, semantic sites were found in the posterior part of the cortex surrounding the superior temporal sulcus and in white matter tracts deep to the sulcus, extending anteriorly. Within the frontal lobe, semantic sites were identified in the lateral orbito-frontal region and in the part of the medial frontal gyrus anterior to the dorsal pre-motor language area previously identified by Duffau and colleagues (Duffau et al., 2003).
The prefrontal voxel cluster we identified occupied part of the medial frontal gyrus corresponding to Brodmann area 46, along with Brodman area 45, which has a well-known association with semantic processing and competitive selection (for reviews see Bookheimer, 2002 and Devlin and Watkins, 2007). Medial frontal gyrus is considered part of the dorso-lateral prefrontal cortex, which is important for a variety of executive functions. Functional neuroimaging evidence implicates left middle frontal gyrus in verbal working memory (Smith et al., 1996) and mental search during lexical retrieval (Grabowski et al., 1998). This may explain why the association between middle frontal gyrus and semantic naming errors that we saw in the unfiltered analysis was statistically eliminated by the filtering of NVcomp. That is, it could be that damage to middle frontal gyrus generates semantic errors in naming by compromising mental search for the precise lexical concept, a process that is shared with NVcomp. This would bring our finding in line with the thesis that multi-modal semantic deficits in aphasia, unlike those in semantic dementia, have their basis in regulatory/executive processes supported by the frontal lobes (Jefferies and Lambon Ralph, 2006; Jefferies et al., 2007; Noonan et al., in press).
There is substantial evidence in both the neuroimaging and aphasia literature that cortices surrounding the posterior third of the left superior temporal sulcus, occupying posterior superior and middle temporal gyri (posterior superior temporal gyrus/middle temporal gyrus), participate in the brain system for multimodality comprehension (e.g. Hart and Gordon, 1990; Vandenberghe et al., 1996; Booth et al., 2002; Saygin et al., 2003). In agreement with this, studies by Hillis’ group and by Duffau et al. (2005) identified semantic sites in this posterior superior temporal gyrus/middle temporal gyrus area. We did not. While there were significant and near-significant voxels throughout middle temporal gyrus, the clusters were located anterior to posterior superior temporal gyrus/middle temporal gyrus (in mid to anterior middle temporal gyrus) or posterior to it (in lateral superior Brodmann area 37). The superior temporal gyrus, including Wernicke's area in the posterior third, yielded low t-values in both unfiltered and filtered analyses. It is evident from the coverage map (Fig. 2), that many patients had lesions in this area, so there was adequate power to detect effects here.
Doubts about the localization of semantic processing to posterior superior temporal gyrus (Wernicke's area) have been raised before, in the neuroimaging literature (Binder et al., 2009) and in lesion mapping studies in aphasia. For example, Dronkers et al. (2004) mapped lesions associated with auditory sentence comprehension scores in 64 chronic aphasic patients on a voxel-wise basis. Significant effects were found in five left hemisphere regions: posterior middle temporal gyrus (Brodmann area 21/37), anterior Brodmann area 22, superior temporal sulcus/Brodmann areas 39, 46 and 47. Notably, posterior Brodmann area 22 did not contribute (see also Bates et al., 2003). In the words of Dronkers et al. (2004), the classical association of Wernicke's area with language comprehension may be ‘epiphenomenal’ and due rather to the involvement of distinct surrounding areas.
It is conceivable that the strong and specific associations that the Hopkins group (Hillis et al., 2001a) found between comprehension errors and the Brodmann area 22 region of interest reflected the contribution of anterior Brodmann area 22 more than Wernicke's area in the posterior sector. Alternatively, their Brodmann area 22 effect could be revealing something specific to acute aphasia. It is well-known that patients who present acutely with Wernicke's aphasia sometimes show rapid resolution of the neologistic jargon and profound comprehension deficit that define this syndrome (Kertesz and Benson, 1970; Goodglass and Kaplan, 1983; Benson and Ardila; 1996). A functional neuroimaging study of speech comprehension in patients with chronic, left temporal lobe damage found right temporal lobe activation in these patients within the normal range; moreover, this activation was correlated with auditory comprehension scores outside the magnet (Crinion and Price, 2005; and for related evidence, see also Sharp et al., 2004 and Crinion et al., 2006). The implication is that in chronic patients, variation in auditory comprehension scores can be explained at least in part by the right hemisphere's capacity to sustain performance. In acute patients, tissue damage in posterior superior temporal gyrus could temporarily impede or suppress this residual capacity (e.g. through diaschisis), thereby yielding the correlation with performance that Hillis and colleagues observed.
This study presents the first definitive evidence for a causal relation between left anterior temporal lobe lesions and semantic errors in lexical access. Our results line up nicely with evidence from semantic dementia, convergence zone theory and meta-analyses of neuroimaging studies of word production. Drawing on the interactive two-step model of lexical access in naming, we suggest that the left anterior temporal lobe is specialized for conveying fine-grained semantic distinctions to the lexical system, at the level of abstract, pre-phonological word forms (lemmas). Focal damage to left anterior temporal lobe blunts this finer grain of differentiation, increasing the competition between the target word and its semantic neighbours and raising the probability that the competitor will be erroneously selected and realized in output.
National Institutes of Health (RO1 DC000191 to M.F.S., R01 MH073529 to D.Y.K.).
We are very grateful to the research participants and caregivers who made this study possible. We also wish to acknowledge the many research assistants and associates who helped with recruitment, testing and scoring, including Laura Barde, Laurel Brehm, Jacqueline Cairone, Krista Cullen, Jennifer Gallagher, Marisa Gauger, A. Cris Hamilton, Jesse Hochstadt, Rachel Jacobson, Laura MacMullen, Michelle Rapp and Paula Sobel. Finally, we wish to thank Argye Hillis, Daniel Tranel and an anonymous reviewer for their careful reading and helpful suggestions on an earlier draft.