This paper presents a new model of how lexical knowledge is represented and utilized and where it is stored in the human brain. Building on the dual pathway model of speech processing proposed by Hickok and Poeppel (2000
), its central claim is that representations of the forms of spoken words are stored in two parallel lexica. One lexicon, localized in the posterior temporal lobe and forming part of the ventral speech stream, mediates the mapping from sound to meaning. A second lexicon, localized in the inferior parietal lobe and forming part of the dorsal speech stream, mediates the mapping between sound and articulation.
Lexical knowledge is an essential component of virtually every aspect of language processing. Language learners leverage the words they know to infer the meanings of new words based on the assumption of mutual exclusivity (Merriman & Bowman, 1989
). Listeners use stored lexical knowledge to inform phonetic categorization (Ganong, 1980
) and to guide processes including lexical segmentation (Gow & Gordon, 1995
), perceptual learning (Norris et al., 2003
) and the acquisition of novel wordforms (Gaskell & Dumay, 2003
). Lexically indexed syntactic information also guides the assembly and parsing of syntactic structures (Bresnan, 2001
; Lewis et al., 2006
). By some estimates, a typical literate adult English speaker may command a vocabulary of 50,000 to 100,000 words (Miller, 1991
) in order to achieve these goals. Given this background, it is important to understand where and how words are represented in the brain.
Studies of this question date to the first scientific papers on the neural basis of language. In 1874 Carl Wernicke described a link between damage to the left posterior superior temporal gyrus (pSTG) and impaired auditory speech comprehension. He hypothesized that the root of the impairment was damage to a putative permanent store of word knowledge that he termed the wortshatz
or “treasury of words”. In his model, this treasury consisted of sensory representations of words that interfaced with both a frontal articulatory center and a widely distributed set of conceptual representations in motor, association and sensory cortices. In this model, Wernicke was careful to distinguish between permanent “memory images” of the sounds of words, and the effects of “sensory stimulation”, a notion akin to activation associated with sensory processing or short-term buffers (Wernicke, 1874/1969
). The broad dual pathway organization of Wernicke’s model has been supported by modern research (Hickok & Poeppel, 2000
; Scott & Wise, 2004
; Scott, 2005
), but his interpretation of the left STG as the location of a permanent store of auditory representation of words is open to debate.
The strongest support for the classical interpretation of the pSTG as a permanent store of lexical representations comes from BOLD imaging studies that show that activation of the left pSTG and adjacent superior temporal sulcus (STS) is sensitive to lexical properties including word frequency and neighborhood size (Okada & Hickok, 2006; Graves et al., 2007
). Neighborhood size is a measure of the number of words that closely resemble the phonological form of a given word. This result is balanced in part by evidence that a number of regions outside of the pSTG/STS are also sensitive to these factors (c.f. Prabhakaran et al., 2006
; Goldrick & Rapp, 2006; Graves et al., 2007
) and directly modulate pSTG/STS activation during speech perception (Gow et al., 2008
; Gow & Segawa, 2009
). This raises the possibility that sensitivity to lexical properties is referred from other areas, and that the STG/STS acts as a sensory buffer where multiple information types converge to refine and perhaps normalize transient representations of wordform.
This view of the STG/STS is consistent with both neuropsychological and neuroimaging evidence. In the 1970’s and 1980’s aphasiologists noted that damage to the left STG does not lead to impaired word comprehension (Basso et al., 1977
; Blumstein et al., 1977a
; Miceli et al., 1980
; Damasio & Damasio, 1980
). A review of BOLD imaging studies by Hickok and Poeppel (2007)
showed consistent bilateral activity in the mostly posterior STG in speech-resting state contrasts and adjacent STS when participants listened to speech as compared to listening to tones or less speech-like complex auditory stimuli. They interpreted this pattern as evidence that the bilateral superior temporal cortex is involved in high-level spectrotemporal auditory analyses, including the acoustic-phonetic processing of speech. This spectrotemporal analysis could in turn be informed by top-down influences from permanent wordform representations stored in other parts of the brain on the STG to produce evolving transient representations of phonological form that are consistent with higher level linguistic constraints and representations. This hypothesis is discussed in section 7.
At the same time that aphasiologists and neurolinguists were recharacterizing the function of the STG, psycholinguists were developing a more nuanced understanding of lexical processing. A distinction emerged between spoken word recognition
, the mapping of sound onto stored phonological representations of words, and lexical access,
the activation of representations of word meaning and syntactic properties. This distinction was reinforced by studies of patients who showed a double dissociation between the ability recognize words and the ability to understand them. Some patients had preserved lexical decision but impaired word comprehension (Franklin et al., 1994
; Hall & Riddoch, 1997
), while others showed relatively preserved word comprehension with deficient lexical decision or phonological processing (Blumstein et al., 1977b
; Caplan & Utman, 1994
). At a higher level, some patients showed more circumscribed deficits in word comprehension coupled with specific deficits in the naming of items in certain categories including colors and body parts (Damasio, McKee & Damasio, 1979
; Dennis, 1976
). This fractionation of lexical knowledge was accompanied by a widening list of brain structures associated with lexical processing. Disturbances in various aspects of spoken word recognition, comprehension and production were associated with damage to regions in the temporal, parietal and frontal lobes (c.f. Damasio & Damasio, 1980
; Gainotti et al., 1986
; Coltheart, 2004
; Patterson et al., 2007
The advent of functional neuroimaging techniques introduced invaluable new data that underscore the conceptual challenges of localizing wordform representations. Three types of studies have dominated this work: (1) word-pseudoword contrasts, (2) repetition suppression/enhancement designs, and (3) designs employing parametric manipulation of lexical properties. Many studies have contrasted activation associated with listening to words versus pseudowords. (Binder et al., 2000
; Newman & Twieg, 2001
; Kotz et al., 2002
; Majerus et al., 2002; 2005
; Bellgowan et al., 2003
; Rissman et al., 2003
; Vigneau et al., 2005
; Xiao et al., 2005
; Prabhakaran et al., 2006
; Orifanidou et al., 2006; Valdois et al., 2006; Raettig & Kotz, 2008
; Sabri et al., 2008
; Gagnepain et al., 2008; Davis et al., 2009
). These studies differ by task and in the specific wordform properties of both word and pseudoword properties. Nevertheless, several reviews and metanalyses have found several systematic trends in these data (Raettig & Kotz, 2008
; Davis & Gaskell, 2009
). A metanalysis of 11 studies by Davis and Gaskell (2009)
found 68 peak voxels that show more activation for words than pseudowords at a corrected level of significance. These included left hemisphere voxels in the anterior and posterior middle and superior temporal gyri, the inferior temporal and fusiform gyri, the inferior and superior parietal lobules, supramarginal gyrus, and the inferior and middle frontal gyri, In the right hemisphere, words produced more activation than nonwords in the middle and superior temporal gyri, supramarginal gyrus, and precentral gyrus. The same study also showed significantly more activation by pseudowords than words in 29 regions including a voxels in left mid-posterior and mid-anterior superior temporal gyrus, left posterior middle temporal gyrus, and portions of the left inferior frontal gyrus, the right superior and middle temporal gyri.
While these studies would appear to bear on the localization of the lexicon, it is important to note the lexicon is rarely invoked in this work. This subtraction is generally associated with the broader identification of brain regions supporting “lexico-semantic processing” (c.f. Raettig & Kotz, 2008
) or “word recognition” (c.f. Davis & Gaskell, 2009
). There are several reasons to suspect that a narrower reading of these subtractions that directly and uniquely ties them to wordform localization is unviable. Recognizable words trigger a cascade of representations and processes related to their semantic and syntactic properties that pseudowords either do not trigger, or trigger to a different extent1
. As result, many of the regions that are activated in word-pseudoword subtractions may be associated with the representation of information that is associated with wordforms, and not just wordforms themselves.
Behavioral and neuroimaging results provide converging evidence that suggests another limitation of the word-pseudoword subtraction as a tool for localizing wordform representations. One can imagine a system in which words activated stored representations of form, but nonwords did not. Given such a system, a word-pseudoword subtraction could be used to localize the lexicon. However, evidence from behavioral and neuroimaging studies suggests that pseudowords are represented using the same resources that are used to represent words. A number of behavioral results in tasks including lexical decision, naming, and repetition show that the processing of nonwords is influenced by the degree to which they resemble real words (c.f. Gathercole et al., 1991
; Gathercole & Martin, 1996
; Vitevitch & Luce, 1998
; Frisch et al., 2000
; Luce & Large, 2001
; Saito et al., 2003
). The overlap in operations is masked by word-pseudoword subtractions, but is apparent in BOLD results that employ resting state subtractions. Binder et al. (2000)
and Xiao et al. (2005)
showed almost identical patterns of activation in word-resting state and pseudoword-resting state subtractions. The only differences they reported were a tendency for more bilateral activation for words in the ventral precentral sulcus and pars opercularis in the Binder et al. study and less activation in the parahippopcampal region in the Xiao et al. study. Moreover, several studies have shown that pseudoword BOLD activation is influenced by the degree to which pseudowords resemble known words, with word-like pseudowords producing activation patterns that were more similar to those produced by familiar words than those produced by less-wordlike tokens (Majerus et al., 2005
; Raettig & Kotz, 2008
). Evidence for a shared neural substrate for the representation of words and pseudowords has implications for the nature of wordform representations (discussed in section 2). Moreover, it suggests that differential activation produced by listening to words and pseudowords relates to form properties of pseudowords that are not generally controlled for in this research.
Repetition suppression and enhancement designs offer a more targeted tool for localizing wordform representations. In word recognition tasks, repeated presentation of the same items leads to a reduction in response latency and increase in accuracy. This type of repetition priming is mirrored at a physiological level by repetition suppression and enhancement, in which repetition of a stimulus leads to changes in localized BOLD responses (see review by Hensen, 2003). Several studies using passive listening to meaningful words have demonstrated repetition suppression effects in left mid-anterior STS (Cohen et al., 2004
; Dehaene-Lambertz et al., 2006
). This finding was replicated by Buschbaum and D’Esposito (2009) who used an explicit “new/old” recognition judgment. They also found repetition enhancement or reactivation at the boundary of bilateral pSTG, anterior insula and inferior parietal cortex including the SMG.
The fact that words were used in these studies does not necessarily indicate that repetition effects reflect lexical activation. Activation changes could reflect representation or processing at any level (e.g. auditory, acoustic-phonetic, phonemic, lexical). In order to directly tie these effects to lexical representation it is necessary to control for the contribution of non-lexical repetition. Orfanidou et al. (2006)
addressed this issue by using different speakers for first and second presentations of words to minimize the influence of auditory representation, and by contrasting repetition effects associated with phonotactically matched word and pseudoword stimuli to target specifically lexical properties. They found no evidence of interaction between lexicality and repetition in any voxel in whole brain comparisons. This result is again consistent with the notion that word and pseudoword representation share a common neural substrate. Analyses collapsing across lexicality showed significant repetition suppression in the supplemental motor area (SMA), and bilateral inferior frontal posterior inferior temporal regions as well as repetition enhancement in in bilateral parietal, orbitofrontal and dorsal frontal regions as well as the right posterior inferior temporal gyrus and a region including the right precuneas and adjacent parietal lobe. The lack of anterior STS suppression in these results may reflect the diminished role of auditory effects due to the speaker manipulation. However, the lack of orthogonal manipulation of phoneme, syllable or diphone repetition make it unclear whether these effects are directly attributable to lexical representation.
The other primary BOLD imaging strategy for localizing lexical representation involves contrasts that rely on parametric manipulation of specifically lexical properties including word frequency, phonological neighborhood size and lexical competitor environment. This strategy (which is discussed again in section 3) is less widely used than word-pseudoword contrasts or repetition suppression/enhancement techniques, but has been explored by several groups. In an auditory lexical decision task, Prabhakaran et al. (2006)
found differential activation based on word frequency in left pMTG extending into STG and left aMTG. In contrast, Graves et al. (2007)
found frequency sensitivity in left hemisphere SMG, pSTG, and posterior occipitotemporal cortex and bilateral inferior frontal gyrus in a picture naming task. These results differ, but do show some overlapping STG activation and adjacent activations in the left posterior temporal lobe associated with word frequency. Differences in frequency sensitivity in the two studies in other areas may be related to differences in the task demands imposed by lexical decision versus overt naming.
Manipulations of neighborhood size have also produced different patterns of activation in different studies. Okada and Hickok (2006) found sensitivity to neighborhood size limited to bilateral pSTS in a passive listening task, while Prabhakaran et al. (2006)
found neighborhood effects in the left SMG, caudate and parahippocampal region in their auditory lexical decision task. In this case, the differences may be related to the differing attentional demands of passive listening versus lexical decision. In a study employing a selective attention manipulation during bimodal language processing, Sabri et al. (2008)
found that while superior temporal regions were activated in all speech conditions, differential activation associated with lexical manipulations (word-pseudoword subtraction) was only found when subjects attended to speech. This suggests that tasks such as passive listening that require only shallow processing may fail to produce robust activation outside of superior temporal cortex.
To summarize, the complex and often contradictory results seen in the BOLD imaging literature do not provide a simple resolution to the localization problem, but they do delineate a number of issues that any satisfying resolution must address. Claims about the localization of the lexicon must be framed in relation to a general understanding of the nature of lexical representation that specifically addresses the relationship between the representation of words, pseudowords and sublexical representations, and the causes of task effects.
Recent behavioral results and advances in the characterization of neural processing streams associated with spoken language processing suggest that some task effects may be attributable to a fundamental distinction between semantic and articulatory phonological processes. In one line of experimentation, researchers have found that listeners show different patterns of behavioral effects when presented with the same set of spoken word stimuli in similar tasks that tap phonological versus semantic aspects of word knowledge. Gaskell and Marslen-Wilson (2002)
showed that gated primes (e.g. captain
presented as /kæpt/ or /kaæptI/) produce significant phonological priming for complete words (CAPTAIN), but no priming and no effect of degree of overlap for strong semantic associates (e.g. COMMANDER). Norris et al. (2006)
found several similar differences between phonological and semantic cross-modal priming. They found both associative (date
– TIME) and identity (date
– DATE) priming when spoken primes were presented in isolation, but only identity priming when they were presented in sentences. In instances in which a short wordform in embedded in a longer wordform (e.g. date
) no associative priming was found for embedded words (sedate
-TIME), but negative form priming (sedate
-DATE) was found in sentential contexts. Together, these results demonstrate the dissociability of semantic and phonological modes of lexical processing in the perception of spoken words.
Gaskell and Marslen-Wilson (1997) explored the idea that semantic and phonological aspects of spoken word processing may be independent of each other in their distributed cohort model. Unlike earlier models (c.f. McClelland & Elman, 1986
) that assumed that lexical access is the result of an ordered mapping from acoustic-phonetic representation to phonological and then semantic representation, their model employed direct simultaneous parallel mapping processes between low-level sensory representations and distributed semantic and phonological representations.2
In their work, the decision to represent lexical semantics and phonology as separate outputs was motivated in part by computational considerations. Parallel architecture offers potentially faster access to semantic representations. This general organization also allows for the development of intermediate representations that are optimally suited for the mapping between a common input representation and different output representations.
The parallel mapping between low-level phonetic representations of speech and semantic versus phonological representation proposed by Gaskell and Marslen-Wilson is similar to the form of modern dual-pathway models of spoken language processing that draw on the pathology, functional imaging and psychological literatures and postulate separate routes from auditory processing to semantics and speech production (Hickok & Poeppel, 2000
; Wise, 2003
; Scott & Wise, 2004
; Scott, 2005
; Warren et al., 2005
; Rauschecker & Scott, 2009
). In these models auditory input representations are initially processed in primary auditory cortex, with higher-level auditory and acoustic-phonetic processing taking place in adjacent superior temporal structures. As in Gaskell and Marslen-Wilson’s model, subsequent mappings are carried out in simultaneous parallel processing streams. In the neural models these include a dorsal pathway that provides a mapping between sound and articulation, and a ventral pathway that maps from sound to meaning.
In the model developed by Scott and colleagues (Scott & Wise, 2004
; Scott, 2005
; Rauschecker & Scott, 2009
), the left ventral pathway links primary auditory cortex to the lateral STG and then the anterior STS (aSTS). No ventral lexicon is proposed in these models. In the Hickok and Poeppel model (2000
), the mapping between sound and meaning is mediated by a lexical interface located in the posterior middle temporal gyrus (pMTG) and adjacent cortices. This interface is the most explicit description of a lexicon in any of the dual stream models.
Parallels between the distributed model’s phonological output and the articulatory dorsal processing stream in dual stream models are less clear. One critical question is whether articulatory and phonological representations are the same thing. While phonological representation is historically rooted in articulatory description (Chomsky & Halle, 1968
), current theories of featural representation include both explicitly articulatory (c.f. Browman & Goldstein, 1992
) and purely abstract systems (c.f. Hale & Ross, 2008
). The lexical representations used in Gaskell and Marslen-Wilson’s model do not make a clear commitment to articulatory or non-articulatory representation.
In summary, despite widespread evidence that words play a central role in language processing, over a century of research has produced no clear consensus on where or how words are represented in the brain. This may be attributed to a number of factors including the methodological challenges inherent in discriminating between lexical activation, processes that follow on lexical activation, and the application of lexical processes to pseudoword stimuli. During the same period, evidence from dissociations in unimpaired and aphasic behavioral processing measures have pointed towards a potential dissociation between semantic and phonological or articulatory aspects of lexical processing that roughly parallels distinctions made in recent dual stream models of spoken language processing in the human brain. In the sections that follow I will develop a framework for understanding the organization and function of lexical representations and review evidence from a variety of disciplines that suggests the existence of parallel lexica in the ventral and dorsal language processing streams.