|Home | About | Journals | Submit | Contact Us | Français|
Words, grammar, and phonology are linguistically distinct, yet their neural substrates are difficult to distinguish in macroscopic brain regions. We investigated whether they can be separated in time and space at the circuit level using intra-cranial electrophysiology (ICE), namely by recording local field potentials (LFP) from populations of neurons using depth electrodes implanted in language-related brain regions while people read words verbatim or grammatically inflected them (present/past, singular/plural). Neighboring probes within Broca’s area revealed distinct neuronal activity for lexical (~200 ms), grammatical (~320 ms), and phonological (~450 ms) processing, identically for nouns and verbs, in a region activated in the same patients and task in functional magnetic resonance imaging (fMRI). This suggests that a linguistic processing sequence predicted on computational grounds is implemented in the brain in fine-grained spatiotemporally patterned activity.
Within cognitive neuroscience, language is understood far less well than sensation, memory, or motor control, because language has no animal homologues, and methods appropriate to humans (functional magnetic resonance imaging (fMRI), studies of brain-damaged patients, and scalp-recorded potentials) are far coarser in space or time than the underlying causal events in neural circuitry. Moreover, language involves several kinds of abstract information (lexical, grammatical, phonological) that are difficult to manipulate independently. This has left a gap in understanding between the computational structure of language suggested by linguistics and the neural circuitry that implements language processing. We narrow this gap using a technique with high spatial, temporal, and physiological resolution, and a task that distinguishes three components of linguistic computation.
According to linguistic analyses, the ability to identify words, combine them grammatically, and articulate their sounds involve several kinds of representations, with logical dependencies among them (1, 2). For example, to pronounce a verb in a sentence, one must determine the appropriate tense given the intended meaning and syntactic context (e.g., “walk”, “walks”, “walked”, “walking”). One must identify the particular verb, which specifies whether to use a regular (e.g., “walked”) or irregular (e.g., “went”) form. In addition, one must unpack the phonological content of the verb and suffix to implement three additional computations: phonological adjustments in the sequence of phonemes (e.g., inserting a vowel between verb and suffix in “patted,” but not in “walked”), phonetic adjustments in the pronunciation of the phonemes (such as the difference between the “d” in “walked” and “jogged”), and conversion of the phoneme sequence into articulatory motor commands.
This logical decomposition does not entail that each kind of representation corresponds to a distinct stage or circuit in the brain. In many neural-network models, the selection of tense, discrimination of regular from irregular inflection, and formulation of the phonetic output are computed in parallel and in one time-step within a single distributed network (3, 4). Others contain loops and feedback connections, propagate probabilistic constraints, and iteratively settle into a globally stable state, with no fixed sequence of operations (5). Even stage models may incorporate cascades where partial information from one stage begins to feed the next before its computation is complete (6). Nonetheless the most comprehensive model of speech production, developed by Levelt, Roelofs, & Meyer (LRM), maximizes parsimony and falsifiability by implementing linguistic operations as discrete ordered stages, eschewing feedback, loops, parallelism, or cascades (7). They posit stages for lexical retrieval (which they associate with the left middle temporal gyrus at 150–225 ms after stimulus presentation), grammatical encoding (locus and duration unknown), phonological retrieval (posterior temporal lobe, 200–400 ms), phonological and phonetic processing (Broca’s area, 400–600 ms), self-monitoring (superior temporal lobe, beginning at 275–400 ms but highly variable in duration), and articulation (motor cortex) (8, 9).
Current evidence, however, leaves considerable uncertainty about the localization and timing of these components, especially grammatical processing. Although clinical studies report double dissociations in which a patient is more impaired in grammar than phonology or vice-versa (10), in most studies both abilities are linked to similar regions in the left inferior prefrontal cortex, particularly Broca’s area (11). Though Broca’s area itself has been identified as the seat of phonology, grammar, and even specific grammatical operations (12, 13, 14), lesion and neuroimaging studies have tied it to a broad variety of linguistic and nonlinguistic processes (15). This uncertainty may be a consequence of the coarseness of current measurements. It remains possible that grammatical and other linguistic processes are processed distinctly, even sequentially, in the microcircuitry of the brain, but techniques that sum over seconds and centimeters necessarily blur them.
In a rare procedure, electrodes are implanted in the brains of patients with epilepsy for clinical evaluation. Recordings of intra-cranial electrophysiology (ICE) from unaffected brain tissue during periods of normal activity can provide millisecond resolution in time with millimeter resolution in space. We recorded local field potentials (LFP) from multi-contact depth electrodes in three right-handed patients (age 38–51; above-average language and cognitive skills) whose electrodes were located in and around Broca’s area while they read words verbatim or converted them to an inflected form (past/present, singular/plural) (Figs. 1 and and2)2) (16). The task engages inflectional morphology, which is like syntax in combining meaningful elements according to grammatical rules, but the units are shorter and semantically simpler, making fewer demands on working memory and conceptual integration, thus allowing greater experimental control. We applied the high resolution of ICE to a task that distinguishes three linguistic processes to investigate the spatiotemporal patterning of word production in the brain.
In each trial, participants saw either the instruction “Repeat word” (the “Read” condition), or a cue that dictated an inflected form (“Every day they ____”; “Yesterday they ____”; “That is a ____”; “Those are the ____”). Next they saw a target word and produced the appropriate form silently (Fig. 1A) (16). The 240 target words were presented in uninflected form in the phrase “a [noun]” or “to [verb]” (17) (Fig. 1B). Half the targets were regular (e.g. “link”/“linked”) and half irregular (e.g. “think”/“thought”), to ensure that participants had to access the word rather than automatically appending the regular suffix (18).
The Null-Inflect (N) condition requires an inflected form of the verb (present tense) or noun (singular), yet these forms are not overtly marked and thus require the same output to be pronounced as in the Read (R) condition. The difference between these conditions thus implicates the process of inflection. In contrast, the Overt-Inflect (O) condition (past-tense verb or plural noun) requires that a suffix be added (regular) or the form changed (irregular). It thus differs from the Null-Inflect condition in requiring computation of a different phonological output (Fig. 1B; the label ‘phonological’ subsumes phonological, phonetic, and articulatory processes). The design was fully crossed, with trials presented in pseudorandom order.
To assess if these patients’ language systems were organized normally, and to correlate LFP with fMRI, we performed fMRI in two of the patients before their electrodes were placed. Their activation patterns were indeed similar to 18 healthy controls (Fig. 2A–C) (for other fMRI results see 19). Most of the 168 bipolar channels from which we recorded (across patients) were in fMRI-active regions (Fig. 2A–G). LFP that was significantly correlated with the task (p<.001, corrected; see 16) was recorded in about half (86/168) of the channels (19 channels in Patient A, 37 in B, and 30 in C). Of these channels, 49 (57%) were within Broca’s area or the anterior temporal lobes (16 in A, 19 in B, 14 in C). Of the 49 channels, 26 were within Broca’s area, and the majority (20/26) yielded a strong triphasic (3-component) LFP waveform (9 in Patient A, 8 in B, 3 in C). The mean peaks occurred ~200, ~320, and ~450 ms after the target word onset (Fig. 2A), and this timing was consistent across patients (Fig. 4A and B; fig. S1, fig. S4, fig. S5).
The three LFP components showed signatures of distinct linguistic processing stages (Fig. 2A–C). The ~200 ms component appears to reflect lexical identification. The timing converges with when word-specific activity has previously been recorded in the visual word form area (VWFA) (20, 21, but see 22), and when the VWFA has been shown to become phase-locked with Broca’s area (23). Furthermore, the magnitude of the component varied with word frequency, which indexes lexical access (24). Specifically, rare words (frequency 1–4) yielded a significantly higher amplitude (t(204)=3.32, p < .001) than common words (frequency 9 to 12) (Fig. 2A bottom; 25). Word frequency is inversely correlated with word length, but the present effect is not a consequence of length: we found no difference at ~200 ms between short (2–4 character) and long (6–11 character) words (Fig. 2A), nor a difference between one-morpheme and two-morpheme responses (26). Later components were not affected by frequency. Finally, consistent with the fact that lexical identification is required by all three inflectional conditions, the ~200 ms component did not vary across them. Primary lexical access is generally associated with temporal cortex rather than Broca’s area (8), so this component may index delivery of word identity information into Broca’s area for subsequent processing, consistent with anatomic and physiological evidence that the two areas are integrated (23,27). Although word-evoked activity in this latency range has previously been localized to Broca’s area with LFP (28) and MEG (29), it has not been demonstrated to be modulated by lexical frequency.
The subsequent two LFP components showed activity patterns predicted for grammatical and phonological processing, respectively (Fig. 2B and C). In the ~320 ms component (Fig. 2B) the Overt-Inflect and Null-Inflect conditions significantly differed from the Read condition, but not from each other. Thus, the ~320 ms component is modulated by the demands of inflection (required by Overt-Inflect and Null-Inflect but not Read), but not by the demands of phonological programming (required in Overt-Inflect but not in Null-Inflect or Read; recall Fig. 1C). In contrast, in a component appearing at ~450 ms, Overt-Inflect did differ from the Null-Inflect and Read conditions, which did not differ from each other (Fig. 2C). This contrasting pattern indicates that the ~450 ms component reflects phonological, phonetic, and articulatory programming, independently confirmed by its sensitivity to the number of syllables (Fig. 4C). Both components were recorded from Broca’s area in all patients (fig. S1), and specifically in Patient A (Fig. 1) from the inferior frontal gyrus pars triangularis deep in the inferior frontal sulcus. The ~320 ms component was recorded near the fundus; the ~450 ms component 5 mm more lateral along the sulcus within a sub-gyral fold that faced the fundus (Fig. 3I, fig. S1a). This region is often considered part of area 45 (but see 30).
The pattern of sign inversions across neighboring bipolar channels in space (Fig. 2A top) indicates that the generators of the LFP components were local (fig. S3), and the differences in inversions across components in time indicate that their generators were not identical (Fig. 3I and J). Thus the overall LFP pattern suggests a fine-grain spatiotemporal progression of lexical, grammatical, and phonological processing within Broca’s area during word production.
The triphasic pattern in all patients was found exclusively in Broca’s area (Fig. 4A and B). Outside Broca’s area other patterns prevailed: for example, temporal lobe sites showed a slow and late monophasic component at 500–600 ms (Fig. 4A bottom; fig. S4f and g) (31), possibly reflecting self-monitoring (7, 8). The condition differences for each component were also consistent across patients, replicating the temporal isolation of grammatical (~320 ms) from phonological (~450 ms) processing (fig. S1). The word-frequency effect on the ~200 ms component was significant in Patients A and B and marginal (p=0.06) in Patient C (fig. S2). The ~200, ~320, and ~450 ms components were consistent in their timing across patients, though the keypress reaction times, which require the self-monitoring process, varied among patients and conditions (fig. S6).
Although nouns and verbs differ linguistically and neurobiologically (32, 33), the neuronal activity they evoked was similar (Fig. 4B). Furthermore, the patterning across inflectional conditions was the same for nouns and verbs (34). These parallels suggest that words from different lexical classes feed a common process for inflection.
Further evidence that the LFP patterns reflect inflectional computation is that they are triggered by presentation of the target word, not the cue, even though the cues contain more visual and linguistic elements (Fig. 4D) (35). Furthermore, activity evoked by the cue showed little sensitivity to the inflectional conditions.
The LFP patterns are consistent with the computational nature of the task, and with independent estimates of the timing of its subprocesses. Inflectional processing cannot occur before the word is identified (especially as to whether it is regular or irregular), and phonological, phonetic, and articulatory processing cannot be computed before the phonemes of the inflected form have been determined. Word identification has been shown to occur at 170–250 ms (8, 29, 36), consistent with the ~200 ms component, and syllabification and other phonological processes at 400–600 ms, consistent with the phonological component at 400–500 ms (8). In naming tasks, speech onset occurs at around 600 ms (8), which is consistent with the self-monitoring behavioral responses we recorded (fig. S6). Self-monitoring has been localized to the temporal lobe (8), where we recorded LFPs in the post-response latency range that may correspond to previously described scalp ERPs (37).Working backwards from 600 ms, we note that motor neuron commands occur 50–100 ms prior to speech, placing them just after the phonological component we found to peak at 400–500 ms (38). In sum, the location, behavioral correlates, and timing of the components of neuronal activity in Broca’s area suggest that they embody, respectively, lexical identification (~200 ms), grammatical inflection (~320 ms), and phonological processing (~450 ms), in the production of nouns and verbs alike.
Although the language processing stream as a whole surely exhibits parallelism, feedback, and interactivity, the current results support parsimony-based models such as LRM (7) in which one portion of this stream consists of spatiotemporally distinct processes corresponding to levels of linguistic computation. Among the processes identified by these higher-resolution data is grammatical computation, which has been elusive in previous, coarser-grained investigations. As such the results are also consistent with recent proposals that Broca’s area is not dedicated to a single kind of linguistic representation but is differentiated into adjacent but distinct circuits that process phonological, grammatical, and lexical information (37, 39, 40, 41).
Supported by NIH grants NS18741 (E.H.), NS44623 (E.H.), HD18381 (S.P.), T32-MH070328 (N.T.S.), NCRR P41-RR14075; and the Mental Illness and Neuroscience Discovery (MIND) Institute (N.T.S.), Sackler Scholars Programme in Psychobiology (N.T.S.) and Harvard Mind/Brain/Behavior Initiative (N.T.S.). We heartily thank the patients. We also thank Efstathios Papavassiliou and Julian Wu for access to their patients; Suresh Narayanan, Nima Dehghani, Matthew T. Wheeler, Frank Kampmann and Larry Gruber for assistance with intracranial electrophysiological data; Rajeev Raizada for manuscript suggestions; Nicole M. Sahin; and two anonymous reviewers whose suggestions and encouragement greatly improved this paper.