|Home | About | Journals | Submit | Contact Us | Français|
In this work, we apply Granger causality analysis to high spatiotemporal resolution intracranial EEG (iEEG) data to examine how different components of the left perisylvian language network interact during spoken language perception. The specific focus is on the characterization of serial versus parallel processing dependencies in the dominant hemisphere dorsal and ventral speech processing streams. Analysis of iEEG data from a large, 64-electrode grid implanted over the left perisylvian region in a single right-handed patient showed a consistent pattern of direct posterior superior temporal gyrus influence over sites distributed over the entire ventral pathway for words, nonwords, and phonetically ambiguous items that could be interpreted either as words or nonwords. For the phonetically ambiguous items, this pattern was overlayed by additional dependencies involving the inferior frontal gyrus, which influenced activation measured at electrodes located in both ventral and dorsal stream speech structures. Implications of these results for understanding the functional architecture of spoken language processing and interpreting the role of the posterior superior temporal gyrus in speech perception are discussed.
BOLD imaging results point to the existence of a large, functionally decomposable perisylvian processing network that supports speech perception in human listeners (see reviews by Demonet et al., 1992; Binder, 2000; Scott & Wise, 2004; Hickok & Poeppel, 2001; 2004; 2007). Electrophysiological results provide insight into the temporal organization of this process, showing evidence of a sequence of discrete temporal epochs reflecting early perceptually bound sensitivities followed by later sensitivities to properties of the word (c.f. Näätänen, 2001; Benton et al., 1985; Pylkkänen et al., 2003). In this paper we apply Granger causation analysis to high spatiotemporal resolution intracranial electroencephalography (iEEG) data to determine how activity in this network is coordinated during speech perception.
The broad pattern of activation within this system is well-known. High spatiotemporal resolution imaging techniques including anatomically constrained MEG/EEG imaging and intracranial EEG recordings show a systematic pattern of activation involving early bilateral activation of the posterior superior temporal gyrus (pSTG) peaking within 150 ms of stimulus onset for speech stimuli followed by spreading activation showing stimulus and task-dependent involvement of largely perislvian portions of all four lobes, often peaking at roughly 400 ms for words in nonsentential contexts (Crone et al., 2001; Marinkovic et al., 2003; Canolty et al., 2007; Gow et al., 2008). Early pSTG activation is consistent in timing and localization with evoked components including the mismatch negativity and N1 that show sensitivity to early perceptual properties of both nonsense syllables and spoken words (see reviews by Näätänen, 2001; Hillyard & Kutas, 1983). Similarly, activation of the larger perisylvian network is cotemporaneous with components including the N400 and M350 that are influenced by properties of phonological wordform, lexical semantics, and sentential predictability (Benton et al., 1985; Pylkkänen et al., 2002).
This sequence is broadly consistent with psycholinguistic models that describe spoken language processing in terms of a sequence of increasingly abstract representations, typically ranging from representations of phonetic feature cues to representations of wordform and ultimately meaning (c.f. Marslen-Wilson, 1987; McClelland & Elman, 1986). However, several lines of work suggest that parallel processes play a significant role in speech processing. Dual pathway models of general auditory processing have been well established based on a combination primate studies, human functional imaging, and pathology (c.f. Romanski et al., 1999; Arnott et al., 2004; Clarke & Thiran, 2004). The focus of the current work is on multiple pathway models of spoken language perception. Dual pathway models concerned specifically with spoken language processing date back to Wernicke’s (1874) model, which was based on evidence from aphasic patients. More recently, Scott and Wise (2004) and Hickok and Poeppel (2004; 2007) have proposed alternate dual processing stream models of speech perception based on evidence from human pathology and neuroimaging (Scott & Wise, 2004; Hickok and Poeppel, 2004; 2007). Like the general auditory models, both groups hypothesize the existence of ventral and dorsal processing streams. They differ primarily in role of the dorsal pathway, which is associated with sound localization in general auditory models, and with speech articulation verbal working memory in the speech models. Both Scott and Wise and Hickok and Poeppel suggest that there are two streams originating in the superior temporal gyrus (STG) – a ventral stream that is concerned with the mapping from sound to meaning and a dorsal stream is concerned with the mapping between sound and articulation. The two dual-route models make differing claims about the organization of processing within each pathway. Scott and Wise (2004) propose a serial constructivist processing framework, with successive regions within the “what” pathway showing sensitivity to increasingly complex levels of auditory and phonetic processing, and successive regions in the “how” pathway first providing an interface between auditory representation and articulatory representation and ultimately projecting to premotor cortex. This general organization is similar to that found in early visual pathways leading to striate cortex (Hubel, 1995).
In contrast, Poeppel and Hickok’s model (2004; 2007) suggests that within each stream, all processing dependencies are bidirectional, and parallel pathways exist within each processing stream. In the dorsal pathway all structures appear to have parallel links with the pSTG. In the ventral pathway there are parallel left and right hemisphere pathways, but processing within each hemisphere is roughly serial with a single chain of connections linking dorsal superior temporal gyrus to mid-posterior superior temporal sulcus and then to thee posterior middle temporal gyrus and posterior inferior temporal sulcus, which project to the anterior middle temporal gyrus and inferior sulcus. While this locally serial organization of the ventral pathways within each hemisphere is consistent with Scott and Wise’s serial model (2004), the global organization of both the ventral and dorsal pathways of the Poeppel and Hickok model is parallel. This account is generally consistent with current descriptions of higher (post-striate) cortical visual processing (van Essen, 2004). Both models are consistent with evidence from diffusion tensor imaging and intraoperative subcortical stimulation results that support the existence of parallel white matter tracts extending from posterior superior temporal cortex on ventral (inferior occipito-frontal fasciculus) and dorsal (superior longitudinal fasciculus,) pathways with broad perisylvian distribution (Duffau et al., 2008).
Significantly, direct neural evidence for serial or parallel processing within these processing streams has not been provided. Such data are scarce, in large part due to the challenges associated with acquiring the high spatiotemporal resolution measures of neural activity over the entire perisylvian network needed to determine the fine temporal structure and causal organization of localized perisylvian activation during speech processing.
Intracranial electrophysiological recording (iEEG) offers one potential solution to this problem. It provides better temporal resolution than can be obtained by BOLD imaging, and better spatial resolution than traditional EEG and MEG measures that are also subject to eye-movement and muscle artifacts as well as distortion caused by the physical distance between sources and detectors and impedance from intervening tissue that are not factors in iEEG (Arroyo & Lesser, 1999). In iEEG electrode position can be ascertained with great precision and accuracy through the use of intra-operative photography and co-registration of CT scans showing electrode grid placement with presurgical MRIs. Moreover, analyses relating the decay of coherence between activity detected at paired electrodes as a function of distance suggest that iEEG activity at a single electrode reflects local field potentials generated within a small (~ 1 cm) radius (Menon et al., 1996; Lauchaux et al., 2003).
In this paper we report the timing and distribution of left perisylvian electrophysiological activity in a single patient during a speech processing task. The spatiotemporal resolution offered by iEEG recordings provides a precise measure of the sequence of activation that may be used to distinguish between serial and parallel patterns of activation. We further explore the processing dependencies that produced these patterns using Granger causality analyses of sensor space data. Granger causation analysis is a statistical tool that is widely used in many disciplines to identify and quantify the strength of directional causal relations between multiple time –varying signals (Granger, 1969; Seth, 2005). In this case, we apply it to electrophysiological activity timeseries collected simultaneously at individual intracranial electrodes. The parallel-serial distinction may sometimes be a matter of scope. Systems may show globally parallel structure with locally serial organization within a processing stream. Given this caveat, there are clear predictions about how serial or parallel processing can be identified within a Granger analysis. If processing were purely serial, Granger causation analyses should reveal a single chain of cascading causal interactions emerging from a single origin. To the extent that processing is parallel there should be evidence of bifurcating pathways with individual brain regions directly influencing activation in multiple regions.
Analysis of the behavioral results showed that the patient was able to reliably discriminate between clear tokens of [s] and [∫] in both word and non-word contexts (p < .001). The endpoints of the continuum were categorized with greater than 95% accuracy. However, unlike normal subjects tested with the same stimuli and task in a related study (Gow et al., 2008), the patient’s behavioral data showed no evidence of a lexical bias in the interpretation of ambiguous tokens (p > .5). This may be related to the finding that the portion of the SMG that was responsible for producing a lexical bias in the Gow et al. study was the primary source of pathological activity in the patient.
Activation patterns were examined to determine when different components of the perisylvian language network become active during processing. The temporally evolving pattern of evoked broadband activation over the first 350 ms following stimulus onset in each of the three conditions is shown in Figure 1. In the following discussion the results of sensor space analyses are referenced to the locations of individual sensors over perisylvian cortex. In the word condition, activation developed first in the pSTG and neighboring anterior supramarginal gyrus (aSMG) during the first 50 ms following stimulus onset. This was followed by the concurrent activation of the inferior frontal gyrus (IFG) and middle temporal gyrus (MTG) emerging between 100 and 150 ms after stimulus onset. The non-word condition produced a somewhat different pattern, with broader STG activation extending into the IFG in the first 50 ms, a later appearance of consistent MTG activation between 200 and 250 ms, and a significant increase in pSTG activation beginning 150 ms after onset. There was also activation of the dorsal premotor area (DPM) peaking in the interval between 200 and 250 ms. Phonetically ambiguous tokens produced a distinctive pattern. Early activation was restricted to the anterior STG and IFG, and became strongest in the middle STG. DPM activation occurred earlier, and was stronger that it was in the non-word condition. Across conditions, all perisylvian areas that show strong activation at any point show significant activation by the end of the 200-250 ms interval. This general timeframe and pattern is consistent with results from other results using comparable stimuli (Marinkovic et al., 2003; Canolty et al., 2007; Gow et al., 2008).
The results of Granger causation analyses of broadband activity for the word, non-word, and phonetically ambiguous conditions are shown in Figure 2. All connections are significant at the p < .01 level (Bonferroni-corrected). While all analyses are conducted in sensor space, the precise localization of electrode sites, coupled with the narrow (~ 1 cm radius) spatial sensitivity of individual electrodes to local field potentials supports a simple linking hypothesis that licenses the discussion of these results in terms of localized source space generators.
All three conditions show several common properties. In each case, broadband activity at a single electrode (electrode 46), located in the lateral pSTG, Granger caused most of the activity at observed at other locations. In the word condition it is the only such driver. In that condition the pSTG Granger caused activity changes in the inferior parietal lobe, and 14 sites in the temporal lobe distributed over the superior, middle and inferior temporal gyri and sulci.
The non-word condition shows a very similar pattern of Granger causation with activity measured at electrode 46 influencing activity at electrodes in the inferior parietal lobe and 13 sites in the temporal lobe. Unlike the word condition, activity at one anterior STG (aSTG) electrode, electrode 34, was driven by both electrode 46 and electrode 35, which is also located in the aSTG.
The phonetically ambiguous tokens evoked a much larger pattern of Granger causation. As in the word and non-word conditions, electrode 46 in the pSTG drove activation over a network that included 11 temporal sites. In this condition, two electrodes in the IFG (electrodes 27 and 28) emerged as causal hubs, influencing activation in adjacent IFG electrodes, one electrode in the central sulcus, and multiple electrodes in the MTG and STG. Three other electrode sites in the temporal and parietal lobes influenced single electrodes. In total, four electrodes were influenced by two inputs. There was no evidence of cascading activation of the sort predicted by strictly serial processing models.
The pSTG Granger caused changes in activation over all of temporal structures associated with the ventral speech processing pathway in each of the three experimental conditions. Activation was found at several dorsal pathway structures in every condition, but only showed widespread patterns of Granger interaction in the phonetically ambiguous condition. The finding that even in this condition, ventral structures beyond the inferior parietal region were not influenced by the pSTG is at odds with previous results from studies that applied Granger analysis to combined MEG/EEG data (Gow et al., 2008; Gow & Segawa, 2009). This difference might be attributed to several factors. The patient’s seizure focus in the left SMG may be related to abnormal function in the dorsal pathway, or abnormal interaction between the pSTG and dorsal structures. It is also possible that the long temporal analysis window necessitated by the relatively small number of experimental trials that were available for analysis reduced the sensitivity of the analysis and missed weak interactions involving the posterior dorsal system.
The question then is whether the overall processing within the ventral stream is primarily parallel or serial. Both the accounts of Scott and Wise (2004) and of Hickok and Poeppel (2004; 2007) predict a serial pattern of cascading activation within the left ventral stream with activation at different points in the stream appearing in an identifiable sequence. While the timecourse of ventral activation across conditions could be consistent with either serial or parallel processing, the results of the Granger analyses clearly disconfirm the claims of both models that ventral processing is serial within each hemisphere. Consistent with parallel processing, there is widespread perislyvian activation within the first 100 ms following stimulus onset in all conditions. However, several areas, most notably the portions of the MTG in the word and nonword conditions and the middle STG in the ambiguous condition become active after other portions of the perislvian network. These instances of late activation could be interpreted in several ways. One interpretation is that activation in these areas depends on earlier activation in one or several areas that show prior activation. Another possibility is that these areas are involved in processes that simply require larger input units than regions that show earlier activation. Evidence from functional imaging and pathology suggests that the MTG provides an interface between phonological and semantic representation (Miglioreti & Boatman, 2003; Rissman et al., 2003; Rodd et al., 2005; Okada & Hickok, 2006; Prabhakaran et al., 2006). If this is the case, MTG activation may depend on the accumulation of enough input to support word recognition. In the current experiment, words may be distinguishable by 250 ms.
Granger causation analyses provide more definitive evidence. If activation were serial, one would expect to find a pattern of cascading dependencies (e.g. ActivationA causes ActivationB causes ActivationC). This is not the pattern we observed. While causal pathways converge at several points in the phonetically ambiguous condition, we found no instances of cascading dependencies. These results cannot be interpreted as evidence that cascading does not occur. Gow et al. (2008) found several instances of cascading organization in their study of unimpaired listeners. However, the current results suggest that there can be direct interaction between the pSTG and widespread components of the ventral pathway.
This result directly disconfirms predictions of both the Scott and Wise (2004) Hickok and Poeppel (2004; 2007) models of ventral pathway organization. Both models make the intuitive claim that intrahemispheric processing involves the serial activation of increasingly abstract representations, turning critically on lexical mediation. One might expect that abstract representations of the phonological wordform would be needed to mediate between phonetic representations and semantic, syntactic or verbal working memory representations that are generally associated with different portions of the perisylvian network. The pSTG plays a clear role in the processing and representation of complex auditory stimuli including speech (see Hickok & Poeppel, 2007 for review). However, it has not shown differential activation based on wordform properties such as neighborhood size, or word frequency that influence activation in other parts of the perisylvian network (Prabhakaran et al., 2006; Okada & Hickok, 2006). This suggests that it is not the site of representations of abstract phonological representations of wordform. One possibility is that representations of wordform in these other regions typically play a role in normalizing pSTG phonetic representations. Gow et al. (2008) found evidence for a feedback relationship in normal subjects between SMG and pSTG activation that could provide a basis for this type of normalization. In the current data, the patient’s pathology may have interfered with this process. If this is the case, normalized phonetic representations in the pSTG may provide a reliable basis for directly accessing higher order representations.
The subject was a 19 year-old, right-handed native speaker of American English with normal vision and hearing and intelligence in the normal range. The patient experienced his first seizure at the age of 14, and by the time of testing had been diagnosed with subjectively disabling, medically intractable epilepsy. Intracranial EEG recordings were made to identify the focus or foci of predominant epileptiform activity, with a view to surgical intervention. They revealed a pattern of complex seizures arising from the anterior supramarginal gyrus (SMG). These experiments were undertaken under the auspices of the local IRB in concordance with the Helsinki Accords and other guidelines governing the involvement of human subjects in research. All clinical decisions related to the recording procedure were made on purely medical grounds with no reference to this or any experiment. The patient was informed that participation in or declining to be involved in the experiment would not alter their clinical treatment in any way, and that he could withdraw at any time without jeopardizing their medical care.
Recordings were made using a surgically implanted 8 × 8 grid of electrodes with each 3 mm platinum disc macro-electrodes spaced 10 mm center-to-center (Ad-tech Medical Inc, Racone, Wisconsin). The grid was placed over the left perisylvian region with the exact location of electrodes located using intraoperative photographs and registration of CT scans with pre-operative MRIs. Recordings were made using a 128 channel clinical telemetry system (XLTEK, Ontario, Canada) with a sampling rate of 500 Hz. Recordings were referenced to an intradural electrode remote from the recording electrodes and facing away from the cortex. Recordings were stored with timing markers indicating the onset and offset of experimental events.
Intracranial EEG recordings were made during two testing sessions in which the subject performed a phoneme categorization task used in a previous study by Gow et al. (2008). The two testing sessions were conducted respectively 6 and 7 days after surgical implantation of the iEEG grid. The stimuli (described in more detail in Gow et al., 2008) consisted of a five-step [s]-[∫] continuum spliced to replace the onsets of the words sandal and shampoo. The continua were created through the weighted averaging of spectra from clear tokens of [s] and [∫] originally spoken in isolation and digitally manipulated to show the same duration (120 ms) and amplitude profile as naturally produced onset segments from the carrier words. These continua were originally created to examine the Ganong effect (Ganong, 1980), a widely documented perceptual phenomenon in which lexical context influences the categorization of phonetically ambiguous segments. The stimuli included clear tokens of unambiguous words (shampoo, sandal), unambiguous non-words (sampoo, shandal) and phonetically ambiguous items ([s∫]ampoo, ([s∫]andal). Individual trials consisted of the presentation of a visual fixation stimulus (500 ms), the presentation of the auditory stimulus (577 ms), a 260 ms silence, and then the presentation of a visual prompt for a behavioral response (lateralized “S” and “SH” for 1153 ms). Auditory stimuli were presented in using speakers, which presentation level set at a comfortable volume as described by the subject. The patient was instructed to indicate whether the item he heard began with an “S” or “SH” by pressing one of two buttons with his left hand. A randomly varying 100-300 ms silent interval was inserted between trials to minimize anticipatory activation. The design included 143 tokens each of unambiguous words, unambiguous non-words and maximally ambiguous items formed by assigning equal weights to [s] and [∫] during spectral averaging. An additional 95 tokens of the other two items from the continuum were included as distracter trials. Trial order was fully randomized.
Data from 160 trials were excluded due to the presence of epileptic activity. Two types of time-domain analyses were done on the remaining data. Broadband activity (0.1-200 Hz) was calculated on a trial-by-trial basis for each electrode over the interval from −500 ms to 1000 ms after the onset of the auditory stimulus. These values were then averaged within the experimental condition and normalized by subtracting activity measures in the 500 ms preceding stimulus onset from measures of power during stimulus presentation. We created spatially smoothed 2D plots of ERP-based activation from these baseline normalized average electrode data by creating 64-channel grid maps, which were then upsampled using a cubic spline interpolation technique to produce two-dimensional intensity plots.
Granger causation analyses were conducted on sensor space broadband activation timeseries for the first 1000 ms following stimulus onset for each of the three experimental conditions. Granger causation analysis (Granger, 1969) is a statistical jackknife procedure that relies on lagged timeseries regression analyses to determine whether and how strongly the behavior of one variable predicts the future behavior of another. To the extent that each timeseries is unique, and all potentially causal variables are represented, Granger analyses support inferences about directional causal interactions between variables. Granger analysis has been applied to neurophysiological data (cf. Bernasconi et al., 1999; Hesse et al., 2003; Brovelli et al., 2004; Supp et al., 2007) including iEEG data (c.f. Fredwald et al., 1999; Chávez et al., 2003; Oya et al., 2007). In closely related work, Gow et al. (2008) applied Granger analysis to MRI constrained MEG and EEG data to explore the influence of lexical representation on the perception of ambiguous speech sounds. Granger causality analysis of iEEG data offers a tradeoff. While iEEG provides higher spatiotemporal resolution than other human imaging techniques including multimodal imaging, it tends to provide more limited cortical coverage, and thus may lead to the misinterpretation of some connections when causal variables outside of electrode coverage influence activation within the grid.
Granger causation analysis critically assumes that all timecourses are covariance stationary (Granger, 1969). We tested this assumption using the Dickey-Fuller test (alpha = 0.05) to test for unit roots. When a unit root was found all timeseries were submitted to a backwards differencing function (Seth, 2005) and retested until the data conformed to the stationarity assumption.
We relied on a statistical implementation of Granger Causality analysis developed by Seth (2005) that employs lagged multivariate vector autogression. In order to avoid overfitting in the regression analysis while still identifying all potentially causal connections we used the Akaike Information Criterion or AIC (Akaike, 1974) to identify the optimal model order. This analysis showed that a model order with lag of two produced the lowest AIC value. With a sampling rate of 500 Hz, this amounts a to 10 ms lag. Model orders of 2-6 were also examined, and all yielded comparable results. This range of model orders is consistent with those used in similar studies (c.f. Kaminski et al., 2001; Gow et al., 2008; Gow & Segawa, 2009). Multivariate vector autoregression modeling was conducted on a trial-by-trial basis. Following Seth (2005) the Bonferroni-corrected statistical significance of Granger causation between any two nodes was determined by the use of an F-test comparing the error terms associated with complete versus restricted models in which the potentially causal signal was removed from the model.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.