|Home | About | Journals | Submit | Contact Us | Français|
Debate surrounds the precise cortical location and timing of access to phonological information during visual word recognition. Therefore, using whole head magnetoencephalography (MEG), we investigated the spatiotemporal pattern of brain responses induced by a masked pseudohomophone priming task. Twenty healthy adults read target words that were preceded by one of three kinds of nonword prime: pseudohomophones (e.g., brein-BRAIN), where 4 of 5 letters are shared between prime and target, and the pronunciation is the same; matched orthographic controls (e.g., broin-BRAIN), where the same 4 of 5 letters are shared between prime and target but pronunciation differs; and unrelated controls (e.g., lopus-BRAIN), where neither letters nor pronunciation are shared between prime and target. All three priming conditions induced activation in the pars opercularis of the left inferior frontal gyrus (IFGpo) and the left precentral gyrus (PCG) within 100ms of target word onset. However, for the critical comparison which reveals a processing difference specific to phonology, we found that the induced pseudohomophone priming response was significantly stronger than the orthographic priming response in left IFG/PCG at ~100ms. This spatio-temporal concurrence demonstrates early phonological influences during visual word recognition and is consistent with phonological access being mediated by a speech production code.
Extensive research has shown that phonological processing skill is a critical predictor of reading acquisition (Bradley & Bryant, 1983) and has been identified as a source of difficulty in dyslexia (Goswami, 2000 for review). A common technique used to probe the earliest stages of processing in visual word identification is the masked priming paradigm. Such studies demonstrate reaction time advantages when pseudohomophones (e.g., brein) prime target words like ‘BRAIN’ as compared to orthographic control primes (e.g., broin) (Lukatela & Turvey, 1994; Perfetti & Bell 1991). This pseudohomophone priming effect is typically interpreted as indicating that the initial access code for word recognition is phonological in nature.
Although behavioral masked priming studies have suggested that phonological access occurs as quickly as 50–100ms after words are presented (Ferrand & Grainger, 1993), such studies cannot determine precisely the time course of events that comprise visual word recognition. In part this is because outcome measures like reaction time represent the output of the system as a whole. But more importantly, experimental manipulations such as varying prime duration do not necessarily provide direct information about the time course of processing. For example,Rayner et al. (2003) demonstrated that exposure to text as brief as 60ms is sufficient for lexical information to be extracted, but this was indexed by changes in eye-fixation duration ~250ms post-stimulus. Thus, observing an experimental effect with a 60ms prime doesn't necessarily mean that a particular processing step happens within 60ms. Rather, 60ms worth of input provides sufficient information to permit that process to occur, at whatever time point thereafter.
To elucidate when and where phonological access occurs during visual word recognition, time sensitive neurophysiological measurements are ideal. Typically, the earliest EEG correlates of phonological priming have been found around 200–300ms following word presentation (Grainger et al., 2006; Sereno et al., 1998). An exception to this was reported by Ashby et al. (2009) who recorded EEG as participants read targets with voiced and unvoiced final consonants (e.g. fad, fat), preceded by pseudoword primes that were incongruent or congruent in voicing and vowel duration (e.g. fap, faz). Phonological feature congruency modulated ERPs by 80ms, indicating that sub-phonemic features can be activated rapidly during word recognition. This latter finding is consistent with recent MEG studies showing early responses to printed words ~100ms after stimulus onset in the left inferior frontal gyrus, pars opercularis (IFGpo) and the precentral gyrus (PCG) (Pammer et al., 2004; Cornelissen et al., 2009). Put together, these neurophysiological data imply that phonological activation may indeed occur around ~100ms, and may be mediated by the IFGpo/PCG. However, such a conclusion is premature because neither Pammer et al. (2004) nor Cornelissen et al. (2009) specifically manipulated phonology in relation to IFGpo/PCG activity, and Ashby et al. (2009) did not localize their ERP data. Therefore, to test this idea, we used MEG to measure brain responses during a masked pseudohomophone priming task, and analyzed the data with cortical source reconstruction methods that provide high temporal resolution (milliseconds) and good spatial resolution (estimated to be 5mm for 85% of voxels; Barnes et al., 2004).
Twenty native English-speaking, strongly right-handed adults (mean age 23.2 years, SD 5.97 months; 12 female) gave informed consent to participate in the study. None had been diagnosed reading disabled and all read normally based on WRAT-III performance. Handedness was defined by the Annett Hand Preference Questionnaire (Annett, 1967). The study conformed with The Code of Ethics of the World Medical Association (Declaration of Helsinki).
The target words were 111 English 5-letter nouns and verbs, with a mean word frequency count of 19.7 (CELEX). These were primed by pseudohomophones of the target word (PSEUD), matched orthographic nonwords (ORTH) and unrelated nonwords (UNREL). Pseudohomophone primes shared four out of five letters with their target word (brein-BRAIN) and were pronounced identically. Orthographic control primes shared the same four out of five letters with the target word but were pronounced differently (broin-BRAIN)1. Unrelated primes were pseudowords that shared no letters (in any position) with the target word, (lopus-BRAIN). All three prime types were matched on bigram frequency using a positional bigram frequency count derived from the 5-letter words in CELEX. The mean log10 frequencies were: pseudohomophones 5.639 (SEM 0.033); orthographic primes 5.687 (SEM 0.027) and unrelated primes 5.635 (SEM 0.034). A one-way ANOVA for positional bigram frequency score was not significant, F(2,330)=0.06, p=0.94, indicating no condition contained primes made up of more frequently occurring letter pairs than any other condition. Catch trials were randomly interspersed with experimental trials. Target catch trials had an animal name as the target, ensuring the participant had a purpose for attending to the stimuli (i.e. to spot the animal names). Prime catch trials had an animal name as the prime with the purpose of monitoring the visibility of the primes.
Participants were asked to rapidly and silently read target words and to press a button only if they spotted an animal name2. The experiment consisted of 373 trials (including 40 catch trials) of 1890ms separated by a fixation cross with duration randomly jittered between 1200–2200ms. Each trial comprised: 300ms blank screen, 500ms forward mask ‘#####’, 66.7ms lowercase prime, 16.7ms backward mask ‘#### #’, 300ms uppercase target word and 500ms blank screen.
Stimuli were back-projected (60Hz vertical refresh) as light grey words and symbols (Arial Monospace 24pt) on a dark grey background using Presentation v12.0 (Neurobehavioural Systems, Inc.). At a viewing distance of ~75cm stimuli subtended ~1° vertically and ~5° horizontally. Each participant saw each of the 111 target words three times, once for each priming condition (PSEUD, ORTH and UNREL), making a total of 333 trials. A pseudorandom blocked design ensured each participant saw a unique overall target word presentation order, and across six participants, prime-target relationships were counterbalanced.
MEG data were collected continuously using a 4D Neuroimaging Magnes 3600Whole Head, 248 channel system, with the magnetometers arranged in a helmet shaped array. Data were sampled at a rate of 678.17Hz (200hz anti-alias filter). Head shape and head coil position were recorded with a 3-D digitizer (Polhemus Fastrak), and used for co-registration (Kozinska et al., 2001) with a high resolution T1 weighted anatomical volume reconstructed to 1 mm isotropic resolution, acquired using GE 3.0T Signa Excite HDx.
Neural sources of activity were reconstructed with an in-house modified type I vectorized linearly-constrained minimum-variance beamformer (Van Veen et al., 1997; Huang et al., 2004). In a beamforming analysis, the neuronal signal at a location of interest in the brain is constructed as the weighted sum of the signals recorded by the MEG sensors, the sensor weights computed for each location forming 3 spatial filters, one for each orthogonal current direction. The beamformer weights are determined by an optimization algorithm so that the signal from a location of interest contributes maximally to the beamformer output whereas the signal from other locations is suppressed. For a whole brain analysis, a cubic lattice of spatial filters is defined within the brain (here 5-mm spacing), and an independent set of weights is computed for each of them. The outputs of the 3 spatial filters at each location in the brain are then summed to generate the total power at each so-called ‘virtual electrode’ (VE) over a given temporal window and within a given frequency band.
The localisation accuracy of spatial filtering approaches to source analysis has been found to be superior to that of alternative MEG analysis techniques such as minimum norm (Sekihara et al., 2005). However, the accuracy of spatial filtering approaches can be affected by several factors, including the length of the analysis window, signal-to-noise level, and the signal bandwidth (Brookes et al., 2008). Simulation studies have suggested that type 1 spatial filters maintain localisation accuracy at adverse SNRs and are not prone to produce 'phantom' sources of activity (Huang et al., 2004).
The main limitation of MEG is the difficulty in detecting and localizing deep sources. However, Hillebrand and Barnes (2002) have demonstrated ~90% detection rate for MEG signals in IFGpo/PCG, middle occipital gyrus (MOG), and indeed most of the cortical network involved in reading which is the concern of the current study. An exception to this is the medial portion of the middle and anterior fusiform gyrus, where detection probability reduces to ~50%. In addition there is a theoretical restriction in resolving perfectly temporally correlated sources (Van Veen et al., 1997). However, perfect correlation between distinct sources is unlikely and beamforming has been shown to resolve even highly temporally correlated sources (Huang et al., 2004).
A major advantage of beamformer analysis relative to alternative source localisation techniques, such as equivalent current dipole modelling or minimum norm estimation, is the ability to image changes in cortical oscillatory power that do not give rise to a strong signal in the evoked-average response. Evoked signal components tend to have a stereotypical wave shape that is phase-locked to the onset of the stimulus in such a way it can be revealed by both the evoked average in the time domain and by frequency domain analyses. In contrast, induced components are those changes in oscillatory activity which, though they may occur within a predictable time-window following stimulus onset, lack sufficient phase locking to be revealed by averaging in the time domain. They are, however, revealed by changes in power in the frequency domain.
After acquisition, the MEG data were segmented into epochs running from 900ms before target onset to 800ms after. Epochs containing artifacts, such as blinks, articulatory movements, swallows and other movements, were rejected.
Previous MEG studies of visual word recognition have revealed a complex spread of activation across the cortex with time (Tarkiainen et al, 1999; Pammer et al., 2004; Cornelissen et al., 2009). The earliest components of this pattern occur in occipital, occipito-temporal, and prefrontal cortex ~100–150ms post stimulus. Therefore, as a compromise between being able to reveal this temporal pattern across the whole brain and being able to resolve oscillatory activity as low as 5–10Hz, we conducted beamforming analyses for 200ms long windows.
At the first, within subject level of statistical analysis, we computed a paired sample t-statistic for each point in the VE grid. To do this, we compared the mean difference in oscillatory power (averaged across epochs) in four frequency bands: 5–15Hz, 15–25Hz, 25–35Hz & 35–50Hz between a 200ms passive window (i.e. −790 to −590ms before target onset), which was shared between all conditions, and two active time windows (0–200ms and 200–400ms following target onset). This procedure generates separate t-maps for each participant, for each contrast, at each of the frequency-band/time-window combinations. Individual participant’s t-maps were then transformed into the standardized space defined by the Montreal Neurological Institute (MNI).
At the second, group level of statistical analysis, we used a multi-step procedure (Holmes et al., 1996) to compute the permutation distribution of the maximal statistic (by re-labelling experimental conditions), in our case the largest mean t-value (averaging across participants) from the population of VEs in standard MNI space (Nichols & Holmes, 2004). For a single VE, the null hypothesis asserts that the t-distribution would have been the same whatever the labelling of experimental conditions. At the group level, for whole brain images, we rejected the omnibus hypothesis (that all the VE hypotheses are true) at level α=0.05 if the maximal statistic for the actual labelling of the experiment was in the top 100α% of the permutation distribution for the maximal statistic. This critical value is the (c+1)th largest member of the permutation distribution, where c=[αN], αN rounded down. This test has been formally shown to have strong control over experiment-wise Type I error (Holmes et al., 1996).
At specified ROIs, we wanted to compare the evoked and induced frequency components between experimental conditions, retaining millisecond temporal resolution. We selected ROIs based on peaks in the group level analyses, and used separate beamformers to reconstruct the time series at these sites. We used Stockwell transforms (Stockwell et al., 1996) to compute time-frequency plots for each participant for each condition, and used generalized linear mixed models (GLMM) to compare these at the group level. The GLMMs included repeated measures factors to account for the fact that each participant’s time-frequency plot is made up of multiple time-frequency tiles. Time-frequency (spatial) variability was integrated into the models by specifying a spatial correlation model for the model residuals (Littell et al., 2006).
To verify our task design and stimuli, 18 participants (none of whom subsequently participated in the MEG) read aloud target words primed by the PSEUD (mean vocal reaction time [VRT] 419.1ms, SEM 21.9ms), ORTH (mean VRT 439.8ms, SEM 21.1ms), and UNREL (mean VRT 467.5ms, SEM 19.3ms) conditions, thus confirming a 21ms pseudohomophone effect. A repeated measures ANOVA with post-hoc comparisons revealed significant differences between PSEUD & ORTH (t1,34=6.13, p<0.0001), PSEUD & UNREL (t1,34=14.35, p<0.0001) and ORTH & UNREL (t1,34=8.23, p<0.0001). In MEG, participants were very poor at correctly identifying animal words in the prime position (mean d’=0.40, SD 0.77), indicating appropriately low awareness of primes. In comparison, participants correctly identified animal words in the target position with a mean d’=3.55 (SD 0.54), indicating participants were successfully attending to the task.
Figure 1a illustrates 3D rendered images for a representative condition (PSEUD), thresholded at p<0.05 (corrected). During the first 200ms following target onset, in all three conditions cortical activity was centred on left IFGpo, PCG, and left and right middle occipital gyri (MOG). However, the inherent uncertainty in spatial localisation of MEG beamforming analysis prevented us from clearly distinguishing the extent to which activity was localized in either IFGpo or PCG alone or whether there was functionally distinct activity in both areas. Therefore, henceforth we label this cluster of activation as IFGpo/PCG. Activation during this time window also extended inferiorly towards left and right mid fusiform gyri, and superiorly towards right superior parietal lobule.
Figure 1b shows substantial overlap in IFGpo/PCG activity for all three conditions in the first 200ms. During the 200–400ms following target onset, all conditions activated additional reading-related regions, including anterior middle temporal gyrus, left posterior middle temporal gyrus, angular and supramarginal gyri, and left superior temporal gyrus.
We performed region of interest analyses on the IFGpo/PCG (centred on MNI co-ordinate: −56, 4, 18) and the left and right MOG (centred on MNI co-ordinates: −26, −96, 8 and 24, −98, 10 respectively) sites visible in the 0–200ms window (see Figure 1a) to compare the strength of responses between conditions. Figure 1c shows the results of group level comparisons of time-frequency plots for the critical comparison between PSEUD and ORTH 3. It demonstrates that shared phonology between prime and target, over and above shared orthography, results in significantly greater induced 30–40Hz activity (blue-aqua scale) at IFGpo/PCG ~100ms after stimulus presentation. No such differences were found in MOG.
Within 100ms of target word onset, we observed stronger responses to pseudohomophone priming than to orthographic priming of visually presented words in a cluster that includes pars opercularis of the left inferior frontal (IFGpo) and/or precentral gyri (PCG). These findings therefore demonstrate an early neurobiological response to phonological priming during visual word recognition within a time frame that is consistent with behavioral studies and Ashby et al.’s (2009) ERP result. Furthermore, these data provide additional confirmation of the early activation of IFGpo/PCG in response to visually presented words as reported by Pammer et al. (2005) and Cornelissen et al. (2009).
Involvement of left posterior IFG is not unique to language and visual word recognition tasks. For example, there is fMRI evidence for a role during motor imitation (Buccino et al., 2004) and in cognitive control (Snyder et al., 2007). However, the early difference we observed between the PSEUD and ORTH conditions was obtained from an event related paradigm in which the task, silent reading, was identical for all trials, and where participants could not predict the nature of the up-coming prime, or even detect it reliably. Therefore, we argue it is most parsimonious to attribute our findings to a stimulus driven differential effect in phonological priming, rather than top-down alternatives such as cognitive control. Indeed, we interpret the early engagement of MOG and IFGpo/PCG as reflecting prelexical orthographic-phonological mapping between these regions. Several lines of evidence are consistent with this idea. First, abstract representations of letters/letter-clusters are available in MOG as early as ~100ms after a printed word is presented (Tarkiainen et al., 1999; Hauk et al., 2006) thus providing the necessary orthographic component of the mapping within an appropriate timeframe. Second, white matter fibre tract connections between the inferior and middle temporal cortices, MOG and IFGpo/PCG may be carried by the superior longitudinal fasciculus (Wakana et al., 2004; Bernal and Altman, 2009), and these could provide the necessary anatomical connectivity. Third, MEG evidence of functional connectivity between MOG and IFGpo/PCG from reading tasks indicates that nodes in the left occipito-temporal cortex can cause the activity observed in prefrontal nodes of the reading network (Kujala et al., 2007). Fourth, early activation of IFGpo/PCG should be observed for pronounceable letter strings not only for silent reading tasks as demonstrated in the current study (see Fig. 2b) but also for visual lexical decision and for passive viewing of words, and this has been shown by Pammer et al. (2004) and Cornelissen et al. (2009) respectively.
The difference in induced oscillatory responses between pseudohomophone priming and orthographic priming at LIFGpo/PCG occurred within 100ms of target onset, whereas differences in the evoked activity were not apparent in IFGpo until ~150–200ms. This may provide an explanation for the failure of most EEG studies to identify such an early neurophysiological signature for fast phonological priming. Because analyses of EEG data are often restricted to the evoked average signal, only the brain responses that are phase-locked to target onset are routinely observed.Ashby et al. (2009), however, used short 3-letter stimuli, which may have aligned the phases of the cortical responses to individual trials sufficiently to reveal a significant effect of phonological consistency in their evoked averaged analysis, analogous to the recognition point for spoken stimuli.
Although the inferior frontal gyrus is implicated in many functions, direct recording in surgical patients (Greenlee et al., 2004) and fMRI studies (Brown et al., 2008) indicate that IFGpo/PCG in particular is strongly associated with motor control of speech articulators. Further evidence that this region is associated with speech production codes comes from Pulvermüller et al. (2006) who found that when individuals listened to speech sounds, somatotopic representations of articulatory features were activated in PCG which were spatially consistent with the motor representations required for generating those same speech sounds. Finally, activation of this speech-motor region is consistent with findings from behavioral studies suggesting that the phonology accessed in visual word recognition is sensitive to articulatory characteristics of the words (Abramson & Goldinger, 1997; Lukatela et al., 2004). In conclusion therefore, the early involvement of IFGpo/PCG in pseudohomophone priming supports a role for these sites in prelexical access to phonological information during visual word recognition. Moreover, these findings suggest that early word recognition may be achieved by a direct print-to-speech mapping mediated by a speech production code.
We are grateful to Andy Ellis of York University and Michael Simpson of York Neuroimaging Centre who provided advice and guidance at various stages of this project regarding the development of the experimental protocols, MEG data acquisition and analysis. We thank Vesa Kiviniemi, Department of Statistics, University of Kuopio, Finland, for advice on the statistical analyses of the time-frequency plots and to Jane Ashby of Central Michigan University for comments on drafts of the manuscript.
11 Five observers judged whether pseudohomophones sounded like target words and whether orthographic controls sounded different from target words. Winer’s inter-rater reliability for these decisions was 0.97.
2Participants could be heard over an intercom at all times ensuring they were not reading words aloud
3These statistical contours are based on the estimated marginal means derived from the model parameters and the predicted population margins were compared using tests for simple effects by partitioning the interaction effects.