|Home | About | Journals | Submit | Contact Us | Français|
This study investigated the neural plasticity associated with perceptual learning of a cochlear implant (CI) simulation. Normal-hearing listeners were trained with vocoded and spectrally-shifted speech simulating a CI while cortical responses were measured with fMRI. A condition in which the vocoded speech was spectrally inverted provided a control for learnability and adaptation. Behavioral measures showed considerable individual variability both in the ability to learn to understand the degraded speech, and in phonological working memory capacity. Neurally, left-lateralized regions in superior temporal sulcus and inferior frontal gyrus (IFG) were sensitive to the learnability of the simulations, but only the activity in prefrontal cortex correlated with inter-individual variation in intelligibility scores and phonological working memory. A region in left angular gyrus (AG) showed an activation pattern that reflected learning over the course of the experiment, and co-variation of activity in AG and IFG was modulated by the learnability of the stimuli. These results suggest that variation in listeners' ability to adjust to vocoded and spectrally-shifted speech is partly reflected in differences in the recruitment of higher-level language processes in prefrontal cortex, and that this variability may further depend on functional links between the left inferior frontal gyrus and angular gyrus. Differences in the engagement of left inferior prefrontal cortex, and its co-variation with posterior parietal areas, may thus underlie some of the variation in speech perception skills that have been observed in clinical populations of CI users.
Cochlear implants (CIs) can restore hearing after sensorineural hearing loss, or provide auditory input to children born deaf. These prostheses deliver tonotopically distributed electrical stimulation to the auditory nerve via an electrode array that is inserted into the cochlea. CIs provide a limited degree of spectral resolution, sufficient for good speech intelligibility in quiet (Shannon et al., 1995), but lose much detail of the original signal. Acoustic cues that are important for decoding verbal and nonverbal information may thus be weakened or lost. Post-lingually deafened, adult CI users commonly report that speech and voices sound very different to their memory of speech, and it can take some time for users to adapt to the new input (Tyler et al., 1997; Reiss et al., 2007; Moore and Shannon, 2009). In addition to the limited availability of acoustic cues, the placement of the electrode array can further affect the intelligibility of the speech signal (Skinner et al., 2002; Finley et al., 2008). If the array has a relatively shallow insertion into the cochlea, the spectral features of speech may be signaled at more apical places than in normal hearing. Thereby, the speech signal is in effect shifted up in frequency (Dorman et al., 1997; Shannon et al., 1998; Rosen et al., 1999).
Some CI users learn to use their device exceedingly well, and are even able to use the telephone with ease. There is, however, considerable inter-individual variability in outcome, for which the main predicting factors include age of implantation, duration of deafness, and residual speech perception levels before implantation (UKCISG, 2004). Cognitive factors, such as verbal learning and phonological working memory, have also been implicated in implantation outcomes, both in adults (e.g., Heydebrand et al., 2007) and children (e.g., Fagan et al., 2007). However, no currently known set of factors can account for all of the inter-individual variability that is observed clinically.
The adaptation to the novel stimulation from a CI is mediated by plasticity in the ascending auditory pathway (Fallon et al., 2008) and the cortex. Functional imaging of CI users using positron emission tomography (PET) has identified neural correlates of post-implant sound and speech perception in primary and secondary auditory cortex, prefrontal and parietal cortex, and visual cortex (Wong et al., 1999; Giraud et al., 2000; Giraud et al., 2001; Giraud and Truy, 2002; Green et al., 2005). Variation in speech perception is associated with activity in temporal and pre-frontal cortex (Mortensen et al., 2006; Lee et al., 2007). Technical limitations of fMRI with implant devices, and radiation exposure limits with PET, have prevented the imaging of neural changes associated with initial perceptual and linguistic processing of the novel-sounding CI input. The current study used fMRI and a simulation of the spectral resolution and spectral shifting of a cochlear implant. Its aim was to identify cortical changes in naïve, hearing listeners as they learnt to understand this novel input over the course of a training session, and to relate these changes to their capacity in phonological working memory.
Noise-vocoding was used to simulate the spectral resolution and spectral shifting of a cochlear implant (Fig. 1). The number of spectral channels and degree of shift of the simulated CI stimulation was set to a level that was sufficiently difficult to understand initially, yet allowed learning to occur on a time scale that can be tracked in a functional imaging study (i.e., minutes, rather than seconds or hours). A condition in which the spectral information in the speech signal was inverted such that low frequencies in the speech input became high and vice versa, served as a control for learnability as well as for the overall acoustic properties of the stimuli. In order to examine the relationship between intelligibility-related neural activity and memory systems, a battery of phonological working memory and vocabulary tests was administered prior to scanning (see Supplementary materials).
Twenty right-handed native speakers of English (10 male, mean age 25 years, range 19-31 years) participated in the experiment, and a further five (3 male, mean age 24, range 22-25) took part in a pretest. None reported having a history of hearing disorder or neurological illness, taking medication, or having prior experience with CI simulations. All volunteers gave informed written consent and were paid for their participation. The study was approved by the University College London Department of Psychology Ethics Committee.
Stimulus materials were created from recordings of sentence lists (Bench et al., 1979) which comprise 336 syntactically and semantically simple sentences. Recordings were made in a sound-damped room by a male native English speaker, recorded to MiniDV tape (Brüel & Kjaer 4165 microphone, digitized at a 48 kHz sampling rate with 16-bit quantization) and edited using Final Cut Pro (Apple Inc., Cupertino, CA) and Matlab (Mathworks Inc., Natick, MA) software.
The sentences for the learnable condition were individually manipulated using noise-vocoding (Shannon et al., 1995) and spectral shifting (Dorman et al., 1997; Shannon et al., 1998; Rosen et al., 1999). These techniques simulate two critical aspects of how stimulation produced by a cochlear implant may differ from that of normal hearing; respectively, a coarse spectral resolution resulting from the limited number of effective frequency channels, and a misalignment of the frequencies delivered to the implant's electrode array with the tonotopy of the basilar membrane. Noise-vocoding involves dividing the frequency spectrum into analysis bands, extracting the amplitude envelope from each band, and multiplying the envelope with a noise-excited carrier band whose center frequency and cut-offs are matched to its respective analysis band. The amplitude-modulated carrier bands are then added together. Spectral shifting additionally alters the cut-off frequencies of the carrier bands by a factor that reflects a given misalignment of cochlear place according to Greenwood's frequency–position function (Greenwood, 1990). Our manipulations simulated eight effective frequency channels, and a basalward shift of 4.8 mm from the apex of the basilar membrane. The filter cut-off frequencies that were used for the analysis- and noise-bands are shown in Table 1; otherwise the signal processing procedure followed Rosen et al (1999). Stimuli for the ‘inverted’ control condition were processed identically, except that the mapping from analysis bands to carrier bands was inverted in the frequency domain; that is, the amplitude envelope from the lowest analysis band was mapped to the highest carrier band, the second lowest analysis band mapped to the second highest carrier band, etc. This produced stimuli with acoustic characteristics that were highly matched to the ‘learnable’ condition, but were unintelligible and could not be understood even with training. Examples of the learnable and inverted stimuli are available as supplementary materials.
A behavioral pilot study was conducted to ensure that the spectrally inverted stimuli were indeed unintelligible and could not be understood after training. The design and procedure of the pretest was identical to that of the fMRI experiment, except that instead of collecting verbal responses during test phases, listeners were asked to type on a computer keyboard what they heard after the first presentation of each stimulus; after this they saw the sentence presented on the screen. The responses to ‘vocoded’ and ‘vocoded-inverted’ stimuli were binned in four blocks of 25 and scored in terms of the percentage of correctly reported key words. In the ‘vocoded’ condition, the average scores from blocks one to four were 39.3, 47.0, 62.8, and 65.2; representing an average improvement of 26%. The corresponding scores in the ‘vocoded-inverted’ condition were 1.0, 0.3, 1.3, and 1.0, indicating that, as expected, listeners are unable to adjust to the spectral inversion manipulation within the context and time frame of this experiment.
The training materials comprised of 100 vocoded-shifted and 100 vocoded-inverted sentences. On each trial, subjects would first hear a sentence while the instruction “Listen” was displayed on the screen. This was then followed by a second presentation at which, simultaneously, a written version of the sentence was shown on-screen. The mean inter-stimulus interval between these ‘listening’ and ‘listen + feedback’ trials was 5 sec (jittered up to ±1 sec). The order of learnable and inverted trials was pseudo-randomized such that not more than three trials of one type could occur in a row. The training session was broken up into four blocks of 50 sentences (25 vocoded and 25 vocoded-inverted in each, see Fig. 2). In between the training blocks, and at the beginning and end of the experiment, there were five test phases which consisted of ten sentences each. During a test phase, subjects were asked to repeat aloud what they could understand after having heard a vocoded sentence once, while the instruction “Repeat” was displayed on-screen. These verbal responses were recorded and scored off-line as the percentage of key words that were repeated correctly in each test phase. For each individual subject, the items that made up the ‘learnable’, ‘inverted’, and ‘test’ conditions were drawn at random from the corpus of 336 sentences, such that a particular sentence would not occur more than once in the experiment. A repeated-measures analysis of variance with the factors ‘test phase’ and ‘percentage of correct keywords’, with a planned t-contrast comparison of block 1 and block 5, was used to test for a change in intelligibility over the course of the experiment.
Stimuli were delivered using Matlab with the Psychophysics Toolbox extension (Brainard, 1997) and responses recorded in Audacity (audacity.sourceforge.net). Subjects wore a headset with electrodynamic headphones and an optical microphone (MR Confon GmbH, Magdeburg, Germany). Whole-brain functional and structural MRI data were acquired on a Siemens Avanto 1.5 Tesla scanner (Siemens AG, Erlangen, Germany) with a 12-channel birdcage head coil. A gradient-echo echo-planar imaging sequence was used for the functional scans (TR = 3 s, TE = 50 ms, flip angle = 90°, isotropic voxel size = 3 mm3, 35 axial slices, 805 Volumes; total duration = 40.3 min). A T1-weighted anatomical scan was acquired after the functional run (HiRes MP-RAGE, voxel size = 1 mm3, 160 saggital slices).
MRI data were analyzed using SPM5 (Wellcome Department of Imaging Neuroscience, London, UK) with the MarsBaR extension for region-of-interest analyses (Brett et al., 2002). Functional MRI volumes that had been acquired during the test phases were discarded. The remaining 668 fMRI volumes were re-aligned, slice-timing corrected, co-registered with the structural scan, segmented, normalized to a standard stereotactic space (Montreal Neurological Institute, MNI) on the basis of the segmentation parameters, and smoothed with an isotropic Gaussian kernel of 6 mm full-width at half-maximum. Statistical whole-brain analyses were conducted in the context of the general linear model, and included four effects of interest at the single-subject level (learnable vs inverted, each under listening and listening + feedback conditions). Event-related hemodynamic responses for each event type were modeled as a canonical hemodynamic (gamma type) response function.
We first compared the hemodynamic response elicited by listening to learnable cochlear-implant simulations to the response elicited by listening to spectrally-inverted control stimuli. A random-effects group analysis was used to compare the learnable and inverted conditions on ‘listening’ trials. This included a covariate for the mean repetition test scores from each subject, averaged across the session. The statistical threshold was set at p < 0.001 at the voxel level (uncorrected for multiple comparisons) with a cluster extent threshold k = 30 voxels. This analysis revealed greater activation for learnable stimuli in the left superior temporal sulcus (STS) and the left inferior frontal gyrus (IFG; Table 2). The average percentage of signal change across the session was obtained for each condition and for each subject in the two clusters of significantly activated voxels. These regionally-averaged percentages of signal change confirmed that the overall effect of learnability was present at both sites, both in the trials where participants initially listened to a stimulus and in the trials where they received simultaneous written feedback during stimulus presentation (Fig. 3).
Behavioral data collected over five test blocks indicated that participants improved significantly in the identification of words within sentences in the learnable condition, albeit with considerable variability of scores over time (Fig. 4). While nearly all participants repeated fewer than ten percent of keywords correctly in the first test phase, some improved considerably (by more than 60%) over the course of the experiment, while others barely improved at all (by less than 5%). The neural correlates of this variability were investigated as follows: Voxels showing inter-individual variation in the effect of learnability were defined by masking inclusively (at a height threshold of p < 0.001) the effect of overall learnability and the effect of a covariate coding the improved intelligibility over the course of the experiment for each subject (calculated as the difference between the first and last test phase). This revealed an area on the inferior frontal gyrus in which the neural learnability effect, that is, the difference between the responses to the learnable and inverted conditions, correlated significantly with individual participants' behavioral learning scores. From these voxels, regionally-averaged percentages of signal change were obtained as an index of effect size in each condition for each subject (Fig. 5).
The analysis was repeated with a predictor that, instead of intelligibility, coded individual composite working memory scores, measured using a battery of phonological working memory tests (see Supplementary Materials). This, again, revealed an area of the inferior frontal gyrus in which the overall neural effect of learnability correlated positively with individual working memory scores (Fig. 5). This area partially overlapped with the one identified on the basis of intelligibility. In contrast, neither the intelligibility scores nor the phonological working memory scores correlated significantly with activation in the superior temporal lobe.
In order to identify brain areas which change over time as a function of learning, a covariate of interest was included at single-subject level in addition to the four experimental conditions, which represented the individual learning curve. The learning curve was derived by cubic interpolation of the five behavioral test scores over the course of the experiment. A random-effects group analysis (p < 0.001 [unc.]; k = 30) of this covariate revealed significant activations in two regions in the inferior parietal lobe, specifically the left supramarginal gyrus (SMG), and the left angular gyrus (AG), in which the fMRI signal intensity correlated with the participants' intelligibility scores over the course of the experiment (Fig. 6). For these two regions we again obtained the percentages of signal change in each condition and for each subject (Fig. 6). A repeated-measures ANOVA with the factors area, trial type, and learnability showed that there was no significant difference in these regions between the learnable and inverted stimuli, during either the listening or listening + feedback trials, although there was a trend for the learnable stimuli to activate the angular gyrus more, over time, during the listening + feedback trials (p = 0.07).
Patterns of functional connectivity, as indexed by correlations of the average signal strength over time in a particular region, were investigated between the two parietal regions, and the regions in left IFG and STS that had been identified on the basis of the effect of overall learnability. This was done by calculating the correlations in regional signal change in each of the four regions, separately for the learnable and inverted conditions (two-tailed Pearson's product-moment correlations; alpha level set to 0.004 to correct for multiple comparisons). This analysis showed significant correlations of responses between STS and IFG, and between AG and SMG, for both learnable and inverted conditions. There was also a significant correlation between responses in IFG and AG in the learnable condition only. No other connections between the four regions were significantly correlated (Fig. 6B).
A left-lateralized system in IFG and STS was sensitive to the learnability of the cochlear implant simulations, showing greater activity when listening to learnable sentences than when listening to spectrally inverted sentences. The STS has consistently been implicated in the processing and representation of intelligible speech, including in studies that used noise-vocoding as a way of manipulating the intelligibility of speech (Scott et al., 2000; Davis and Johnsrude, 2003; Scott et al., 2006; Obleser et al., 2007a; Obleser et al., 2008), consequent to acoustic-phonetic processing in the superior temporal gyrus (STG) (Jacquemot et al., 2003; Obleser et al., 2007b; Obleser and Eisner, 2009). Many of these studies have revealed a sensitivity in the left and right STS to the number of channels in the vocoded speech signal (Davis and Johnsrude, 2003; Scott et al., 2006; Obleser et al., 2008). Consistent with this work, we propose that in the present study, the left STS is responding to the speech characteristics that are preserved in the cochlear implant simulation.
In contrast to STS, the left IFG has typically been implicated in higher-order language processes, and activity in this area has been described in a number of previous functional imaging studies of passive speech perception (Davis and Johnsrude, 2003; Friederici et al., 2003; Rodd et al., 2005; Obleser et al., 2007a). For example, activation has been observed in the left IFG when participants heard sentences containing ambiguous words (Hoenig and Scheef, 2009), during aspects of syntax processing (Tettamanti et al., 2009), and when the semantic predictability of a sentence supported comprehension (Obleser et al., 2007a). Left IFG has been proposed to act as a unification space, integrating linguistic information across the phonological, semantic, and syntactic levels (Hagoort, 2005). Prefrontal regions receive projections from both the anterior and posterior auditory streams of processing (Romanski et al., 1999; Scott and Johnsrude, 2003; Rauschecker and Scott, 2009). In the present study, the functionality of the left IFG could thus include the specific use of the simultaneous written feedback to enhance comprehension of the speech, or more generally in the use of linguistic knowledge —lexical, syntactic, or contextual— to support comprehension.
The training paradigm of this study aimed to elicit relatively rapid perceptual learning of the type that may be targeted in computer-based rehabilitation programs (Fu and Galvin, 2007; Stacey and Summerfield, 2007). The profile of behaviorally-measured intelligibility effects (Fig. 4) shows that the biggest differences in intelligibility occurred over the first trials, and performance continued to improve more slowly across the rest of the session. This pattern is consistent with other auditory learning paradigms, which have found that the biggest improvements in performance often occur at the start of training (Wright and Fitzgerald, 2001; Hawkey et al., 2004). The improvement in the current study was far from reaching ceiling, and previous behavioral training studies suggest that learning would likely continue with further training sessions (Rosen et al., 1999; Fu and Galvin, 2007; Stacey and Summerfield, 2007). Fitting the individual learning profiles to the neural activity revealed two regions in left SMG and AG in which intelligibility-related change was correlated with change in activation over the course of the training session. This suggests that, alongside the activation in left STS and IFG —which was associated with increases in the intelligibility of spectrally degraded speech— neural activity in the left inferior parietal lobe underlies the behavioral adaptation to the stimuli. In AG, the time-dependent effect was broadly modulated by intelligibility during the perception trials in which written feedback was provided. In contrast, activity in the SMG did not differ between the learnable and inverted trials, nor between trials in which the subjects listened to the stimuli and those in which written feedback was presented simultaneously with the CI simulation. This may imply that the AG is implicated in the specific use of other linguistic information to support learning, while the SMG may be more generally sensitive to time-dependent exposure to the stimuli, rather than to the linguistic content.
The increase in speech intelligibility over time suggests that the effects of learnability and adaptation should be related. There was indeed a significant correlation in activation between the AG and the IFG for only the learnable trials, which may be subserved by a bidirectional anatomical connection between these regions via the superior longitudinal fasciculus (Frey et al., 2008). Increased functional connectivity between the left IFG and angular gyrus has been demonstrated when context helps the comprehension of noise vocoded speech (Obleser et al., 2007a), and when subjects are making overt semantic decisions about noise vocoded speech (Sharp et al., in press). Both regions have been implicated in other learning processes in speech perception, such as learning to perceive a non-native phonemic contrast (Golestani and Zatorre, 2004), which supports the possibility that they are part of a more generalized learning mechanism which not only applies to spectrally-degraded speech, but also to other listening situations where perceptual processing is effortful. We suggest that in the present study, the functional connectivity between the left IFG and AG may have a specific role in the task of mapping between the written sentence information and the heard CI simulation when the simulation is learnable.
An extensive network of brain regions thus underlies adaptation to a CI simulation. Among recipients of CIs there is considerable variation in outcome, and this variation can only in part be explained by known predictive factors such as the pre-implantation level of residual hearing. CI users' neural capacity for adaptation may be facilitated by cognitive functions which are not primarily part of the central auditory system (Moore and Shannon, 2009). In our study, we observed a wide range of scores in participants' comprehension of CI simulations, which were correlated with activity in the left IFG. In contrast to the IFG, the response in STS did not vary with individual differences in intelligibility. This pattern of results is consistent with claims that the basic speech perception system, as represented by the activity in the STS, works to its fullest extent with what it can process from the incoming signal, and is not modulated by feedback projections from high-order language areas (Norris et al., 2000). The lack of association between activity in STS and individual differences in learning, in contrast with the strong association between comprehension and left IFG, suggests that variation in successful processing of CI simulations can depend on high-level, linguistic and cognitive factors that go beyond relatively early, acoustic-phonetic processes.
One candidate higher-order cognitive factor that has been implicated in the successful use of CIs is phonological working memory (pWM). Several behavioral studies have found that phonological working memory scores are positively correlated with successful perception of speech following cochlear implantation in children (Pisoni and Geers, 2000; Cleary et al., 2001; Pisoni and Cleary, 2003; Dillon et al., 2004). More generally, developmental language disorders such as specific language impairment or dyslexia often involve a deficit in working memory (Bishop, 2006; Ramus and Szenkovits, 2008). For the participants in the current study, a composite measure of phonological working memory correlated with activity in the left IFG. Functional imaging studies have outlined a network of brain areas underlying phonological working memory, including the inferior parietal lobe and the inferior frontal gyrus (Buchsbaum and D'Esposito, 2008), and one study specifically linked IFG activation in a pWM task with the the encoding, maintenance, and response elements of pWM (Strand et al., 2008). The current study shows a neural link between variation in pWM capacity and variation in speech intelligibility, which thus represents a potential functional anatomical basis for the variation that is observed in the responsiveness to post-implant rehabilitation.
It is possible that the use of sentences as stimuli has emphasized individual differences in higher-order language processing (Hervais-Adelman et al., 2008), and this may interact with pWM processes. Neuropsychological studies have reported patients who had deficits in pWM and who made errors processing phonemes in sentences, but not in isolated words (Jacquemot et al., 2006). This should be tested with further studies employing adaptation to single-word or sub-lexical stimuli.
Our results suggest that individual variation in the comprehension of a cochlear implant simulation is at least in part determined by differences in the employment of higher-order language processes to help decode the speech sequences, and functional connectivity between the frontal and parietal lobes, rather than differences in the quality of acoustic-phonetic processing or representations in the dorso-lateral temporal lobes. Furthermore, the results show that variation in phonological working memory scores shares an anatomical location in left IFG with individual variability in the learning of the CI simulations. Problems with speech and language processing commonly co-occur with problems in phonological working memory tasks. We suggest that one of the linguistic properties of the left IFG is to act as an interface for the interaction of speech perception and phonological working memory when processing spoken sentences, and that activation differences in this area across individuals are associated with differences in the successful adaptation to a CI simulation.
FE, CM, and SKS were funded by Wellcome Trust grant WT074414MA. The authors thank Joe Devlin, Rob Leech, Fred Dick, and Marty Sereno at the Birkbeck-UCL Centre for Neuroimaging for technical advice, and gratefully acknowledge helpful comments on the manuscript by D. Sauter and two anonymous reviewers.