|Home | About | Journals | Submit | Contact Us | Français|
Cochlear implants are a standard therapy for deafness, yet the ability of implanted patients to understand speech varies widely. To better understand this variability in outcomes, we used functional near-infrared spectroscopy (fNIRS) to image activity within regions of the auditory cortex and compare the results to behavioral measures of speech perception.
We studied 32 deaf adults hearing through cochlear implants and 35 normal-hearing controls. We used fNIRS to measure responses within the lateral temporal lobe and the superior temporal gyrus to speech stimuli of varying intelligibility. The speech stimuli included normal speech, channelized speech (vocoded into 20 frequency bands), and scrambled speech (the 20 frequency bands were shuffled in random order). We also used environmental sounds as a control stimulus. Behavioral measures consisted of the Speech Reception Threshold, CNC words, and AzBio Sentence tests measured in quiet.
Both control and implanted participants with good speech perception exhibited greater cortical activations to natural speech than to unintelligible speech. In contrast, implanted participants with poor speech perception had large, indistinguishable cortical activations to all stimuli. The ratio of cortical activation to normal speech to that of scrambled speech directly correlated with the CNC Words and AzBio Sentences scores. This pattern of cortical activation was not correlated with auditory threshold, age, side of implantation, or time after implantation. Turning off the implant reduced cortical activations in all implanted participants.
Together, these data indicate that the responses we measured within the lateral temporal lobe and the superior temporal gyrus correlate with behavioral measures of speech perception, demonstrating a neural basis for the variability in speech understanding outcomes after cochlear implantation.
Cochlear implants (CIs) are commonly used to treat severe-to-profound sensorineural hearing loss when hearing aids are ineffective in helping a patient understand speech. Modern CIs work by splitting the sound frequency spectrum into 12-22 discrete channels, each electrically stimulating a different region of the cochlea (Clark 2015). Since auditory neurons are distributed along the tonotopic gradient of the cochlea, sounds of different frequencies thus stimulate different auditory nerve fibers. This is in contrast to normal hearing where ~4000 inner hair cells exist to convert sound pressure waves into electrical signals. Nevertheless, most patients demonstrate large improvements in speech understanding after cochlear implantation (Gfeller et al. 2002; Lu et al. 2014). While speech understanding increases with time, it tends to stabilize by 6-12 months after implantation (Holden et al. 2013), with the ultimate level of benefit remaining highly variable and difficult to predict (Gates et al. 1995; Perez and Macias 2004). In most cases, the reason any given patient ends up having a poor speech perception outcome after cochlear implantation is unknown. The presumption is that the implant is not able to effectively convey the temporal and frequency characteristics of speech to the auditory nerve.
Speech perception occurs within and beyond the auditory cortex, and thus neuroimaging could provide a way to supplement behavioral measures in assessing the ability of the speech information provided by the CI to correctly reach and stimulate the language centers of the brain (Pasley et al. 2012; Steinschneider et al. 2014). In addition, neuroimaging may be able to provide a more immediate measure than behavioral testing is able to provide. The lateral temporal lobe and the superior temporal gyrus (LTL/STG) are probably the most clinically relevant regions of the cortex to image in order to study speech perception in individuals hearing through CIs. These core areas process acoustic parameters such as pitch, tone, and spatiotemporal fluctuation (Hall and Plack 2009; Humphries et al. 2010; Lakatos et al. 2013). More recently, these areas have been viewed as a more general purpose acoustic problem solver, with neural tuning to complex sound patterns such as speech (Belin et al. 2002; Belin et al. 2004; Fecteau et al. 2004; Mesgarani et al. 2014). The LTL/STG show selective responses to species-specific vocalizations in humans and other mammals, such as the marmoset (Wang 2000; Belin et al. 2002). Recent studies utilizing fMRI and implanted recording electrodes suggest that phonemes, words, and phrases elicit localized responses within the human LTL/STG (DeWitt and Rauschecker 2012; Grodzinsky and Nelken 2014; Mesgarani et al. 2014). Similarly, fMRI has demonstrated rapid neural adaptations in normal-hearing participants exposed to degraded sound similar to what a cochlear implant user experiences (Smalt et al. 2013).
However, commonly-used neuroimaging techniques do have downsides when used in implanted individuals. While fMRI is the most common technique to measure human brain function, the ferromagnetic components in a CI are not compatible with the strong magnetic fields present in an MRI scanner. PET imaging uses radioactive compounds that are not well-suited for routine clinical use that might require regular repeated testing as the CI program parameters are adjusted. EEG is a feasible and frequently used technique for measuring cortical activations to speech in cochlear implant recipients (Senkowski et al. 2014). Key EEG studies have argued that late auditory evoked potentials provide a useful objective metric of performance in participants hearing through a cochlear implant (Firszt et al. 2002; Zhang et al. 2010; Zhang et al. 2011). However EEG does have a disadvantage in that CIs produces electrical noise that can interfere with recordings when long duration speech stimuli are used, although artifact removal techniques can be used to minimize this issue (Viola et al. 2012; Mc Laughlin et al. 2013; Miller and Zhang 2014).
In contrast, functional near-infrared spectroscopy (fNIRS) is amenable to neuroimaging in this patient population because it is CI-compatible, safe for repeated testing sessions, and small enough to use in a standard clinic. The technique is based upon the differential absorption of near-infrared light by oxyhemoglobin (HbO) and deoxyhemoglobin (HbR) to detect hemodynamic changes in biological tissues. In the brain, this corresponds with neural activity (Scholkmann et al. 2014). fNIRS has a long history of use in basic science research but is not widely used clinically (Villringer et al. 1993; Zaramella et al. 2001; Bortfeld et al. 2007; Mahmoudzadeh et al. 2013; Lloyd-Fox et al. 2014). Previously, our group demonstrated that fNIRS can detect speech-evoked activation of the LTL/STG, a core region of the auditory cortex in normal hearing and implanted participants. In addition, we found that adults with normal hearing exhibit greater cortical activation to speech than non-speech sounds (Sevy et al. 2010; Pollonini et al. 2014). Here, we expand this approach in participants with CIs to explore the use of fNIRS as an objective measure of speech perception.
The study protocol was approved by the Stanford University IRB; all participants signed an informed consent form prior to participation. Criteria for inclusion and exclusion of participants were established prospectively; criteria for inclusion were age over 18, English fluency, and a functioning CI post-activation. Participants were excluded from the study if they were not fluent in English or had a non-functional CI. Normal-hearing controls were healthy adult volunteers who passed a 30 dB hearing level screening test at 500, 1000, and 2000 Hz. In our block design experiment, the participant was exposed to five trials of each sound stimulus. Prior to beginning the experiment, it was hypothesized that cortical hemodynamic response areas among adults with CIs would correlate with speech perception scores.
The study participants consisted of two cohorts (Table 1). One cohort consisted of 32 post-lingually deafened adults with CIs ranging in age from 23-86 years of age, with time from CI activation ranging between 1 day and 12 years. The second cohort consisted of 35 adults with normal hearing, ranging in age from 24-65 years of age. CIs from all three FDA-approved brands (Cochlear, Advanced Bionics, and Med-El) were represented in the participants tested. Most adults had only one implant (30/32; evenly split between right and left ears), whereas 2 adults had bilateral implants. Participants that were unilaterally implanted all had pure tone averages and speech reception thresholds in the contralateral ear that were >60 dB HL prior to their implantation surgery. No participant had been implanted for unilateral sensorineural hearing loss.
Each participant hearing through a cochlear implant underwent behavioral speech perception testing (Table 2). All behavioral measurements used in this study were obtained on or near the day of fNIRS testing (within three months) by a licensed audiologist at the participant's usual clinic appointment. Stimuli were presented in a sound field and participants with hearing aids in the unimplanted ear removed their hearing aid prior to the experiment. Participants with CIs were first exposed to speech stimuli with the device on and then the experiment was repeated with the device off (i.e. removed from the side of the head).
The speech recognition threshold (SRT) was measured to assess hearing level (American Speech-Language-Hearing Association 1988). Post-implantation SRTs ranged from 10-35 dB HL, with all but one of the participants being in the range of <30 dB HL, correlating to mild or no hearing loss (American Speech-Language-Hearing Association 1988). Speech perception was measured using monosyllabic consonant-nucleus-consonant word (CNC Words) scores and AzBio sentence recognition scores (Peterson and Lehiste 1962; Spahr et al. 2012a). These stimuli came from audio recordings and were presented in silence at 60 dB SPL. Many participants had both CNC words and AzBio tests performed whereas some only had one or the other performed.
Four types of speech stimuli were presented to all participants. Stimuli were generated in MATLAB (R2013A, the Mathworks) and the volume of each stimulus was set to a comfortable listening level of 60 dB SPL (Fig. 1). Normal speech consisted of digital recordings of a male voice narrating an audiobook version of The War of the Worlds (by H.G. Wells) in English, which were digitally edited into 20 s sequential segments (Audio File S1). The second stimulus was channelized speech, in which the frequency spectrum was broken into 20 logarithmically spaced channels between 250-8,000 Hz (the typical range processed by a CI), creating speech that was intelligible but lacked the fine detail of natural speech (Audio File S2), i.e. vocoded speech (Shannon et al. 1995). The envelope was extracted for each frequency band using the short time FFT ‘spectrogram’ function in MATLAB, with a Hamming window of 31 ms and a 45% overlap between windows. White noise that was band-pass filtered to the frequency range of each channel was then modulated by the envelope signal. The third stimulus was scrambled speech, which was the same as channelized speech except that the spectral envelope for each channel was randomly assigned to a different channel. While the frequency range, total energy content, and timing of the speech remained unchanged from channelized speech, the stimulus was not intelligible to normal-hearing participants (Audio File S3). The fourth stimulus was environmental sounds and was designed to serve as a non-speech control. It consisted of 10 two-second clips of animal vocalizations and construction sounds that were spliced together to make a 20 second stimulus (Audio File S4). Each of the five trials consisted of random permutations of these environmental sounds. The environmental sounds naturally varied in frequency spectrum from speech, but were energy-matched to the speech stimuli.
fNIRS testing was performed in a quiet, darkened room equipped with a computer possessing all necessary data-processing software. For the duration of the experiment, participants were seated in a chair in front of the computer screen displaying a silent visual stimulus consisting of moving geometric shapes intended to maintain their attention and minimize head movement. Participants were exposed to speech stimuli from speakers placed directly in front of the listener. Five trials of each sound stimulus were presented in a pseudorandom order, alternating with 20-second blocks of silence. Each session took 20-30 minutes to complete. Just like with the behavioral testing procedures, participants with hearing aids in the unimplanted ear removed their hearing aid prior to the experiment. Participants with CIs were first exposed to speech stimuli with the device on and then the experiment was repeated with the device off (removed from the side of the head). During data collection, the start and end of auditory stimuli were synchronized with the incoming fNIRS data and recorded in an event file, which recorded the timing of the beginning and end of each stimulus.
We used a NIRScout 1624 (NIRx Medical Technologies, LLC, Glen Head, NY) instrument containing 16 dual-wavelength infrared light sources and 24 detectors. Each illumination optode consisted of two LED emitting at 760 and 850 nm near-infrared radiation. Backscattered light returning to the scalp's surface was collected and carried back to the instrument by fiber-optic cables and transduced into electrical signals by the photodetectors. Signals at the two wavelengths were separated by sequentially activating the sources. The software that came with the device was used to collect the data from all channels at a rate of 6.25 Hz.
A custom headpiece was created to hold the light sources and detectors (optodes) in place against the participant's scalp. Sources and detectors were alternated to create a checkerboard pattern with 15 mm center-to-center distance between adjacent optodes (Fig. 2). The optodes for each hemisphere were secured in a scaffold of flexible black polypropylene film (0.30 inches thick; part #1451T21 [3-288-1143-20T21], McMaster-Carr, Santa Fe Springs, CA) by rubber O-rings supplied by the instrument's manufacturer. This arrangement yielded two symmetrical optode holders, one for each hemisphere. The optode holders were connected to one another with Velcro straps, making the headpiece adjustable to the participant's head size and shape.
The optode holders were placed against the scalp centered at the T7/T8 position (based on the modified International 10-20 system suggested by the American Electroencephalographic Society), with the second column of optodes being situated directly superior to the tragus and the bottom of the headset sitting in the sulcus between the pinna and the skull (Klem and Lüders 1999). The light from each source was received by neighboring detectors in the array, giving a potential of 96 source-detector pairings (or channels) on each side of the head. In order to optimally probe the cortex, we only analyzed the 31 channels in which the source-detector distance was 3.0-3.3 cm (Sevy et al. 2010; Beauchamp et al. 2011; Pollonini et al. 2014).
With this type of headset localization, our fNIRS setup probes regions within the auditory cortex, specifically the lateral temporal lobe and the superior temporal gyrus (LTL/STG). Previously, we used a mathematical model (Dehghani et al. 2003; Huppert et al. 2006) to estimate the coverage area for the simple optode probe set configuration we were using at that time (Sevy et al. 2010). This supported our prediction that the headset provided excellent coverage of the lateral temporal lobe that extended into to the superior temporal gyrus. This prediction was confirmed after fNIRS and fMRI demonstrated similar response patterns within the LTL/STG to auditory stimuli. We then validated the expanded fNIRS headset we use here by comparing measurements made using fNIRS and fMRI, and finding a similar pattern of responses within the LTL/STG to the various stimuli (Pollonini et al. 2014).
The analysis of the fNIRS data was performed in a similar fashion to our previous publication (Pollonini et al. 2014). In essence, this involved removing channels from the analysis that did make good contact with the scalp, removing motion artifacts, bandpass filtering to the time course of the stimuli, and the calculation of changes in oxygenated and deoxygenated hemoglobin using the modified Beer-Lambert Law. We then performed statistical analysis to compare the change in hemoglobin over time to the predicted response. Statistically significant responses were then interpolated over a 2D grid to quantify the area of activation to each stimuli. These steps are detailed below.
All fNIRS data were processed identically using custom software created in MATLAB (R2013A, The Mathworks). First, optodes with satisfactory scalp contact were identified, so channels with favorable signal to noise ratios could be analyzed further, excluding those channels resulting from optodes with poor scalp contact (Fig. 3A). Adequate scalp contact is denoted by a synchronous cardiac pulse signal detected by both wavelengths of light directed from a single probe. Raw data were filtered between 0.5 and 2.5 Hz to isolate the portion of the signal attributable to the cardiac pulse (for heart rates between 30-150 beats per minute) and normalized to account for differences in amplitude between the signals from the two wavelengths. The two wavelengths’ signals were then used to generate a correlation coefficient between 0 and 1.0, the scalp contact index (SCI). Channels with a SCI below 0.70 were discarded.
Next, signal artifacts due to motion were removed (Fig. 3B). Each channel was bandpass filtered between 0.1-3.0 Hz to remove slow signal drift, and the channel's voltage was normalized to the intensity of the highest peak. Peaks in the signal exceeding 20% of the maximum peak intensity were identified as motion artifact. We went back to the raw data tracing and removed these artifacts by linearly interpolating between the start and stop time points. To prevent flattening a clean signal (without motion artifact), only those channels where <2% of the data samples were occupied by peaks were processed in this manner.
To remove cardiac and respiratory contributions to the hemodynamic signal, the raw data from each channel were bandpass filtered between 0.016 and 0.25 Hz to (Fig. 3C), and the modified Beer-Lambert law was used to calculate the relative concentrations of oxygenated hemoglobin (HbO) and deoxygenated hemoglobin (HbR) (Fig. 3D) (Sassaroli and Fantini 2004). For each channel, all five trials of each stimulus type were block-averaged, and linear regression was carried out between the block-averaged hemodynamic response and a predicted hemodynamic response. The predicted hemodynamic response was created in three steps. First, 50 responses from 26 normal-hearing individuals listening to normal speech were selected by eye as being obvious stimulus-evoked responses. Second, these responses were then normalized and averaged. Finally, the average response was smoothed by fitting it with a 5th-order polynomial that provided the best least-squares fit. We chose this approach to defining a predicted hemodynamic response rather than using a previously-published fMRI BOLD response model because the popular fMRI models we reviewed did not accurately predict obvious fNIRS responses noted by eye. We presume this is related to the fact that fNIRS analysis does not have the spatial sensitivity of fMRI and so the responses we measure are spatio-temporally filtered along the optical path through the biological tissues. Finally, the regressions between the predicted response and the measured response resulted in the calculation of a T-statistic for each channel, quantifying the quality of fit between the predicted and actual hemodynamic response curves (Fig. 3E) (Friston et al. 1995; Monti 2011). The magnitudes of the hemodynamic responses (e.g. the beta or parametric values) were not analyzed as many unmeasurable factors vary between participants that affect these parameters (such as skull thickness, scalp contact, etc.). Therefore, we used the T-statistic because even though the relative changes have meaning, the absolute magnitudes do not (Sato et al. 2005).
At this point in analysis process, each emitter-detector combination (channel) could be represented by a single number, which was the T-statistic representing the goodness of fit between the measured response and the predicted response. We then selected only those channels with a 3.0-3.3 cm interoptode distance to calculate the overall cortical activation in that participant for that stimulus. Since many of these channels crisscross each other, they provide spatial oversampling. We averaged the T-statistics from overlapping channels to reduce noise, and then performed spline interpolation with a step size of 0.5 mm to translate the relatively sparse T-statistic distribution to a dense mesh of values suitable for visualization (Fig. 3F). A threshold for statistical significance was set and the interpolated area with T-statistics exceeding the threshold was recorded as the activation area for each stimulus condition. We selected the threshold T-statistic to be 8 because we found that it effectively eliminated most activations in subjects who had their cochlear implant turned off, which presumably were artifacts. In addition, it enhanced the differences in brain activation to the different speech stimuli in normal-hearing subjects.
As opposed to simply plotting a 2D square of the fNIRS data (i.e. a topographic map), we chose to plot the representative responses we present in this manuscript on a standard brain image to make them easier to visualize. A cortical surface was first generated in MATLAB from a GifTI formatted surface obtained from a representative adult's anatomical MRI data. The cortical surface was then smoothed using a weighted spherical harmonic representation (Chung et al. 2008). The two-dimensional interpolated T-statistic map was projected to the cortical surface and the cortex surface was colored based on the projected T-statistic values. This strategy is reasonable given our previously published work demonstrating that the responses measured with fNIRS derive from this region of the brain (Sevy et al. 2010; Pollonini et al. 2014). However, it is important to note that this plotting scheme has no impact on the individual or group statistical analyses performed in this manuscript.
Group average activation maps (e.g. Fig 5) were generated by comparing the list of T-statistics for all participants on a channel-by-channel basis versus no activation (a flat line with no activation during the stimulus presentation, i.e. the null hypothesis) using paired t-tests. Only channels passing the SCI threshold were included in this analysis, and each stimulus was analyzed independently. Each analysis thus resulted in the generation of a two-dimensional map of p-values that indicated the statistically significant channels over the cohort of participants for that stimulus. Each map was then interpolated and projected to the cortical surface after a threshold for significance (p < 0.05) was set. In a similar fashion, cohort difference maps (e.g. Fig 6) were generated by comparing t-statistics on a channel-by-channel basis for each stimulus. Since in these analyses two different cohorts of participants were being compared, the unpaired t-test was used rather than the paired t-test. Otherwise, the analyses and plotting was performed in an identical fashion.
All representative examples were followed up with statistical analyses of averaged data. Unless otherwise stated, categorical variables are presented as frequencies and percentages, and continuous variables as means and standard errors after ensuring that such variables had normal distributions. For comparisons with more than two groups, ANOVA analysis was first performed followed by the Student's t-test with Bonferroni correction. The Mann-Whitney U-test was used to compare data with skewed distributions when indicated. Paired t-tests were used to assess differences in cortical activation areas within groups, including the difference in activation area with CI on and off, and differences in cortical activation area for normal, channelized, and environmental sounds compared to scrambled speech for left and right HbO and HbR. Between groups, non-paired t-tests were used. All reported p-values are two-tailed, with a p<0.05 indicating statistical significance when Bonferroni correction was not used (i.e. only one comparison was performed).
Multiple linear regression was used to characterize the association between word and speech perception scores and the normal:scrambled activation area ratio for HbO and HbR in both hemispheres. Age, side of CI implantation, duration of deafness, and time from implantation were included as covariates based on their relation to post-implantation performance in the literature (van Dijk et al. 1999; Gifford et al. 2008; Roditi et al. 2009; Budenz et al. 2011; Roberts et al. 2013). These variables were only included after univariate analysis of each variable yielded a p-value of 0.25 or less (Bendel and Afifi 1977). Missing variables were eliminated from the analysis in a pairwise fashion, and independent variables were assessed for co-linearity before inclusion in the model, with a variance inflation factor (VIF) of 10 being the threshold for unacceptable co-linearity. Beta coefficients are reported for a 1-SD increase for independent variables. All statistical analyses were performed in SPSS (Version 21, IBM, Armonk, NY).
Our fNIRS imaging setup consisted of a headset holding an array of infrared light sources and detectors against the scalp overlying the LTL/STG (Fig. 4A, left) (Klem and Lüders 1999; Sevy et al. 2010). During a listening task, source-detector pairs (channels) measured increases in oxyhemoglobin concentration ([HbO]) and decreases in deoxyhemoglobin ([HbR]) that synchronize with the time course of the auditory stimulus (Fig. 4A, right). We compared the HbO and HbR curves for each channel to the predicted response using linear regression; only channels with a resulting T-statistic > 8 were used to define the cortical activation area. This t-statistic threshold was chosen because after studying how well the raw data tracings fit the predicted curves well by eye. It excluded obviously noisy data without excluding data that fit the predicted response reasonably well. These channels were spatially interpolated according to the layout of the headset to construct a color-coded activation map, which was then projected onto a standardized 3D cortical model to visualize the responses.
We performed behavioral measures of speech perception and fNIRS imaging in 35 normal-hearing adult controls and 32 adults with CIs. These tests included the monosyllabic consonant-nucleus-consonant word (CNC Words) (Peterson and Lehiste 1962) and the AzBio sentence recognition (Spahr et al. 2012b) protocols. All normal-hearing control participants score 95-100% on these tests. However, the deaf participants hearing through a CI demonstrated wide variability in the behavioral tests, (Fig. 4B; Table 3). CNC Word scores ranged from 20-94% (mean 58%, SD 21%), which is consistent with larger populations with mean CNC Words scores in the range of 54-61% (Bassim et al. 2005; Gifford et al. 2008). AzBio sentence recognition scores ranged from 28-97% (mean 60%, SD 25%), which is also consistent with larger study populations with mean AzBio scores ranging from 58-72% (Gifford et al. 2008; Spahr et al. 2012a; Massa and Ruckenstein 2014). We considered those participants with CNC words scores >=70% and <=40% (the top 10 and bottom 10 participants from our cohort) to have good and poor speech perception, respectively.
Given the known responsiveness of the superior temporal gyrus to syllables and natural speech in normal-hearing participants (Belin et al. 2000; Belin et al. 2002; Fecteau et al. 2004; Grodzinsky and Nelken 2014; Mesgarani et al. 2014), we expected that intelligible speech would evoke larger cortical responses than unintelligible speech in implanted adults. To test this, we used four previously-validated acoustic stimuli that demonstrate this activation pattern within the LTL/STG of normal-hearing controls when measured by fMRI and fNIRS (Pollonini et al. 2014) (in order of intelligibility): normal speech, channelized speech, scrambled speech, and environmental sounds. Normal-hearing controls had strong oxyhemoglobin and deoxyhemoglobin cortical responses to normal speech (Fig. 4C; Movie S1). The response areas were about the same with channelized speech, but smaller responses were found with scrambled speech and environmental sounds. Cortical activation maps averaged over the normal-hearing controls confirmed this qualitative finding (Fig. 5).
As expected, implanted adults with good speech perception showed a cortical activation pattern similar to that of controls; as the speech stimuli became less intelligible, there was less cortical activation (Fig. 4D). However, participants with poor speech perception had similarly large areas of cortical activation for all four stimuli (Fig. 4E). Average difference maps were then generated between normal-hearing participants, adults with CIs with good speech perception, and adults with CIs with poor speech perception (Fig. 6). There were strong similarities between deaf adults hearing through a CI with good speech perception and normal-hearing controls. However, in deaf adults hearing through a CI, there was a fundamental difference between good and poor speech perceivers in that scrambled speech and environmental sounds created overly-large responses within the LTL/STG of poor speech perceivers. This finding links the quality of speech perception to the amount of LTL/STG activation.
To further explore and quantify this association, we divided the activation area for each auditory stimulus by the activation area to scrambled speech, henceforth referred to as the “activation area ratio.” This normalization reduced variability across participants caused by differences in scalp-brain distance and differences in the number of channels with adequate scalp contact (Sato et al. 2005; Cui et al. 2011). The activation area ratios were determined for each participant, for each hemisphere, and for both HbO and HbR. Some participants demonstrated no activation to scrambled speech. In these situations where calculating the activation area ratio would lead to a division by zero error, we used 1 cm2 (the minimum detectable area of our experimental setup) as the area of scrambled speech. If a participant had no activation for both normal and scrambled speech, that participant was deemed a non-responder for that hemoglobin species in that hemisphere (for instance, left HbO) and was not included in that particular statistical analysis.
In normal-hearing controls, the activation area ratio increased with increasing intelligibility of the speech stimulus (Fig. 7A). This was found for both HbO and HbR in both hemispheres. Implanted adults with good speech perception demonstrated a similar pattern, with increasing activation area ratios corresponding to increasing intelligibility of the speech stimulus. In contrast, implanted adults with poor speech perception had similar activation area ratios across all four stimuli. The activation area ratios for normal to scrambled speech (N:S) versus the behavioral measures of speech perception were plotted for each implanted participant that had these tests performed (Fig. 7B,C). Participants with higher CNC words and AzBio sentence test scores tended to have larger N:S activation areas.
There are many factors that are known to affect performance when hearing through a cochlear implant, for example age correlates both with acoustical hearing abilities as well as with central processing abilities (Gates and Mills 2005; Humes et al. 2012; Jin et al. 2014; Park et al. 2015). Although there was significant overlap between the ages of the participants in the implanted and normal-hearing cohorts (Table 1), their mean ages were different (Mann-Whitney U-test, p<0.01). To account for age and other factors as a potential co-variates underlying the differences in cortical activation patterns, we performed multiple linear regression analyses using the N:S activation area ratio as the dependent variable. The independent factors included age, side of implantation, duration of deafness, time of implant and either CNC words scores or AzBio sentence scores. Both the CNC words and AzBio scores independently correlated with the N:S activation area ratio for HbO and HbR in each hemisphere. The fact that significance was found across multiple independent comparisons strongly supports the concept that cortical activation correlates with speech perception. While the duration of deafness was also found to be inversely related to the N:S activation area ratio for HbO and HbR in both hemispheres when AzBio was analyzed, but not for CNC scores (Tables 4, ,55). This finding is difficult to interpret, but may represent an effect of the cortical reorganization that can occur in association with hearing loss (Campbell and Sharma 2014). Most importantly, however, none of the other factors (age in particular) were related to the differences in cortical activation patterns.
To exclude the possibility that differences in the activation area ratio were linked to differences in auditory thresholds, the activation area ratio was plotted against the Speech Recognition Threshold (SRT) (Fig. 8). Significant correlations were not observed, indicating that the N:S activation area ratio was not simply providing an assessment of a participant's perception of sound intensity. Lastly, we also performed three more multiple linear regression analyses using the SRT, CNC words, and AzBio scores as dependent variables, and age, side of implantation, duration of deafness, and time of implant as dependent variables. Both age and time of implant correlated with SRT (standardized beta = 0.802 and −0.228 respectively, p < 0.001 for both). This makes sense because our cochlear implant programming strategy is to gradually increase the power (and thus lower the auditory threshold) over time as patients acclimate to using the device. However, there wasn't a significant relationship between CNC words or AzBio scores and any of the independent variables. Thus, these data rule out the most likely alternative causes of the cortical responses we measured. Therefore, taken together, these data indicate that the LTL/STG activation pattern provides an objective measure of speech perception by a deaf participant listening through a CI.
To determine the impact of reduced auditory stimulation, we repeated the measures after turning off the CI. Most of the implanted participants had a unilateral CI and the other ear was left unaided for fNIRS testing. This meant that some degree of residual acoustical hearing persisted in the unimplanted ears and perhaps even at minimal level in the implanted ears. Behavioral measures of speech perception revealed speech understanding in many participants even with the CI turned off, although the results significantly improved when the CI was turned on (Fig. 9A,B; Table 3). Deaf adults demonstrated strong cortical responses when their CI was on, but only minimal responses with the CI turned off (Fig. 9C). On average, the cortical activation area in response to speech was greatest for controls, followed by implanted adults with their CI on; implanted adults with their CI off had the smallest areas of cortical activation (Fig. 9D). These responses were bilateral for all conditions. As a reference, the cortical surface area of the superior temporal gyrus has been measured to be just slightly larger than what we found in normal hearing adults (Left: 19.83 ± 3.52 cm2; Right: 17.28 ± 2.62 cm2; Meyer et al 2013). In general, patients with residual hearing maintained small cortical responses to speech when the implant was turned off, whereas those without residual hearing did not. The N:S activation area ratio dropped when the CI was turned off, consistent with the drop in speech perception (Fig. 9E). Together, these data confirm that the CI produced the responses measured by fNIRS, further supporting the notion that the cortical activation pattern correlates with speech perception.
The artificial electrical stimulation produced by a modern CI provides less information about the auditory environment than that provided by a normal-functioning cochlea. The fact that many deaf patients hearing through a CI can decipher speech is a testament to the ability of the human brain to adapt to and process poor quality sensory information (Clark 2015). However, this process does not work for every implanted patient, and we currently have no clinical test to explain the pathophysiology underlying the difficulty that some patients have with speech understanding. Here we show that activation patterns within the LTL/STG in the implanted population correlate with behavioral measures of speech perception. Furthermore, we have found that the neural basis of poor speech perception in implanted participants is not that sound information does not reach the LTL/STG, but that these participants are unable to discriminate speech from the signal that does reach it.
While many neuroimaging methods exist, they have been limited to a research setting in the implanted population due to a lack of CI compatibility and suitability for routine use in the outpatient setting. For our purposes, some of these issues can be overcome using fNIRS. For example, fNIRS has several advantages over fMRI in that it is not affected by the magnet within the CI and it is easy to use in a routine clinical environment because it is small and portable. However, it has a lower spatial resolution than fMRI (~5-10 mm vs. ~1 mm) (Cui et al. 2011). A benefit over PET scan studies is that it does not utilize ionizing radiation, and thus can be used safely for repeated testing over multiple clinic visits. A benefit over EEG is that long speech stimuli can be used, as opposed to the brief stimulus pulses designed to isolate the cortical response from the electrical artifact of the CI in time. Thus, the stimuli reflect normal listening situations. Lastly, fNIRS does not require a lengthy setup or experimental protocol; the time between when a participant entered and left the testing room was 20-30 minutes for these experiments.
Despite these positive attributes, fNIRS is not without its drawbacks. fNIRS is limited to imaging superficial regions of cortex, as the light cannot penetrate the tissue to measure deeper structures. In addition, scalp thickness has a significant impact of the ability of fNIRS to image cortical activity (Cui et al. 2011; Beauchamp et al. 2011). There is no doubt that fNIRS has poor spatial localization when compared against fMRI. In addition, the exact position of the optodes certainly might vary slightly (<1 cm between participants). Because of these limitations, we have been extremely conservative in analyzing the responses we measured. First, we have been careful not make any claims regarding the spatial pattern of cortical activations. Second, we calculated the total area of activation to several different stimuli, and then normalized the responses of each participant individually. In this way, each participant could serve as his/her own control. In future studies, the spatial patterns of activation could be explored by localizing optode positions using the pre-implant MRI most patients receive as part of their medical evaluation of hearing loss and modeling the light travel paths (Cristobal and Oghalai 2008; Lin et al. 2010; Jerry and Oghalai 2011; Lin et al. 2011; Oghalai et al. 2012). Despite these limitations, fNIRS is a powerful technique that is well-suited for use in an outpatient clinical setting and could be easily implemented in clinical studies of CI function.
Our finding of larger cortical responses to speech than environmental sounds in controls and implanted adults with good speech perception aligns well with previously-published fNIRS and fMRI data (Belin et al. 2002; Fecteau et al. 2004; Pollonini et al. 2014). Vocoded speech is almost entirely intelligible to adults with normal hearing, which explains the similar areas of cortical response to normal and channelized stimuli we found in the control cohort (Shannon et al. 1995). Our finding that implanted adults with high speech perception scores exhibit a pattern of cortical activation similar to controls is consistent with PET data that the degree of cortical response to speech correlates with the quality of speech perception among adults with CIs, and that there is a stronger cortical response to speech than to white noise and environmental sounds in adults with CIs and in normal-hearing adults (Naito et al. 2000; Green et al. 2005).
Implanted adults with low speech perception scores demonstrated large cortical activation areas to all stimuli, without preferential responses to speech stimuli. Our studies are unable to differentiate where in the auditory pathway this abnormal pattern of responses is initiated. It could begin as early as the auditory nerve, due to less-than-optimal stimulation of the auditory nerve fibers by the electrical contacts along the CI electrode. Alternatively, it could occur at a later stage in the auditory processing hierarchy, with neurons failing to process the key characteristics of speech (frequency, intensity, and timing) necessary to render speech intelligible. Consistent with our results, abnormal electrically-evoked cortical potentials have been found in implanted children with poor speech perception despite the ability to hear sound (Gordon et al. 2005). Our findings suggest that humans may attempt to compensate for poor quality sensory information by increasing activity within the LTL/STG.
This may be due to an increased listening effort. For example, EEG studies have demonstrated that differences in task difficulty, for example with sound localization, can alter the amount of effort that CI users invest in performing the tasks (Senkowski et al. 2014). Thus, the attention of the subject to the stimulus alters his/her cortical response. In addition, hearing-impaired individuals have increased listening effort to noise with no meaningful content compared to normal-hearing controls (Larsby et al. 2005; Zekveld et al. 2010). Correspondingly, the listening effort for CI recipients and normal-hearing controls is similar if the two groups have similar word recognition scores (Hughes and Galvin 2013). Our findings of larger areas of brain activation to scrambled speech in CI users with poor word recognition scores strongly support these studies. It is even possible that the increased brain activity we measured in poor CI performers is the neural correlate of the increased listening effort.
One caveat that should be considered, however, is that the attention of the participant was not quantified or controlled for in our experiments. It is possible that participant attention may modulate the response patterns within the LTL/STG, i.e. perhaps those participants who pay more attention to sound have larger differential responses and also were able to learn to use their cochlear implant more effectively. Thus, this study only demonstrates a correlation between cortical responses and behavioral responses, not a causal relationship. Further studies are needed to determine whether imaging with fNIRS can be used as a prognostic indicator after implantation and possibly even to aid CI programming efforts. In addition, the role of contralateral and ipsilateral cortical activation needs to be studied, particularly when consideration of the risks and benefits of unilateral versus bilateral implantation in young children, given our increasing understanding of the consequences of asymmetric hearing during development (Gordon et al. 2015).
Support was provided by NIH R56 DC010164 and R01 DC010075, The Dana Foundation, the Howard Hughes Medical Institute, and the Stanford Society for Physician Scholars.
Financial Disclosures: This research was funded by NIDCD, The Dana Foundation, the Howard Hughes Medical Institute, and the Stanford Society for Physician Scholars (to CO, LP, HA, HB, MSB, and JSO).
Conflicts of Interest:
There are no conflicts of interest.
Author Contributions: CEO, LP, HB, MB, and JSO contributed to the conception and design of this study. Data was gathered by CEO, HA, JL, and ML. Data analysis was performed by CEO, LP, and JSO. Data were interpreted by CEO, LP, HA, JL, ML, MB, and JSO. CEO and JSO wrote the manuscript, and LP, MB, and HB provided critical revision for important intellectual content.