|Home | About | Journals | Submit | Contact Us | Français|
The present study investigated whether self-vocalization enhances auditory neural responsiveness to voice pitch feedback perturbation and how this vocalization-induced neural modulation can be affected by the extent of the feedback deviation.
Event related potentials (ERPs) were recorded in 15 subjects in response to +100, +200 and +500 cents pitch-shifted voice auditory feedback during active vocalization and passive listening to the playback of the self-produced vocalizations.
The amplitude of the evoked P1 (latency: 73.51 ms) and P2 (latency: 199.55 ms) ERP components in response to feedback perturbation were significantly larger during vocalization than listening. The difference between P2 peak amplitudes during vocalization vs. listening was shown to be significantly larger for +100 than +500 cents stimulus.
Results indicate that the human auditory cortex is more responsive to voice F0 feedback perturbations during vocalization than passive listening. Greater vocalization-induced enhancement of the auditory responsiveness to smaller feedback perturbations may imply that the audio-vocal system detects and corrects for errors in vocal production that closely match the expected vocal output.
Findings of this study support previous suggestions regarding the enhanced auditory sensitivity to feedback alterations during self-vocalization, which may serve the purpose of feedback-based monitoring of one’s voice.
The control of voice fundamental frequency (F0) plays an important role in speech production and contributes to efficient transmission of linguistic and non-linguistic cues for human communication. Two major controlling mechanisms have been proposed to be involved in voice F0 regulation (Guenther, 2006, Guenther et al., 2006). First, the feed-forward motor system adjusts the biomechanical parameters of the laryngeal muscles through previously learned motor commands. Second, sensory feedback (kinesthetic and auditory) arising from vocalization is used to minimize mismatch (error) between the intended and actual vocal output. The integration of feed-forward and feedback control mechanisms involves voice F0 regulation through error-induced corrective commands that update the internal models of the vocal motor system.
Numerous studies have used a pitch perturbation technique to investigate the role of auditory feedback in voice F0 control (Elman, 1981, Burnett et al., 1997, Burnett et al., 1998, Larson, 1998, Hain et al., 2000, Natke and Kalveram, 2001, Burnett and Larson, 2002, Donath et al., 2002, Jones and Munhall, 2002, Bauer and Larson, 2003, Xu et al., 2004, Sivasankar et al., 2005, Chen et al., 2007). Results of these studies have shown that applying real-time pitch shifts to the voice auditory feedback evokes compensatory vocal responses that oppose the direction of the pitch-shifted stimulus (PSS) to maintain F0 at a desired level. Moreover, other studies have shown that the role of auditory feedback in voice regulation is not limited to pitch compensation. In a series of sensorimotor adaptation experiments it was shown that subjects compensated for shifts in formant frequency of the auditory feedback during vowel production (Houde and Jordan, 1998, Houde and Jordan, 2002, Purcell and Munhall, 2006b, 2006a, Villacorta et al., 2007). It has also been shown that applying loudness shifts to voice auditory feedback leads to vocal responses that compensate for alteration in voice intensity feedback (Heinks-Maldonado and Houde, 2005, Bauer et al., 2006, Liu et al., 2007). Evidence from all these studies suggests that auditory feedback plays an important role in online regulation of voice and for updating an internal representation of the mapping between auditory feedback and the motor control system.
In efforts to understand the neural mechanisms underlying voice control in humans, magneto-encephalographic (MEG) (Numminen and Curio, 1999, Numminen et al., 1999, Houde et al., 2002, Heinks-Maldonado et al., 2006) and electro-encephalographic (EEG) (Heinks-Maldonado et al., 2005) recordings of the brain activity have been obtained during vocalization. Results of these studies showed that the auditory cortex responses to normal and pitch-shifted voice feedback were attenuated during speaking compared to listening, a phenomenon often referred to as motor-induced suppression (MIS). It was suggested that MIS dampens sensory input from self-initiated actions in order to help distinguish between self and externally generated sounds. The “re-afference hypothesis” has been proposed to explain this phenomenon through integration of the internal feed-forward speech production mechanism (Wolpert, 1997, Blakemore et al., 1998) and auditory feedback system (Houde et al., 2002, Heinks-Maldonado et al., 2005, Guenther et al., 2006). This theory suggests that, during speaking, re-afference projections from motor cortex transmit the neural representation of the intended vocal output to the auditory cortical areas. The comparison between the efference copy (corollary discharge) and the sensory consequences of a self-produced vocalization in auditory error cells has been proposed to provide a mechanism to detect and correct for vocal errors in order to stabilize voice F0. Therefore, it was thought that MIS during speaking might result from attenuated auditory responsiveness to the incoming sensory feedback that is represented by the re-afference projections (Heinks-Maldonado et al., 2005). The notion of suppression due to subtractive comparison between the incoming auditory feedback information and the efference copies of the intended vocal motor commands is further supported by showing that the cortical auditory neural responses to self-vocalizations are most suppressed during normal feedback compared to conditions where the feedback was either pitch-shifted or modified with an alien’s voice (Heinks-Maldonado et al., 2005, Heinks-Maldonado et al., 2006). This maximum suppression is thought to occur because the auditory feedback information is maximally represented by the efference projections when the vocal output closely matches its normal auditory feedback.
With relevance to this discussion, projections from the cingular vocalization area were shown to attenuate neural responses to acoustical stimulation in the superior temporal gyrus (STG) in squirrel monkeys (Müller-Preuss et al., 1980, Muller-Preuss and Ploog, 1981). The vocalization-induced inhibition in primate auditory cortex was further reported in later studies in marmoset monkeys (Eliades and Wang, 2003, 2005). However, results of a recent study showed that during normal feedback of self-vocalization, some neurons increased and others decreased their discharge rate (Eliades and Wang, 2008). Those auditory neurons that were suppressed during normal voice feedback showed a larger increase in their firing rate in response to the PSS than those neurons that were excited. Based on this observation, it was suggested that the vocalization-induced suppression enhanced neural sensitivity to feedback perturbation. The absence of neural sensitivity enhancement during passive listening indicated that the vocalization-induced increase of the neural firing rates during feedback perturbation could not be simply explained by the frequency-specificity (tonotopic map) of the cortical auditory neurons. This evidence suggested that the vocal motor system might be involved in modulation of neural sensitivity by fine-tuning the auditory neurons (e.g. increasing their dynamic range) that respond more vigorously to the alteration in feedback.
It can be inferred from the results of the latter experiment (Eliades and Wang, 2008) that the neural response to the pitch-shifted voice feedback might be affected by two factors during active vocalization: 1) vocalization-induced suppression and 2) suppression-induced enhancement of neural responses to feedback perturbation. Therefore, for the feedback perturbation that co-occurs with the onset of the vocalization and persists during vocal production, the scalp-recorded brain potentials may not accurately reflect the enhanced neural responses to the feedback perturbation stimuli because these responses may be masked by the suppression due to vocalization onset. This masking effect might be a possible explanation for the previously reported decrease in the auditory responses to voice pitch feedback perturbation during vocalization compared to listening (Numminen and Curio, 1999, Numminen et al., 1999, Houde et al., 2002, Heinks-Maldonado et al., 2006). In the present study, we tested the hypothesis that self-vocalization enhances the auditory cortex responsiveness to unexpected, brief (200 ms duration) voice pitch feedback perturbation by presenting PSS to the subjects 500 ms following vocalization onset. This method temporally separates the effects of pitch-shifted voice feedback and vocalization onset and allows the recording of the neural responses to pitch perturbations independent of vocalization onset.
We further asked the question of whether vocalization-induced modulation of the neural responsiveness can be affected by the extent of the feedback deviation. Presenting pitch-shift stimuli at different magnitudes allowed us to investigate whether the auditory cortex is most sensitive to perturbations in feedback that closely matched the intended vocal output. It was previously shown that large magnitude pitch-shift stimuli produce more responses that follow the direction of the stimulus rather than opposing it (Burnett et al., 1998). One interpretation of this finding is that large magnitude stimuli are perceived to come from another speaker rather than self-produced vocalization, and would therefore not require a corrective response. More recently, it was shown that smaller voice F0 shifts in feedback would evoke proportionally greater compensatory vocal responses when the vocal response magnitudes are expressed as the percentage of the feedback perturbation stimulus magnitude (Liu and Larson, 2007). This observation suggests that the audio-vocal system might be more sensitive to smaller feedback deviations during voice F0 control. Greater sensitivity to smaller feedback deviations may allow the audio-vocal system to apply robust control and error correction over self-produced vocalization compared to the disruptive effect of external sounds during vocal production.
Fifteen right-handed native speakers of American English (10 females and 5 males, 18–27 years of age; mean age=20.73 and std=2.55) with no history of speech, hearing or neurological disorders, and no voice training participated in the study. All study procedures, including recruitment and data acquisition of informed consent were approved by the Northwestern University institutional review boards and subjects were monetarily compensated for their participation.
In this study, the neural mechanisms involved in auditory feedback-based control of voice F0 were investigated using ERPs. The ERPs were recorded in response to the pitch-shifted auditory feedback stimulus during active production of the vowel sound /a/ and passive listening to the playback of the self-generated vocalization.
The experiment was comprised of a total of 6 different conditions (3 vocalization and 3 listening) in which the PSS magnitude was manipulated at +100 cents, +200 cents and +500 cents. For all subjects, the order of the stimulus magnitude presentation was randomly chosen and each vocalization condition was followed by the passive listening to the playback of the same utterance. During each vocalization condition, subjects were asked to vocalize the vowel sound 25 times, while maintaining a steady pitch and voice intensity. The maximum duration of each vocalization was approximately 6–7 seconds and 8 PSS were presented during each vocalization. Thus, a total of 200 ERP trials were collected and analyzed for each condition. The duration of the PSS was set to 200 ms and the inter-stimulus interval (ISI) was randomized between 500–900 ms. The ISI was randomized in order to make the subjects unable to anticipate the onset of the auditory feedback perturbation.
Subjects were seated in a sound-treated room in which their voices were recorded with an AKG boomset microphone (model C420), amplified with a Mackie mixer (model 1202), and pitch-shifted through an Eventide Eclipse Harmonizer. The pitch-shifted auditory feedback was heard through Etymotic earphones (model ER1-14A) inserted into the subject’s ear canal. The gain between the subject’s voice and feedback was further manipulated with a Crown amplifier (D75) and HP350 dB-attenuators to +10 dB SPL, calibrated with a Zwisklocki coupler and a Brüel & Kjær sound level meter (model 2250). The 10 dB gain between the voice and feedback channels allowed the recording of ERPs in response to pitch-shifted auditory feedback by masking the normal voice feedback through bone and airborne conductions. Subjects were instructed to maintain their vocalization as long as they heard the perturbations (taking about 6–8 seconds depending on the randomized inter-stimulus-interval) in their own auditory feedback and to take short breaks (2–3 seconds) between successive vocalizations. During each vocalization condition, the pitch-shifted feedback channel was recorded and then converted to a sound file to be played back during the passive listening condition. The gain of the feedback channel during passive listening was carefully calibrated to about the same level (80–85 dB) at which subjects heard the feedback of their own voice during active vocalization.
All parameters of the pitch-shifting auditory feedback stimulus, such as duration, magnitude, and ISI were controlled by MIDI software (Max/MSP v.4.1. by Cycling 74). The MIDI software also generated a TTL pulse to mark the onset of each stimulus for synchronized averaging of the recorded ERPs. Voice, feedback and TTL pulses were sampled at 10 kHz using PowerLab A/D Converter (Model ML880, AD Instruments) and recorded on a laboratory computer utilizing Chart software (AD Instruments).
The electroencephalogram (EEG) signals were recorded from 13 sites on the subject’s scalp (CZ, C3, C4, T3, T4, FZ, F3, F4, F7, F8, PZ, P3, P4) using an Ag-AgCl electrode EEG cap (10–20 system). Scalp-recorded brain potentials were amplified with 13 Grass amplifiers (Grass P511 AC amplifier), sampled at 10 kHz (PowerLab A/D Converter) and recorded using Chart software. All amplifiers were calibrated according to the instructions from the manufacturers. The gain of the EEG amplifiers was set to 10k and the cut-off frequencies of their online high-pass and low-pass filters were set to 0.1 Hz and 10 kHz, respectively. All recorded EEG channels were referenced to linked earlobes and their impedances were measured using Grass impedance meter (Model: EZM-5AB) and maintained below 5 kΩ. The effect of the visual and muscle artifact on the recorded brain potentials were reduced by instructing the subjects to close their eyes and relax their muscles throughout the course of the experiment.
The evoked ERPs were obtained by averaging the recorded EEG signals, synchronized to the onset of the pitch-shifted stimulus. Prior to the synchronized averaging, the EEG signals from all channels were subjected to offline filtering, using a band-pass filter with cut-off frequencies set to 1 and 30 Hz. The filtered EEGs were then cut into epochs ranging from −100 ms to 500 ms, relative to the onset of the PSS. The epochs with amplitudes exceeding +/−50 μV were excluded from the data analysis. The baseline (pre-stimulus mean amplitude) of the remaining epochs was removed, and the ERPs were obtained for conditions with a minimum number of 100 epochs.
The latency and amplitude of the P1-N1-P2 complex were extracted from the averaged neural responses by finding the most prominent peaks in 50 ms-long time windows centered at 50 ms, 100 ms and 200 ms. In addition to the extracted peak amplitudes and latencies, the normalized difference index (NDI) was calculated as the difference between response peak amplitudes during active vocalization vs. passive listening, normalized to the sum of peak amplitudes for each individual neural component. The NDI is a measure that reflects the changes in the stimulus-evoked auditory neural responses as a result of vocal-motor system activity. In other words, NDI is an indication of vocalization-induced changes in the neural responsiveness to feedback alteration and can be calculated using the following formula:
In which Mvoc and Mlis are the amplitude of the averaged evoked neural components during active vocalization and passive listening conditions, respectively.
The surface distribution maps of measures of brain activity in response to voice pitch feedback perturbation are created using the peak amplitudes of the neural responses for 13 electrode sites (CZ, C3, C4, T3, T4, FZ, F3, F4, F7, F8, PZ, P3, P4) over the surface of the scalp. These topographical distribution maps of the neural activity are created by color coding the amplitudes of the ERP components using the interpolation method between adjacent electrodes to obtain a fine map of electrical activity distribution.
The extracted latencies and amplitudes of the P1, N1 and P2 peaks in the averaged neural responses were separately analyzed for 10 recording sites using a four-way repeated-measures analysis of variance (ANOVA). The analyzed electrode sites included C3, C4, T3, T4, F3, F4, F7, F8, P3 and P4. Repeated-measures ANOVAs were conducted to examine effects of condition (active vocalization, passive listening), stimulus magnitude (100 cents, 200 cents, 500 cents), laterality (Left: C3, T3, F3, F7, P3 vs. Right: C4, T4, F4, F8, P4) and electrode site (Centro-Medial: C3, C4 – Temporal: T3, T4 – Fronto-Medial: F3, F4 – Fronto-Lateral: F7, F8 – Parieto-Medial: P3, P4), and their interactions on the latencies and amplitudes of the P1-N1-P2 cortical neural components.
Figure 1 shows the grand average (all subjects) ERP responses as a function of the stimulus magnitude during active vocalization and passive listening conditions. Figure 2 shows the overlaid ERP responses during active vocalization and passive listening conditions across three PSS magnitudes. The repeated-measures ANOVA on the P1 amplitude revealed a significant main effect of condition, F(1,14)=6.39, p=0.024, stimulus magnitude, F(2,28)=5.74, p=0.008, electrode position, F(4,56)=9.51, p=0.000, as well as laterality × position interaction, F(4,56)=2.55, p=0.049. The significant main effect of the condition factor on the P1 amplitude indicated that the PSS evoked larger P1 responses during the active vocalization than the passive listening condition.
Post-hoc tests using a Bonferroni adjustment for multiple comparisons of P1 mean amplitudes across three different PSS magnitudes revealed that the P1 amplitudes for 500 cents significantly differed from 100 cents (p=0.037) and 200 cents (p=0.017) stimuli. These findings indicated that 500 cents pitch-shifts elicited larger P1 responses compared to two other smaller stimuli. No significant difference between P1 amplitudes was found for 100 cents and 200 cents stimuli. Post-hoc tests for the electrode position main effect revealed significant positivity in the frontal and frontal-medial regions for the P1 amplitudes (p=0.003) compared with other electrode sites. Figure 3a illustrates the topographical distribution of the P1 peak amplitudes over the surface of the scalp, grand averaged over three PSS magnitudes during both vocalization and passive listening conditions. Figure 4 (top row) also shows the P1 amplitude distributions in more detail for different stimulus magnitudes and conditions separately. For the grand averaged P1 amplitudes over three PSS magnitudes and both vocalization and listening conditions, significant laterality × position interaction indicated that the pattern of P1 amplitude distribution was different for left vs. right hemispheres. Significant differences were observed for P1 amplitudes between centro-medial vs. parieto-medial (pLeft=0.002, pRight=0.025) and fronto-medial vs. parieto-medial (pLeft=0.008, pRight=0.003) electrode sites on either side, however, the difference in the distribution pattern was due to the significant difference in P1 amplitudes between fronto-medial vs. temporal on the left (p=0.003) instead of fronto-lateral vs. parieto-medial sites on the right (p=0.043).
The statistical test on the N1 peak amplitude only revealed a significant main effect of electrode position, F(4,56)=13.5, p<0.001. Post-hoc tests for the electrode position main effect on the grand averaged N1 amplitudes for all three PSS magnitudes during both vocalization and passive listening conditions revealed significant negativity in the frontal and fronto-medial regions (p=0.001) compared with other electrode sites. Figure 3b illustrates the topographical distribution of the N1 peak amplitudes over the surface of the scalp, grand averaged over three PSS magnitudes during both vocalization and passive listening conditions. Figure 4 (middle row) also shows the N1 amplitude distributions in more detail for different stimulus magnitudes and conditions separately.
For the P2 amplitude, the repeated-measures ANOVA revealed a significant main effect of condition, F(1,14)=5.99, p=0.028, stimulus magnitude, F(2,28)= 18.04, p<0.001, electrode position, F(4,56)=12.86, p<0.001, as well as magnitude × position, F(8,112)=5.78, p<0.001, and laterality × position interactions, F(4,56)=3.83, p=0.008. The significant main effect of condition on P2 amplitude indicated that the PSS elicited larger P2 responses during the active vocalization than passive listening. Post-hoc tests using a Bonferroni adjustment revealed that the P2 amplitudes significantly differed for 100 cents vs. 200 cents (p=0.036), 200 cents vs. 500 cents (p=0.035) and for 100 cents vs. 500 cents (p=0.000). The significant difference between P2 amplitudes for all pairs of the stimulus magnitudes indicates that the shifts in the auditory feedback of voice F0 elicited graded P2 responses that become larger as the pitch-shifted magnitude increased. Post-hoc tests for the electrode position main effect revealed significant positivity at the central and centro-medial regions for the P2 amplitudes (p=0.001) compared with other electrode sites. Figure 3c illustrates the topographical distribution of the P2 peak amplitudes over the surface of the scalp, grand averaged over three PSS magnitudes during both vocalization and passive listening conditions. Figure 4 (bottom row) also shows the P2 amplitude distributions in more detail for different stimulus magnitudes and conditions separately. For the grand averaged P2 amplitudes over three PSS magnitudes and both vocalization and listening conditions, significant laterality × position interaction indicated that the pattern of P2 amplitude distribution was different for left vs. right hemispheres. Significant differences were observed for P2 amplitudes between centro-medial vs. parieto-medial (pLeft=0.007, pRight=0.013) and temporal (pLeft=0.000, pRight=0.002) and between fronto-medial vs. fronto-lateral (pLeft=pRight=0.000) and temporal (pLeft=0.001, pRight=0.019) electrode sites on either side, however, the P2 amplitude was only significantly different between centro-medial vs. fronto-lateral (p=0.003) electrodes on the left side.
Significant magnitude × position interaction for the P2 peak amplitudes indicated that the potential distribution pattern was different for different PSS magnitudes. Separate ANOVAs for stimulus magnitudes considering the condition, laterality and electrode position factors revealed a significant main effect of electrode position (p=0.012) with the largest positivity at the centro-frontal area for 100 cents stimulus, a significant main effect of electrode position (p=0.000) with the largest positivity at the centro-frontal area and also a significant laterality × position interaction (p=0.015) for 200 cents stimulus and finally a significant main effect of electrode position (p=0.000) with the largest positivity at the centro- frontal area for 500 cents stimulus.
Analysis of the P1, N1 and P2 latencies revealed no significant effects. The overall mean latencies of the P1-N1-P2 complex were 73.51 ms (std=11.38 ms), 117.41 ms (std=13.43 ms) and 199.55 ms (std=23.49 ms), respectively.
Due to a significant main effect of condition on P1 and P2 amplitudes, the NDI was calculated and analyzed for these auditory neural components. A two-way repeated-measures ANOVA was conducted to investigate the main effects of the stimulus magnitude, laterality, and their interaction on P1 and P2 NDIs. Results of the statistical tests revealed no significant effects on the P1 NDIs. Separate ANOVAs for the right and left sides also confirmed that there was no significant main effect of the stimulus magnitude on the P1 NDIs (Figure 5a). The ANOVA for the P2 NDIs only revealed a significant main effect of the stimulus magnitude, F(2,28)=5.39, p=0.01. Pair-wise comparison using Bonferroni adjustment showed that the main effect of the stimulus magnitude on P2 NDI was only significant between 100 cents and 500 cents (p=0.018). Separate ANOVAs also showed that the P2 NDIs significantly differed only across 100 cents and 500 cents for both the left (F(2,28)=4.2, p=0.025) and right (F(2,28)=5.07, p=0.013) sides (Figure 5b). The significant main effect of the stimulus magnitude on the P2 NDIs indicated that, although shifts in voice F0 during active vocalization elicited larger P2 responses, the difference between P2 peak amplitudes during active vocalization and passive listening conditions were smaller for larger pitch shifts.
A recent study in monkeys suggested that the previously reported auditory suppression (Numminen and Curio, 1999, Numminen et al., 1999, Houde et al., 2002, Heinks-Maldonado et al., 2006) can lead to increased neural sensitivity to feedback perturbation during vocalization (Eliades and Wang, 2008). The evidence for such an effect came from the results of single neuron recordings showing that the suppressed auditory neurons during normal voice feedback exhibited a larger increase in their firing rates in response to PSS than neurons that were excited during normal voice feedback. The present study investigated this effect by recording ERPs in humans to test the hypothesis that self-vocalization enhances auditory responsiveness to voice F0 feedback perturbation.
The suppression-induced enhancement of auditory sensitivity to alteration in voice F0 feedback during vocalization (Eliades and Wang, 2008) implies a complex neural mechanism that involves internal modulation (re-afference projections) as well as responses to changes in feedback. Therefore, the auditory responses to feedback perturbation can be studied accurately only if the vocalization-induced modulation (suppression) does not occur at the same time as the perturbation. In the current study we proposed that presenting the PSS after the onset of vocalization can reveal neural responsiveness to perturbations in self-vocalization without the confounding influence of the simultaneous suppression at the vocalization onset. Thus, the recorded ERPs in response to PSS that occur after a time interval with respect to the vocalization onset might reduce or eliminate the previously reported suppression effect (Heinks-Maldonado et al., 2005) from the neural responses to auditory feedback pitch perturbations.
Results of the analysis revealed that P1 and P2 peak amplitudes were larger during vocalization compared to listening to the same vocalizations, indicating that the auditory cortex is more responsive to feedback perturbations during active vocalization than during passive listening (Figure 2). The increase in P1 and P2 peak amplitudes during vocalization suggests that the proposed feed-forward model (Wolpert, 1997, Blakemore et al., 1998) can enhance auditory responsiveness to feedback perturbations during speaking. This effect might be an important characteristic of the speech motor control system that allows for accurate detection and correction of unintended changes in the vocal output during speech production. In fact, the observed vocalization-induced enhancement of neural responsiveness suggests that the hypothesized re-afference projection might be involved in fine-tuning the auditory cortex to improve its ability to detect feedback deviations during online voice regulation. This effect was discussed by Eliades et al. (Eliades and Wang, 2008) to result from changes in the tuning properties of the auditory neurons (e.g. increasing their dynamic range) as a result of vocal motor system activity. Although the neural mechanisms of such a phenomenon are poorly understood, the vocalization-induced increase in auditory sensitivity (Eliades and Wang, 2008) and responsiveness (current study) might be an indication of enhanced neural processing of sensory information to mediate feedback-based monitoring and control of voice F0.
Results of the current experiment did not show any significant difference between N1 peak amplitudes across vocalization and passive listening conditions. We suggest that this inconsistency with the previous findings (Numminen and Curio, 1999, Numminen et al., 1999, Houde et al., 2002, Heinks-Maldonado et al., 2006) may be due to differences between experimental procedures in these studies. The temporal separation of vocalization and PSS onsets in the current experiment allows for independent recording of the neural responses to the onset of the feedback perturbation, whereas in the previous experiments, the neural responses to feedback perturbation may have been masked by the suppression of the auditory cortex at the onset of vocalization. In the previous experiments, the coinciding vocalization and feedback perturbation onsets caused alteration of subject’s auditory feedback as soon as vocalization was initiated. Therefore, the post-stimulus neural activity during vocalization consisted of the suppressive effect of the vocal motor system on the auditory neurons and the changes in auditory cortex neural activity in response to the feedback alteration. In other words, the post-stimulus responses in those cases reflected the sum of changes in the neural activity as a result of the vocalization-induced inhibition (suppression) and stimulus-evoked excitation reported by Eliades et al. (Eliades and Wang, 2008).
The absence of the vocalization-induced suppression in the current experiment does not challenge or reject the previous findings regarding the suppression of auditory cortex during vocalization. Although the recorded ERPs in the current experiment might not reflect the suppression effect during vocalization onset, we propose that the previously reported auditory suppression (Numminen et al., 1999, Houde et al., 2002, Heinks-Maldonado et al., 2006) causes the enhancement of neural responsiveness observed in the current study. This proposal is based on the recent finding in primate auditory cortex that during normal feedback only the suppressed neurons exhibited increased neural activity to feedback perturbation (Eliades and Wang, 2008). Vocalization-induced suppression was hypothesized to be responsible for adjusting tuning properties of the auditory neurons in order to increase their sensitivity to unexpected feedback alteration during vocalization (Eliades and Wang, 2008).
The effect of the PSS magnitude on auditory neural responses was also investigated during active vocalization and passive listening to the playback. The results of the analysis showed that manipulation of the PSS magnitude modulates the amplitudes of the P1 and P2 neural components during both conditions. The increase in the amplitudes of the P1 and P2 peaks for larger pitch shifts indicates that larger voice F0 feedback deviations increase the auditory cortex responsiveness during both vocalization and passive listening. This modulation suggests that the underlying neural substrates giving rise to these two peaks might be involved in detecting and processing the perturbation magnitude in voice F0 feedback. The effect of the stimulus magnitude on neural responses was more pronounced in P2 for which the response amplitudes were graded according to the magnitudes of the PSS (Figure 1). P1 response magnitudes were significantly different between 500 cents stimuli and the two other stimuli, but no significant difference between the P1 responses to 100 cents and 200 cents stimuli was found. These observations suggest that larger feedback deviations might be needed to modulate the responsiveness of the neural substrates giving rise to P1.
The condition (vocalization vs. listening) sequence could also explain the apparent vocalization-induced enhancement of the neural responsiveness to feedback perturbation. The sequence effect arises because the subjects experienced the vocalization condition before passive listening for each PSS magnitude and therefore, this could have resulted in the stimulus being novel during the vocalization condition, but more familiar during the listening condition. However, this factor is unlikely to fully account for the observed effect because the stimulus would have been novel during vocalization only for the first stimulus magnitude but not for all three of them. Moreover, results showed the enhanced neural responsiveness for all three PSS magnitudes when the order of stimulus magnitude presentation was randomized across different subjects.
In addition to the significant main effect of the PSS magnitude and condition factors on P1 and P2 peaks amplitudes, results revealed that there was a significant main effect of the electrode position factor on the neural peak amplitudes for P1-N1-P2 complex. This effect was observed as a significant positivity that started in the frontal and fronto-medial region for P1 followed by a significant negativity for N1 in about the same region that eventually moved posteriorly toward the centro-frontal region as a significant positivity for the P2 peak amplitudes (Figures 3 and and4).4). The dynamic flow of the potential distribution on the surface of the scalp over time suggested several stages of feedback information processing in different auditory related areas in the brain. Previous studies have suggested that several neural generators might contribute to auditory P1-N1-P2 components that reflect auditory processing at multiple stages. The P1 component has been proposed to have dominant generators in the primary auditory cortex located in the superior temporal gyrus (STG) where the early cortical detection of any change in the auditory system occurs (Burkard et al., 2006). The N1 component has been discussed in numerous studies to have neural generators in the higher (e.g. secondary) auditory cortical areas as well as the upper bank of the Sylvian fissure in the temporal lobe (Hari et al., 1980) and in cortical frontal areas (Naatanen and Picton, 1987). The N1 is considered as an index of the pre-attentive auditory processing that reflects mechanisms underlying automatic change detection based on the comparison of the incoming auditory feedback information with the memory trace of the previous sensory input to the auditory system (Naatanen, 1991). A recent study on feedback-based error monitoring during musical performance has suggested that the P2 component might reflect the process of auditory mismatch detection (Katahira et al., 2008) during the performance of a motor task and is assumed to arise from the anterior cingulate cortex (ACC), triggered by the basal ganglia when subjects notice their own motor error. This assumption is supported by a fMRI study that investigated the neural substrates of vocal pitch regulation during singing (Zarate and Zatorre, 2008). However, the P2 component during the passive listening condition in the present study can be interpreted as the cognitive control-related auditory component that is thought to be generated in ACC as a result of template mismatches and therefore, has a scalp distribution similar to the P2 component during motor tasks (Folstein and Van Petten, 2008). Higher responsiveness to larger feedback deviations along with the modulation of the P1 and P2 peak amplitudes in the current study suggested that the neural substrates giving rise to these potentials are involved in detection of auditory perturbation magnitude, and their sensitivity to changes in the auditory feedback can be modulated by the vocal motor system during voice F0 control.
Although the P1 and P2 peak magnitudes were shown to be modulated by PSS magnitude during both vocalization and listening conditions, the significant main effect of condition revealed that the neural responses were greater during vocalization than listening. The NDI was then calculated as the difference between response peak amplitudes for vocalization and listening to measure the vocalization-induced increase in the auditory responsiveness to feedback perturbation. In addition, NDIs were compared across different voice F0 feedback perturbation levels to investigate how vocalization-induced enhancement of neural responsiveness can be affected by the extent of feedback deviation. Results revealed a significant main effect of the PSS magnitude on P2 NDIs for 100 cents and 500 cents stimuli, indicating that the vocalization-induced increase in neural responsiveness is greater for the smaller feedback perturbation magnitude (Figure 5). This finding suggests that the effect of the self-vocalization on the human auditory system is greater when the deviated feedback signal is closer to one’s own voice F0. In other words, even very small deviations around a target pitch level might be significant from the system’s standpoint, whereas a larger deviation might be negligible because it is recognized as not belonging to the speaker. This important characteristic of the system may enable it to apply corrective motor commands to correct for deviations only if the feedback signal is distinguished as self. Therefore, increasing feedback deviation might systematically reduce neural responsiveness to perturbations in the feedback signal because that signal does not closely match the intended output (highly deviated self or externally-generated). However, although suppression is suggested to play a major role in fine-tuning the auditory cortex, the way that motor induced suppression and increased sensitivity to perturbations can relate to each other is not clearly understood.
Larger increases in neural responsiveness to smaller feedback alterations, measured using the normalized difference between neural peak magnitudes during active vocalization and passive listening, indicated that the integration of audio-vocal modalities in humans might provide a mechanism for stabilizing the voice against small perturbations. As a result, higher neural sensitivity to feedback perturbation during active vocalization might indicate that the audio-vocal system applies robust control over the structure of the self-produced voices against the disruptive effect of the external sounds. This suggestion is supported by a recent finding that human subjects showed proportionally greater compensation to smaller changes in voice F0 feedback when the magnitude of the vocal responses were expressed as the percentage of the PSS stimulus magnitude (Liu and Larson, 2007). Moreover, greater relative responsiveness to smaller deviations might indicate that the speech motor control system largely corrects for the disparity between one’s own voice and its feedback (error) as long as the feedback signal is recognized to be produced by the self (no or small unintended deviation). Therefore, it is possible that distinguishing between self- and externally-generated sounds would be a critical issue for the audio-vocal system during speaking.
In the present study, the effect of voice F0 feedback perturbation was investigated on the cortical neural responses during active vocalization and passive listening conditions. Results showed that the auditory cortex is more responsive to changes in auditory feedback during vocalization than when passively listening to the playback of their own voice. This phenomenon was suggested to result from the internal modulation of auditory neurons (re-afference theory) that provides a mechanism to increase the neural sensitivity to detect and help the vocal motor system correct for deviations in self-generated voice feedback. The vocalization-induced enhancement of neural responsiveness was also shown to be more effective (greater increase) for smaller F0 perturbations in voice feedback. This finding suggests that the auditory cortex is more sensitive to smaller feedback alterations that result from natural fluctuations in self-produced vocalizations. It is possible that the audio-vocal system carries information about the predicted vocal output that enables it to distinguish between self and externally generated sounds. This effect might be an important characteristic of the vocal production system for monitoring self-vocalizations that helps to detect and correct for vocal errors during speech production.
This research was supported by a grant from NIH, Grant No. 1R01DC006243. Authors wish to thank Chun Liang Chan for his help with the computer programming.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.