|Home | About | Journals | Submit | Contact Us | Français|
Sensory responses to stimuli that are triggered by a self-initiated motor act are suppressed when compared with the response to the same stimuli triggered externally, a phenomenon referred to as motor-induced suppression (MIS) of sensory cortical feedback. Studies in the somatosensory system suggest that such suppression might be sensitive to delays between the motor act and the stimulus-onset, and a recent study in the auditory system suggests that such MIS develops rapidly. In three MEG experiments, we characterize the properties of MIS, by examining the M100 response from the auditory cortex to a simple tone triggered by a button press. In Experiment 1, we found that MIS develops for zero-delays but does not generalize to non-zero delays. In Experiment 2, we found that MIS developed for 100 ms delays within 300 trials and occurs in excess of auditory habituation. In Experiment 3, we found that unlike MIS for zero-delays, MIS for non-zero delays does not exhibit sensitivity to sensory, delay or motor-command changes. These results are discussed in relation to suppression to self-produced speech and a general model of sensory motor control.
A key goal in neuroscience is understanding the complex interplay between the brain’s sensory and motor systems, and a phenomenon where this interplay is readily observed is the suppressed sensory response to self-produced sensations. In human auditory cortex, this suppression is observed during speech (speaking-induced suppression, or SIS), where it manifests properties that elucidate how auditory feedback is processed: a speaker’s auditory cortex responds to the sound of his own speech with an activation that is suppressed compared with a greater response during passive listening to playback of the speech (Eliades & Wang, 2003; Houde, Nagarajan, Sekihara, & Merzenich, 2002). Such suppression is highly specific to the auditory speech feedback: responses to additional tone pips occurring during speech are not suppressed beyond that expected for acoustic masking, and if the subject’s auditory feedback is artificially altered, the response to speech is restored to the same levels observed during passive listening (Houde et al., 2002). This suppression profile suggests that the auditory cortex compares incoming auditory feedback to a prediction of expected feedback. Such a comparison is crucial as the brain is continuously assailed with sensory stimuli originating both externally and internally (self-produced) and it is necessary to accurately and continuously distinguish self-produced stimuli, which can generally be discarded, from external stimuli, which might be necessary for proper interaction with the environment.
It has been postulated that this distinction is guided by a central monitor (Frith, 1992) or an internal forward model (Sarah-Jayne Blakemore, Rees, & Frith, 1998; Wolpert, 1997; Wolpert, Ghahramani, & Jordan, 1995) which learns and predicts sensory consequences of self-produced actions by using a copy of the neural signals (variously referred to as “efference copy” (von Holst, 1954) or “corollary discharge” (Sperry, 1950)) controlling the vocal tract muscles that produce speech. During speech, the efference copy enables the forward model to produce an accurate prediction of auditory feedback, resulting in a small prediction error, which translates to a minimal response in the auditory cortex. During passive listening, where the efference copy is unavailable, the forward model is unable to generate an accurate prediction of auditory feedback, resulting in a larger prediction error, which translates to a larger response in the auditory cortex. Under this hypothesis, a larger prediction error can be artificially created during speech by distorting the auditory feedback perceived by the subject, which again translates to a larger response in the auditory cortex.
A similar suppression phenomenon has been observed in the somatosensory system where responses to self-produced tactile stimuli are weaker relative to externally generated tactile stimuli (Sarah-J. Blakemore, Wolpert, & Frith, 1998, 2000), and such suppression is sensitive to delays in stimulus delivery. These observations suggest that, akin to the auditory cortex, the somatosensory cortex processes sensory feedback by comparing it against an efference copy based prediction of said sensory feedback. The similarities in suppression phenomena observed in the auditory and somatosensory cortex suggest that suppression observed in the auditory cortex during speech is not unique to the act of speaking, but instead a special case of a more general property of the auditory cortex: that it processes auditory feedback from any motor act by comparing incomming feedback against a prediction of that feedback derived from an efference copy of the motor command that produced the feedback, where this comparison results in motor-induced suppression, or MIS.
Indeed, the auditory cortex does exhibit suppression for a more arbitrary pairing of a motor act and auditory stimulus: electroencephalography (EEG) and magnetoencephalography (MEG) experiments have demonstrated that the auditory response to self-triggered tones is suppressed relative to the response while passively listening to the same tones. Schafer and Marcus demonstrated that the vertex EEG response was attenuated for self-generated auditory stimuli when compared to machine generated stimuli (Schafer & Marcus, 1973). More recently, Martikainen et al. reported similar findings, where the MEG responses arising from the auditory cortex were attenuated for self-triggered tones, and that such an attenuation develops rapidly within a block of 60 trials (Martikainen, Ken-ichi, & Hari, 2005).
If MIS in the auditory cortex arises in the same way that SIS is hypothesized to arise – i.e., from a comparison with an efference copy derived prediction – then it should have characteristics resulting from the properties of predictions. First, MIS in the auditory cortex should be a learned phenomenon – i.e., it should not be immediately present, but require practice trials to develop. This follows from the hypothesis that MIS arises from comparison with a prediction, and that the prediction comes from a forward model that must be learned. Second, a full sensory prediction should have two dimensions: it should specify not only what the predicted sensation will be (e.g., a tone), but also when the sensation will arrive (e.g., the tone is heard X ms after the button press). Arrival time should be an important part of learning a prediction since there are intrinsic time delays in the processes between the efferent motor command going out (e.g., to the finger muscles, which have a response latency), and the transduced sensory feedback coming in (e.g., via the auditory pathway to the auditory cortex, which includes transmission and synapse delays). Two predictions about MIS in the auditory cortex follow from this. First, MIS in the auditory cortex should be sensitive to feedback delays, as indeed appears to be true for MIS in the somatosensory cortex (Sarah-J. Blakemore, Frith, & Wolpert, 1999; Sarah-J. Blakemore et al., 2000). Second, MIS should develop for different, artificially produced, feedback delays – a prediction that is easier to test with button-generated tones than with speaking. This follows from the assertion that arrival time is an essential component of a prediction, and that due to intrinsic neural processing delays, a non-zero arrival time must be learned even without additional artificial feedback delay.
We therefore hypothesized that, with training, an internal forward model can be recruited within an experimental session, and that the sensory responses to the sensory consequences of a self-triggered action (a tone resulting from a button press) could be suppressed – at least in part – over time. Thus, we predicted that auditory responses to a tone triggered by a button press would become weaker in intensity relative to an external (untriggered) tone as the experiment progressed – i.e., that MIS would develop, provided the correspondence between button presses and the subsequent tone is learned. In the present study, we characterized the properties of MIS in three MEG experiments. In Experiment 1, we examined whether MIS in response to tones triggered by a button press is a learned phenomenon, and we further explored the specificity of this button-tone MIS for time-delays between the motor-act (the button press) and the sensory-stimulus (the tone feedback). We confirmed in Experiment 1 that MIS in response to tones triggered by a button press is a learned phenomenon. We also found that MIS learned at zero-delay does not generalize to non-zero delays between the motor-act and sensory stimuli, nor does it extend to the right hemisphere, prompting speculation about MIS learned at zero-delay. In Experiment 2, we investigated the hypothesis that non-zero delays between motor act and the resulting sensory stimuli can induce MIS, and we show that the observed suppression is independent of adaptation or habituation processes. Finally, since Experiment 1 showed that MIS learned for a zero tone delay does not generalize to other delays and does not develop in the right hemisphere, the question arises as to whether MIS learned for other delays possesses similar characteristics. Thus, in Experiment 3, we investigate the hypothesis that suppression to MIS learned with non-zero tone delays exhibits specificity to the sensory stimuli, motor-act and delay learned. We found that MIS learned with non-zero delay is not specific to sensory stimuli or motor-act and does generalize to other delay conditions.
The MEG M100 response was measured in thirteen healthy right-handed subjects (8 males, 5 females; aged 20–40 years). All subjects provided informed consent as approved by the Committee on Human Research at our institution. The experiment consisted of 6 blocks: a training block, 4 test blocks and a control block. In the training block, subjects pressed a plastic button with their right thumb at a self-paced rate of 0.5 Hz and heard a simple tone (1 kHz, 100 ms long, 5 ms rise/fall ramp, 80 dB SPL, binaural), immediately after the button press at 0 ms delay. Subsequent to this training block, in 4 test blocks subjects pressed a button and heard a simple tone at delays of either 0 (delay0s), 100 (delay0.1s), 300 (delay0.3s) or 500 (delay0.5s) ms. Note that the 0 ms block was identical to the training block. Finally in a control block, subjects passively listened to a simple tone, identical to the tone in the training and test blocks, presented once every 2 seconds. The training block was presented first and the control block last. The order of the intervening four test blocks was randomized across subjects. Each block consisted of 100 trials and a short break of 1–2 minutes was provided between blocks.
Seven healthy right-handed subjects (4 males, 3 females; aged 22–40) participated in this experiment with informed consent. Five of these subjects also participated in Experiment 1. Experiment 2 consisted of 6 blocks as well: 2 control blocks and 4 training blocks. The first and sixth blocks were controls; where subjects passively listened to a simple tone presented once every 2 seconds, similar to the control block in Experiment 1. Blocks 2–5 were training blocks identical to each other where subjects pressed a button and heard a tone afterwards at a constant 100 ms delay (delay0.1s). Each block consisted of 100 trials.
Thirteen healthy right-handed subjects (8 males, 5 females; aged 20–40) participated in this experiment with informed consent, 5 of who also participated in experiments 1 and 2. The experiment consisted of 4 training blocks followed by 4 test blocks and a control block. The four training blocks were identical to the training blocks of Experiment 2, thus also serving as a replication, where subjects pressed a button with their right thumb once every 2 seconds and heard a simple tone after a 100 ms delay (delay0.1s). Subsequent to these four training blocks, four test blocks were conducted. In one test block, subjects were asked to press the button with their left thumb and heard the 1 kHz tone after a 100 ms delay (motor; where the hand switch signifies an alteration in motor act). In a second test block (sensory), subjects pressed a button with their right thumb and heard a tone at a 100 ms delay but with a different frequency (0.5 kHz); the lower frequency tone signifying an alteration in sensory stimuli. In two other test blocks, subjects pressed a button with their right thumb and the delay between the tone was changed to either 0 ms or 200 ms (delay0s and delay0.2s) while the carrier frequency of the tones were fixed at 1 kHz. In a final control block, subjects passively listened to simple tones with frequencies of 0.5 kHz or 1 kHz, presented randomly once every 2 seconds. The order of presentation of the four test blocks were randomized across subjects. Each training and test block consisted of 100 trials. The control block consisted of 200 trials, with 100 randomly distributed trials for each tone frequency.
Magnetoencephalographic recordings (Band-pass filtered from 0 to 300 Hz, sampling rate 1200 Hz) were obtained from the whole head in a magnetically shielded room using an Omega 275 biomagnetometer (VSM MedTech Inc. Port Coquitlam, Canada). We were particularly interested in the evoked M100 response, which typically occurs ~100 ms post stimulus (Farrell, Tripp, Norgren, & Teyler, 1980; Hari, Aittoniemi, Järvinen, Katila, & Varpula, 1980). Therefore we created epochs time-locked to the auditory stimulus (−300 ms to 500 ms). For each block, 100 responses were epoched offline using CTF MEG software (VSM MedTech Inc. Port Coquitlam, Canada). During MEG recordings, subjects were fitted with position indicator sensors at anatomic landmarks (nasion, right and left auricle). These sensors were used to quantify motion, and in aligning MEG and magnetic resonance imaging (MRI) coordinate systems. Structural MR images were generally acquired on a 1.5T GE Signa scanner (GE Healthcare, Milwaukee, WI) using a T1-weighted three-dimensional gradient-echo (3D-SPGR, 3D-spoiled gradient recalled acquisition in a steady state): flip angle = 40°, TR/TE = 27/6 ms, matrix = 256×256, slice thickness = 1.5 mm.
M100 responses were evaluated separately for the left and right hemisphere in each subject for each block. Root mean square (RMS) M100 amplitude and latencies were deduced from waveforms of sensors located in the left and right temporal regions. Sources of the M100 (Q) were estimated as equivalent current dipoles (ECDs) using CTF MEG software (VSM MedTech Inc. Port Coquitlam, Canada). A Spherical head model was used and optimized based on MR images. ECDs explaining the most dominant signals from left and right temporal regions were determined for the block with the best signal-to-noise ratio, which was usually the control block, with goodness-of-fit over 80%. Once found, these dipoles were used to model the responses of the other blocks, keeping the location and orientation of the dipoles fixed while permitting the dipole moment strengths to vary temporally and across conditions (Hämäläinen, Hari, IImoniemi, Knuutila, & Lounasmaa, 1993). The M100-RMS values and the M100-Q, i.e. the dipole moment amplitude corresponding to the M100 response, were then subject to statistical analysis across experimental conditions. Subjects for whom reliable sources could not be estimated were eliminated from statistical analysis. We present both the M100-RMS and M100-Q results for each experiment. The former is an assumption free and model independent measure of activity in the sensor array that although is dominated by auditory cortical sources, could also contain contributions from non-auditory sources. The M100-Q is a model dependent measure of response strength and is dominated by the auditory cortical activity restricted to a single dipole for each hemisphere’s auditory cortex.
Statistical analysis was performed using SPSS version 10.0 (SPSS Inc., Chicago IL). Whenever possible, repeated measures Analysis of Variance (ANOVA) was performed on RMS amplitudes, latencies and dipole source strengths in both hemispheres with block as the repeated measure. Post hoc tests and t-tests were performed for specific contrasts. Statistical significance was set at p < .05, except where corrections for multiple comparisons were made.
Figures 1a and 1b illustrate representative dipole localizations during the control (Tone Alone) block in Experiment 1. Temporal region sensors were used in estimating current dipoles. Current dipoles generally localized to the supratemporal auditory cortex as expected (Picton et al., 1999; Reite et al., 1994). Figures 2a and 2b display representative left hemisphere sensor waveforms during the control and zero-delay test blocks respectively, in Experiment 1. The figures show the development of MIS: sensor waveform amplitudes are reduced during the zero-delay test block (where subjects push a button and then hear a tone) relative to the control block. This effect was not observed in the right hemisphere, as can be seen in Figures 2c and 2d, which display right hemisphere sensor waveforms during the control and zero-delay test blocks respectively. The RMS amplitude and dipole source strength timecourses also document MIS development. Figures 2e and 2f show left hemisphere RMS amplitude and source strength timecourses for the control and zero-delay test block in a representative subject during Experiment 1. The figures show that the RMS amplitude and source strength timecourses corresponding to the zero-delay block are reduced relative to those for the control block. As with the sensor waveforms, this effect was not observed in the right hemisphere (Figures 2g and 2h).
A repeated measures ANOVA revealed significant differences between test blocks for the source strength (M100-Q) (F(3, 27) = 5.049, p < .007) and sensor waveform magnitude (M100-RMS) (F(3, 33) = 7.122, p < .001) of the M100 in the left hemisphere. Post hoc testing with Bonferroni adjustment for multiple comparisons showed the zero-delay block to be significantly suppressed relative to the control block for M100-Q (M = 18.56 %, SE = 6.15, p < .05). Note: M designates mean and SE designates standard error. Suppression (normalized difference from the control block) for M100-Q and M100-RMS responses in the training and test blocks is displayed in Figures 3a and 3b. M100-Q results (Figure 3a) demonstrate a delay-tuning pattern of decreasing suppression with increasing delay and document MIS development during the zero-delay block in the left hemisphere. M100-RMS results (Figure 3b) also reveal pronounced suppression at zero-delay in the left hemisphere. No significant differences were observed between the training and control block for both M100-Qs and M100-RMS, suggesting that at least one training block is necessary to develop MIS.
Although latencies were different between right and left hemispheres (F(1, 11) = 10.18, p < .002), no latency differences were observed across blocks.
It is unclear from the results of Experiment 1 whether MIS can be developed for non-zero delays. Here, we investigated the timecourse of MIS development for a 100 ms button-to-tone delay. We also sought to ascertain that MIS is “true” suppression, distinct from adaptation or habituation processes. Since control blocks were conducted preceding and succeeding training blocks, any differences between the control blocks would be considered generalized adaptation. Repeated measures ANOVA on M100-Q data revealed significant differences between the training and control blocks in the left (F(4, 20) = 5.503, p < .004) and right (F(4, 24) = 6.855, p < .001) hemispheres. For M100-RMS data, the assumption of sphericity was not met, so Huynh Feldt corrections were applied. Results show significant differences between training and control blocks in the left, [F(2.910, 17.461) = 7.110, p < .003, partial η2 = .542] and right, [F(1.7934, 10.759) = 5.953, p < .02, partial η2 = .498] hemispheres.
To assay “true” suppression, we corrected for adaptation by subtracting the difference between the first and last control blocks from the observed suppression to each block. Upon correcting for adaptation, training and control blocks no longer differed significantly in the right hemisphere, but still differed significantly in the left hemisphere both for M100-Q (F(5, 30) = 3.043, p < .03), as well as for M100-RMS (F(5, 30) = 4.611, p < .003). Post hoc testing with modified Bonferroni correction (Hochberg, 1988; Holland & Copenhaver, 1998; Holm, 1979; Hommel, 1988; Jaccard & Wan, 1996; Seaman, Levin, & Serlin, 1991) confirm training blocks to be significantly suppressed relative to the control block, as detailed in Figures 4a and 4b for M100-Q and M100-RMS, respectively. Results show that within four blocks of training with non-zero-delays, 34% MIS is observed in M100-Q and 31% MIS is observed in M100-RMS. We contrasted the difference between the two control blocks with the difference between the first control block and the last training block (where MIS is maximally developed) and found this difference to be significant in the left hemisphere both for M100-Q (t(6) = 4.54, p < .002) and M100-RMS (t(6) = 2.25, p < .04). We only observed 10% adaptation to M100-Q and 14% adaptation to M100-RMS, suggesting that adaptation can only account for a portion (less than 50%) of MIS. Interestingly, neither M100-Q nor M100-RMS results showed statistical difference between the control blocks and the first training block. This finding is in agreement with Experiment 1 that MIS requires at least one block of training to develop.
Having established that MIS develops with a non-zero delay between button and tone, in Experiment 3 we examined the specificity of non-zero delay MIS. The beginning of Experiment 3 was like Experiment 2: four training blocks with a 1 kHz tone coming 100ms after each button press. Then, in four test blocks, we varied the motor act (left instead of right thumb pushing the button in the motor block), sensory stimulus (a 0.5 kHz instead of 1 kHz tone in the sensory block) and tone delays (delay0s, delay0.2s), examining MIS in each of these blocks. Due to insufficient degrees of freedom (missing and unreliable data for individual blocks in three subjects), we were unable to test MIS specificity using an ANOVA model. We opted to test MIS (normalized suppression) for each block against a null hypothesis that suppression is zero, using one-tailed t-tests with modified Bonferroni corrections (Hochberg, 1988; Holland & Copenhaver, 1998; Holm, 1979; Hommel, 1988; Jaccard & Wan, 1996). Figures 5a and 5b show suppression in the training (averaged) and test blocks in Experiment 3. We compared the average of the training blocks with the control block and found the training blocks to be suppressed for M100-Q (t(8) = 3.44, p < .005) and M100-RMS (t(7) = 2.23, p < .04) in the left hemisphere, replicating our finding in Experiment 2 that MIS extends to non-zero delays. In four subjects, MIS was not observed after four blocks of training and data from these subjects was excluded from further analysis of MIS specificity – effectively bringing the number of subjects in Experiment 3 to nine.
M100-Q results (Figure 5a) detail MIS development for all test blocks (motor, sensory and delays) in the left and right hemisphere, suggesting that MIS generalizes with motor act across hemispheres, across sensory stimuli induced by the motor-act, and lacks delay specificity. RMS results (Figure 5b) support this finding in the left hemisphere: substantial MIS is observed for all test blocks.
The neural mechanisms underlying MIS were examined in three MEG experiments. In Experiment 1, we found that MIS develops for zero-delays but does not generalize to non-zero delays. In Experiment 2, we found that MIS developed for 100 ms delays within 300 trials and occurs in excess of auditory habituation. In Experiment 3, we found that unlike MIS for zero-delays, MIS for non-zero delays does not exhibit sensitivity to sensory, delay or motor-command changes. Results for each of these experiments are first discussed separately in relation to suppression to self-produced speech. Subsequently we discuss these results in the context of a general model of sensory motor control.
In Experiment 1, we found MIS development in the left hemisphere for zero-delay between button and tone (10.0% as measured by RMS and 18.6% as measured by dipole fit Q), which did not generalize to non-zero delays (Figure 3a and Figure 3b) and did not extend to the right hemisphere.
Our results in Experiment 1 are consistent with the Blakemore et al. (1999, 2000) somatosensory study, which reported minimum ticklishness at zero-delay. The Blakemore et al. (1999, 2000) study did not examine the laterality of this effect – they only looked at stimulation of the right hand – so, although our results showing MIS in the left hemisphere are consistent with their results of right hand self-stimulation, we are unsure whether our lack of MIS in the right hemisphere would correspond with any larger ticklishness seen in left hand stimulation. Without an analog of ticklishness in the auditory domain, it is challenging to fully compare our results with the Blakemore et al. (1999, 2000) study. Nevertheless, it is interesting to note that unlike our binary results in Experiment 1 – MIS for 0ms, no MIS for other delays – the Blakemore et al. study reported a graded increase in ticklishness with increasing delay. Another distinction between the Blakemore et al. study and ours, concerns the nature of action-consequence pairing: results of the Blakemore et al. study support a forward model that learns the correspondence between the subject triggered tickle-act (action) and tickle-sensation (consequence). This action-consequence pairing is more direct and natural relative to the pairing in our study – button press (action) and tone feedback (consequence), which is more indirect and unnatural.
In a related functional imaging study, Blakemore et al. (1998) investigated what brain areas were activated by self- or experimenter-produced tactile stimulation of the left hand. They found that self-produced stimulation creates bilateral suppression of activity in secondary somatosensory cortex, relative to the experimenter-produced stimulation. This bilateral suppression to left hand stimulation differs from the left-lateralized MIS we have observed in our own experiment.
A recent study by Martikainen et al. is more comparable to our study. Like our study, Martikainen et al. examined the auditory response to a tone generated by self-produced key presses. They report significant MIS in both left and right hemisphere of 24 +/− 7% and 18 +/− 4% respectively. While their MIS in Q for the left hemisphere is comparable to ours, their MIS in Q for the right hemisphere is noticeably greater than our measurement (9.5%). As in our study, subjects pressed the button with their right hand, and the authors did find greater MIS in the left hemisphere than the right, but they state that this difference is not significant. However, our findings still cannot be compared literally with this study for several reasons: They report their findings exclusively in terms of Q values of the M100, whereas we have also reported RMS values of the M100. Q values depend on congruence between source models and true underlying current sources – source models depend on arbitrary rules such as establishing thresholds to restrict included dipoles. Thus, we also report our results in terms of RMS – a non-model-dependent value derived directly from measurements.
Procedural differences further distinguish our study from the Martikainen et al. study: In the training block of our study, subjects pressed a button 100 times at an approximate rate of two seconds between each press, yielding about 200 seconds (3 min, 20 sec) of button-tone-association exposure time. Furthermore, we measured MIS not in the training block, but in a subsequent test block also consisting of 100 button presses at the same rate of once every two seconds, adding another 200 seconds of exposure time. In contrast, Martikainen et al. took their measurements in two sessions where subjects pressed the button 60 times at an approximate rate of once every five seconds, yielding about 720 seconds of exposure time. Thus, our experiment and Martikainen et al.’s differ in frequency (~0.5 Hz for our study, ~0.2 Hz for Martikainen et al.), number of button presses (200 for our study, 120 for Martikainen et al.), and total exposure time (400 seconds for our study, 720 seconds for Martikainen et al.). It is possible that total exposure time is more critical for MIS development than is the frequency of- or total number of button-tone associations experienced. This extra total time might also explain MIS development in the right hemisphere for Martikainen et al.’s study but not for ours.
A key element of our hypotheses is a learned internal model that accommodates sensory delays (Whitney & Murakami, 1998; Whitney, Murakami, & Cavanagh, 2000). We tested this in Experiment 2, where subject button presses did not immediately produce a tone – tones were delayed by 100 ms. We predicted that MIS would develop for this delayed tone as subjects practiced the button presses, and indeed this was the case: over successive 100-trial blocks, MIS developed significantly in both the left and right hemispheres. This finding, in contrast to Experiment 1 – where we only noted significant MIS development in the left hemisphere – raises questions about the role of delay in measured MIS. One possible explanation for this difference is that introducing a delay creates procedural differences between the experiments: during the zero-delay condition, MIS measurements might be contaminated by active current sources in the left motor cortex (and possibly left somatosensory cortex) related to the button press action; whereas during the 100 ms delay condition, these left hemisphere motor current sources might have dissipated.
Nevertheless, there are apparent hemispheric differences present in Experiment 2. MIS is consistently larger in the left hemisphere across training blocks. Furthermore, by the fourth block of training, the magnitude of MIS in the right hemisphere approaches an asymptote, while the left hemisphere appears to continue increasing, suggesting that MIS in the left hemisphere could continue to develop with more blocks of training. In fact, upon adjusting for general adaptation effects (by comparing the initial and final control sessions), we find that MIS is only significant in the left hemisphere, with adaptation accounting for a 14% change in RMS and a 10% change in Q during the experiment. Thus, MIS in the left hemisphere is 30.7% – 14% = 16.7% in terms of RMS, and 34.1% – 10% = 24.1% in terms of Q, which is still nearly double the MIS observed in the left hemisphere for the zero-delay condition in Experiment 1. Would the zero-delay MIS have been greater than 30% if training had continued for four blocks? This is possible, but it should also be noted that even by the second training block (i.e., the total exposure in Experiment 1), left hemisphere MIS for the 100 ms delay condition amounted to 21.6% – still more than double the MIS observed in Experiment 1. Further experimentation will be required to investigate this issue.
Also interesting is a comparison of the results in Experiment 2 with an earlier study of speaking-induced suppression of auditory cortex (SIS). In that study (Houde et al., 2002), the authors reported an SIS value similar to the left hemisphere MIS of Experiment 2: M100 response to self-produced speech was 30% less than the response to tape-playback of that speech in the left hemisphere and 15% less in the right hemisphere.
In Experiment 1 we demonstrated MIS at zero-delays, in Experiment 2 we confirmed MIS at non-zero delays; in Experiment 3, we found that MIS for nonzero delays does not exhibit sensitivity to sensory, delay or motor-command changes. For both RMS and Q, the trained MIS generalized to all other conditions: left hand, 500 Hz tone, zero-delay, and 200 ms delay. There was also no discernable pattern of generalization: the pattern of MIS across conditions for RMS is different from the pattern of MIS across conditions for Q.
Our results in Experiment 3 are not in complete agreement with our results in Experiment 1. In Experiment 1 we found that zero-delay MIS did not generalize to non-zero delays nor extend to the right hemisphere, whereas in Experiment 3, we found that non-zero delay MIS is non-specific and extends to the right hemisphere. This finding raises an intriguing question: Is the profile for MIS trained at zero-delay different from that of MIS trained at non-zero delay? In addressing this question, it is beneficial to recall that our hypothesis for MIS development relies on an internal forward model that is trained – at zero-delay in the case of Experiment 1 – to make predictions about sensory feedback. Our finding in all three experiments that the first training block(s) do not differ statistically from the control block lends credence to the notion that training plays a critical role in MIS development. A pursuant question is how much training is required? Unlike Experiment 2 and Experiment 3 where subjects were exposed to multiple blocks of training, subjects in Experiment 1 were exposed to a single training block – 100 trials to be exact. Perhaps, therefore, a lack of adequate training in Experiment 1 might account for why MIS does not generalize to nonzero delays nor extend to the right hemisphere. In fact, it may be that since zero-delay sensory feedback is encountered so frequently in everyday use, there may be an inherent higher threshold to be surpassed with training at zero-delay, and our results reflect a residual global habituation to sensory feedback at zero-delay. Future experiments would be required to examine this possibility.
Schafer and Marcus originally reported delay specificity in suppression to auditory vertex responses from self-stimulation (Schafer and Marcus 1973). In contrast to our findings, they reported a linear decrease in MIS as a function of delay with some residual suppression at 4-second delays, but their findings were based on a single subject. As such, it is challenging to directly contrast their study with ours and identify sources of discrepancy.
Overall, the results of this study confirm the basic results of Schafer and Marcus’s original study and Martikainen et al.’s follow-up study: that it is possible to observe suppression in the response of auditory cortex to tones triggered by a subject’s own button presses. However, this study also extends this basic result in several ways that advance our understanding of the relationship between sensory prediction and motor output.
First, a key difference between our study and that of Martikainen et al. is that responses over the entire exposure time contribute towards MIS in their study, while responses in our study are separated into a training block and a test block. This allowed us examine the effect of learning, which the Martikainen et al. study did not address. Owing to this difference, we have a different interpretation of why MIS arises. Martikainen et al. report that their results “support the existence of a forward model that predicts the auditory consequences of the subject’s own motor acts on the environment – even with a tool – and thereby enables discrimination between self-produced and external sounds”. However, they do not address the development of this forward model. Since we do not see an immediate MIS effect (no significant MIS in the training block), but do see MIS in the subsequent test block, our results suggest a different hypothesis: that MIS is not an intrinsic property of motor-generated sensations, but instead develops when an internal model is trained to predict those sensations.
Second, our study also examined the effects of introducing delay between motor output (the button press) and sensory consequences (the tone). In Experiment 2, we showed that MIS will still develop if there is a 100ms delay between button press and tone, while in Experiment 3 we examined how this 100ms delayed MIS generalizes to other tones, hands, and delays, and here we found interesting differences between 100ms delayed MIS and the zero-delay MIS examined in Experiment 1. On one hand, in Experiment 1, we did not investigate whether zero-delay MIS generalizes to different tones and hands, hence it is possible that the generalization of MIS to the 100 ms delayed tone, different tones, and hands is in fact characteristic of MIS in general. Further experimentation is necessary to investigate this possibility. On the other hand, the generalization of 100 ms delay trained MIS to other delays does conflict with the lack of generalization observed in Experiment 1 for MIS to the zero-delay tone. So, how do we explain the difference in generalization pattern between zero and non-zero delays? One account for this difference starts from considering that the zero-delay MIS case may be special, in that adaptation to zero-delay sensory feedback (i.e., only internal sensory delays) is over-learned since this is continually encountered in everyday life. It is also reasonable to assume that, taken to extremes, there must be some specificity of MIS in the learned 100 ms delay case. For instance, if we test for MIS with delay = 2 sec, we would presumably not have MIS (since this is effectively equivalent to the Tone Alone condition). Thus, it may be that there is just a low accuracy of the learned non-zero delays model, where this accuracy (i.e. timing specificity of the MIS) improves with extensive timing training, as is seen in the zero (everyday life) delay case. More generally, it may be that the sensory timing model is separately learned from the sensory type/quality model, and maybe the timing model takes longer to learn. That is to say, perhaps when you expect something (anything) to happen as a result of your action, is a learned expectation process that is separate from the expectation of what will happen because of your action. Alternatively, it is possible that MIS that develops for non-zero delays merely reflects a generalized sensory expectation effect due to foreknowledge of the occurrence of the incoming sensory stimulus. Although foreknowledge has been shown to reduce sensory responses in a non-specific manner (Begleiter, Porjesz, Yerre, & Kissin, 1973; Ritter, Vaughan, & Costa, 1968; Sutton, Braren, Zubin, & John, 1965; Sutton, Tueting, Zubin, & John, 1967), few studies have examined the brain areas subserving generalized expectation induced suppression. Further experimentation is necessary to investigate the relationship between MIS and generalized expectation induced suppression together with associated neural substrates.
On a final note, we believe our study also has potential implications for cross-modal interactions between brain systems and plasticity of these interactions. Cumulatively, our experiments suggest that through a coupling between motor and auditory systems, an internal forward model can be recruited within a reasonable timespan and can adapt to systemic perturbations (delays, frequency-shifts, alteration in motor-act). While these results were derived from motor-sensory interactions, it is reasonable to suppose that these results generalize to other brain systems. An area of interest and potential impact is in persons who have lost the use of some sensory modality. Our results hold promise for recruiting compensatory interactions between other brain systems and plasticity of such interactions. Future experiments will be needed to explore these possibilities.