In our daily lives, auditory stream segregation allows us to differentiate concurrent sound sources and to make sense of the scene we are experiencing. However, a combination of segregation and the concurrent integration of auditory streams is necessary in order to analyze the relationship between streams and thus perceive a coherent auditory scene. The present functional magnetic resonance imaging study investigates the relative role and neural underpinnings of these listening strategies in multi-part musical stimuli. We compare a real human performance of a piano duet and a synthetic stimulus of the same duet in a prioritized integrative attention paradigm that required the simultaneous segregation and integration of auditory streams. In so doing, we manipulate the degree to which the attended part of the duet led either structurally (attend melody vs. attend accompaniment) or temporally (asynchronies vs. no asynchronies between parts), and thus the relative contributions of integration and segregation used to make an assessment of the leader-follower relationship. We show that perceptually the relationship between parts is biased towards the conventional structural hierarchy in western music in which the melody generally dominates (leads) the accompaniment. Moreover, the assessment varies as a function of both cognitive load, as shown through difficulty ratings and the interaction of the temporal and the structural relationship factors. Neurally, we see that the temporal relationship between parts, as one important cue for stream segregation, revealed distinct neural activity in the planum temporale. By contrast, integration used when listening to both the temporally separated performance stimulus and the temporally fused synthetic stimulus resulted in activation of the intraparietal sulcus. These results support the hypothesis that the planum temporale and IPS are key structures underlying the mechanisms of segregation and integration of auditory streams, respectively.
Perception of our environment is a multisensory experience; information from different sensory systems like the auditory, visual and tactile is constantly integrated. Complex tasks that require high temporal and spatial precision of multisensory integration put strong demands on the underlying networks but it is largely unknown how task experience shapes multisensory processing. Long-term musical training is an excellent model for brain plasticity because it shapes the human brain at functional and structural levels, affecting a network of brain areas. In the present study we used magnetoencephalography (MEG) to investigate how audio-tactile perception is integrated in the human brain and if musicians show enhancement of the corresponding activation compared to non-musicians. Using a paradigm that allowed the investigation of combined and separate auditory and tactile processing, we found a multisensory incongruency response, generated in frontal, cingulate and cerebellar regions, an auditory mismatch response generated mainly in the auditory cortex and a tactile mismatch response generated in frontal and cerebellar regions. The influence of musical training was seen in the audio-tactile as well as in the auditory condition, indicating enhanced higher-order processing in musicians, while the sources of the tactile MMN were not influenced by long-term musical training. Consistent with the predictive coding model, more basic, bottom-up sensory processing was relatively stable and less affected by expertise, whereas areas for top-down models of multisensory expectancies were modulated by training.
Aging is often accompanied by hearing loss, which impacts how sounds are processed and represented along the ascending auditory pathways and within the auditory cortices. Here, we assess the impact of mild binaural hearing loss on the older adults’ ability to both process complex sounds embedded in noise and to segregate a mistuned harmonic in an otherwise periodic stimulus. We measured auditory evoked fields (AEFs) using magnetoencephalography while participants were presented with complex tones that had either all harmonics in tune or had the third harmonic mistuned by 4 or 16% of its original value. The tones (75 dB sound pressure level, SPL) were presented without, with low (45 dBA SPL), or with moderate (65 dBA SPL) Gaussian noise. For each participant, we modeled the AEFs with a pair of dipoles in the superior temporal plane. We then examined the effects of hearing loss and noise on the amplitude and latency of the resulting source waveforms. In the present study, results revealed that similar noise-induced increases in N1m were present in older adults with and without hearing loss. Our results also showed that the P1m amplitude was larger in the hearing impaired than in the normal-hearing adults. In addition, the object-related negativity (ORN) elicited by the mistuned harmonic was larger in hearing impaired listeners. The enhanced P1m and ORN amplitude in the hearing impaired older adults suggests that hearing loss increased neural excitability in auditory cortices, which could be related to deficits in inhibitory control.
aging; MEG; hearing loss; auditory cortex; inhibition (psychology)
In adults and older children, evidence consistent with relative separation between selective and sustained attention, superimposed upon generally positive inter-test correlations, has been reported. Here we examine whether this pattern is detectable in 5-year-old children from the healthy population. A new test battery (TEA-ChJ) was adapted from measures previously used with adults and older children and administered to 172 5-year-olds. Test-retest reliability was assessed in 60 children. Ninety-eight percent of the children managed to complete all measures. Discrimination of visual and auditory stimuli were good. In a factor analysis, the two TEA-ChJ selective attention tasks (one visual, one auditory) loaded onto a common factor and diverged from the two sustained attention tasks (one auditory, one motor), which shared a common loading on the second factor. This pattern, which suggests that the tests are indeed sensitive to underlying attentional capacities, was supported by the relationships between the TEA-ChJ factors and Test of Everyday Attention for Children subtests in the older children in the sample. It is possible to gain convincing performance-based estimates of attention at the age of 5 with the results reflecting a similar factor structure to that obtained in older children and adults. The results are discussed in light of contemporary models of attention function. Given the potential advantages of early intervention for attention difficulties, the findings are of clinical as well as theoretical interest.
Speech processing inherently relies on the perception of specific, rapidly changing spectral and temporal acoustic features. Advanced acoustic perception is also integral to musical expertise, and accordingly several studies have demonstrated a significant relationship between musical training and superior processing of various aspects of speech. Speech and music appear to overlap in spectral and temporal features; however, it remains unclear which of these acoustic features, crucial for speech processing, are most closely associated with musical training. The present study examined the perceptual acuity of musicians to the acoustic components of speech necessary for intra-phonemic discrimination of synthetic syllables. We compared musicians and non-musicians on discrimination thresholds of three synthetic speech syllable continua that varied in their spectral and temporal discrimination demands, specifically voice onset time (VOT) and amplitude envelope cues in the temporal domain. Musicians demonstrated superior discrimination only for syllables that required resolution of temporal cues. Furthermore, performance on the temporal syllable continua positively correlated with the length and intensity of musical training. These findings support one potential mechanism by which musical training may selectively enhance speech perception, namely by reinforcing temporal acuity and/or perception of amplitude rise time, and implications for the translation of musical training to long-term linguistic abilities.
When stimuli are presented over headphones, they are typically perceived as internalized; i.e., they appear to emanate from inside the head. Sounds presented in the free-field tend to be externalized, i.e., perceived to be emanating from a source in the world. This phenomenon is frequently attributed to reverberation and to the spectral characteristics of the sounds: those sounds whose spectrum and reverberation matches that of free-field signals arriving at the ear canal tend to be more frequently externalized. Another factor, however, is that the virtual location of signals presented over headphones moves in perfect concert with any movements of the head, whereas the location of free-field signals moves in opposition to head movements. The effects of head movement have not been systematically disentangled from reverberation and/or spectral cues, so we measured the degree to which movements contribute to externalization.
We performed two experiments: 1) Using motion tracking and free-field loudspeaker presentation, we presented signals that moved in their spatial location to match listeners’ head movements. 2) Using motion tracking and binaural room impulse responses, we presented filtered signals over headphones that appeared to remain static relative to the world. The results from experiment 1 showed that free-field signals from the front that move with the head are less likely to be externalized (23%) than those that remain fixed (63%). Experiment 2 showed that virtual signals whose position was fixed relative to the world are more likely to be externalized (65%) than those fixed relative to the head (20%), regardless of the fidelity of the individual impulse responses.
Head movements play a significant role in the externalization of sound sources. These findings imply tight integration between binaural cues and self motion cues and underscore the importance of self motion for spatial auditory perception.
(1) To report the speech perception and intelligibility results of Mandarin-speaking patients with large vestibular aqueduct syndrome (LVAS) after cochlear implantation (CI); (2) to compare their performance with a group of CI users without LVAS; (3) to understand the effects of age at implantation and duration of implant use on the CI outcomes. The obtained data may be used to guide decisions about CI candidacy and surgical timing.
Forty-two patients with LVAS participating in this study were divided into two groups: the early group received CI before 5 years of age and the late group after 5. Open-set speech perception tests (on Mandarin tones, words and sentences) were administered one year after implantation and at the most recent follow-up visit. Categories of auditory perception (CAP) and Speech Intelligibility Rating (SIR) scale scores were also obtained.
The patients with LVAS with more than 5 years of implant use (18 cases) achieved a mean score higher than 80% on the most recent speech perception tests and reached the highest level on the CAP/SIR scales. The early group developed speech perception and intelligibility steadily over time, while the late group had a rapid improvement during the first year after implantation. The two groups, regardless of their age at implantation, reached a similar performance level at the most recent follow-up visit.
High levels of speech performance are reached after 5 years of implant use in patients with LVAS. These patients do not necessarily need to wait until their hearing thresholds are higher than 90 dB HL or PB word score lower than 40% to receive CI. They can do it “earlier” when their speech perception and/or speech intelligibility do not reach the performance level suggested in this study.
To study how auditory cortical processing is affected by anticipating and hearing of long emotional sounds, we recorded auditory evoked magnetic fields with a whole-scalp MEG device from 15 healthy adults who were listening to emotional or neutral sounds. Pleasant, unpleasant, or neutral sounds, each lasting for 6 s, were played in a random order, preceded by 100-ms cue tones (0.5, 1, or 2 kHz) 2 s before the onset of the sound. The cue tones, indicating the valence of the upcoming emotional sounds, evoked typical transient N100m responses in the auditory cortex. During the rest of the anticipation period (until the beginning of the emotional sound), auditory cortices of both hemispheres generated slow shifts of the same polarity as N100m. During anticipation, the relative strengths of the auditory-cortex signals depended on the upcoming sound: towards the end of the anticipation period the activity became stronger when the subject was anticipating emotional rather than neutral sounds. During the actual emotional and neutral sounds, sustained fields were predominant in the left hemisphere for all sounds. The measured DC MEG signals during both anticipation and hearing of emotional sounds implied that following the cue that indicates the valence of the upcoming sound, the auditory-cortex activity is modulated by the upcoming sound category during the anticipation period.
We re-examined a modified emotional Stroop task that included an additional colour-word alongside the emotional word, providing the response conflict of the traditional Stroop task. Negative emotionally salient (i.e. unpleasant’) words are claimed to capture attention, producing a smaller Stroop effect for negative words compared to neutral words; this phenomenon is called the emotional dilution of the Stroop effect. To address previous limitations, this study compared negative words with lexically matched neutral words in a powered sample of 45 participants. Results demonstrated an emotional Stroop effect (slower colour-naming responses for negative words) and a traditional Stroop effect but not an emotional dilution of the Stroop effect. This finding is at odds with claims that other processing resources are diminished through the failure to disengage attention from emotional information. No matter how attention towards emotional information builds up over time, our findings indicate that attentional resources are not fully captured by negative words.
The processing of notes and chords which are harmonically incongruous with their context has been shown to elicit two distinct late ERP effects. These effects strongly resemble two effects associated with the processing of linguistic incongruities: a P600, resembling a typical response to syntactic incongruities in language, and an N500, evocative of the N400, which is typically elicited in response to semantic incongruities in language. Despite the robustness of these two patterns in the musical incongruity literature, no consensus has yet been reached as to the reasons for the existence of two distinct responses to harmonic incongruities. This study was the first to use behavioural and ERP data to test two possible explanations for the existence of these two patterns: the musicianship of listeners, and the resolved or unresolved nature of the harmonic incongruities. Results showed that harmonically incongruous notes and chords elicited a late positivity similar to the P600 when they were embedded within sequences which started and ended in the same key (harmonically resolved). The notes and chords which indicated that there would be no return to the original key (leaving the piece harmonically unresolved) were associated with a further P600 in musicians, but with a negativity resembling the N500 in non-musicians. We suggest that the late positivity reflects the conscious perception of a specific element as being incongruous with its context and the efforts of musicians to integrate the harmonic incongruity into its local context as a result of their analytic listening style, while the late negativity reflects the detection of the absence of resolution in non-musicians as a result of their holistic listening style.
It sometimes happens that when someone asks a question, the addressee does not give an adequate answer, for instance by leaving out part of the required information. The person who posed the question may wonder why the information was omitted, and engage in extensive processing to find out what the partial answer actually means. The present study looks at the neural correlates of the pragmatic processes invoked by partial answers to questions. Two experiments are presented in which participants read mini-dialogues while their Event-Related brain Potentials (ERPs) are being measured. In both experiments, violating the dependency between questions and answers was found to lead to an increase in the amplitude of the P600 component. We interpret these P600-effects as reflecting the increased effort in creating a coherent representation of what is communicated. This effortful processing might include the computation of what the dialogue participant meant to communicate by withholding information. Our study is one of few investigating language processing in conversation, be it that our participants were ‘eavesdroppers’ instead of real interactants. Our results contribute to the as of yet small range of pragmatic phenomena that modulate the processes underlying the P600 component, and suggest that people immediately attempt to regain cohesion if a question-answer dependency is violated in an ongoing conversation.
Learning the functional properties of objects is a core mechanism in the development of conceptual, cognitive and linguistic knowledge in children. The cerebral processes underlying these learning mechanisms remain unclear in adults and unexplored in children. Here, we investigated the neurophysiological patterns underpinning the learning of functions for novel objects in 10-year-old healthy children. Event-related fields (ERFs) were recorded using magnetoencephalography (MEG) during a picture-definition task. Two MEG sessions were administered, separated by a behavioral verbal learning session during which children learned short definitions about the “magical” function of 50 unknown non-objects. Additionally, 50 familiar real objects and 50 other unknown non-objects for which no functions were taught were presented at both MEG sessions. Children learned at least 75% of the 50 proposed definitions in less than one hour, illustrating children's powerful ability to rapidly map new functional meanings to novel objects. Pre- and post-learning ERFs differences were analyzed first in sensor then in source space. Results in sensor space disclosed a learning-dependent modulation of ERFs for newly learned non-objects, developing 500–800 msec after stimulus onset. Analyses in the source space windowed over this late temporal component of interest disclosed underlying activity in right parietal, bilateral orbito-frontal and right temporal regions. Altogether, our results suggest that learning-related evolution in late ERF components over those regions may support the challenging task of rapidly creating new semantic representations supporting the processing of the meaning and functions of novel objects in children.
Much of what we know regarding the effect of stimulus repetition on neuroelectric adaptation comes from studies using artificially produced pure tones or harmonic complex sounds. Little is known about the neural processes associated with the representation of everyday sounds and how these may be affected by aging. In this study, we used real life, meaningful sounds presented at various azimuth positions and found that auditory evoked responses peaking at about 100 and 180 ms after sound onset decreased in amplitude with stimulus repetition. This neural adaptation was greater in young than in older adults and was more pronounced when the same sound was repeated at the same location. Moreover, the P2 waves showed differential patterns of domain-specific adaptation when location and identity was repeated among young adults. Background noise decreased ERP amplitudes and modulated the magnitude of repetition effects on both the N1 and P2 amplitude, and the effects were comparable in young and older adults. These findings reveal an age-related difference in the neural processes associated with adaptation to meaningful sounds, which may relate to older adults’ difficulty in ignoring task-irrelevant stimuli.
The audibility of a target tone in a multitone background masker is enhanced by the presentation of a precursor sound consisting of the masker alone. There is evidence that precursor-induced neural adaptation plays a role in this perceptual enhancement. However, the precursor may also be strategically used by listeners as a spectral template of the following masker to better segregate it from the target. In the present study, we tested this hypothesis by measuring the audibility of a target tone in a multitone masker after the presentation of precursors which, in some conditions, were made dissimilar to the masker by gating their components asynchronously. The precursor and the following sound were presented either to the same ear or to opposite ears. In either case, we found no significant difference in the amount of enhancement produced by synchronous and asynchronous precursors. In a second experiment, listeners had to judge whether a synchronous multitone complex contained exactly the same tones as a preceding precursor complex or had one tone less. In this experiment, listeners performed significantly better with synchronous than with asynchronous precursors, showing that asynchronous precursors were poorer perceptual templates of the synchronous multitone complexes. Overall, our findings indicate that precursor-induced auditory enhancement cannot be fully explained by the strategic use of the precursor as a template of the following masker. Our results are consistent with an explanation of enhancement based on selective neural adaptation taking place at a central locus of the auditory system.
Humans can anticipate and prepare for uncertainties to achieve a goal. However, it is difficult to maintain this effort over a prolonged period of time. Inappropriate behavior is impulsively (or mindlessly) activated by an external trigger, which can result in serious consequences such as traffic crashes. Thus, we examined the neural mechanisms underlying such impulsive responding using functional magnetic resonance imaging (fMRI). Twenty-two participants performed a block-designed sustained attention to response task (SART), where each task block was composed of consecutive Go trials followed by a NoGo trial at the end. This task configuration enabled us to measure compromised preparation for NoGo trials during Go responses using reduced Go reaction times. Accordingly, parametric modulation analysis was conducted on fMRI data using block-based mean Go reaction times as an online marker of impulsive responding in the SART. We found that activity in the right dorsolateral prefrontal cortex (DLPFC) and the bilateral intraparietal sulcus (IPS) was positively modulated with mean Go reaction times. In addition, activity in the medial prefrontal cortex (MPFC) and the posterior cingulate cortex (PCC) was negatively modulated with mean Go reaction times, albeit statistically weakly. Taken together, spontaneously reduced activity in the right DLPFC and the IPS and spontaneously elevated activity in the MPFC and the PCC were associated with impulsive responding in the SART. These results suggest that such a spontaneous transition of brain activity pattern results in impulsive responding in monotonous situations, which in turn, might cause human errors in actual work environments.
Research on the neural basis of speech-reading implicates a network of auditory language regions involving inferior frontal cortex, premotor cortex and sites along superior temporal cortex. In audiovisual speech studies, neural activity is consistently reported in posterior superior temporal Sulcus (pSTS) and this site has been implicated in multimodal integration. Traditionally, multisensory interactions are considered high-level processing that engages heteromodal association cortices (such as STS). Recent work, however, challenges this notion and suggests that multisensory interactions may occur in low-level unimodal sensory cortices. While previous audiovisual speech studies demonstrate that high-level multisensory interactions occur in pSTS, what remains unclear is how early in the processing hierarchy these multisensory interactions may occur. The goal of the present fMRI experiment is to investigate how visual speech can influence activity in auditory cortex above and beyond its response to auditory speech. In an audiovisual speech experiment, subjects were presented with auditory speech with and without congruent visual input. Holding the auditory stimulus constant across the experiment, we investigated how the addition of visual speech influences activity in auditory cortex. We demonstrate that congruent visual speech increases the activity in auditory cortex.
Earlier studies have shown considerable intersubject synchronization of brain activity when subjects watch the same movie or listen to the same story. Here we investigated the across-subjects similarity of brain responses to speech and non-speech sounds in a continuous audio drama designed for blind people. Thirteen healthy adults listened for ∼19 min to the audio drama while their brain activity was measured with 3 T functional magnetic resonance imaging (fMRI). An intersubject-correlation (ISC) map, computed across the whole experiment to assess the stimulus-driven extrinsic brain network, indicated statistically significant ISC in temporal, frontal and parietal cortices, cingulate cortex, and amygdala. Group-level independent component (IC) analysis was used to parcel out the brain signals into functionally coupled networks, and the dependence of the ICs on external stimuli was tested by comparing them with the ISC map. This procedure revealed four extrinsic ICs of which two–covering non-overlapping areas of the auditory cortex–were modulated by both speech and non-speech sounds. The two other extrinsic ICs, one left-hemisphere-lateralized and the other right-hemisphere-lateralized, were speech-related and comprised the superior and middle temporal gyri, temporal poles, and the left angular and inferior orbital gyri. In areas of low ISC four ICs that were defined intrinsic fluctuated similarly as the time-courses of either the speech-sound-related or all-sounds-related extrinsic ICs. These ICs included the superior temporal gyrus, the anterior insula, and the frontal, parietal and midline occipital cortices. Taken together, substantial intersubject synchronization of cortical activity was observed in subjects listening to an audio drama, with results suggesting that speech is processed in two separate networks, one dedicated to the processing of speech sounds and the other to both speech and non-speech sounds.
Voice, as a secondary sexual characteristic, is known to affect the perceived attractiveness of human individuals. But the underlying mechanism of vocal attractiveness has remained unclear. Here, we presented human listeners with acoustically altered natural sentences and fully synthetic sentences with systematically manipulated pitch, formants and voice quality based on a principle of body size projection reported for animal calls and emotional human vocal expressions. The results show that male listeners preferred a female voice that signals a small body size, with relatively high pitch, wide formant dispersion and breathy voice, while female listeners preferred a male voice that signals a large body size with low pitch and narrow formant dispersion. Interestingly, however, male vocal attractiveness was also enhanced by breathiness, which presumably softened the aggressiveness associated with a large body size. These results, together with the additional finding that the same vocal dimensions also affect emotion judgment, indicate that humans still employ a vocal interaction strategy used in animal calls despite the development of complex language.
Behavioral studies of spoken word memory have shown that context congruency facilitates both word and source recognition, though the level at which context exerts its influence remains equivocal. We measured event-related potentials (ERPs) while participants performed both types of recognition task with words spoken in four voices. Two voice parameters (i.e., gender and accent) varied between speakers, with the possibility that none, one or two of these parameters was congruent between study and test. Results indicated that reinstating the study voice at test facilitated both word and source recognition, compared to similar or no context congruency at test. Behavioral effects were paralleled by two ERP modulations. First, in the word recognition test, the left parietal old/new effect showed a positive deflection reflective of context congruency between study and test words. Namely, the same speaker condition provided the most positive deflection of all correctly identified old words. In the source recognition test, a right frontal positivity was found for the same speaker condition compared to the different speaker conditions, regardless of response success. Taken together, the results of this study suggest that the benefit of context congruency is reflected behaviorally and in ERP modulations traditionally associated with recognition memory.
In Japanese, vowel duration can distinguish the meaning of words. In order for infants to learn this phonemic contrast using simple distributional analyses, there should be reliable differences in the duration of short and long vowels, and the frequency distribution of vowels must make these differences salient enough in the input. In this study, we evaluate these requirements of phonemic learning by analyzing the duration of vowels from over 11 hours of Japanese infant-directed speech. We found that long vowels are substantially longer than short vowels in the input directed to infants, for each of the five oral vowels. However, we also found that learning phonemic length from the overall distribution of vowel duration is not going to be easy for a simple distributional learner, because of the large base-rate effect (i.e., 94% of vowels are short), and because of the many factors that influence vowel duration (e.g., intonational phrase boundaries, word boundaries, and vowel height). Therefore, a successful learner would need to take into account additional factors such as prosodic and lexical cues in order to discover that duration can contrast the meaning of words in Japanese. These findings highlight the importance of taking into account the naturalistic distributions of lexicons and acoustic cues when modeling early phonemic learning.
The present study evaluated the relation between speech perception in the presence of background noise and temporal processing ability in listeners with Auditory Neuropathy (AN).
The study included two experiments. In the first experiment, temporal resolution of listeners with normal hearing and those with AN was evaluated using measures of temporal modulation transfer function and frequency modulation detection at modulation rates of 2 and 10 Hz. In the second experiment, speech perception in quiet and noise was evaluated at three signal to noise ratios (SNR) (0, 5, and 10 dB).
Results demonstrated that listeners with AN performed significantly poorer than normal hearing listeners in both amplitude modulation and frequency modulation detection, indicating significant impairment in extracting envelope as well as fine structure cues from the signal. Furthermore, there was significant correlation seen between measures of temporal resolution and speech perception in noise.
Results suggested that an impaired ability to efficiently process envelope and fine structure cues of the speech signal may be the cause of the extreme difficulties faced during speech perception in noise by listeners with AN.
The organization of sound into meaningful units is fundamental to the processing of auditory information such as speech and music. In expressive music performance, structural units or phrases may become particularly distinguishable through subtle timing variations highlighting musical phrase boundaries. As such, expressive timing may support the successful parsing of otherwise continuous musical material. By means of the event-related potential technique (ERP), we investigated whether expressive timing modulates the neural processing of musical phrases. Musicians and laymen listened to short atonal scale-like melodies that were presented either isochronously (deadpan) or with expressive timing cues emphasizing the melodies’ two-phrase structure. Melodies were presented in an active and a passive condition. Expressive timing facilitated the processing of phrase boundaries as indicated by decreased N2b amplitude and enhanced P3a amplitude for target phrase boundaries and larger P2 amplitude for non-target boundaries. When timing cues were lacking, task demands increased especially for laymen as reflected by reduced P3a amplitude. In line, the N2b occurred earlier for musicians in both conditions indicating general faster target detection compared to laymen. Importantly, the elicitation of a P3a-like response to phrase boundaries marked by a pitch leap during passive exposure suggests that expressive timing information is automatically encoded and may lead to an involuntary allocation of attention towards significant events within a melody. We conclude that subtle timing variations in music performance prepare the listener for musical key events by directing and guiding attention towards their occurrences. That is, expressive timing facilitates the structuring and parsing of continuous musical material even when the auditory input is unattended.
This study investigated a theoretically challenging dissociation between good production and poor perception of tones among neurologically unimpaired native speakers of Cantonese. The dissociation is referred to as the near-merger phenomenon in sociolinguistic studies of sound change. In a passive oddball paradigm, lexical and nonlexical syllables of the T1/T6 and T4/T6 contrasts were presented to elicit the mismatch negativity (MMN) and P3a from two groups of participants, those who could produce and distinguish all tones in the language (Control) and those who could produce all tones but specifically failed to distinguish between T4 and T6 in perception (Dissociation). The presence of MMN to T1/T6 and null response to T4/T6 of lexical syllables in the dissociation group confirmed the near-merger phenomenon. The observation that the control participants exhibited a statistically reliable MMN to lexical syllables of T1/T6, weaker responses to nonlexical syllables of T1/T6 and lexical syllables of T4/T6, and finally null response to nonlexical syllables of T4/T6, suggests the involvement of top-down processing in speech perception. Furthermore, the stronger P3a response of the control group, compared with the dissociation group in the same experimental conditions, may be taken to indicate higher cognitive capability in attention switching, auditory attention or memory in the control participants. This cognitive difference, together with our speculation that constant top-down predictions without complete bottom-up analysis of acoustic signals in speech recognition may reduce one’s sensitivity to small acoustic contrasts, account for the occurrence of dissociation in some individuals but not others.
Computational and experimental research has revealed that auditory sensory predictions are derived from regularities of the current environment by using internal generative models. However, so far, what has not been addressed is how the auditory system handles situations giving rise to redundant or even contradictory predictions derived from different sources of information. To this end, we measured error signals in the event-related brain potentials (ERPs) in response to violations of auditory predictions. Sounds could be predicted on the basis of overall probability, i.e., one sound was presented frequently and another sound rarely. Furthermore, each sound was predicted by an informative visual cue. Participants’ task was to use the cue and to discriminate the two sounds as fast as possible. Violations of the probability based prediction (i.e., a rare sound) as well as violations of the visual-auditory prediction (i.e., an incongruent sound) elicited error signals in the ERPs (Mismatch Negativity [MMN] and Incongruency Response [IR]). Particular error signals were observed even in case the overall probability and the visual symbol predicted different sounds. That is, the auditory system concurrently maintains and tests contradictory predictions. Moreover, if the same sound was predicted, we observed an additive error signal (scalp potential and primary current density) equaling the sum of the specific error signals. Thus, the auditory system maintains and tolerates functionally independently represented redundant and contradictory predictions. We argue that the auditory system exploits all currently active regularities in order to optimally prepare for future events.
Songbirds are one of the few groups of animals that learn the sounds used for vocal communication during development. Like humans, songbirds memorize vocal sounds based on auditory experience with vocalizations of adult “tutors”, and then use auditory feedback of self-produced vocalizations to gradually match their motor output to the memory of tutor sounds. In humans, investigations of early vocal learning have focused mainly on perceptual skills of infants, whereas studies of songbirds have focused on measures of vocal production. In order to fully exploit songbirds as a model for human speech, understand the neural basis of learned vocal behavior, and investigate links between vocal perception and production, studies of songbirds must examine both behavioral measures of perception and neural measures of discrimination during development. Here we used behavioral and electrophysiological assays of the ability of songbirds to distinguish vocal calls of varying frequencies at different stages of vocal learning. The results show that neural tuning in auditory cortex mirrors behavioral improvements in the ability to make perceptual distinctions of vocal calls as birds are engaged in vocal learning. Thus, separate measures of neural discrimination and behavioral perception yielded highly similar trends during the course of vocal development. The timing of this improvement in the ability to distinguish vocal sounds correlates with our previous work showing substantial refinement of axonal connectivity in cortico-basal ganglia pathways necessary for vocal learning.