In Japanese, vowel duration can distinguish the meaning of words. In order for infants to learn this phonemic contrast using simple distributional analyses, there should be reliable differences in the duration of short and long vowels, and the frequency distribution of vowels must make these differences salient enough in the input. In this study, we evaluate these requirements of phonemic learning by analyzing the duration of vowels from over 11 hours of Japanese infant-directed speech. We found that long vowels are substantially longer than short vowels in the input directed to infants, for each of the five oral vowels. However, we also found that learning phonemic length from the overall distribution of vowel duration is not going to be easy for a simple distributional learner, because of the large base-rate effect (i.e., 94% of vowels are short), and because of the many factors that influence vowel duration (e.g., intonational phrase boundaries, word boundaries, and vowel height). Therefore, a successful learner would need to take into account additional factors such as prosodic and lexical cues in order to discover that duration can contrast the meaning of words in Japanese. These findings highlight the importance of taking into account the naturalistic distributions of lexicons and acoustic cues when modeling early phonemic learning.
The present study evaluated the relation between speech perception in the presence of background noise and temporal processing ability in listeners with Auditory Neuropathy (AN).
The study included two experiments. In the first experiment, temporal resolution of listeners with normal hearing and those with AN was evaluated using measures of temporal modulation transfer function and frequency modulation detection at modulation rates of 2 and 10 Hz. In the second experiment, speech perception in quiet and noise was evaluated at three signal to noise ratios (SNR) (0, 5, and 10 dB).
Results demonstrated that listeners with AN performed significantly poorer than normal hearing listeners in both amplitude modulation and frequency modulation detection, indicating significant impairment in extracting envelope as well as fine structure cues from the signal. Furthermore, there was significant correlation seen between measures of temporal resolution and speech perception in noise.
Results suggested that an impaired ability to efficiently process envelope and fine structure cues of the speech signal may be the cause of the extreme difficulties faced during speech perception in noise by listeners with AN.
The organization of sound into meaningful units is fundamental to the processing of auditory information such as speech and music. In expressive music performance, structural units or phrases may become particularly distinguishable through subtle timing variations highlighting musical phrase boundaries. As such, expressive timing may support the successful parsing of otherwise continuous musical material. By means of the event-related potential technique (ERP), we investigated whether expressive timing modulates the neural processing of musical phrases. Musicians and laymen listened to short atonal scale-like melodies that were presented either isochronously (deadpan) or with expressive timing cues emphasizing the melodies’ two-phrase structure. Melodies were presented in an active and a passive condition. Expressive timing facilitated the processing of phrase boundaries as indicated by decreased N2b amplitude and enhanced P3a amplitude for target phrase boundaries and larger P2 amplitude for non-target boundaries. When timing cues were lacking, task demands increased especially for laymen as reflected by reduced P3a amplitude. In line, the N2b occurred earlier for musicians in both conditions indicating general faster target detection compared to laymen. Importantly, the elicitation of a P3a-like response to phrase boundaries marked by a pitch leap during passive exposure suggests that expressive timing information is automatically encoded and may lead to an involuntary allocation of attention towards significant events within a melody. We conclude that subtle timing variations in music performance prepare the listener for musical key events by directing and guiding attention towards their occurrences. That is, expressive timing facilitates the structuring and parsing of continuous musical material even when the auditory input is unattended.
This study investigated a theoretically challenging dissociation between good production and poor perception of tones among neurologically unimpaired native speakers of Cantonese. The dissociation is referred to as the near-merger phenomenon in sociolinguistic studies of sound change. In a passive oddball paradigm, lexical and nonlexical syllables of the T1/T6 and T4/T6 contrasts were presented to elicit the mismatch negativity (MMN) and P3a from two groups of participants, those who could produce and distinguish all tones in the language (Control) and those who could produce all tones but specifically failed to distinguish between T4 and T6 in perception (Dissociation). The presence of MMN to T1/T6 and null response to T4/T6 of lexical syllables in the dissociation group confirmed the near-merger phenomenon. The observation that the control participants exhibited a statistically reliable MMN to lexical syllables of T1/T6, weaker responses to nonlexical syllables of T1/T6 and lexical syllables of T4/T6, and finally null response to nonlexical syllables of T4/T6, suggests the involvement of top-down processing in speech perception. Furthermore, the stronger P3a response of the control group, compared with the dissociation group in the same experimental conditions, may be taken to indicate higher cognitive capability in attention switching, auditory attention or memory in the control participants. This cognitive difference, together with our speculation that constant top-down predictions without complete bottom-up analysis of acoustic signals in speech recognition may reduce one’s sensitivity to small acoustic contrasts, account for the occurrence of dissociation in some individuals but not others.
Computational and experimental research has revealed that auditory sensory predictions are derived from regularities of the current environment by using internal generative models. However, so far, what has not been addressed is how the auditory system handles situations giving rise to redundant or even contradictory predictions derived from different sources of information. To this end, we measured error signals in the event-related brain potentials (ERPs) in response to violations of auditory predictions. Sounds could be predicted on the basis of overall probability, i.e., one sound was presented frequently and another sound rarely. Furthermore, each sound was predicted by an informative visual cue. Participants’ task was to use the cue and to discriminate the two sounds as fast as possible. Violations of the probability based prediction (i.e., a rare sound) as well as violations of the visual-auditory prediction (i.e., an incongruent sound) elicited error signals in the ERPs (Mismatch Negativity [MMN] and Incongruency Response [IR]). Particular error signals were observed even in case the overall probability and the visual symbol predicted different sounds. That is, the auditory system concurrently maintains and tests contradictory predictions. Moreover, if the same sound was predicted, we observed an additive error signal (scalp potential and primary current density) equaling the sum of the specific error signals. Thus, the auditory system maintains and tolerates functionally independently represented redundant and contradictory predictions. We argue that the auditory system exploits all currently active regularities in order to optimally prepare for future events.
Songbirds are one of the few groups of animals that learn the sounds used for vocal communication during development. Like humans, songbirds memorize vocal sounds based on auditory experience with vocalizations of adult “tutors”, and then use auditory feedback of self-produced vocalizations to gradually match their motor output to the memory of tutor sounds. In humans, investigations of early vocal learning have focused mainly on perceptual skills of infants, whereas studies of songbirds have focused on measures of vocal production. In order to fully exploit songbirds as a model for human speech, understand the neural basis of learned vocal behavior, and investigate links between vocal perception and production, studies of songbirds must examine both behavioral measures of perception and neural measures of discrimination during development. Here we used behavioral and electrophysiological assays of the ability of songbirds to distinguish vocal calls of varying frequencies at different stages of vocal learning. The results show that neural tuning in auditory cortex mirrors behavioral improvements in the ability to make perceptual distinctions of vocal calls as birds are engaged in vocal learning. Thus, separate measures of neural discrimination and behavioral perception yielded highly similar trends during the course of vocal development. The timing of this improvement in the ability to distinguish vocal sounds correlates with our previous work showing substantial refinement of axonal connectivity in cortico-basal ganglia pathways necessary for vocal learning.
Top-down attention to spatial and temporal cues has been thoroughly studied in the visual domain. However, because the neural systems that are important for auditory top-down temporal attention (i.e., attention based on time interval cues) remain undefined, the differences in brain activity between directed attention to auditory spatial location (compared with time intervals) are unclear. Using fMRI (magnetic resonance imaging), we measured the activations caused by cue-target paradigms by inducing the visual cueing of attention to an auditory target within a spatial or temporal domain. Imaging results showed that the dorsal frontoparietal network (dFPN), which consists of the bilateral intraparietal sulcus and the frontal eye field, responded to spatial orienting of attention, but activity was absent in the bilateral frontal eye field (FEF) during temporal orienting of attention. Furthermore, the fMRI results indicated that activity in the right ventrolateral prefrontal cortex (VLPFC) was significantly stronger during spatial orienting of attention than during temporal orienting of attention, while the DLPFC showed no significant differences between the two processes. We conclude that the bilateral dFPN and the right VLPFC contribute to auditory spatial orienting of attention. Furthermore, specific activations related to temporal cognition were confirmed within the superior occipital gyrus, tegmentum, motor area, thalamus and putamen.
Little is known about the timing of activating memory for objects and their associated perceptual properties, such as colour, and yet this is important for theories of human cognition. We investigated the time course associated with early cognitive processes related to the activation of object shape and object shape+colour representations respectively, during memory retrieval as assessed by repetition priming in an event-related potential (ERP) study. The main findings were as follows: (1) we identified a unique early modulation of mean ERP amplitude during the N1 that was associated with the activation of object shape independently of colour; (2) we also found a subsequent early P2 modulation of mean amplitude over the same electrode clusters associated with the activation of object shape+colour representations; (3) these findings were apparent across both familiar (i.e., correctly coloured – yellow banana) and novel (i.e., incorrectly coloured - blue strawberry) objects; and (4) neither of the modulations of mean ERP amplitude were evident during the P3. Together the findings delineate the timing of object shape and colour memory systems and support the notion that perceptual representations of object shape mediate the retrieval of temporary shape+colour representations for familiar and novel objects.
Facial emotions and emotional body postures can easily grab attention in social communication. In the context of faces, gaze has been shown as an important cue for orienting attention, but less is known for other important body parts such as hands. In the present study we investigated whether hands may orient attention due to the emotional features they convey. By implying motion in static photographs of hands, we aimed at furnishing observers with information about the intention to act and at testing if this interacted with the hand automatic coding. In this study, we compared neutral and frontal hands to emotionally threatening hands, rotated along their radial-ulnar axes in a Sidedness task (a Simon-like task based on automatic access to body representation). Results showed a Sidedness effect for both the palm and the back views with either neutral and emotional hands. More important, no difference was found between the two views for neutral hands, but it emerged in the case of the emotional hands: faster reaction times were found for the palm than the back view. The difference was ascribed to palm views' “offensive” pose: a source of threat that might have raised participants' arousal. This hypothesis was also supported by conscious evaluations of the dimensions of valence (pleasant-unpleasant) and arousal. Results are discussed in light of emotional feature coding.
The presence of non-simultaneous maskers can result in strong impairment in auditory intensity resolution relative to a condition without maskers, and causes a complex pattern of effects that is difficult to explain on the basis of peripheral processing. We suggest that the failure of selective attention to the target tones is a useful framework for understanding these effects. Two experiments tested the hypothesis that the sequential grouping of the targets and the maskers into separate auditory objects facilitates selective attention and therefore reduces the masker-induced impairment in intensity resolution. In Experiment 1, a condition favoring the processing of the maskers and the targets as two separate auditory objects due to grouping by temporal proximity was contrasted with the usual forward masking setting where the masker and the target presented within each observation interval of the two-interval task can be expected to be grouped together. As expected, the former condition resulted in a significantly smaller masker-induced elevation of the intensity difference limens (DLs). In Experiment 2, embedding the targets in an isochronous sequence of maskers led to a significantly smaller DL-elevation than control conditions not favoring the perception of the maskers as a separate auditory stream. The observed effects of grouping are compatible with the assumption that a precise representation of target intensity is available at the decision stage, but that this information is used only in a suboptimal fashion due to limitations of selective attention. The data can be explained within a framework of object-based attention. The results impose constraints on physiological models of intensity discrimination. We discuss candidate structures for physiological correlates of the psychophysical data.
The ability to detect sudden changes in the environment is critical for survival. Hearing is hypothesized to play a major role in this process by serving as an “early warning device,” rapidly directing attention to new events. Here, we investigate listeners' sensitivity to changes in complex acoustic scenes—what makes certain events “pop-out” and grab attention while others remain unnoticed? We use artificial “scenes” populated by multiple pure-tone components, each with a unique frequency and amplitude modulation rate. Importantly, these scenes lack semantic attributes, which may have confounded previous studies, thus allowing us to probe low-level processes involved in auditory change perception. Our results reveal a striking difference between “appear” and “disappear” events. Listeners are remarkably tuned to object appearance: change detection and identification performance are at ceiling; response times are short, with little effect of scene-size, suggesting a pop-out process. In contrast, listeners have difficulty detecting disappearing objects, even in small scenes: performance rapidly deteriorates with growing scene-size; response times are slow, and even when change is detected, the changed component is rarely successfully identified. We also measured change detection performance when a noise or silent gap was inserted at the time of change or when the scene was interrupted by a distractor that occurred at the time of change but did not mask any scene elements. Gaps adversely affected the processing of item appearance but not disappearance. However, distractors reduced both appearance and disappearance detection. Together, our results suggest a role for neural adaptation and sensitivity to transients in the process of auditory change detection, similar to what has been demonstrated for visual change detection. Importantly, listeners consistently performed better for item addition (relative to deletion) across all scene interruptions used, suggesting a robust perceptual representation of item appearance.
In everyday life, we need a capacity to flexibly shift attention between alternative sound sources. However, relatively little work has been done to elucidate the mechanisms of attention shifting in the auditory domain. Here, we used a mixed event-related/sparse-sampling fMRI approach to investigate this essential cognitive function. In each 10-sec trial, subjects were instructed to wait for an auditory “cue” signaling the location where a subsequent “target” sound was likely to be presented. The target was occasionally replaced by an unexpected “novel” sound in the uncued ear, to trigger involuntary attention shifting. To maximize the attention effects, cues, targets, and novels were embedded within dichotic 800-Hz vs. 1500-Hz pure-tone “standard” trains. The sound of clustered fMRI acquisition (starting at t = 7.82 sec) served as a controlled trial-end signal. Our approach revealed notable activation differences between the conditions. Cued voluntary attention shifting activated the superior intraparietal sulcus (IPS), whereas novelty-triggered involuntary orienting activated the inferior IPS and certain subareas of the precuneus. Clearly more widespread activations were observed during voluntary than involuntary orienting in the premotor cortex, including the frontal eye fields. Moreover, we found evidence for a frontoinsular-cingular attentional control network, consisting of the anterior insula, inferior frontal cortex, and medial frontal cortices, which were activated during both target discrimination and voluntary attention shifting. Finally, novels and targets activated much wider areas of superior temporal auditory cortices than shifting cues.
In sentence comprehension research, the case system, which is one of the subsystems of the language processing system, has been assumed to play a crucial role in signifying relationships in sentences between noun phrases (NPs) and other elements, such as verbs, prepositions, nouns, and tense. However, so far, less attention has been paid to the question of how cases are processed in our brain. To this end, the current study used fMRI and scanned the brain activity of 15 native English speakers during an English-case processing task. The results showed that, while the processing of all cases activates the left inferior frontal gyrus and posterior part of the middle temporal gyrus, genitive case processing activates these two regions more than nominative and accusative case processing. Since the effect of the difference in behavioral performance among these three cases is excluded from brain activation data, the observed different brain activations would be due to the different processing patterns among the cases, indicating that cases are processed differently in our brains. The different brain activations between genitive case processing and nominative/accusative case processing may be due to the difference in structural complexity between them.
In this study we sought to elucidate what mechanisms underlie the effects of trial history on information processing. We explicitly focused on the contribution of conflict control and S-R binding to sequential trial effects. Performance and brain activity were measured during two hours of continuous Stroop task performance. Mental fatigue, known to influence top-down processing, was used to elucidate separate effects via top-down and bottom-up mechanisms. Here we confirm that performance in the Stroop task is indeed strongly modulated by stimulus history. Performance was affected by the kind of advance information available; dependent on this information adjustments were made, resulting in differential effects of cognitive conflict, and S-R binding on subsequent performance. The influence of mental fatigue on information processing was mainly related to general effects on attention.
Given that both auditory and visual systems have anatomically separate object identification (“what”) and spatial (“where”) pathways, it is of interest whether attention-driven cross-sensory modulations occur separately within these feature domains. Here, we investigated how auditory “what” vs. “where” attention tasks modulate activity in visual pathways using cortically constrained source estimates of magnetoencephalograpic (MEG) oscillatory activity. In the absence of visual stimuli or tasks, subjects were presented with a sequence of auditory-stimulus pairs and instructed to selectively attend to phonetic (“what”) vs. spatial (“where”) aspects of these sounds, or to listen passively. To investigate sustained modulatory effects, oscillatory power was estimated from time periods between sound-pair presentations. In comparison to attention to sound locations, phonetic auditory attention was associated with stronger alpha (7–13 Hz) power in several visual areas (primary visual cortex; lingual, fusiform, and inferior temporal gyri, lateral occipital cortex), as well as in higher-order visual/multisensory areas including lateral/medial parietal and retrosplenial cortices. Region-of-interest (ROI) analyses of dynamic changes, from which the sustained effects had been removed, suggested further power increases during Attend Phoneme vs. Location centered at the alpha range 400–600 ms after the onset of second sound of each stimulus pair. These results suggest distinct modulations of visual system oscillatory activity during auditory attention to sound object identity (“what”) vs. sound location (“where”). The alpha modulations could be interpreted to reflect enhanced crossmodal inhibition of feature-specific visual pathways and adjacent audiovisual association areas during “what” vs. “where” auditory attention.
Theories on visual perception agree that visual recognition begins with global analysis and ends with detailed analysis. Different results from neurophysiological, computational, and behavioral studies all indicate that the totality of visual information is not immediately conveyed, but that information analysis follows a predominantly coarse-to-fine processing sequence (low spatial frequencies are extracted first, followed by high spatial frequencies). We tested whether such processing continues to occur in normally aging subjects. Young and aged participants performed a categorization task (indoor vs. outdoor scenes), using dynamic natural scene stimuli, in which they resorted to either a coarse-to-fine (CtF) sequence or a reverse fine-to-coarse sequence (FtC). The results show that young participants categorized CtF sequences more quickly than FtC sequences. However, sequence processing interacts with semantic category only for aged participants. The present data support the notion that CtF categorization is effective even in aged participants, but is constrained by the spatial features of the scenes, thus highlighting new perspectives in visual models.
Consonants, unlike vowels, are thought to be speech specific and therefore no interactions would be expected between consonants and pitch, a basic element for musical tones. The present study used an electrophysiological approach to investigate whether, contrary to this view, there is integrative processing of consonants and pitch by measuring additivity of changes in the mismatch negativity (MMN) of evoked potentials. The MMN is elicited by discriminable variations occurring in a sequence of repetitive, homogeneous sounds. In the experiment, event-related potentials (ERPs) were recorded while participants heard frequently sung consonant-vowel syllables and rare stimuli deviating in either consonant identity only, pitch only, or in both dimensions. Every type of deviation elicited a reliable MMN. As expected, the two single-deviant MMNs had similar amplitudes, but that of the double-deviant MMN was also not significantly different from them. This absence of additivity in the double-deviant MMN suggests that consonant and pitch variations are processed, at least at a pre-attentive level, in an integrated rather than independent way. Domain-specificity of consonants may depend on higher-level processes in the hierarchy of speech perception.
The physiological roots of music perception are a matter of long-lasting debate. Recently light on this problem has been shed by the study of otoacoustic emissions (OAEs), which are weak sounds generated by the inner ear following acoustic stimulation and, sometimes, even spontaneously. In the present study, a high-resolution time–frequency method called matching pursuit was applied to the OAEs recorded from the ears of 45 normal volunteers so that the component frequencies, amplitudes, latencies, and time-spans could be accurately determined. The method allowed us to find that, for each ear, the OAEs consisted of characteristic frequency patterns that we call resonant modes. Here we demonstrate that, on average, the frequency ratios of the resonant modes from all the cochleas studied possessed small integer ratios. The ratios are the same as those found by Pythagoras as being most musically pleasant and which form the basis of the Just tuning system. The statistical significance of the results was verified against a random distribution of ratios. As an explanatory model, there are attractive features in a recent theory that represents the cochlea as a surface acoustic wave resonator; in this situation the spacing between the rows of hearing receptors can create resonant cavities of defined lengths. By adjusting the geometry and the lengths of the resonant cavities, it is possible to generate the preferred frequency ratios we have found here. We conclude that musical perception might be related to specific geometrical and physiological properties of the cochlea.
Attentional blink (AB) describes a phenomenon whereby correct identification of a first target impairs the processing of a second target (i.e., probe) nearby in time. Evidence suggests that explicit attention orienting in the time domain can attenuate the AB. Here, we used scalp-recorded, event-related potentials to examine whether auditory AB is also sensitive to implicit temporal attention orienting. Expectations were set up implicitly by varying the probability (i.e., 80% or 20%) that the probe would occur at the +2 or +8 position following target presentation. Participants showed a significant AB, which was reduced with the increased probe probability at the +2 position. The probe probability effect was paralleled by an increase in P3b amplitude elicited by the probe. The results suggest that implicit temporal attention orienting can facilitate short-term consolidation of the probe and attenuate auditory AB.
Although many types of learning require associations to be formed, little is known about the brain mechanisms engaged in association formation. In the present study, we measured event-related potentials (ERPs) while participants studied pairs of semantically related words, with each word of a pair presented sequentially. To narrow in on the associative component of the signal, the ERP difference between the first and second words of a pair (Word2-Word1) was derived separately for subsequently recalled and subsequently not-recalled pairs. When the resulting difference waveforms were contrasted, a parietal positivity was observed for subsequently recalled pairs around 460 ms after the word presentation onset, followed by a positive slow wave that lasted until around 845 ms. Together these results suggest that associations formed between semantically related words are correlated with a specific neural signature that is reflected in scalp recordings over the parietal region.
Inhibition of Return (IOR) is one of the most consistent and widely studied effects in experimental psychology. The effect refers to a delayed response to visual stimuli in a cued location after initial priming at that location. This article presents a dynamic field model for IOR. The model describes the evolution of three coupled activation fields. The decision field, inspired by the intermediate layer of the superior colliculus, receives endogenous input and input from a sensory field. The sensory field, inspired by earlier sensory processing, receives exogenous input. Habituation of the sensory field is implemented by a reciprocal coupling with a third field, the habituation field. The model generates IOR because, due to the habituation of the sensory field, the decision field receives a reduced target-induced input in cue-target-compatible situations. The model is consistent with single-unit recordings of neurons of monkeys that perform IOR tasks. Such recordings have revealed that IOR phenomena parallel the activity of neurons in the intermediate layer of the superior colliculus and that neurons in this layer receive reduced input in cue-target-compatible situations. The model is also consistent with behavioral data concerning temporal expectancy effects. In a discussion, the multi-layer dynamic field account of IOR is used to illustrate the broader view that behavior consists of a tuning of the organism to the environment that continuously and concurrently takes place at different spatiotemporal scales.
It has traditionally been assumed that cochlear implant users de facto perform atypically in audiovisual tasks. However, a recent study that combined an auditory task with visual distractors suggests that only those cochlear implant users that are not proficient at recognizing speech sounds might show abnormal audiovisual interactions. The present study aims at reinforcing this notion by investigating the audiovisual segregation abilities of cochlear implant users in a visual task with auditory distractors. Speechreading was assessed in two groups of cochlear implant users (proficient and non-proficient at sound recognition), as well as in normal controls. A visual speech recognition task (i.e. speechreading) was administered either in silence or in combination with three types of auditory distractors: i) noise ii) reverse speech sound and iii) non-altered speech sound. Cochlear implant users proficient at speech recognition performed like normal controls in all conditions, whereas non-proficient users showed significantly different audiovisual segregation patterns in both speech conditions. These results confirm that normal-like audiovisual segregation is possible in highly skilled cochlear implant users and, consequently, that proficient and non-proficient CI users cannot be lumped into a single group. This important feature must be taken into account in further studies of audiovisual interactions in cochlear implant users.
Constant sound sequencing as operationalized by repeated stimulation with tones of the same frequency has multiple effects. On the one hand, it activates mechanisms of habituation and refractoriness, which are reflected in the decrease of response amplitude of evoked responses. On the other hand, the constant sequencing acts as spectral cueing, resulting in tones being detected faster and more accurately. With the present study, by means of magnetoencephalography, we investigated the impact of repeated tone stimulation on the N1m auditory evoked fields, while listeners were distracted from the test sounds. We stimulated subjects with trains of either four tones of the same frequency, or with trains of randomly assigned frequencies. The trains were presented either in a silent or in a noisy background. In silence, the patterns of source strength decline originating from repeated stimulation suggested both, refractoriness as well as habituation as underlying mechanisms. In noise, in contrast, there was no indication of source strength decline. Furthermore, we found facilitating effects of constant sequencing regarding the detection of the single tones as indexed by a shortening of N1m latency. We interpret our findings as a correlate of a bottom-up mechanism that is constantly monitoring the incoming auditory information, even when voluntary attention is directed to a different modality.
Hemodynamic mismatch responses can be elicited by deviant stimuli in a sequence of standard stimuli even during cognitive demanding tasks. Emotional context is known to modulate lateralized processing. Right-hemispheric negative emotion processing may bias attention to the right and enhance processing of right-ear stimuli. The present study examined the influence of induced mood on lateralized pre-attentive auditory processing of dichotic stimuli using functional magnetic resonance imaging (fMRI). Faces expressing emotions (sad/happy/neutral) were presented in a blocked design while a dichotic oddball sequence with consonant-vowel (CV) syllables in an event-related design was simultaneously administered. Twenty healthy participants were instructed to feel the emotion perceived on the images and to ignore the syllables. Deviant sounds reliably activated bilateral auditory cortices and confirmed attention effects by modulation of visual activity. Sad mood induction activated visual, limbic and right prefrontal areas. A lateralization effect of emotion-attention interaction was reflected in a stronger response to right-ear deviants in the right auditory cortex during sad mood. This imbalance of resources may be a neurophysiological correlate of laterality in sad mood and depression. Conceivably, the compensatory right-hemispheric enhancement of resources elicits increased ipsilateral processing.
Auditory perception and cognition entails both low-level and high-level processes, which are likely to interact with each other to create our rich conscious experience of soundscapes. Recent research that we review has revealed numerous influences of high-level factors, such as attention, intention, and prior experience, on conscious auditory perception. And recently, studies have shown that auditory scene analysis tasks can exhibit multistability in a manner very similar to ambiguous visual stimuli, presenting a unique opportunity to study neural correlates of auditory awareness and the extent to which mechanisms of perception are shared across sensory modalities. Research has also led to a growing number of techniques through which auditory perception can be manipulated and even completely suppressed. Such findings have important consequences for our understanding of the mechanisms of perception and also should allow scientists to precisely distinguish the influences of different higher-level influences.
auditory scene analysis; multistability; change deafness; informational masking; priming; attentional blink