In a natural environment, objects that we look for often make characteristic sounds. A hiding cat may meow, or the keys in the cluttered drawer may jingle when moved. Using a visual search paradigm, we demonstrated that characteristic sounds facilitated visual localization of objects, even when the sounds carried no location information. For example, finding a cat was faster when participants heard a meow sound. In contrast, sounds had no effect when participants searched for names rather than pictures of objects. For example, hearing “meow” did not facilitate localization of the word cat. These results suggest that characteristic sounds cross-modally enhance visual (rather than conceptual) processing of the corresponding objects. Our behavioral demonstration of object-based cross-modal enhancement complements the extensive literature on space-based cross-modal interactions. When looking for your keys next time, you might want to play jingling sounds.
When you are looking for an object, does hearing its characteristic sound make you find it more quickly? Our recent results supported this possibility by demonstrating that when a cat target, for example, was presented among other objects, a simultaneously presented “meow” sound (containing no spatial information) reduced the manual response time for visual localization of the target. To extend these results, we determined how rapidly an object-specific auditory signal can facilitate target detection in visual search. On each trial, participants fixated a specified target object as quickly as possible. The target’s characteristic sound speeded the saccadic search time within 215–220 ms and also guided the initial saccade toward the target, compared to presentation of a distractor’s sound or to no sound. These results suggest that object-based auditory-visual interactions rapidly increase the target object’s salience in visual search.
In this study we investigate previous claims that a region in the left posterior superior temporal sulcus (pSTS) is more activated by audiovisual than unimodal processing. First, we compare audiovisual to visual–visual and auditory–auditory conceptual matching using auditory or visual object names that are paired with pictures of objects or their environmental sounds. Second, we compare congruent and incongruent audiovisual trials when presentation is simultaneous or sequential. Third, we compare audiovisual stimuli that are either verbal (auditory and visual words) or nonverbal (pictures of objects and their associated sounds). The results demonstrate that, when task, attention, and stimuli are controlled, pSTS activation for audiovisual conceptual matching is 1) identical to that observed for intramodal conceptual matching, 2) greater for incongruent than congruent trials when auditory and visual stimuli are simultaneously presented, and 3) identical for verbal and nonverbal stimuli. These results are not consistent with previous claims that pSTS activation reflects the active formation of an integrated audiovisual representation. After a discussion of the stimulus and task factors that modulate activation, we conclude that, when stimulus input, task, and attention are controlled, pSTS is part of a distributed set of regions involved in conceptual matching, irrespective of whether the stimuli are audiovisual, auditory–auditory or visual–visual.
amodal; audiovisual binding; conceptual integration; congruency; crossmodal
Seeing the image of a newscaster on a television set causes us to think that the sound coming from the loudspeaker is actually coming from the screen. How images capture sounds is mysterious because the brain uses different methods for determining the locations of visual vs. auditory stimuli. The retina senses the locations of visual objects with respect to the eyes, whereas differences in sound characteristics across the ears indicate the locations of sound sources referenced to the head. Here, we tested which reference frame (RF) is used when vision recalibrates perceived sound locations.
Visually guided biases in sound localization were induced in seven humans and two monkeys who made eye movements to auditory or audio-visual stimuli. On audio-visual (training) trials, the visual component of the targets was displaced laterally by ~5°. Interleaved auditory-only (probe) trials served to evaluate the effect of experience with mismatched visual stimuli on auditory localization. We found that the displaced visual stimuli induced ventriloquism aftereffect in both humans (~50% of the displacement size) and monkeys (~25%), but only for locations around the trained spatial region, showing that audio-visual recalibration can be spatially specific.
We tested the reference frame in which the recalibration occurs. On probe trials, we varied eye position relative to the head to dissociate head- from eye-centered RFs. Results indicate that both humans and monkeys use a mixture of the two RFs, suggesting that the neural mechanisms involved in ventriloquism occur in brain region(s) employing a hybrid RF for encoding spatial information.
visual calibration of auditory space; humans; monkeys; reference frame of auditory space representation; ventriloquism; cross-modal adaptation
Can hearing a word change what one sees? Although visual sensitivity is known to be enhanced by attending to the location of the target, perceptual enhancements of following cues to the identity of an object have been difficult to find. Here, we show that perceptual sensitivity is enhanced by verbal, but not visual cues.
Participants completed an object detection task in which they made an object-presence or -absence decision to briefly-presented letters. Hearing the letter name prior to the detection task increased perceptual sensitivity (d′). A visual cue in the form of a preview of the to-be-detected letter did not. Follow-up experiments found that the auditory cuing effect was specific to validly cued stimuli. The magnitude of the cuing effect positively correlated with an individual measure of vividness of mental imagery; introducing uncertainty into the position of the stimulus did not reduce the magnitude of the cuing effect, but eliminated the correlation with mental imagery.
Hearing a word made otherwise invisible objects visible. Interestingly, seeing a preview of the target stimulus did not similarly enhance detection of the target. These results are compatible with an account in which auditory verbal labels modulate lower-level visual processing. The findings show that a verbal cue in the form of hearing a word can influence even the most elementary visual processing and inform our understanding of how language affects perception.
While perceiving speech, people see mouth shapes that are systematically associated with sounds. In particular, a vertically stretched mouth produces a /woo/ sound, whereas a horizontally stretched mouth produces a /wee/ sound. We demonstrate that hearing these speech sounds alters how we see aspect ratio, a basic visual feature that contributes to perception of 3D space, objects and faces. Hearing a /woo/ sound increases the apparent vertical elongation of a shape, whereas hearing a /wee/ sound increases the apparent horizontal elongation. We further demonstrate that these sounds influence aspect ratio coding. Viewing and adapting to a tall (or flat) shape makes a subsequently presented symmetric shape appear flat (or tall). These aspect ratio aftereffects are enhanced when associated speech sounds are presented during the adaptation period, suggesting that the sounds influence visual population coding of aspect ratio. Taken together, these results extend previous demonstrations that visual information constrains auditory perception by showing the converse – speech sounds influence visual perception of a basic geometric feature.
Auditory–visual; Aspect ratio; Crossmodal; Shape perception; Speech perception
For humans and animals, the ability to discriminate speech and conspecific vocalizations is an important physiological assignment of the auditory system. To reveal the underlying neural mechanism, many electrophysiological studies have investigated the neural responses of the auditory cortex to conspecific vocalizations in monkeys. The data suggest that vocalizations may be hierarchically processed along an anterior/ventral stream from the primary auditory cortex (A1) to the ventral prefrontal cortex. To date, the organization of vocalization processing has not been well investigated in the auditory cortex of other mammals. In this study, we examined the spike activities of single neurons in two early auditory cortical regions with different anteroposterior locations: anterior auditory field (AAF) and posterior auditory field (PAF) in awake cats, as the animals were passively listening to forward and backward conspecific calls (meows) and human vowels. We found that the neural response patterns in PAF were more complex and had longer latency than those in AAF. The selectivity for different vocalizations based on the mean firing rate was low in both AAF and PAF, and not significantly different between them; however, more vocalization information was transmitted when the temporal response profiles were considered, and the maximum transmitted information by PAF neurons was higher than that by AAF neurons. Discrimination accuracy based on the activities of an ensemble of PAF neurons was also better than that of AAF neurons. Our results suggest that AAF and PAF are similar with regard to which vocalizations they represent but differ in the way they represent these vocalizations, and there may be a complex processing stream between them.
More comprehensive, and efficient, mapping strategies are needed to avoid post-operative language impairments in patients undergoing epilepsy surgery. Conservative resection of dominant anterior frontal or temporal cortex frequently results in post-operative naming deficits despite standard pre-operative electrocortical stimulation mapping of visual object (picture) naming. Naming to auditory description may better simulate word retrieval in human conversation but is not typically tested, in part due to the time demands of electrocortical stimulation mapping. Electrocorticographic high gamma (60-150 Hertz) activity, recorded simultaneously through the same electrodes used for stimulation mapping, has recently been used to map brain function more efficiently, and has at times predicted deficits not anticipated based on stimulation mapping alone. The present study investigated electrocorticographic mapping of visual object naming and auditory descriptive naming within conservative dominant temporal or frontal lobe resection boundaries in 16 patients with 933 subdural electrodes implanted for epilepsy surgery planning. A logistic regression model showed that electrodes within traditional conservative dominant frontal or temporal lobe resection boundaries were significantly more likely to record high gamma activity during auditory descriptive naming than during visual object naming. Eleven patients ultimately underwent resection and 7 demonstrated post-operative language deficits not anticipated based on electrocortical stimulation mapping alone. Four of these patients underwent a resection that included sites where high gamma activity was observed during auditory naming. These findings indicate that electrocorticographic mapping of auditory descriptive naming may reduce the risk of permanent post-operative language deficits following dominant temporal or frontal resection.
language mapping; epilepsy surgery; high gamma; electrocorticography; electrocortical stimulation; surgical outcome
Seeing the articulatory gestures of the speaker significantly enhances speech perception. Findings from recent neuroimaging studies suggest that activation of the speech motor system during lipreading enhance speech perception by tuning, in a top-down fashion, speech-sound processing in the superior aspects of the posterior temporal lobe. Anatomically, the superior-posterior temporal lobe areas receive connections from the auditory, visual, and speech motor cortical areas. Thus, it is possible that neuronal receptive fields are shaped during development to respond to speech-sound features that coincide with visual and motor speech cues, in contrast with the anterior/lateral temporal lobe areas that might process speech sounds predominantly based on acoustic cues. The superior-posterior temporal lobe areas have also been consistently associated with auditory spatial processing. Thus, the involvement of these areas in audiovisual speech perception might partly be explained by the spatial processing requirements when associating sounds, seen articulations, and one’s own motor movements. Tentatively, it is possible that the anterior “what” and posterior “where / how” auditory cortical processing pathways are parts of an interacting network, the instantaneous state of which determines what one ultimately perceives, as potentially reflected in the dynamics of oscillatory activity.
Audiovisual speech perception; speech motor theory; functional MRI; magnetoencephalography; electroencephalography.
Families of infants who are congenitally deaf now have the option of cochlear implantation at a very young age. In order to assess the effectiveness of early cochlear implantation, however, new behavioral procedures are needed to measure speech perception and language skills during infancy. One important component of language development is word learning—a complex skill that involves learning arbitrary relations between words and their referents. A precursor to word learning is the ability to perceive and encode intersensory relations between co-occurring auditory and visual events. Recent studies in infants with normal hearing have shown that intersensory redundancies, such as temporal synchrony, can facilitate the ability to learn arbitrary pairings between speech sounds and objects (Gogate & Bahrick, 1998). To investigate the early stages of learning arbitrary pairings of sounds and objects after cochlear implantation, we used the Preferential Looking Paradigm (PLP) to assess infants’ ability to associate speech sounds to objects that moved in temporal synchrony with the onset and offsets of the signals. Children with normal hearing ranging in age from 6, 9, 18, and 30 months served as controls and demonstrated the ability to learn arbitrary pairings between temporally synchronous speech sounds and dynamic visual events. Infants who received their cochlear implants (CIs) at earlier ages (7–15 months of age) performed similarly to the infants with normal hearing after about 2–6 months of CI experience. In contrast, infants who received their implants at later ages (16–25 months of age) did not demonstrate learning of the associations within the context of this experiment. Possible implications of these findings are discussed.
This fMRI study investigates how audiovisual integration differs for verbal stimuli that can be matched at a phonological level and nonverbal stimuli that can be matched at a semantic level. Subjects were presented simultaneously with one visual and one auditory stimulus and were instructed to decide whether these stimuli referred to the same object or not. Verbal stimuli were simultaneously presented spoken and written object names, and nonverbal stimuli were photographs of objects simultaneously presented with naturally occurring object sounds. Stimulus differences were controlled by including two further conditions that paired photographs of objects with spoken words and object sounds with written words. Verbal matching, relative to all other conditions, increased activation in a region of the left superior temporal sulcus that has previously been associated with phonological processing. Nonverbal matching, relative to all other conditions, increased activation in a right fusiform region that has previously been associated with structural and conceptual object processing. Thus, we demonstrate how brain activation for audiovisual integration depends on the verbal content of the stimuli, even when stimulus and task processing differences are controlled.
Audiovisual; Integration; Verbal; Nonverbal; Semantic; Conceptual; Phonological; Amodal
In our natural environment, emotional information is conveyed by converging visual and auditory information; multimodal integration is of utmost importance. In the laboratory, however, emotion researchers have mostly focused on the examination of unimodal stimuli. Few existing studies on multimodal emotion processing have focused on human communication such as the integration of facial and vocal expressions. Extending the concept of multimodality, the current study examines how the neural processing of emotional pictures is influenced by simultaneously presented sounds. Twenty pleasant, unpleasant, and neutral pictures of complex scenes were presented to 22 healthy participants. On the critical trials these pictures were paired with pleasant, unpleasant, and neutral sounds. Sound presentation started 500 ms before picture onset and each stimulus presentation lasted for 2 s. EEG was recorded from 64 channels and ERP analyses focused on the picture onset. In addition, valence and arousal ratings were obtained. Previous findings for the neural processing of emotional pictures were replicated. Specifically, unpleasant compared to neutral pictures were associated with an increased parietal P200 and a more pronounced centroparietal late positive potential (LPP), independent of the accompanying sound valence. For audiovisual stimulation, increased parietal P100 and P200 were found in response to all pictures which were accompanied by unpleasant or pleasant sounds compared to pictures with neutral sounds. Most importantly, incongruent audiovisual pairs of unpleasant pictures and pleasant sounds enhanced parietal P100 and P200 compared to pairings with congruent sounds. Taken together, the present findings indicate that emotional sounds modulate early stages of visual processing and, therefore, provide an avenue by which multimodal experience may enhance perception.
emotional pictures; emotional sounds; audiovisual stimuli; ERPs; P100; P200; LPP
Coloured-hearing (CH) synesthesia is a perceptual phenomenon in which an acoustic stimulus (the inducer) initiates a concurrent colour perception (the concurrent). Individuals with CH synesthesia "see" colours when hearing tones, words, or music; this specific phenomenon suggesting a close relationship between auditory and visual representations. To date, it is still unknown whether the perception of colours is associated with a modulation of brain functions in the inducing brain area, namely in the auditory-related cortex and associated brain areas. In addition, there is an on-going debate as to whether attention to the inducer is necessarily required for eliciting a visual concurrent, or whether the latter can emerge in a pre-attentive fashion.
By using the EEG technique in the context of a pre-attentive mismatch negativity (MMN) paradigm, we show that the binding of tones and colours in CH synesthetes is associated with increased MMN amplitudes in response to deviant tones supposed to induce novel concurrent colour perceptions. Most notably, the increased MMN amplitudes we revealed in the CH synesthetes were associated with stronger intracerebral current densities originating from the auditory cortex, parietal cortex, and ventral visual areas.
The automatic binding of tones and colours in CH synesthetes is accompanied by an early pre-attentive process recruiting the auditory cortex, inferior and superior parietal lobules, as well as ventral occipital areas.
Coloured-hearing synesthesia; Crossmodal integration; EEG; Mismatch negativity; Auditory cortex
A number of studies have investigated changes in the perception of visual motion
as a result of altered sensory experiences. An animal study has shown that
auditory-deprived cats exhibit enhanced performance in a visual movement
detection task compared to hearing cats (Lomber,
Meredith, & Kral, 2010). In humans, the behavioural evidence
regarding the perception of motion is less clear. The present study investigated
deaf and hearing adult participants using a movement localization task and a
direction of motion task employing coherently-moving and static visual dot
patterns. Overall, deaf and hearing participants did not differ in their
movement localization performance, although within the deaf group, a left visual
field advantage was found. When discriminating the direction of motion, however,
deaf participants responded faster and tended to be more accurate when detecting
small differences in direction compared with the hearing controls. These results
conform to the view that visual abilities are enhanced after auditory
deprivation and extend previous findings regarding visual motion processing in
deafness; cross-modal plasticity; localization of motion; direction of motion
Object detection and identification are fundamental to human vision, and there is mounting evidence that objects guide the allocation of visual attention. However, the role of objects in tasks involving multiple modalities is less clear. To address this question, we investigate object naming, a task in which participants have to verbally identify objects they see in photorealistic scenes. We report an eye-tracking study that investigates which features (attentional, visual, and linguistic) influence object naming. We find that the amount of visual attention directed toward an object, its position and saliency, along with linguistic factors such as word frequency, animacy, and semantic proximity, significantly influence whether the object will be named or not. We then ask how features from different modalities are combined during naming, and find significant interactions between saliency and position, saliency and linguistic features, and attention and position. We conclude that when the cognitive system performs tasks such as object naming, it uses input from one modality to constraint or enhance the processing of other modalities, rather than processing each input modality independently.
scene perception; visual saliency; eye movements; naming; overt attention; object perception
The human neocortex appears to contain a dedicated visual word form area (VWFA) and an adjacent multimodal (visual/auditory) area. However, these conclusions are based on functional magnetic resonance imaging (fMRI) of alphabetic language processing, languages that have clear grapheme-to-phoneme correspondence (GPC) rules that make it difficult to disassociate visual-specific processing from form-to-sound mapping. In contrast, the Chinese language has no clear GPC rules. Therefore, the current study examined whether native Chinese readers also have the same VWFA and multimodal area. Two cross-modal tasks, phonological retrieval of visual words and orthographic retrieval of auditory words, were adopted. Different task requirements were also applied to explore how different levels of cognitive processing modulate activation of putative VWFA-like and multimodal-like regions. Results showed that the left occipitotemporal sulcus (LOTS) responded exclusively to visual inputs and an adjacent region, the left inferior temporal gyrus (LITG), showed comparable activation for both visual and auditory inputs. Surprisingly, processing levels did not significantly alter activation of these two regions. These findings indicated that there are both unimodal and multimodal word areas for non-alphabetic language reading, and that activity in these two word-specific regions are independent of task demands at the linguistic level.
fMRI; visual word form area; Chinese; multimodal; task modulation
Speech production involves the generation of an auditory signal from the articulators and vocal tract. When the intended auditory signal does not match the produced sounds, subsequent articulatory commands can be adjusted to reduce the difference between the intended and produced sounds. This requires an internal model of the intended speech output that can be compared to the produced speech. The aim of this functional imaging study was to identify brain activation related to the internal model of speech production after activation related to vocalization, auditory feedback, and movement in the articulators had been controlled. There were four conditions: silent articulation of speech, non-speech mouth movements, finger tapping, and visual fixation. In the speech conditions, participants produced the mouth movements associated with the words “one” and “three.” We eliminated auditory feedback from the spoken output by instructing participants to articulate these words without producing any sound. The non-speech mouth movement conditions involved lip pursing and tongue protrusions to control for movement in the articulators. The main difference between our speech and non-speech mouth movement conditions is that prior experience producing speech sounds leads to the automatic and covert generation of auditory and phonological associations that may play a role in predicting auditory feedback. We found that, relative to non-speech mouth movements, silent speech activated Broca’s area in the left dorsal pars opercularis and Wernicke’s area in the left posterior superior temporal sulcus. We discuss these results in the context of a generative model of speech production and propose that Broca’s and Wernicke’s areas may be involved in predicting the speech output that follows articulation. These predictions could provide a mechanism by which rapid movement of the articulators is precisely matched to the intended speech outputs during future articulations.
speech production; auditory feedback; PET; fMRI; forward model
The aim of this study was to investigate the hypothesis that semantic information facilitates auditory and visual spatial learning and memory. An auditory spatial task was administered, whereby healthy participants were placed in the center of a semi-circle that contained an array of speakers where the locations of nameable and non-nameable sounds were learned. In the visual spatial task, locations of pictures of abstract art intermixed with nameable objects were learned by presenting these items in specific locations on a computer screen. Participants took part in both the auditory and visual spatial tasks, which were counterbalanced for order and were learned at the same rate. Results showed that learning and memory for the spatial locations of nameable sounds and pictures was significantly better than for non-nameable stimuli. Interestingly, there was a cross-modal learning effect such that the auditory task facilitated learning of the visual task and vice versa. In conclusion, our results support the hypotheses that the semantic representation of items, as well as the presentation of items in different modalities, facilitate spatial learning and memory.
audition; vision; hippocampus; spatial memory; cognitive map
The simultaneous presentation of a stimulus in one sensory modality often enhances target detection in another sensory modality, but the neural mechanisms that govern these effects are still under investigation. Here we test a hypothesis proposed in the neurophysiologic literature: that auditory facilitation of visual-target detection operates through cross-sensory phase reset of ongoing neural oscillations (see Lakatos et al., 2009). To date, measurement limitations have prevented this potentially powerful neural mechanism from being directly linked with its predicted behavioral consequences. The present experiment uses a psychophysical approach in humans to demonstrate, for the first time, stimulus-locked periodicity in visual-target detection, following a temporally informative sound. Our data further demonstrate that periodicity in behavioral performance is strongly influenced by the probability of audiovisual co-occurrence. We argue that fluctuations in visual-target detection result from cross-sensory phase reset, both at the moment it occurs and persisting for seconds thereafter. The precise frequency at which this periodicity operates remains to be determined through a method that allows for a higher sampling rate.
Naming is a fundamental aspect of language and is virtually always assessed with visual confrontation tests. Tests of the ability to name objects by their characteristic sounds would be particularly useful in the assessment of visually impaired patients, and may be particularly sensitive in Alzheimer’s disease (AD). We developed an Auditory Naming Task, requiring the identification of the source of environmental sounds (i.e., animal calls, musical instruments, vehicles) and multiple-choice recognition of those not identified. In two separate studies, mild-to-moderate AD patients performed more poorly than cognitively normal elderly on the Auditory Naming Task. This task was also more difficult than two versions of a comparable Visual Naming Task, and correlated more highly with Mini-Mental State Exam score. Internal consistency reliability was acceptable, although ROC analysis revealed auditory naming to be slightly less successful than visual confrontation naming in discriminating AD patients from normal subjects. Nonetheless, our Auditory Naming Test may prove useful in research and clinical practice, especially with visually-impaired patients.
In natural environments, sensory information is embedded in temporally contiguous streams of events. This is typically the case when seeing and listening to a speaker or when engaged in scene analysis. In such contexts, two mechanisms are needed to single out and build a reliable representation of an event (or object): the temporal parsing of information and the selection of relevant information in the stream. It has previously been shown that rhythmic events naturally build temporal expectations that improve sensory processing at predictable points in time. Here, we asked to which extent temporal regularities can improve the detection and identification of events across sensory modalities. To do so, we used a dynamic visual conjunction search task accompanied by auditory cues synchronized or not with the color change of the target (horizontal or vertical bar). Sounds synchronized with the visual target improved search efficiency for temporal rates below 1.4 Hz but did not affect efficiency above that stimulation rate. Desynchronized auditory cues consistently impaired visual search below 3.3 Hz. Our results are interpreted in the context of the Dynamic Attending Theory: specifically, we suggest that a cognitive operation structures events in time irrespective of the sensory modality of input. Our results further support and specify recent neurophysiological findings by showing strong temporal selectivity for audiovisual integration in the auditory-driven improvement of visual search efficiency.
A general problem in learning is how the brain determines what lesson to learn (and what lessons not to learn). For example, sound localization is a behavior that is partially learned with the aid of vision. This process requires correctly matching a visual location to that of a sound. This is an intrinsically circular problem when sound location is itself uncertain and the visual scene is rife with possible visual matches. Here, we develop a simple paradigm using visual guidance of sound localization to gain insight into how the brain confronts this type of circularity. We tested two competing hypotheses. 1: The brain guides sound location learning based on the synchrony or simultaneity of auditory-visual stimuli, potentially involving a Hebbian associative mechanism. 2: The brain uses a ‘guess and check’ heuristic in which visual feedback that is obtained after an eye movement to a sound alters future performance, perhaps by recruiting the brain’s reward-related circuitry. We assessed the effects of exposure to visual stimuli spatially mismatched from sounds on performance of an interleaved auditory-only saccade task. We found that when humans and monkeys were provided the visual stimulus asynchronously with the sound but as feedback to an auditory-guided saccade, they shifted their subsequent auditory-only performance toward the direction of the visual cue by 1.3–1.7 degrees, or 22–28% of the original 6 degree visual-auditory mismatch. In contrast when the visual stimulus was presented synchronously with the sound but extinguished too quickly to provide this feedback, there was little change in subsequent auditory-only performance. Our results suggest that the outcome of our own actions is vital to localizing sounds correctly. Contrary to previous expectations, visual calibration of auditory space does not appear to require visual-auditory associations based on synchrony/simultaneity.
The mechanisms and functional anatomy underlying the early stages of speech perception are still not well understood. Auditory agnosia is a deficit of auditory object processing defined as a disability to recognize spoken languages and/or nonverbal environmental sounds and music despite adequate hearing while spontaneous speech, reading and writing are preserved. Usually, either the bilateral or unilateral temporal lobe, especially the transverse gyral lesions, are responsible for auditory agnosia. Subcortical lesions without cortical damage rarely causes auditory agnosia. We present a 73-year-old right-handed male with generalized auditory agnosia caused by a unilateral subcortical lesion. He was not able to repeat or dictate but to perform fluent and comprehensible speech. He could understand and read written words and phrases. His auditory brainstem evoked potential and audiometry were intact. This case suggested that the subcortical lesion involving unilateral acoustic radiation could cause generalized auditory agnosia.
Auditory agnosia; Unilateral subcortical lesion
Previous picture-word interference (PWI) fMRI-paradigms revealed ambiguous mechanisms underlying facilitation and inhibition in healthy subjects. Lexical distractors revealed increased (enhancement) or decreased (suppression) activation in language and monitoring/control areas. Performing a secondary examination and data analysis, we aimed to illuminate the relation between behavioral and neural interference effects comparing target-related distractors (REL) with unrelated distractors (UNREL). We hypothesized that interference involves both (A) suppression due to priming and (B) enhancement due to simultaneous distractor and target processing. Comparisons to UNREL should remain distractor unspecific even at a low threshold. (C) Distractor types with common characteristics should reveal overlapping brain areas. In a 3T MRI scanner, participants were asked to name pictures while auditory words were presented (stimulus onset asynchrony [SOA] = –200 msec). Associatively and phonologically related distractors speeded responses (facilitation), while categorically related distractors slowed them down (inhibition) compared to UNREL. As a result, (A) reduced brain activations indeed resembled previously reported patterns of neural priming. Each target-related distractor yielded suppressions at least in areas associated with vision and conflict/competition monitoring (anterior cingulate cortex [ACC]), revealing least priming for inhibitors. (B) Enhancements concerned language-related but distractor-unspecific regions. (C) Some wider brain regions were commonly suppressed for combinations of distractor types. Overlapping areas associated with conceptual priming were found for facilitatory distractors (inferior frontal gyri), and areas related to phonetic/articulatory processing (precentral gyri and left parietal operculum/insula) for distractors sharing feature overlap. Each distractor with semantic relatedness revealed nonoverlapping suppressions in lexical-phonological areas (superior temporal regions). To conclude, interference combines suppression of areas well known from neural priming and enhancement of language-related areas caused by dual activation from target and distractor. Differences between interference and priming need to be taken into account. The present interference paradigm has the potential to reveal the functioning of word-processing stages, cognitive control, and responsiveness to priming at the same time.
Facilitation; fMRI; inhibition; naming; picture-word interference task; semantic priming; visual object priming; word processing
A common complaint amongst listeners with hearing loss (HL) is that they have difficulty communicating in common social settings. This paper reviews how normal-hearing listeners cope in such settings, especially how they focus attention on a source of interest. Results of experiments with normal-hearing listeners suggest that the ability to selectively attend depends on the ability to analyze the acoustic scene and to form perceptual auditory objects properly. Unfortunately, sound features important for auditory object formation may not be robustly encoded in the auditory periphery of HL listeners. In turn, impaired auditory object formation may interfere with the ability to filter out competing sound sources. Peripheral degradations are also likely to reduce the salience of higher-order auditory cues such as location, pitch, and timbre, which enable normal-hearing listeners to select a desired sound source out of a sound mixture. Degraded peripheral processing is also likely to increase the time required to form auditory objects and focus selective attention, so that listeners with hearing loss lose the ability to switch attention rapidly (a skill that is particularly important when trying to participate in a lively conversation). Finally, peripheral deficits may interfere with strategies that normal-hearing listeners employ in complex acoustic settings, including the use of memory to fill in bits of the conversation that are missed. Thus, peripheral hearing deficits are likely to cause a number of inter-related problems that challenge the ability of HL listeners to communicate in social settings requiring selective attention.
attention; segregation; auditory object; auditory scene analysis