Related Articles
In a natural environment, objects that we look for often make characteristic sounds. A hiding cat may meow, or the keys in the cluttered drawer may jingle when moved. Using a visual search paradigm, we demonstrated that characteristic sounds facilitated visual localization of objects, even when the sounds carried no location information. For example, finding a cat was faster when participants heard a meow sound. In contrast, sounds had no effect when participants searched for names rather than pictures of objects. For example, hearing “meow” did not facilitate localization of the word cat. These results suggest that characteristic sounds cross-modally enhance visual (rather than conceptual) processing of the corresponding objects. Our behavioral demonstration of object-based cross-modal enhancement complements the extensive literature on space-based cross-modal interactions. When looking for your keys next time, you might want to play jingling sounds.
PMCID: PMC2647585
PMID: 18567253
When you are looking for an object, does hearing its characteristic sound make you find it more quickly? Our recent results supported this possibility by demonstrating that when a cat target, for example, was presented among other objects, a simultaneously presented “meow” sound (containing no spatial information) reduced the manual response time for visual localization of the target. To extend these results, we determined how rapidly an object-specific auditory signal can facilitate target detection in visual search. On each trial, participants fixated a specified target object as quickly as possible. The target’s characteristic sound speeded the saccadic search time within 215–220 ms and also guided the initial saccade toward the target, compared to presentation of a distractor’s sound or to no sound. These results suggest that object-based auditory-visual interactions rapidly increase the target object’s salience in visual search.
doi:10.3758/APP.72.7.1736
PMCID: PMC3261720
PMID: 20952773
In this study we investigate previous claims that a region in the left posterior superior temporal sulcus (pSTS) is more activated by audiovisual than unimodal processing. First, we compare audiovisual to visual–visual and auditory–auditory conceptual matching using auditory or visual object names that are paired with pictures of objects or their environmental sounds. Second, we compare congruent and incongruent audiovisual trials when presentation is simultaneous or sequential. Third, we compare audiovisual stimuli that are either verbal (auditory and visual words) or nonverbal (pictures of objects and their associated sounds). The results demonstrate that, when task, attention, and stimuli are controlled, pSTS activation for audiovisual conceptual matching is 1) identical to that observed for intramodal conceptual matching, 2) greater for incongruent than congruent trials when auditory and visual stimuli are simultaneously presented, and 3) identical for verbal and nonverbal stimuli. These results are not consistent with previous claims that pSTS activation reflects the active formation of an integrated audiovisual representation. After a discussion of the stimulus and task factors that modulate activation, we conclude that, when stimulus input, task, and attention are controlled, pSTS is part of a distributed set of regions involved in conceptual matching, irrespective of whether the stimuli are audiovisual, auditory–auditory or visual–visual.
doi:10.1093/cercor/bhn007
PMCID: PMC2536697
PMID: 18281303
amodal; audiovisual binding; conceptual integration; congruency; crossmodal
Seeing the image of a newscaster on a television set causes us to think that the sound coming from the loudspeaker is actually coming from the screen. How images capture sounds is mysterious because the brain uses different methods for determining the locations of visual vs. auditory stimuli. The retina senses the locations of visual objects with respect to the eyes, whereas differences in sound characteristics across the ears indicate the locations of sound sources referenced to the head. Here, we tested which reference frame (RF) is used when vision recalibrates perceived sound locations.
Visually guided biases in sound localization were induced in seven humans and two monkeys who made eye movements to auditory or audio-visual stimuli. On audio-visual (training) trials, the visual component of the targets was displaced laterally by ~5°. Interleaved auditory-only (probe) trials served to evaluate the effect of experience with mismatched visual stimuli on auditory localization. We found that the displaced visual stimuli induced ventriloquism aftereffect in both humans (~50% of the displacement size) and monkeys (~25%), but only for locations around the trained spatial region, showing that audio-visual recalibration can be spatially specific.
We tested the reference frame in which the recalibration occurs. On probe trials, we varied eye position relative to the head to dissociate head- from eye-centered RFs. Results indicate that both humans and monkeys use a mixture of the two RFs, suggesting that the neural mechanisms involved in ventriloquism occur in brain region(s) employing a hybrid RF for encoding spatial information.
doi:10.1523/JNEUROSCI.2783-09.2009
PMCID: PMC2804958
PMID: 19889992
visual calibration of auditory space; humans; monkeys; reference frame of auditory space representation; ventriloquism; cross-modal adaptation
Background
Can hearing a word change what one sees? Although visual sensitivity is known to be enhanced by attending to the location of the target, perceptual enhancements of following cues to the identity of an object have been difficult to find. Here, we show that perceptual sensitivity is enhanced by verbal, but not visual cues.
Methodology/Principal Findings
Participants completed an object detection task in which they made an object-presence or -absence decision to briefly-presented letters. Hearing the letter name prior to the detection task increased perceptual sensitivity (d′). A visual cue in the form of a preview of the to-be-detected letter did not. Follow-up experiments found that the auditory cuing effect was specific to validly cued stimuli. The magnitude of the cuing effect positively correlated with an individual measure of vividness of mental imagery; introducing uncertainty into the position of the stimulus did not reduce the magnitude of the cuing effect, but eliminated the correlation with mental imagery.
Conclusions/Significance
Hearing a word made otherwise invisible objects visible. Interestingly, seeing a preview of the target stimulus did not similarly enhance detection of the target. These results are compatible with an account in which auditory verbal labels modulate lower-level visual processing. The findings show that a verbal cue in the form of hearing a word can influence even the most elementary visual processing and inform our understanding of how language affects perception.
doi:10.1371/journal.pone.0011452
PMCID: PMC2898810
PMID: 20628646
For humans and animals, the ability to discriminate speech and conspecific vocalizations is an important physiological assignment of the auditory system. To reveal the underlying neural mechanism, many electrophysiological studies have investigated the neural responses of the auditory cortex to conspecific vocalizations in monkeys. The data suggest that vocalizations may be hierarchically processed along an anterior/ventral stream from the primary auditory cortex (A1) to the ventral prefrontal cortex. To date, the organization of vocalization processing has not been well investigated in the auditory cortex of other mammals. In this study, we examined the spike activities of single neurons in two early auditory cortical regions with different anteroposterior locations: anterior auditory field (AAF) and posterior auditory field (PAF) in awake cats, as the animals were passively listening to forward and backward conspecific calls (meows) and human vowels. We found that the neural response patterns in PAF were more complex and had longer latency than those in AAF. The selectivity for different vocalizations based on the mean firing rate was low in both AAF and PAF, and not significantly different between them; however, more vocalization information was transmitted when the temporal response profiles were considered, and the maximum transmitted information by PAF neurons was higher than that by AAF neurons. Discrimination accuracy based on the activities of an ensemble of PAF neurons was also better than that of AAF neurons. Our results suggest that AAF and PAF are similar with regard to which vocalizations they represent but differ in the way they represent these vocalizations, and there may be a complex processing stream between them.
doi:10.1371/journal.pone.0052942
PMCID: PMC3534661
PMID: 23301004
Seeing the articulatory gestures of the speaker significantly enhances speech perception. Findings from recent neuroimaging studies suggest that activation of the speech motor system during lipreading enhance speech perception by tuning, in a top-down fashion, speech-sound processing in the superior aspects of the posterior temporal lobe. Anatomically, the superior-posterior temporal lobe areas receive connections from the auditory, visual, and speech motor cortical areas. Thus, it is possible that neuronal receptive fields are shaped during development to respond to speech-sound features that coincide with visual and motor speech cues, in contrast with the anterior/lateral temporal lobe areas that might process speech sounds predominantly based on acoustic cues. The superior-posterior temporal lobe areas have also been consistently associated with auditory spatial processing. Thus, the involvement of these areas in audiovisual speech perception might partly be explained by the spatial processing requirements when associating sounds, seen articulations, and one’s own motor movements. Tentatively, it is possible that the anterior “what” and posterior “where / how” auditory cortical processing pathways are parts of an interacting network, the instantaneous state of which determines what one ultimately perceives, as potentially reflected in the dynamics of oscillatory activity.
doi:10.2174/1874440001004020030
PMCID: PMC2948144
PMID: 20922046
Audiovisual speech perception; speech motor theory; functional MRI; magnetoencephalography; electroencephalography.
Families of infants who are congenitally deaf now have the option of cochlear implantation at a very young age. In order to assess the effectiveness of early cochlear implantation, however, new behavioral procedures are needed to measure speech perception and language skills during infancy. One important component of language development is word learning—a complex skill that involves learning arbitrary relations between words and their referents. A precursor to word learning is the ability to perceive and encode intersensory relations between co-occurring auditory and visual events. Recent studies in infants with normal hearing have shown that intersensory redundancies, such as temporal synchrony, can facilitate the ability to learn arbitrary pairings between speech sounds and objects (Gogate & Bahrick, 1998). To investigate the early stages of learning arbitrary pairings of sounds and objects after cochlear implantation, we used the Preferential Looking Paradigm (PLP) to assess infants’ ability to associate speech sounds to objects that moved in temporal synchrony with the onset and offsets of the signals. Children with normal hearing ranging in age from 6, 9, 18, and 30 months served as controls and demonstrated the ability to learn arbitrary pairings between temporally synchronous speech sounds and dynamic visual events. Infants who received their cochlear implants (CIs) at earlier ages (7–15 months of age) performed similarly to the infants with normal hearing after about 2–6 months of CI experience. In contrast, infants who received their implants at later ages (16–25 months of age) did not demonstrate learning of the associations within the context of this experiment. Possible implications of these findings are discussed.
PMCID: PMC3114639
PMID: 21643556
This fMRI study investigates how audiovisual integration differs for verbal stimuli that can be matched at a phonological level and nonverbal stimuli that can be matched at a semantic level. Subjects were presented simultaneously with one visual and one auditory stimulus and were instructed to decide whether these stimuli referred to the same object or not. Verbal stimuli were simultaneously presented spoken and written object names, and nonverbal stimuli were photographs of objects simultaneously presented with naturally occurring object sounds. Stimulus differences were controlled by including two further conditions that paired photographs of objects with spoken words and object sounds with written words. Verbal matching, relative to all other conditions, increased activation in a region of the left superior temporal sulcus that has previously been associated with phonological processing. Nonverbal matching, relative to all other conditions, increased activation in a right fusiform region that has previously been associated with structural and conceptual object processing. Thus, we demonstrate how brain activation for audiovisual integration depends on the verbal content of the stimuli, even when stimulus and task processing differences are controlled.
doi:10.1016/j.bandl.2008.10.005
PMCID: PMC2693664
PMID: 19101025
Audiovisual; Integration; Verbal; Nonverbal; Semantic; Conceptual; Phonological; Amodal
Background
Coloured-hearing (CH) synesthesia is a perceptual phenomenon in which an acoustic stimulus (the inducer) initiates a concurrent colour perception (the concurrent). Individuals with CH synesthesia "see" colours when hearing tones, words, or music; this specific phenomenon suggesting a close relationship between auditory and visual representations. To date, it is still unknown whether the perception of colours is associated with a modulation of brain functions in the inducing brain area, namely in the auditory-related cortex and associated brain areas. In addition, there is an on-going debate as to whether attention to the inducer is necessarily required for eliciting a visual concurrent, or whether the latter can emerge in a pre-attentive fashion.
Results
By using the EEG technique in the context of a pre-attentive mismatch negativity (MMN) paradigm, we show that the binding of tones and colours in CH synesthetes is associated with increased MMN amplitudes in response to deviant tones supposed to induce novel concurrent colour perceptions. Most notably, the increased MMN amplitudes we revealed in the CH synesthetes were associated with stronger intracerebral current densities originating from the auditory cortex, parietal cortex, and ventral visual areas.
Conclusions
The automatic binding of tones and colours in CH synesthetes is accompanied by an early pre-attentive process recruiting the auditory cortex, inferior and superior parietal lobules, as well as ventral occipital areas.
doi:10.1186/1471-2202-13-151
PMCID: PMC3547775
PMID: 23241212
Coloured-hearing synesthesia; Crossmodal integration; EEG; Mismatch negativity; Auditory cortex
Speech production involves the generation of an auditory signal from the articulators and vocal tract. When the intended auditory signal does not match the produced sounds, subsequent articulatory commands can be adjusted to reduce the difference between the intended and produced sounds. This requires an internal model of the intended speech output that can be compared to the produced speech. The aim of this functional imaging study was to identify brain activation related to the internal model of speech production after activation related to vocalization, auditory feedback, and movement in the articulators had been controlled. There were four conditions: silent articulation of speech, non-speech mouth movements, finger tapping, and visual fixation. In the speech conditions, participants produced the mouth movements associated with the words “one” and “three.” We eliminated auditory feedback from the spoken output by instructing participants to articulate these words without producing any sound. The non-speech mouth movement conditions involved lip pursing and tongue protrusions to control for movement in the articulators. The main difference between our speech and non-speech mouth movement conditions is that prior experience producing speech sounds leads to the automatic and covert generation of auditory and phonological associations that may play a role in predicting auditory feedback. We found that, relative to non-speech mouth movements, silent speech activated Broca’s area in the left dorsal pars opercularis and Wernicke’s area in the left posterior superior temporal sulcus. We discuss these results in the context of a generative model of speech production and propose that Broca’s and Wernicke’s areas may be involved in predicting the speech output that follows articulation. These predictions could provide a mechanism by which rapid movement of the articulators is precisely matched to the intended speech outputs during future articulations.
doi:10.3389/fpsyg.2011.00237
PMCID: PMC3174393
PMID: 21954392
speech production; auditory feedback; PET; fMRI; forward model
The aim of this study was to investigate the hypothesis that semantic information facilitates auditory and visual spatial learning and memory. An auditory spatial task was administered, whereby healthy participants were placed in the center of a semi-circle that contained an array of speakers where the locations of nameable and non-nameable sounds were learned. In the visual spatial task, locations of pictures of abstract art intermixed with nameable objects were learned by presenting these items in specific locations on a computer screen. Participants took part in both the auditory and visual spatial tasks, which were counterbalanced for order and were learned at the same rate. Results showed that learning and memory for the spatial locations of nameable sounds and pictures was significantly better than for non-nameable stimuli. Interestingly, there was a cross-modal learning effect such that the auditory task facilitated learning of the visual task and vice versa. In conclusion, our results support the hypotheses that the semantic representation of items, as well as the presentation of items in different modalities, facilitate spatial learning and memory.
doi:10.3389/fpsyg.2010.00228
PMCID: PMC3153833
PMID: 21833283
audition; vision; hippocampus; spatial memory; cognitive map
The simultaneous presentation of a stimulus in one sensory modality often enhances target detection in another sensory modality, but the neural mechanisms that govern these effects are still under investigation. Here we test a hypothesis proposed in the neurophysiologic literature: that auditory facilitation of visual-target detection operates through cross-sensory phase reset of ongoing neural oscillations (see Lakatos et al., 2009). To date, measurement limitations have prevented this potentially powerful neural mechanism from being directly linked with its predicted behavioral consequences. The present experiment uses a psychophysical approach in humans to demonstrate, for the first time, stimulus-locked periodicity in visual-target detection, following a temporally informative sound. Our data further demonstrate that periodicity in behavioral performance is strongly influenced by the probability of audiovisual co-occurrence. We argue that fluctuations in visual-target detection result from cross-sensory phase reset, both at the moment it occurs and persisting for seconds thereafter. The precise frequency at which this periodicity operates remains to be determined through a method that allows for a higher sampling rate.
doi:10.1523/JNEUROSCI.1338-11.2011
PMCID: PMC3343369
PMID: 21734288
Naming is a fundamental aspect of language and is virtually always assessed with visual confrontation tests. Tests of the ability to name objects by their characteristic sounds would be particularly useful in the assessment of visually impaired patients, and may be particularly sensitive in Alzheimer’s disease (AD). We developed an Auditory Naming Task, requiring the identification of the source of environmental sounds (i.e., animal calls, musical instruments, vehicles) and multiple-choice recognition of those not identified. In two separate studies, mild-to-moderate AD patients performed more poorly than cognitively normal elderly on the Auditory Naming Task. This task was also more difficult than two versions of a comparable Visual Naming Task, and correlated more highly with Mini-Mental State Exam score. Internal consistency reliability was acceptable, although ROC analysis revealed auditory naming to be slightly less successful than visual confrontation naming in discriminating AD patients from normal subjects. Nonetheless, our Auditory Naming Test may prove useful in research and clinical practice, especially with visually-impaired patients.
doi:10.1080/13854046.2010.518977
PMCID: PMC2992092
PMID: 20981630
In natural environments, sensory information is embedded in temporally contiguous streams of events. This is typically the case when seeing and listening to a speaker or when engaged in scene analysis. In such contexts, two mechanisms are needed to single out and build a reliable representation of an event (or object): the temporal parsing of information and the selection of relevant information in the stream. It has previously been shown that rhythmic events naturally build temporal expectations that improve sensory processing at predictable points in time. Here, we asked to which extent temporal regularities can improve the detection and identification of events across sensory modalities. To do so, we used a dynamic visual conjunction search task accompanied by auditory cues synchronized or not with the color change of the target (horizontal or vertical bar). Sounds synchronized with the visual target improved search efficiency for temporal rates below 1.4 Hz but did not affect efficiency above that stimulation rate. Desynchronized auditory cues consistently impaired visual search below 3.3 Hz. Our results are interpreted in the context of the Dynamic Attending Theory: specifically, we suggest that a cognitive operation structures events in time irrespective of the sensory modality of input. Our results further support and specify recent neurophysiological findings by showing strong temporal selectivity for audiovisual integration in the auditory-driven improvement of visual search efficiency.
doi:10.1371/journal.pone.0040936
PMCID: PMC3400621
PMID: 22829899
A common complaint amongst listeners with hearing loss (HL) is that they have difficulty communicating in common social settings. This paper reviews how normal-hearing listeners cope in such settings, especially how they focus attention on a source of interest. Results of experiments with normal-hearing listeners suggest that the ability to selectively attend depends on the ability to analyze the acoustic scene and to form perceptual auditory objects properly. Unfortunately, sound features important for auditory object formation may not be robustly encoded in the auditory periphery of HL listeners. In turn, impaired auditory object formation may interfere with the ability to filter out competing sound sources. Peripheral degradations are also likely to reduce the salience of higher-order auditory cues such as location, pitch, and timbre, which enable normal-hearing listeners to select a desired sound source out of a sound mixture. Degraded peripheral processing is also likely to increase the time required to form auditory objects and focus selective attention, so that listeners with hearing loss lose the ability to switch attention rapidly (a skill that is particularly important when trying to participate in a lively conversation). Finally, peripheral deficits may interfere with strategies that normal-hearing listeners employ in complex acoustic settings, including the use of memory to fill in bits of the conversation that are missed. Thus, peripheral hearing deficits are likely to cause a number of inter-related problems that challenge the ability of HL listeners to communicate in social settings requiring selective attention.
doi:10.1177/1084713808325306
PMCID: PMC2700845
PMID: 18974202
attention; segregation; auditory object; auditory scene analysis
The mechanisms and functional anatomy underlying the early stages of speech perception are still not well understood. Auditory agnosia is a deficit of auditory object processing defined as a disability to recognize spoken languages and/or nonverbal environmental sounds and music despite adequate hearing while spontaneous speech, reading and writing are preserved. Usually, either the bilateral or unilateral temporal lobe, especially the transverse gyral lesions, are responsible for auditory agnosia. Subcortical lesions without cortical damage rarely causes auditory agnosia. We present a 73-year-old right-handed male with generalized auditory agnosia caused by a unilateral subcortical lesion. He was not able to repeat or dictate but to perform fluent and comprehensible speech. He could understand and read written words and phrases. His auditory brainstem evoked potential and audiometry were intact. This case suggested that the subcortical lesion involving unilateral acoustic radiation could cause generalized auditory agnosia.
doi:10.5535/arm.2012.36.6.866
PMCID: PMC3546192
PMID: 23342322
Auditory agnosia; Unilateral subcortical lesion
Previous picture-word interference (PWI) fMRI-paradigms revealed ambiguous mechanisms underlying facilitation and inhibition in healthy subjects. Lexical distractors revealed increased (enhancement) or decreased (suppression) activation in language and monitoring/control areas. Performing a secondary examination and data analysis, we aimed to illuminate the relation between behavioral and neural interference effects comparing target-related distractors (REL) with unrelated distractors (UNREL). We hypothesized that interference involves both (A) suppression due to priming and (B) enhancement due to simultaneous distractor and target processing. Comparisons to UNREL should remain distractor unspecific even at a low threshold. (C) Distractor types with common characteristics should reveal overlapping brain areas. In a 3T MRI scanner, participants were asked to name pictures while auditory words were presented (stimulus onset asynchrony [SOA] = –200 msec). Associatively and phonologically related distractors speeded responses (facilitation), while categorically related distractors slowed them down (inhibition) compared to UNREL. As a result, (A) reduced brain activations indeed resembled previously reported patterns of neural priming. Each target-related distractor yielded suppressions at least in areas associated with vision and conflict/competition monitoring (anterior cingulate cortex [ACC]), revealing least priming for inhibitors. (B) Enhancements concerned language-related but distractor-unspecific regions. (C) Some wider brain regions were commonly suppressed for combinations of distractor types. Overlapping areas associated with conceptual priming were found for facilitatory distractors (inferior frontal gyri), and areas related to phonetic/articulatory processing (precentral gyri and left parietal operculum/insula) for distractors sharing feature overlap. Each distractor with semantic relatedness revealed nonoverlapping suppressions in lexical-phonological areas (superior temporal regions). To conclude, interference combines suppression of areas well known from neural priming and enhancement of language-related areas caused by dual activation from target and distractor. Differences between interference and priming need to be taken into account. The present interference paradigm has the potential to reveal the functioning of word-processing stages, cognitive control, and responsiveness to priming at the same time.
doi:10.1002/brb3.31
PMCID: PMC3345356
PMID: 22574280
Facilitation; fMRI; inhibition; naming; picture-word interference task; semantic priming; visual object priming; word processing
Brain
2009;132(7):1928-1940.
Hearing developmental dyslexics and profoundly deaf individuals both have difficulties processing the internal structure of words (phonological processing) and learning to read. In hearing non-impaired readers, the development of phonological representations depends on audition. In hearing dyslexics, many argue, auditory processes may be impaired. In congenitally profoundly deaf individuals, auditory speech processing is essentially absent. Two separate literatures have previously reported enhanced activation in the left inferior frontal gyrus in both deaf and dyslexic adults when contrasted with hearing non-dyslexics during reading or phonological tasks. Here, we used a rhyme judgement task to compare adults from these two special populations to a hearing non-dyslexic control group. All groups were matched on non-verbal intelligence quotient, reading age and rhyme performance. Picture stimuli were used since this requires participants to generate their own phonological representations, rather than have them partially provided via text. By testing well-matched groups of participants on the same task, we aimed to establish whether previous literatures reporting differences between individuals with and without phonological processing difficulties have identified the same regions of differential activation in these two distinct populations. The data indicate greater activation in the deaf and dyslexic groups than in the hearing non-dyslexic group across a large portion of the left inferior frontal gyrus. This includes the pars triangularis, extending superiorly into the middle frontal gyrus and posteriorly to include the pars opercularis, and the junction with the ventral precentral gyrus. Within the left inferior frontal gyrus, there was variability between the two groups with phonological processing difficulties. The superior posterior tip of the left pars opercularis, extending into the precentral gyrus, was activated to a greater extent by deaf than dyslexic participants, whereas the superior posterior portion of the pars triangularis extending into the ventral pars opercularis, was activated to a greater extent by dyslexic than deaf participants. Whether these regions play differing roles in compensating for poor phonological processing is not clear. However, we argue that our main finding of greater inferior frontal gyrus activation in both groups with phonological processing difficulties in contrast to controls suggests greater reliance on the articulatory component of speech during phonological processing when auditory processes are absent (deaf group) or impaired (dyslexic group). Thus, the brain appears to develop a similar solution to a processing problem that has different antecedents in these two populations.
doi:10.1093/brain/awp129
PMCID: PMC2702837
PMID: 19467990
inferior frontal gyrus; deaf; dyslexia; rhyming; phonology
The visual and auditory systems frequently work together to facilitate the identification and localization of objects and events in the external world. Experience plays a critical role in establishing and maintaining congruent visual–auditory associations, so that the different sensory cues associated with targets that can be both seen and heard are synthesized appropriately. For stimulus location, visual information is normally more accurate and reliable and provides a reference for calibrating the perception of auditory space. During development, vision plays a key role in aligning neural representations of space in the brain, as revealed by the dramatic changes produced in auditory responses when visual inputs are altered, and is used throughout life to resolve short-term spatial conflicts between these modalities. However, accurate, and even supra-normal, auditory localization abilities can be achieved in the absence of vision, and the capacity of the mature brain to relearn to localize sound in the presence of substantially altered auditory spatial cues does not require visuomotor feedback. Thus, while vision is normally used to coordinate information across the senses, the neural circuits responsible for spatial hearing can be recalibrated in a vision-independent fashion. Nevertheless, early multisensory experience appears to be crucial for the emergence of an ability to match signals from different sensory modalities and therefore for the outcome of audiovisual-based rehabilitation of deaf patients in whom hearing has been restored by cochlear implantation.
doi:10.1098/rstb.2008.0230
PMCID: PMC2674475
PMID: 18986967
sound localization; spatial hearing; multisensory integration; auditory plasticity; behavioural training; vision
The role of attention in speech comprehension is not well understood. We used fMRI to study the neural correlates of auditory word, pseudoword, and nonspeech (spectrally-rotated speech) perception during a bimodal (auditory, visual) selective attention task. In three conditions, Attend Auditory (ignore visual), Ignore Auditory (attend visual), and Visual (no auditory stimulation), 28 subjects performed a one-back matching task in the assigned attended modality. The visual task, attending to rapidly presented Japanese characters, was designed to be highly demanding in order to prevent attention to the simultaneously presented auditory stimuli. Regardless of stimulus type, attention to the auditory channel enhanced activation by the auditory stimuli (Attend Auditory > Ignore Auditory) in bilateral posterior superior temporal regions and left inferior frontal cortex. Across attentional conditions, there were main effects of speech processing (word + pseudoword > rotated speech) in left orbitofrontal cortex and several posterior right hemisphere regions, though these areas also showed strong interactions with attention (larger speech effects in the Attend Auditory than in the Ignore Auditory condition) and no significant speech effects in the Ignore Auditory condition. Several other regions, including the postcentral gyri, left supramarginal gyrus, and temporal lobes bilaterally, showed similar interactions due to the presence of speech effects only in the Attend Auditory condition. Main effects of lexicality (word > pseudoword) were isolated to a small region of the left lateral prefrontal cortex. Examination of this region showed significant word > pseudoword activation only in the Attend Auditory condition. Several other brain regions, including left ventromedial frontal lobe, left dorsal prefrontal cortex, and left middle temporal gyrus, showed attention × lexicality interactions due to the presence of lexical activation only in the Attend Auditory condition. These results support a model in which neutral speech presented in an unattended sensory channel undergoes relatively little processing beyond the early perceptual level. Specifically, processing of phonetic and lexical-semantic information appears to be very limited in such circumstances, consistent with prior behavioral studies.
doi:10.1016/j.neuroimage.2007.09.052
PMCID: PMC2268216
PMID: 17996463
This study compares homonym learning to novel word learning by three- to four-year-old children to determine whether homonyms are learned more rapidly or more slowly than novel words. In addition, the role of form characteristics in homonym learning is examined by manipulating phonotactic probability and word frequency. Thirty-two children were exposed to homonyms and novel words in a story with visual support and learning was measured in two tasks: referent identification; picture naming. Results showed that responses to homonyms were as accurate as responses to novel words in the referent identification task. In contrast, responses to homonyms were more accurate than responses to novel words in the picture-naming task. Furthermore, homonyms composed of common sound sequences were named more accurately than those composed of rare sound sequences. The influence of word frequency was less straightforward. These results may be inconsistent with a one-to-one form-referent bias in word learning.
PMCID: PMC1389618
PMID: 16429713
Background
It is well-known that human beings are able to associate stimuli (novel or not) perceived in their environment. For example, this ability is used by children in reading acquisition when arbitrary associations between visual and auditory stimuli must be learned. The studies tend to consider it as an “implicit” process triggered by the learning of letter/sound correspondences. The study described in this paper examined whether the addition of the visuo-haptic exploration would help adults to learn more effectively the arbitrary association between visual and auditory novel stimuli.
Methodology/Principal Findings
Adults were asked to learn 15 new arbitrary associations between visual stimuli and their corresponding sounds using two learning methods which differed according to the perceptual modalities involved in the exploration of the visual stimuli. Adults used their visual modality in the “classic” learning method and both their visual and haptic modalities in the “multisensory” learning one. After both learning methods, participants showed a similar above-chance ability to recognize the visual and auditory stimuli and the audio-visual associations. However, the ability to recognize the visual-auditory associations was better after the multisensory method than after the classic one.
Conclusion/Significance
This study revealed that adults learned more efficiently the arbitrary association between visual and auditory novel stimuli when the visual stimuli were explored with both vision and touch. The results are discussed from the perspective of how they relate to the functional differences of the manual haptic modality and the hypothesis of a “haptic bond” between visual and auditory stimuli.
doi:10.1371/journal.pone.0004844
PMCID: PMC2653648
PMID: 19287486
Summary
Different pictures of Marilyn Monroe can evoke the same percept, even if greatly modified as in Andy Warhol’s famous portraits. But how does the brain recognize highly variable pictures as the same percept? Various studies have provided insights into how visual information is processed along the “ventral pathway,” via both single-cell recordings in monkeys [1, 2] and functional imaging in humans [3, 4]. Interestingly, in humans, the same “concept” of Marilyn Monroe can be evoked with other stimulus modalities, for instance by hearing or reading her name. Brain imaging studies have identified cortical areas selective to voices [5, 6] and visual word forms [7, 8]. However, how visual, text, and sound information can elicit a unique percept is still largely unknown. By using presentations of pictures and of spoken and written names, we show that (1) single neurons in the human medial temporal lobe (MTL) respond selectively to representations of the same individual across different sensory modalities; (2) the degree of multimodal invariance increases along the hierarchical structure within the MTL; and (3) such neuronal representations can be generated within less than a day or two. These results demonstrate that single neurons can encode percepts in an explicit, selective, and invariant manner, even if evoked by different sensory modalities.
doi:10.1016/j.cub.2009.06.060
PMCID: PMC3032396
PMID: 19631538
The purpose of this study was to investigate the influence of conceptual and perceptual properties of words on the speed and accuracy of lexical retrieval of children who do (CWS) and do not stutter (CWNS) during a picture-naming task. Participants consisted of 13 3- to 5-year-old CWS and the same number of CWNS. All participants had speech, language, and hearing development within normal limits, with the exception of stuttering for CWS. Both talker groups participated in a picture-naming task where they named, one at a time, computer-presented, black-on-white drawings of common age-appropriate objects. These pictures were named during four auditory priming conditions: (a) a neutral prime consisting of a tone, (b) a word prime physically related to the target word, (c) a word prime functionally related to the target word, and (d) a word prime categorically related to the target word. Speech reaction time (SRT) was measured from the offset of presentation of the picture target to the onset of participant’s verbal speech response. Results indicated that CWS were slower than CWNS across priming conditions (i.e., neutral, physical, function, category) and that the speed of lexical retrieval of CWS was more influenced by functional than perceptual aspects of target pictures named. Findings were taken to suggest that CWS tend to organize lexical information functionally more so than physically and that this tendency may relate to difficulties establishing normally fluent speech and language.
doi:10.1016/j.jfludis.2006.08.002
PMCID: PMC1831874
PMID: 17010422
STUTTERING; SEMANTIC PROCESSING; LEXICAL RETRIEVAL; SEMANTIC PRIMING; LEXICAL PRIMING; CHILDREN; SPEECH REACTION TIME