In a natural environment, objects that we look for often make characteristic sounds. A hiding cat may meow, or the keys in the cluttered drawer may jingle when moved. Using a visual search paradigm, we demonstrated that characteristic sounds facilitated visual localization of objects, even when the sounds carried no location information. For example, finding a cat was faster when participants heard a meow sound. In contrast, sounds had no effect when participants searched for names rather than pictures of objects. For example, hearing “meow” did not facilitate localization of the word cat. These results suggest that characteristic sounds cross-modally enhance visual (rather than conceptual) processing of the corresponding objects. Our behavioral demonstration of object-based cross-modal enhancement complements the extensive literature on space-based cross-modal interactions. When looking for your keys next time, you might want to play jingling sounds.
When you are looking for an object, does hearing its characteristic sound make you find it more quickly? Our recent results supported this possibility by demonstrating that when a cat target, for example, was presented among other objects, a simultaneously presented “meow” sound (containing no spatial information) reduced the manual response time for visual localization of the target. To extend these results, we determined how rapidly an object-specific auditory signal can facilitate target detection in visual search. On each trial, participants fixated a specified target object as quickly as possible. The target’s characteristic sound speeded the saccadic search time within 215–220 ms and also guided the initial saccade toward the target, compared to presentation of a distractor’s sound or to no sound. These results suggest that object-based auditory-visual interactions rapidly increase the target object’s salience in visual search.
The processing characteristics of neurons in the central auditory system are directly shaped by and reflect the statistics of natural acoustic environments, but the principles that govern the relationship between natural sound ensembles and observed responses in neurophysiological studies remain unclear. In particular, accumulating evidence suggests the presence of a code based on sustained neural firing rates, where central auditory neurons exhibit strong, persistent responses to their preferred stimuli. Such a strategy can indicate the presence of ongoing sounds, is involved in parsing complex auditory scenes, and may play a role in matching neural dynamics to varying time scales in acoustic signals. In this paper, we describe a computational framework for exploring the influence of a code based on sustained firing rates on the shape of the spectro-temporal receptive field (STRF), a linear kernel that maps a spectro-temporal acoustic stimulus to the instantaneous firing rate of a central auditory neuron. We demonstrate the emergence of richly structured STRFs that capture the structure of natural sounds over a wide range of timescales, and show how the emergent ensembles resemble those commonly reported in physiological studies. Furthermore, we compare ensembles that optimize a sustained firing code with one that optimizes a sparse code, another widely considered coding strategy, and suggest how the resulting population responses are not mutually exclusive. Finally, we demonstrate how the emergent ensembles contour the high-energy spectro-temporal modulations of natural sounds, forming a discriminative representation that captures the full range of modulation statistics that characterize natural sound ensembles. These findings have direct implications for our understanding of how sensory systems encode the informative components of natural stimuli and potentially facilitate multi-sensory integration.
We explore a fundamental question with regard to the representation of sound in the auditory system, namely: what are the coding strategies that underlie observed neurophysiological responses in central auditory areas? There has been debate in recent years as to whether neural ensembles explicitly minimize their propensity to fire (the so-called sparse coding hypothesis) or whether neurons exhibit strong, sustained firing rates when processing their preferred stimuli. Using computational modeling, we directly confront issues raised in this debate, and our results suggest that not only does a sustained firing strategy yield a sparse representation of sound, but the principle yields emergent neural ensembles that capture the rich structural variations present in natural stimuli. In particular, spectro-temporal receptive fields (STRFs) have been widely used to characterize the processing mechanisms of central auditory neurons and have revealed much about the nature of sound processing in central auditory areas. In our paper, we demonstrate how neurons that maximize a sustained firing objective yield STRFs akin to those commonly measured in physiological studies, capturing a wide range of aspects of natural sounds over a variety of timescales, suggesting that such a coding strategy underlies observed neural responses.
Seeing the image of a newscaster on a television set causes us to think that the sound coming from the loudspeaker is actually coming from the screen. How images capture sounds is mysterious because the brain uses different methods for determining the locations of visual vs. auditory stimuli. The retina senses the locations of visual objects with respect to the eyes, whereas differences in sound characteristics across the ears indicate the locations of sound sources referenced to the head. Here, we tested which reference frame (RF) is used when vision recalibrates perceived sound locations.
Visually guided biases in sound localization were induced in seven humans and two monkeys who made eye movements to auditory or audio-visual stimuli. On audio-visual (training) trials, the visual component of the targets was displaced laterally by ~5°. Interleaved auditory-only (probe) trials served to evaluate the effect of experience with mismatched visual stimuli on auditory localization. We found that the displaced visual stimuli induced ventriloquism aftereffect in both humans (~50% of the displacement size) and monkeys (~25%), but only for locations around the trained spatial region, showing that audio-visual recalibration can be spatially specific.
We tested the reference frame in which the recalibration occurs. On probe trials, we varied eye position relative to the head to dissociate head- from eye-centered RFs. Results indicate that both humans and monkeys use a mixture of the two RFs, suggesting that the neural mechanisms involved in ventriloquism occur in brain region(s) employing a hybrid RF for encoding spatial information.
visual calibration of auditory space; humans; monkeys; reference frame of auditory space representation; ventriloquism; cross-modal adaptation
In this study we investigate previous claims that a region in the left posterior superior temporal sulcus (pSTS) is more activated by audiovisual than unimodal processing. First, we compare audiovisual to visual–visual and auditory–auditory conceptual matching using auditory or visual object names that are paired with pictures of objects or their environmental sounds. Second, we compare congruent and incongruent audiovisual trials when presentation is simultaneous or sequential. Third, we compare audiovisual stimuli that are either verbal (auditory and visual words) or nonverbal (pictures of objects and their associated sounds). The results demonstrate that, when task, attention, and stimuli are controlled, pSTS activation for audiovisual conceptual matching is 1) identical to that observed for intramodal conceptual matching, 2) greater for incongruent than congruent trials when auditory and visual stimuli are simultaneously presented, and 3) identical for verbal and nonverbal stimuli. These results are not consistent with previous claims that pSTS activation reflects the active formation of an integrated audiovisual representation. After a discussion of the stimulus and task factors that modulate activation, we conclude that, when stimulus input, task, and attention are controlled, pSTS is part of a distributed set of regions involved in conceptual matching, irrespective of whether the stimuli are audiovisual, auditory–auditory or visual–visual.
amodal; audiovisual binding; conceptual integration; congruency; crossmodal
The visual and auditory systems frequently work together to facilitate the identification and localization of objects and events in the external world. Experience plays a critical role in establishing and maintaining congruent visual–auditory associations, so that the different sensory cues associated with targets that can be both seen and heard are synthesized appropriately. For stimulus location, visual information is normally more accurate and reliable and provides a reference for calibrating the perception of auditory space. During development, vision plays a key role in aligning neural representations of space in the brain, as revealed by the dramatic changes produced in auditory responses when visual inputs are altered, and is used throughout life to resolve short-term spatial conflicts between these modalities. However, accurate, and even supra-normal, auditory localization abilities can be achieved in the absence of vision, and the capacity of the mature brain to relearn to localize sound in the presence of substantially altered auditory spatial cues does not require visuomotor feedback. Thus, while vision is normally used to coordinate information across the senses, the neural circuits responsible for spatial hearing can be recalibrated in a vision-independent fashion. Nevertheless, early multisensory experience appears to be crucial for the emergence of an ability to match signals from different sensory modalities and therefore for the outcome of audiovisual-based rehabilitation of deaf patients in whom hearing has been restored by cochlear implantation.
sound localization; spatial hearing; multisensory integration; auditory plasticity; behavioural training; vision
While perceiving speech, people see mouth shapes that are systematically associated with sounds. In particular, a vertically stretched mouth produces a /woo/ sound, whereas a horizontally stretched mouth produces a /wee/ sound. We demonstrate that hearing these speech sounds alters how we see aspect ratio, a basic visual feature that contributes to perception of 3D space, objects and faces. Hearing a /woo/ sound increases the apparent vertical elongation of a shape, whereas hearing a /wee/ sound increases the apparent horizontal elongation. We further demonstrate that these sounds influence aspect ratio coding. Viewing and adapting to a tall (or flat) shape makes a subsequently presented symmetric shape appear flat (or tall). These aspect ratio aftereffects are enhanced when associated speech sounds are presented during the adaptation period, suggesting that the sounds influence visual population coding of aspect ratio. Taken together, these results extend previous demonstrations that visual information constrains auditory perception by showing the converse – speech sounds influence visual perception of a basic geometric feature.
Auditory–visual; Aspect ratio; Crossmodal; Shape perception; Speech perception
The introduction of anthropogenic sounds into the marine environment can impact some marine mammals. Impacts can be greatly reduced if appropriate mitigation measures and monitoring are implemented. This paper concerns such measures undertaken by Exxon Neftegas Limited, as operator of the Sakhalin-1 Consortium, during the Odoptu 3-D seismic survey conducted during 17 August’ September 2001. The key environmental issue was protection of the critically endangered western gray whale (Eschrichtius robustus), which feeds in summer and fall primarily in the Piltun feeding area off northeast Sakhalin Island. Existing mitigation and monitoring practices for seismic surveys in other jurisdictions were evaluated to identify best practices for reducing impacts on feeding activity by western gray whales. Two buffer zones were established to protect whales from physical injury or undue disturbance during feeding. A 1 km buffer protected all whales from exposure to levels of sound energy potentially capable of producing physical injury. A 4’ km buffer was established to avoid displacing western gray whales from feeding areas. Trained Marine Mammal Observers (MMOs) on the seismic ship Nordic Explorer had the authority to shut down the air guns if whales were sighted within these buffers.
Additional mitigation measures were also incorporated: Temporal mitigation was provided by rescheduling the program from June–August to August–September to avoid interference with spring arrival of migrating gray whales. The survey area was reduced by 19% to avoid certain waters <20 m deep where feeding whales concentrated and where seismic acquisition was a lower priority. The number of air guns and total volume of the air guns were reduced by about half (from 28 to 14 air guns and from 3,390 in3 to 1,640 in3) relative to initial plans. ‘Ramp-up’(=‘soft-start’ procedures were implemented.
Monitoring activities were conducted as needed to implement some mitigation measures, and to assess residual impacts. Aerial and vessel-based surveys determined the distribution of whales before, during and after the seismic survey. Daily aerial reconnaissance helped verify whale-free areas and select the sequence of seismic lines to be surveyed. A scout vessel with MMOs aboard was positioned 4 km shoreward of the active seismic vessel to provide better visual coverage of the 4’ km buffer and to help define the inshore edge of the 4’ km buffer. A second scout vessel remained near the seismic vessel. Shore-based observers determined whale numbers, distribution, and behavior during and after the seismic survey. Acoustic monitoring documented received sound levels near and in the main whale feeding area.
Statistical analyses of aerial survey data indicated that about 5’0 gray whales moved away from waters near (inshore of) the seismic survey during seismic operations. They shifted into the core gray whale feeding area farther south, and the proportion of gray whales observed feeding did not change over the study period.
Five shutdowns of the air guns were invoked for gray whales seen within or near the buffer. A previously unknown gray whale feeding area (the Offshore feeding area) was discovered south and offshore from the nearshore Piltun feeding area. The Offshore area has subsequently been shown to be used by feeding gray whales during several years when no anthropogenic activity occurred near the Piltun feeding area.
Shore-based counts indicated that whales continued to feed inshore of the Odoptu block throughout the seismic survey, with no significant correlation between gray whale abundance and seismic activity. Average values of most behavioral parameters were similar to those without seismic surveys. Univariate analysis showed no correlation between seismic sound levels and any behavioral parameter. Multiple regression analyses indicated that, after allowance for environmental covariates, 5 of 11 behavioral parameters were statistically correlated with estimated seismic survey-related variables; 6 of 11 behavioral parameters were not statistically correlated with seismic survey-related variables. Behavioral parameters that were correlated with seismic variables were transient and within the range of variation attributable to environmental effects.
Acoustic monitoring determined that the 4’ km buffer zone, in conjunction with reduction of the air gun array to 14 guns and 1,640 in3, was effective in limiting sound exposure. Within the Piltun feeding area, these mitigation measures were designed to insure that western gray whales were not exposed to received levels exceeding the 163 dB re 1 μPa (rms) threshold.
This was among the most complex and intensive mitigation programs ever conducted for any marine mammal. It provided valuable new information about underwater sounds and gray whale responses during a nearshore seismic program that will be useful in planning future work. Overall, the efforts in 2001 were successful in reducing impacts to levels tolerable by western gray whales. Research in 2002’005 suggested no biologically significant or population-level impacts of the 2001 seismic survey.
Seismic survey; Mitigation; Monitoring; Western gray whale; Eschrichtius robustus; Sakhalin Island; Okhotsk Sea; Russia
Language and communicative impairments are among the primary characteristics of autism spectrum disorders (ASD). Previous studies have examined auditory language processing in ASD. However, during face-to-face conversation, auditory and visual speech inputs provide complementary information, and little is known about audiovisual (AV) speech processing in ASD. It is possible to elucidate the neural correlates of AV integration by examining the effects of seeing the lip movements accompanying the speech (visual speech) on electrophysiological event-related potentials (ERP) to spoken words. Moreover, electrophysiological techniques have a high temporal resolution and thus enable us to track the time-course of spoken word processing in ASD and typical development (TD). The present study examined the ERP correlates of AV effects in three time windows that are indicative of hierarchical stages of word processing. We studied a group of TD adolescent boys (n=14) and a group of high-functioning boys with ASD (n=14). Significant group differences were found in AV integration of spoken words in the 200–300ms time window when spoken words start to be processed for meaning. These results suggest that the neural facilitation by visual speech of spoken word processing is reduced in individuals with ASD.
In typically developing (TD) individuals, behavioural and event-related potential (ERP) studies suggest that audiovisual (AV) integration enables faster and more efficient processing of speech. However, little is known about AV speech processing in individuals with autism spectrum disorder (ASD). The present study examined ERP responses to spoken words to elucidate the effects of visual speech (the lip movements accompanying a spoken word) on the range of auditory speech processing stages from sound onset detection to semantic integration. The study also included an AV condition which paired spoken words with a dynamic scrambled face in order to highlight AV effects specific to visual speech. Fourteen adolescent boys with ASD (15–17 years old) and 14 age- and verbal IQ-matched TD boys participated. The ERP of the TD group showed a pattern and topography of AV interaction effects consistent with activity within the superior temporal plane, with two dissociable effects over fronto-central and centro-parietal regions. The posterior effect (200–300ms interval) was specifically sensitive to lip movements in TD boys, and no AV modulation was observed in this region for the ASD group. Moreover, the magnitude of the posterior AV effect to visual speech correlated inversely with ASD symptomatology. In addition, the ASD boys showed an unexpected effect (P2 time window) over the frontal-central region (pooled electrodes F3, Fz, F4, FC1, FC2, FC3, FC4) which was sensitive to scrambled face stimuli. These results suggest that the neural networks facilitating processing of spoken words by visual speech are altered in individuals with ASD.
Auditory; ASD; ERP; Language; Multisensory; Visual
We adapted a behavioral procedure that has been used extensively with normal-hearing (NH) infants, the visual habituation (VH) procedure, to assess deaf infants’ discrimination and attention to speech.
Twenty-four NH 6-month-olds, 24 NH 9-month-olds, and 16 deaf infants at various ages before and following cochlear implantation (CI) were tested in a sound booth on their caregiver’s lap in front of a TV monitor. During the habituation phase, each infant was presented with a repeating speech sound (e.g. ‘hop hop hop’) paired with a visual display of a checkerboard pattern on half of the trials (‘sound trials’) and only the visual display on the other half (‘silent trials’). When the infant’s looking time decreased and reached a habituation criterion, a test phase began. This consisted of two trials: an ‘old trial’ that was identical to the ‘sound trials’ and a ‘novel trial’ that consisted of a different repeating speech sound (e.g. ‘ahhh’) paired with the same checkerboard pattern.
During the habituation phase, NH infants looked significantly longer during the sound trials than during the silent trials. However, deaf infants who had received cochlear implants (CIs) displayed a much weaker preference for the sound trials. On the other hand, both NH infants and deaf infants with CIs attended significantly longer to the visual display during the novel trial than during the old trial, suggesting that they were able to discriminate the speech patterns. Before receiving CIs, deaf infants did not show any preferences.
Taken together, the findings suggest that deaf infants who receive CIs are able to detect and discriminate some speech patterns. However, their overall attention to speech sounds may be less than NH infants’. Attention to speech may impact other aspects of speech perception and spoken language development, such as segmenting words from fluent speech and learning novel words. Implications of the effects of early auditory deprivation and age at CI on speech perception and language development are discussed.
Speech perception; Deaf infants; Cochlear implantation
Connections unifying hemispheric sensory representations of vision and touch occur in cortex, but for hearing, commissural connections earlier in the pathway may be important. The brainstem auditory pathways course bilaterally to the inferior colliculi (ICs). Each IC represents one side of auditory space but they are interconnected by a commissure. By deactivating one IC in guinea pig with cooling or microdialysis of procaine, and recording neural activity to sound in the other, we found that commissural input influences fundamental aspects of auditory processing. The areas of nonV frequency response areas (FRAs) were modulated, but the areas of almost all V-shaped FRAs were not. The supra-threshold sensitivity of rate level functions decreased during deactivation and the ability to signal changes in sound level was decremented. This commissural enhancement suggests the ICs should be viewed as a single entity in which the representation of sound in each is governed by the other.
The bilateral arrangement of our eyes and ears enables us to receive information from both sides of our body. This information is conveyed via various sensory pathways that take different routes through the brain to culminate in the cerebral hemispheres. The information is then processed in the brain's outer layer, which is called the cortex.
In the visual system, information from both eyes is kept separate until it reaches the cortex. A similar arrangement exists for touch. However, hearing is unusual among our senses in that sounds undergo much more processing in the brainstem, which is located at the base of the brain, than other types of stimuli. Orton and Rees now show that, in contrast to vision and touch, information about sounds occurring to our left or right is refined by interactions between the two sides of the midbrain.
To test for sideward interactions between the two limbs of the auditory pathway, electrodes were lowered into the brains of anesthetized guinea pigs so that neuronal responses to tones could be recorded. The electrodes were placed in the region of the midbrain that contains two structures called the inferior colliculi (meaning ‘lower hills’ in Latin). Each inferior colliculus predominantly receives inputs from the opposite ear. However, recordings made in one colliculus when the other was deactivated revealed that one colliculus normally alters the response of the other. This shows that there is an important sideward interaction between the two halves of the auditory pathway in the midbrain that refines how fundamental aspects of sound, such as its frequency and intensity, are processed.
This represents a marked departure from our previous understanding of auditory processing in the mammalian brain, and opens up new lines of investigation into the functioning of the auditory system in health and disease.
guinea pig; auditory; deactivation; receptive field; commissure; sensory; other
Watching a speaker's facial movements can dramatically enhance our ability to comprehend words, especially in noisy environments. From a general doctrine of combining information from different sensory modalities (the principle of inverse effectiveness), one would expect that the visual signals would be most effective at the highest levels of auditory noise. In contrast, we find, in accord with a recent paper, that visual information improves performance more at intermediate levels of auditory noise than at the highest levels, and we show that a novel visual stimulus containing only temporal information does the same. We present a Bayesian model of optimal cue integration that can explain these conflicts. In this model, words are regarded as points in a multidimensional space and word recognition is a probabilistic inference process. When the dimensionality of the feature space is low, the Bayesian model predicts inverse effectiveness; when the dimensionality is high, the enhancement is maximal at intermediate auditory noise levels. When the auditory and visual stimuli differ slightly in high noise, the model makes a counterintuitive prediction: as sound quality increases, the proportion of reported words corresponding to the visual stimulus should first increase and then decrease. We confirm this prediction in a behavioral experiment. We conclude that auditory-visual speech perception obeys the same notion of optimality previously observed only for simple multisensory stimuli.
The purpose of this study was to examine working memory for sequences of auditory and visual stimuli in prelingually deafened pediatric cochlear implant users with at least 4 yr of device experience.
Two groups of 8- and 9-yr-old children, 45 normal-hearing and 45 hearing-impaired users of cochlear implants, completed a novel working memory task requiring memory for sequences of either visual-spatial cues or visual-spatial cues paired with auditory signals. In each sequence, colored response buttons were illuminated either with or without simultaneous auditory presentation of verbal labels (color-names or digit-names). The child was required to reproduce each sequence by pressing the appropriate buttons on the response box. Sequence length was varied and a measure of memory span corresponding to the longest list length correctly reproduced under each set of presentation conditions was recorded. Additional children completed a modified task that eliminated the visual-spatial light cues but that still required reproduction of auditory color-name sequences using the same response box. Data from 37 pediatric cochlear implant users were collected using this modified task.
The cochlear implant group obtained shorter span scores on average than the normal-hearing group, regardless of presentation format. The normal-hearing children also demonstrated a larger “redundancy gain” than children in the cochlear implant group—that is, the normal-hearing group displayed better memory for auditory-plus-lights sequences than for the lights-only sequences. Although the children with cochlear implants did not use the auditory signals as effectively as normal-hearing children when visual-spatial cues were also available, their performance on the modified memory task using only auditory cues showed that some of the children were capable of encoding auditory-only sequences at a level comparable with normal-hearing children.
The finding of smaller redundancy gains from the addition of auditory cues to visual-spatial sequences in the cochlear implant group as compared with the normal-hearing group demonstrates differences in encoding or rehearsal strategies between these two groups of children. Differences in memory span between the two groups even on a visual-spatial memory task suggests that atypical working memory development irrespective of input modality may be present in this clinical population.
Spatial attention to a visual stimulus that occurs synchronously with a task-irrelevant sound from a different location can lead to increased activity not only in visual cortex, but also auditory cortex, apparently reflecting the object-related spreading of attention across both space and modality (Busse et al., 2005). The processing of stimulus conflict, including multisensory stimulus conflict, is known to activate the anterior cingulate cortex (ACC), but the interactive influence on the sensory cortices remains relatively unexamined. Here we used fMRI to examine whether the multisensory spread of visual attention across the sensory cortices previously observed will be modulated by whether there is conceptual or object-related conflict between the relevant visual and irrelevant auditory inputs. Subjects visually attended to one of two lateralized visual letter streams while synchronously occurring, task-irrelevant, letter sounds were presented centrally, which could be either congruent or incongruent with the visual letters. We observed significant enhancements for incongruent versus congruent letter-sound combinations in the ACC and in the contralateral visual cortex when the visual component was attended, presumably reflecting the conflict detection and the need for boosted attention to the visual stimulus during incongruent trials. In the auditory cortices, activity increased bilaterally if the spatially discordant auditory stimulation was incongruent, but only in the left, language-dominant side when congruent. We conclude that a conflicting incongruent sound, even when task-irrelevant, distracts more strongly than a congruent one, leading to greater capture of attention. This greater capture of attention in turn results in increased activity in the auditory cortex.
When different perceptual signals arising from the same physical entity are integrated, they form a more reliable sensory estimate. When such repetitive sensory signals are pitted against other competing stimuli, such as in a Stroop Task, this redundancy may lead to stronger processing that biases behavior toward reporting the redundant stimuli. This bias would therefore, be expected to evoke greater incongruency effects than if these stimuli did not contain redundant sensory features. In the present paper we report that this is not the case for a set of three crossmodal, auditory-visual Stroop tasks. In these tasks participants attended to, and reported, either the visual or the auditory stimulus (in separate blocks) while ignoring the other, unattended modality. The visual component of these stimuli could be purely semantic (words), purely perceptual (colors), or the combination of both. Based on previous work showing enhanced crossmodal integration and visual search gains for redundantly coded stimuli, we had expected that relative to the single features, redundant visual features would have induced both greater visual distracter incongruency effects for attended auditory targets, and been less influenced by auditory distracters for attended visual targets. Overall, reaction times were faster for visual targets and were dominated by behavioral facilitation for the cross-modal interactions (relative to interference), but showed surprisingly little influence of visual feature redundancy. Post-hoc analyses revealed modest and trending evidence for possible increases in behavioral interference for redundant visual distracters on auditory targets, however, these effects were substantially smaller than anticipated and were not accompanied by a redundancy effect for behavioral facilitation or for attended visual targets.
multisensory conflict; stroop task; redundancy gains; stimulus onset asynchrony (SOA)
Can hearing a word change what one sees? Although visual sensitivity is known to be enhanced by attending to the location of the target, perceptual enhancements of following cues to the identity of an object have been difficult to find. Here, we show that perceptual sensitivity is enhanced by verbal, but not visual cues.
Participants completed an object detection task in which they made an object-presence or -absence decision to briefly-presented letters. Hearing the letter name prior to the detection task increased perceptual sensitivity (d′). A visual cue in the form of a preview of the to-be-detected letter did not. Follow-up experiments found that the auditory cuing effect was specific to validly cued stimuli. The magnitude of the cuing effect positively correlated with an individual measure of vividness of mental imagery; introducing uncertainty into the position of the stimulus did not reduce the magnitude of the cuing effect, but eliminated the correlation with mental imagery.
Hearing a word made otherwise invisible objects visible. Interestingly, seeing a preview of the target stimulus did not similarly enhance detection of the target. These results are compatible with an account in which auditory verbal labels modulate lower-level visual processing. The findings show that a verbal cue in the form of hearing a word can influence even the most elementary visual processing and inform our understanding of how language affects perception.
A major part of learning a language is learning to map spoken words onto objects in the environment. An open question is what are the consequences of this learning for cognition and perception? Here, we present a series of experiments that examine effects of verbal labels on the activation of conceptual information as measured through picture verification tasks. We find that verbal cues, such as the word “cat,” lead to faster and more accurate verification of congruent objects and rejection of incongruent objects than do either nonverbal cues, such as the sound of a cat meowing, or words that do not directly refer to the object, such as the word “meowing.” This label advantage does not arise from verbal labels being more familiar or easier to process than other cues, and it does extends to newly learned labels and sounds. Despite having equivalent facility in learning associations between novel objects and labels or sounds, conceptual information is activated more effectively through verbal means than through non-verbal means. Thus, rather than simply accessing nonverbal concepts, language activates aspects of a conceptual representation in a particularly effective way. We offer preliminary support that representations activated via verbal means are more categorical and show greater consistency between subjects. These results inform the understanding of how human cognition is shaped by language and hint at effects that different patterns of naming can have on conceptual structure.
concepts; labels; words; representations; language and thought
Objective: To develop and evaluate a pilot program to reduce unauthorized access to firearms by youth by distributing gun safes and trigger locks to households.
Design: Pilot intervention with pre/post-evaluation design.
Setting: Two Alaska Native villages in the Bristol Bay Health Corporation region of southwest Alaska.
Subjects: Forty randomly selected households with two or more guns in the home.
Intervention: Initially, a focus group of community members who owned guns was convened to receive input regarding the acceptability of the distribution procedure for the gun storage devices. One gun safe and one trigger lock were distributed to each of the selected households during December 2000. Village public safety officers assisted with the distribution of the safes and provided gun storage education to participants.
Main outcome measures: Baseline data were collected regarding household gun storage conditions at the time of device distribution. Three months after distribution, unannounced onsite home visits were conducted to identify if residents were using the gun safes and/or trigger locks.
Results: All selected households had at least two guns and 28 (70%) of the 40 households owned more than two guns. At baseline, 85% of homes were found to have unlocked guns in the home and were most often found in the breezeway, bedroom, storage room, or throughout the residence. During the follow up visits, 32 (86%) of the 37 gun safes were found locked with guns inside. In contrast, only 11 (30%) of the 37 trigger locks were found to be in use.
Conclusions: This community based program demonstrated that Alaska Native gun owners accepted and used gun safes when they were installed in their homes, leading to substantial improvements in gun storage practices. Trigger locks were much less likely to be used.
In nature, communication sounds among animal species including humans are typical complex sounds that occur in sequence and vary with time in several parameters including amplitude, frequency, duration as well as separation, and order of individual sounds. Among these multiple parameters, sound duration is a simple but important one that contributes to the distinct spectral and temporal attributes of individual biological sounds. Likewise, the separation of individual sounds is an important temporal attribute that determines an animal's ability in distinguishing individual sounds. Whereas duration selectivity of auditory neurons underlies an animal's ability in recognition of sound duration, the recovery cycle of auditory neurons determines a neuron's ability in responding to closely spaced sound pulses and therefore, it underlies the animal's ability in analyzing the order of individual sounds. Since the multiple parameters of naturally occurring communication sounds vary with time, the analysis of a specific sound parameter by an animal would be inevitably affected by other co-varying sound parameters. This is particularly obvious in insectivorous bats, which rely on analysis of returning echoes for prey capture when they systematically vary the multiple pulse parameters throughout a target approach sequence. In this review article, we present our studies of dynamic variation of duration selectivity and recovery cycle of neurons in the central nucleus of the inferior colliculus of the frequency-modulated bats to highlight the dynamic temporal signal processing of central auditory neurons. These studies use single pulses and three biologically relevant pulse-echo (P-E) pairs with varied duration, gap, and amplitude difference similar to that occurring during search, approach, and terminal phases of hunting by bats. These studies show that most collicular neurons respond maximally to a best tuned sound duration (BD). The sound duration to which these neurons are tuned correspond closely to the behaviorally relevant sounds occurring at different phases of hunting. The duration selectivity of these collicular neurons progressively increases with decrease in the duration of pulse and echo, P-E gap, and P-E amplitude difference. GABAergic inhibition plays an important role in shaping the duration selectivity of these collicular neurons. The duration selectivity of these neurons is systematically organized along the tonotopic axis of the inferior colliculus and is closely correlated with the graded spatial distribution of GABAA receptors. Duration-selective collicular neurons have a wide range of recovery cycle covering the P-E intervals occurring throughout the entire target approaching sequences. Collicular neurons with low best frequency and short BD recover rapidly when stimulated with P-E pairs with short duration and small P-E amplitude difference, whereas neurons with high best frequency and long BD recover rapidly when stimulated with P-E pairs with long duration and large P-E amplitude difference. This dynamic variation of echo duration selectivity and recovery cycle of collicular neurons may serve as the neural basis underlying successful hunting by bats. Conceivably, high best frequency neurons with long BD would be suitable for echo recognition during search and approach phases of hunting when the returning echoes are high in frequency, large in P-E amplitude difference, long in duration but low in repetition rate. Conversely, low best frequency neurons with shorter BD and sharper duration selectivity would be suitable for echo recognition during the terminal phase of hunting when the highly repetitive echoes are low in frequency, small in P-E amplitude difference, and short in duration. Furthermore, the tonotopically organized duration selectivity would make it possible to facilitate the recruitment of different groups of collicular neurons along the tonotopic axis for effective processing of the returning echoes throughout the entire course of hunting.
duration selectivity; echolocation; inferior colliculus; pulse-echo pairs; recovery cycle; temporal signal processing
To assess the effects of undercover police stings and lawsuits against gun dealers suspected of facilitating illegal gun sales in three US cities (Chicago, Detroit, Gary) on the flow of new firearms to criminals.
An interrupted time series design and negative binomial regression analyses were used to test for temporal change in the recovery of guns used in crimes within one year of retail sale in both intervention and comparison cities.
The stings were associated with an abrupt 46.4% reduction in the flow of new guns to criminals in Chicago (95% confidence interval, −58.6% to −30.5%), and with a gradual reduction in new crime guns recovered in Detroit. There was no significant change associated with the stings in Gary, and no change in comparison cities that was coincident with the stings in Chicago and Detroit.
The announcement of police stings and lawsuits against suspect gun dealers appeared to have reduced the supply of new guns to criminals in Chicago significantly, and may have contributed to beneficial effects in Detroit. Given the important role that gun stores play in supplying guns to criminals in the US, further efforts of this type are warranted and should be evaluated.
firearms; gun policy; firearms trafficking
Recognizing an object requires binding together several cues, which may be distributed across different sensory modalities, and ignoring competing information originating from other objects. In addition, knowledge of the semantic category of an object is fundamental to determine how we should react to it. Here we investigate the role of semantic categories in the processing of auditory-visual objects.
We used an auditory-visual object-recognition task (go/no-go paradigm). We compared recognition times for two categories: a biologically relevant one (animals) and a non-biologically relevant one (means of transport). Participants were asked to react as fast as possible to target objects, presented in the visual and/or the auditory modality, and to withhold their response for distractor objects. A first main finding was that, when participants were presented with unimodal or bimodal congruent stimuli (an image and a sound from the same object), similar reaction times were observed for all object categories. Thus, there was no advantage in the speed of recognition for biologically relevant compared to non-biologically relevant objects. A second finding was that, in the presence of a biologically relevant auditory distractor, the processing of a target object was slowed down, whether or not it was itself biologically relevant. It seems impossible to effectively ignore an animal sound, even when it is irrelevant to the task.
These results suggest a specific and mandatory processing of animal sounds, possibly due to phylogenetic memory and consistent with the idea that hearing is particularly efficient as an alerting sense. They also highlight the importance of taking into account the auditory modality when investigating the way object concepts of biologically relevant categories are stored and retrieved.
This research studied whether the mode of input (auditory vs audiovisual) influenced semantic access by speech in children with sensorineural hearing impairment (HI).
Participants, 31 children with HI and 62 children with normal hearing (NH), were tested with our new multi-modal picture word task. Children were instructed to name pictures displayed on a monitor and ignore auditory or audiovisual speech distractors. The semantic content of the distractors was varied to be related vs unrelated to the pictures (e.g, picture-distractor of dog-bear vs dog-cheese respectively). In children with NH, picture naming times were slower in the presence of semantically-related distractors. This slowing, called semantic interference, is attributed to the meaning-related picture-distractor entries competing for selection and control of the response [the lexical selection by competition (LSbyC) hypothesis]. Recently, a modification of the LSbyC hypothesis, called the competition threshold (CT) hypothesis, proposed that 1) the competition between the picture-distractor entries is determined by a threshold, and 2) distractors with experimentally reduced fidelity cannot reach the competition threshold. Thus, semantically-related distractors with reduced fidelity do not produce the normal interference effect, but instead no effect or semantic facilitation (faster picture naming times for semantically-related vs -unrelated distractors). Facilitation occurs because the activation level of the semantically-related distractor with reduced fidelity 1) is not sufficient to exceed the competition threshold and produce interference but 2) is sufficient to activate its concept which then strengthens the activation of the picture and facilitates naming. This research investigated whether the proposals of the CT hypothesis generalize to the auditory domain, to the natural degradation of speech due to HI, and to participants who are children. Our multi-modal picture word task allowed us to 1) quantify picture naming results in the presence of auditory speech distractors and 2) probe whether the addition of visual speech enriched the fidelity of the auditory input sufficiently to influence results.
In the HI group, the auditory distractors produced no effect or a facilitative effect, in agreement with proposals of the CT hypothesis. In contrast, the audiovisual distractors produced the normal semantic interference effect. Results in the HI vs NH groups differed significantly for the auditory mode, but not for the audiovisual mode.
This research indicates that the lower fidelity auditory speech associated with HI affects the normalcy of semantic access by children. Further, adding visual speech enriches the lower fidelity auditory input sufficiently to produce the semantic interference effect typical of children with NH.
Dyslexic and control first-grade school children were compared in a Symbol-to-Sound matching test based on a non-linguistic audiovisual training which is known to have a remediating effect on dyslexia. Visual symbol patterns had to be matched with predicted sound patterns. Sounds incongruent with the corresponding visual symbol (thus not matching the prediction) elicited the N2b and P3a event-related potential (ERP) components relative to congruent sounds in control children. Their ERPs resembled the ERP effects previously reported for healthy adults with this paradigm. In dyslexic children, N2b onset latency was delayed and its amplitude significantly reduced over left hemisphere whereas P3a was absent. Moreover, N2b amplitudes significantly correlated with the reading skills. ERPs to sound changes in a control condition were unaffected. In addition, correctly predicted sounds, that is, sounds that are congruent with the visual symbol, elicited an early induced auditory gamma band response (GBR) reflecting synchronization of brain activity in normal-reading children as previously observed in healthy adults. However, dyslexic children showed no GBR. This indicates that visual symbolic and auditory sensory information are not integrated into a unitary audiovisual object representation in them. Finally, incongruent sounds were followed by a later desynchronization of brain activity in the gamma band in both groups. This desynchronization was significantly larger in dyslexic children. Although both groups accomplished the task successfully remarkable group differences in brain responses suggest that normal-reading children and dyslexic children recruit (partly) different brain mechanisms when solving the task. We propose that abnormal ERPs and GBRs in dyslexic readers indicate a deficit resulting in a widespread impairment in processing and integrating auditory and visual information and contributing to the reading impairment in dyslexia.
dyslexia; audiovisual; integration; mismatch; reading; gamma band; oscillatory activity; ERPs
Speech reading enhances auditory perception in noise. One means by which this perceptual facilitation comes about is through information from visual networks reinforcing the encoding of the congruent speech signal by ignoring interfering acoustic signals. We tested this hypothesis neurophysiologically by acquiring EEG while individuals listened to words with a fixed portion of each word replaced by white noise. Congruent (meaningful) or incongruent (reversed frames) mouth movements accompanied the words. Individuals judged whether they heard the words as continuous (illusion) or interrupted (illusion failure) through the noise. We hypothesized that congruent, as opposed to incongruent, mouth movements should further enhance illusory perception by suppressing the auditory cortex's response to interruption onsets and offsets. Indeed, we found that the N1 auditory evoked potential (AEP) to noise onsets and offsets was reduced when individuals experienced the illusion during congruent, but not incongruent, audiovisual streams. This N1 inhibitory effect was most prominent at noise offsets, suggesting that visual influences on auditory perception are instigated to a greater extent during noisy periods. These findings suggest that visual context due to speech-reading disengages (inhibits) neural processes associated with interfering sounds (e.g., noisy interruptions) during speech perception.
audiovisual integration; auditory evoked potentials; degraded speech; illusory filling-in; phonemic restoration
In acquiring language, babies learn not only that people can communicate about objects and events, but also that they typically use a particular kind of act as the communicative signal. The current studies asked whether 1-year-olds’ learning of names during joint attention is guided by the expectation that names will be in the form of spoken words. In the first study, 13-month-olds were introduced to either a novel word or a novel sound-producing action (using a small noisemaker). Both the word and the sound were produced by a researcher as she showed the baby a new toy during a joint attention episode. The baby’s memory for the link between the word or sound and the object was tested in a multiple choice procedure. Thirteen-month-olds learned both the word–object and sound–object correspondences, as evidenced by their choosing the target reliably in response to hearing the word or sound on test trials, but not on control trials when no word or sound was present. In the second study, 13-month-olds, but not 20-month-olds, learned a new sound–object correspondence. These results indicate that infants initially accept a broad range of signals in communicative contexts and narrow the range with development.