Do characteristic sounds facilitate search even for rare targets? In this experiment, participants repeatedly looked for the same category of rare targets. For example, participants looked for a gun on every trial, with the gun presented on only 10% of the trials. If a gun-shot sound automatically increased visual salience of a gun image, presenting a gun-shot sound should facilitate gun detection on the rare target-present trials even with the gun-shot sound presented on every trial, including during the target-absent trials which constituted 90% of all trials.
To determine object-based auditory-visual effects on vigilance over and above the effects of specific sounds per se (e.g., arousing, calming), we employed two types of sounds, gun-shot sounds and cat sounds. In the relevant-sound condition, participants searched for a cat while hearing a meow sound on every trial, or searched for a gun while hearing a gun-shot sound on every trial. In the irrelevant-sound condition, participants searched for a cat while hearing a gun-shot sound on every trial, or searched for a gun while hearing a meow sound on every trial. The overall effects of target type (cat or gun picture) and sound type (meow or gun-shot sound) would be attributable to differential visual salience of the cat and gun pictures and differential arousal levels induced by the meow and gun-shot sounds. Importantly, the remaining effect of sound-picture compatibility (i.e., relevant-sound condition vs. irrelevant-sound condition) would demonstrate object-specific facilitative effects of characteristic sounds on visual search with rare targets.
Thirty-six undergraduate students at Northwestern University gave informed consent to participate for partial course credit. They all had normal or corrected-to-normal visual acuity, and were tested individually in a normally lit room.
The number of pictures per search display was increased to eight (compared to four in Experiments 1 and 2) to simulate a more realistic situation for a typical vigilance task (e.g., baggage screening). The centers of the eight pictures were evenly placed along an approximate iso-acuity ellipse (21° horizontal by 16° vertical, the aspect ratio based on Rovamo & Virsu, 1979
); see for an example. The target (a gun or a cat) was presented on only 10% of the trials. The distractors were randomly selected from a set of 50 commonly encountered and portable objects: apples, a basketball, a belt, a white book, a brown book, a brush-scissors pair, a camera, a CD case, a portable MP3 player, a cell phone, a can of cleanser, a soda can, a dollar bill, a brown hat, a black hat, a pair of sun glasses, a pair of gloves, a beer bottle, a pack of chewing gum, a hair clipper, headphones, a lipstick, a long-sleeve shirt, markers, a mug, a necklace, a newspaper, a perfume bottle, a watch, a pen, a tennis racket, a sandal, a scarf, a shampoo bottle, running shoes, shorts, socks, a spray bottle, a brown sweater, a striped sweater, a wallet, tea bags, toilet paper, a tooth brush, a tube of tooth paste, towels, keys, a lighter, a stapler, and a bird. Each target was randomly selected from 15 examples of cats or 15 examples of guns. Each picture was confined within a 4.33° by 4.35° rectangular region.
An example of target-present trials for (A) the gun search task and (B) the cat search task (Experiment 3).
In the meow-sound block of trials, each meow sound was randomly selected from 15 different meow sounds. Similarly, in the gun-shot-sound block of trials, each gun-shot sound was randomly selected from 15 different gun-shot sounds. All sounds were clearly audible (~70 dB SPL), were presented via loud speakers placed on each side of the display monitor, and carried no target-location information (as in Experiments 1 and 2).
Half of the participants always searched for cats and the remaining participants always searched for guns. Each participant was tested in three blocks of 150 trials with 10 practice trials given at the beginning of each block. In the relevant-sound block, participants heard sounds characteristic of the target category on every trial. In the irrelevant-sound block, participants heard sounds characteristic of the other category on every trial. In the no-sound block, participants heard no sounds. The block order was counterbalanced across participants using all six permutations.
The stimuli were displayed on a color CRT monitor (1024 × 768) at 75Hz, and the experiment was controlled with a Macintosh PowerPC 8600 using Vision Shell software (micro ML, Inc.). A chin rest was used to stabilize the viewing distance at 80 cm.
At the beginning of the experiment, participants were informed of the target category to search for (cat or gun). The experimenter pressed the space bar to start each block of trials. After 2008 ms, the search display appeared synchronously with a sound of the search category (in the relevant-sound block), a sound of the other category (in the irrelevant-sound block), or no sound (in the no-sound block). The display remained for 2008 ms or until participants responded. Participants indicated whether a target was present or absent by pressing a corresponding response button as quickly and accurately as possible. The next search display appeared 1500 ms following the response.
We performed ANOVAs on the response time data from the target-present and target-absent trials, with target type (cat or gun) and block sequence (6 counterbalancing sequences of the three blocks) as the between-participant factors, and sound relevance (relevant sound, irrelevant sound, or no sound) and block order (1st, 2nd, or 3rd) as the within-participant factors. For both target-present and target-absent trials, there was a significant order effect in that response times speeded in later blocks, probably due to practice effects (F[2, 48] = 29.017, p < 0.0001, for target-present trials, and F[2,48] = 79.317, p < 0.0001, for target-absent trials). There was also a significant effect of target type in that participants responded to the cat targets faster than they responded to the gun targets (F[1, 24] = 23.973, p < 0.0001, for target-present trials, and F[1, 24] = 15.812, p < 0.0006, for target absent trials), suggesting that our cat pictures were generally more distinctive than our gun pictures.
Importantly, sound relevance produced significant effects on both target-present and target-absent trials (F[2, 48] = 11.386, p < 0.0001, for target-present trials and F[2, 48] = 6.174, p < 0.005, for target-absent trials). For target-present trials, the target-relevant sounds significantly speeded visual detection of rare targets compared to the target-irrelevant sounds (t = 2.166, p < 0.037), demonstrating an object-specific auditory-visual facilitation of visual search for rare targets (, black bars). The fact that target-absent trials were no slower (if anything they were faster) with the target-relevant sounds than with the target-irrelevant sounds (t = 0.710, n.s.) (, black bars) indicates that the facilitative effect of the target-relevant sounds on target-present trials is not due to response bias. If the target-relevant sounds simply biased participants to make a target-present response, the target-relevant sounds should have speeded target-present responses but slowed target-absent responses.
Figure 3 The effects of sounds consistent with the target category (Relevant sound), sounds consistent with another object category (Irrelevant sound), and no sounds, on picture search (Experiment 3). (A) Response times (upper panel) and miss rates (lower panel) (more ...)
Furthermore, presenting a sound per se (whether target relevant or irrelevant) had a large impact. Compared to no sounds, both the target-relevant and target-irrelevant sounds speeded responses on both target-present trials () and target-absent trials () (for responses on target-present trials, t = 4.973, p < 0.0001, for the target-relevant sounds vs. no sounds, and t = 2.630, p < 0.012, for the target-irrelevant sounds vs. no sounds; for responses on target-absent trials, t = 3.731, p < 0.0007, for the target-relevant sounds vs. no sounds, and t = 2.734, p < 0.009, for the target-irrelevant sounds vs. no sounds). Thus, any coincident sound (irrespective of target relevance) facilitated visual search for rare targets likely by increasing arousal and/or providing temporal cue for the onset of a search array.
The overall error rates were low (3.3% misses and 0.2% false positives for the cat search, and 5.2% misses and 1.1% false positives for the gun search), and there were no significant effects involving errors.
Visual experience in the real world is often accompanied by closely associated auditory experience. Our prior research suggests that cross-modal interactions develop through consistent and repeated multisensory experience (Smith, Grabowecky, & Suzuki, 2007
). Here we investigated the possibility that auditory and visual processing of objects and their names are associated based on their frequent co-occurrence in the real world, and that these experience-based auditory-visual associations can be used to facilitate visual search. We tested this hypothesis by comparing the effects of characteristic object sounds and spoken object names on searches for pictures and names. People tend to see an object (e.g., a key chain) and concurrently hear its characteristic sound (e.g., a jingling sound), hear an object name and simultaneously look at the corresponding object (e.g., point at a dog and say “Look at the dog!”), or see an object name and overtly or covertly pronounce it. In contrast, people usually do not read an object name and simultaneously hear a characteristic sound of the named object. We thus predicted that a characteristic object sound should facilitate search for a picture of the corresponding object (Iordanescu et al., 2008
, 2010), a spoken object name should facilitate search for both a picture and name of the corresponding object, but a characteristic object sound should not affect search for a name of the corresponding object (Iordanescu et al., 2008
). We confirmed these predictions in Experiments 1 and 2. Our results thus suggest that auditory-visual interactions from experiential associations facilitate visual search for common objects and their names.
As in our prior studies (Iordanescu et al., 2008
, 2010), target-consistent sounds facilitated target localization, but distractor-consistent sounds did not significantly slow search. However, whereas distractor-consistent sounds showed little evidence of slowing target localization compared to other objects’ sounds (sounds of objects not in the current search display) or to no sounds in our prior studies (Iordanescu et al., 2008
, 2010), the distractor-consistent sounds in the current study modestly slowed search (although not significantly) compared to the beep sound (). It is possible that the beep sound was not an optimum control because it could have speeded search by increasing arousal and/or by providing an especially effective temporal cue for the onset of a search array due to its sharp auditory onset; had we used no sound as the control, the search times with the distractor-consistent sounds may have been virtually equivalent to those with no sound (as in Iordanescu et al., 2008
, 2010). Nevertheless, we replicated our prior results in that distractor-consistent sounds did not significantly slow search even compared to the beep sound. This is consistent with the idea that top-down goal-directed signals (e.g., Reynolds & Chelazzi, 2004
) might play an important role in mediating object-based auditory-visual interactions (e.g., Molholm et al., 2004
; also see Iordanescu et al., 2008
and 2010 for discussion of neural mechanisms that potentially mediate the goal-directed nature of the object-based auditory-visual facilitation in visual search). It is interesting to note that spoken names of distractors trended toward slowing picture search (compared to the beep sound; ). It might be the case that spoken names have privileged influences on the salience of visual objects because when people call out an object (e.g., “Snake!”), it is generally beneficial to direct attention to the corresponding object.
If auditory-visual interactions generally develop through experiential associations, characteristic sounds of materials (e.g., glass, wood, plastic, etc.) may also direct attention to the corresponding materials irrespective of object information. In fact, some of the stimuli we used, for example, a wine glass and a clinking sound, a key chain and a jingling sound, and a door and a squeaking hinge sound, could be mediated by material-based rather than object-based auditory-visual associations. We are currently investigating how material-consistent sounds generated in various ways (e.g., by tapping on materials or breaking materials) facilitates visual search for specific materials, such as localizing a metal-textured patch presented among distractor patches showing other material textures.
In our prior studies (Iordanescu et al., 2008
, 2010) and also in Experiments 1 and 2, we presented a different search target on each trial and asked the participants to localize the target. In Experiment 3, we extended our results to the case where participants persistently looked for a single category of target objects while the target object was rarely presented and participants responded as to whether a target object was present or absent. This is an important extension because a challenging case of visual search involves vigilance, repeatedly looking for a rare target (e.g., a gun) as in airport baggage screening. The results from Experiment 3 have provided some useful insights.
First, presenting either a meow or gun-shot sound simultaneously with a visual search display substantially speeded responses on both target-present and target-absent trials, compared to presenting no sound. It is possible that presenting any sound, even beeps, might increase arousal and/or provide a temporal cue to speed visual search with rare targets. Alternatively, it might be the case that meaningful and affectively-charged sounds such as a meow or gun-shot sound are particularly effective. Second, in addition to the general benefit of presenting a sound with the search display (compared to presenting no sound), there was an advantage of presenting characteristic sounds of the target category. The target-relevant sounds facilitated search compared to target-irrelevant sounds. The fact that the target-relevant sounds speeded target-present responses without slowing target-absent responses suggested that the target-relevant sounds increased the salience of target objects rather than biasing target-present responses.
It is remarkable that the target-relevant sounds speeded search despite the fact that they were completely uninformative as to the presence of a target (in fact they were mis-informative because targets were absent on most trials), and participants should have therefore ignored them. Our participants performed 450 search trials in about hal an hour. It would be interesting to extend the duration of the experiment. If target-relevant sounds persistently facilitated visual search for rare targets for a period of hours, the technique might provide a means to improve performance in baggage screening and other situations that require persistent search for rare targets. For example, repeatedly presenting appropriate cracking sounds might facilitate detection of cracks during inspections of buildings or machines.
In summary, visual search in the real world occurs in a multisensory environment. Visual objects are often experienced along with their characteristic sounds and spoken names. Consequently, both characteristic sounds and spoken names of objects facilitated localization of objects in visual search. Written names of objects are often experienced along with their spoken versions, but are not experienced along with the named objects’ characteristic sounds. Consequently, spoken names but not characteristic sounds of objects facilitated visual localization of object names. Our results thus suggest that coincident experience of object-related visual and auditory signals lead to object-specific auditory-visual associations through which auditory signals can facilitate visual search. This object-based auditory-visual facilitation is persistent in that characteristic sounds speeded visual search even when targets were rare and the sounds of a single target category were presented on every trial (i.e., primarily on target-absent trials). These results are consistent with our recent results demonstrating that correlated multisensory experience leads to facilitative cross-modal sensory interactions (Smith et al., 2007