is a counterintuitive phenomenon (Levin, Momen, Drivdahl, & Simons, 2000
; Simons & Levin, 1998
) in which observers in a variety of paradigms (e.g., Henderson & Hollingworth, 1999
; Levin & Simons, 1997
; Rensink, O’Regan, & Clark, 1997
) fail to detect what may be described as obvious changes in a visual scene. For example, Grimes (1996)
found that participants noticed only 30% of the changes in photographs that occurred during an eye movement—even changes as obvious as two heads switching bodies. Simons and Levin (1998)
dramatically demonstrated that only 33% of the participants in a real-life interaction noticed that the person asking them for directions was exchanged when a door being carried by confederates momentarily interrupted the discussion. Rensink et al. (1997)
hypothesized that the inability to detect changes in a visual scene was related to the allocation of attention during visual processing; attention must be focused on an object for a change in that object to be detected.
Rensink et al. (1997
; see also Klein, Kingstone, & Pontefract, 1992
; Posner, 1980
) further suggested that attention may be “pushed” by high-level influences (i.e., volitional control) or “pulled” by low-level changes toward relevant objects or features in a visual scene, thereby facilitating the detection of a change in that scene. Indeed, Werner and Thies (2000)
found that the high-level knowledge of experts in American football facilitated their detection of changes in football images compared with participants with less expertise in American football, suggesting that domain-specific expertise may push attention toward relevant objects or features in a visual scene (see also Hollingworth & Henderson, 2000
). Alternatively, low-level changes such as those that produce transient motion may pull attention to that area in the scene, unless a mask of some sort is used to eradicate this cue (e.g., the flicker technique of Rensink et al., 1997
The work of Cherry (1953)
provides some evidence from the auditory domain to support the hypothesis of Rensink et al. (1997)
. Among the studies described in Cherry (1953)
is an experiment in which participants had to shadow, or repeat out loud, passages of continuous speech presented to them over one channel of a set of headphones; the concurrent message in the other channel of the headphones was to be ignored. Cherry found that listeners were quite accurate in their repetition of the attended passages. Changes to the message presented in the unattended ear—such as switching from English to German or switching from one male speaker to another—went relatively undetected, as predicted by the hypothesis of Rensink et al. (1997)
. That is, participants appeared to be deaf to changes in the environment.
Although the hypothesis regarding the role of attention in detecting changes in the environment is supported by (the rather vague descriptions provided in) the work of Cherry (1953)
, the hypothesis appears to suffer from circular reasoning. A change in a scene is not detected because attention is not directed at that object; the proof that attention is not directed at the object is the failure to detect a change. To break the circularity, one must use alternative means to assess the allocation of attention and the detection of a change in the environment. The use of an integral stimulus, such as spoken language, affords researchers the unique opportunity to examine the hypothesis that attention must be allocated to the relevant dimension to detect a change in a complex stimulus (Rensink et al., 1997
) without circularity encroaching into the argument.
An integral stimulus comprises two stimulus dimensions. In the case of spoken language, there is a linguistic and an indexical dimension. The linguistic
dimension of the speech signal conveys prepositional information about objects in the world. For example, the linguistic message of a spoken utterance may be a request to close a window. Indexical
information refers to acoustic correlates in the speech signal that provide information on various characteristics of the talker, including identity, emotional state, age, dialect, and gender (Pisoni, 1997
). If one hears a request to close the window, the indexical information conveyed concurrently with that linguistic message would allow him or her to identify whether the speaker was a shivering old woman that was unknown to him or her, or a more familiar interlocutor, like “Uncle Joe from western Pennsylvania.”
Although different aspects of the acoustic signal are correlated with linguistic (Zue & Schwartz, 1980
) and indexical attributes (Bricker & Pruzansky, 1976
; Hecker, 1971
), evidence from several studies using speeded classification tasks (Garner, 1974
) suggests that spoken language is an integral stimulus with these two dimensions (Mullennix & Pisoni, 1990
). For example, Jerger and colleagues (Jerger, Martin, Pearson, & Dinh, 1995
; Jerger, Pirozzolo, et al., 1993
; Jerger, Stout, et al., 1993
) had participants selectively attend to indexical information, in this case the gender of the talker, while ignoring the linguistic dimension, in this case the word being spoken, and vice versa. In both cases, the classification performance of listeners (with normal hearing) for the relevant dimension was affected by variation in the irrelevant dimension, suggesting that spoken language is an integral stimulus.
If one now considers the interference across dimensions that occurs in an integral stimulus (such as spoken language) and the attention allocation hypothesis of Rensink et al. (1997)
, a unique opportunity to more precisely evaluate the role of attention in the detection of changes can be seen. That is, if attention is allocated to dimension X
in an integral stimulus, then a change in dimension X
should be detected. More important, if attention is allocated to dimension X,
then a processing cost should be observed in dimension Y,
thereby providing a means of evaluating the allocation of attention to dimension X
other than the subjective detection of the change.
The results from several studies investigating talker variability in spoken word recognition suggest that this method of independently evaluating the allocation of attention is valid (e.g., Goldinger, 1998
: Mullennix, Pisoni, & Martin, 1989
; Nygaard, Sommers, & Pisoni, 1994
). In such experiments, a change on each trial was made in the indexical dimension (by using a different voice) while processing of the linguistic information (response times to words differing in various lexical characteristics) was assessed. Changing the voice from trial to trial resulted in less accurate identification of words presented in noise (Mullennix et al., 1989
; Nygaard et al., 1994
; for effects on recall and recognition, see also Goldinger, Pisoni, & Logan, 1991
; Martin, Mullennix, Pisoni, & Summers, 1989
; Palmeri, Goldinger, & Pisoni. 1993
). Thus, changes in the indexical dimension adversely affected processing of the linguistic dimension. Although the participants in the talker-variability studies were never explicitly asked whether they detected the change in the speakers, it is most likely that the continuous low-level changes in the stimuli (such as the fundamental frequency of the different voices presented on each trial) pulled attention toward the indexical dimension of the signal to the detriment of the linguistic dimension.
To better examine the attention-directed detection hypothesis of Rensink et al. (1997)
, I used the integral stimulus of spoken language in a situation that minimized the pull of attention to the indexical dimensions that occurred in the talker-variability studies. This was accomplished by changing the voice of the talker only once—halfway through the list of words—rather than on every trial, as in the talker-variability studies. Furthermore, in contrast to the continuous, meaningful passages used by Cherry (1953)
, isolated words were presented to participants to shadow as quickly and as accurately as possible. The connected speech passages that Cherry used as stimuli may have engaged higher level processes that pushed attention toward the meaning of the passages to the detriment of other characteristics of the auditory input, such as the gender of the talker or the language of the message in the other channel.
If attention is directed toward the indexical dimension of the spoken stimulus, then the change in voices should be detected. However, processing of the linguistic dimension should suffer, as demonstrated in the studies of talker variability (e.g., Mullennix et al., 1989
; Nygaard et al., 1994
). In contrast, if attention is directed toward the linguistic dimension of the spoken stimulus, then a detriment should be observed in the processing of the indexical dimension. Namely, the change in voices should not be detected. Also, the detriment to processing in the linguistic dimension should be attenuated in this case. Listeners in the present experiment, like observers in change blindness studies, were directly questioned at the end of the experiment as to whether they detected the change (in voice) in the experiment. In this way, I could assess subjectively as well as objectively how attention was allocated when listeners were deaf to the change in voices.
To assess how attention to the indexical dimension affected processing of the linguistic dimension, I designed the experiments such that the words to be repeated varied in lexical difficulty (with regard to ease in recognizing the words). Lexically easy
words have a high word frequency, and few words sound similar to them. In addition, the similar-sounding words have a low frequency of occurrence. In contrast, lexically hard
words have a low word frequency, many words sound similar to them, and the similar-sounding words have a high frequency of occurrence (Torretta, 1995
). Lexically easy words are generally recognized more quickly and more accurately than lexically hard words (e.g., Kirk, Pisoni, & Miyamoto, 1997
; Luce & Pisoni, 1998
; Sommers, 1996
). By examining changes to the overall and differential response times to easy and hard words, I could better determine how attention to the indexical dimension affected processing of the linguistic dimension. Measuring latencies to the easy and hard words is also an important extension over the pioneering work of Cherry (1953)
, in which only accuracy rate (not latency) was measured in the shadowing task performed by participants. (Note that there were no actual data reported in Cherry, 1953
, making it difficult to determine exactly what was found.) By measuring response latencies to the linguistic dimension of an integral stimulus, subtle processing difficulties that participants have (which may have escaped detection in Cherry, 1953
) may be observed.