|Home | About | Journals | Submit | Contact Us | Français|
We report a new phenomenon in cross-modal cuing of visual spatial attention - simultaneous auditory peripheral cues are “hyper-effective” - more effective than auditory peripheral (AP), visual central (VC), or visual peripheral (VP) cues that preceded the target with ample time for preparatory orienting. The time courses and mechanisms of visual spatial attention were measured in a four-location Gabor orientation identification task for targets embedded in systematically varying amounts of external noise [Lu and Dosher, Vision Research, 1998] and for cue-target asynchronies (CTOAs) between 0 and 240 ms. Large CTOA pre-cuing improvements in contrast thresholds occurred in high external noise conditions for AP, VC, and VP cues. In low external noise conditions, pre-cuing advantages occurred for visual peripheral (VP) cues, but not for visual central cues (VC), replicating Lu and Dosher [Journal of Experimental Psychology: Human Perception and Performance, 2000]. Auditory peripheral pre-cues (AP) were similar to VP cues at large cue-target delays, but demonstrated a large cross-modal cuing advantage for simultaneous auditory peripheral cues (AP) in both high and low external noise. We conclude that endogenous attention (visual central pre-cuing) excludes external noise, while exogenous attention (both visual and auditory peripheral pre-cuing) enhances the stimulus but also excludes external noise if informative peripheral cues are used.
Although it has been well established that sensory stimuli in the environment are perceived through modality-specific peripheral receptors, the existence of poly-sensory neurons and brain areas that respond to stimulation of multiple types of peripheral receptors suggests that the different sensory modalities are not completely independent (Calvert, Brammer & Iversen, 1998, Stein & Meredith, 1993, Wallace, Meredith & Stein, 1996). Because most real-world perceptual events consist of simultaneous stimulation of multiple sensory modalities, inter-modal interaction must occur at some level of the perceptual system for higher-level animals and humans to achieve coherent perception of the environment. One of the best demonstrations of the importance of multi-modal interaction is the perceptual gains achieved with audiovisual speech information, such as enhanced word identification in acoustic noise (Sumby & Pollack, 1954) and under conditions of hearing loss (Lamore, Huiskamp, van Son, Bosman & Smoorenburg, 1998, Moody-Antonio, Takayanagi, Masuda, Auer, Fisher, Bernstein, 2005; Welch & Warren, 1986). Cross-modal integration is also thought to improve detection (Frassinetti, Bolognini & Ladavas, 2002, Teder-Salejarvi, Di Russo, McDonald, Hillyard, 2005, Lippert, Logothetis & Kayser, 2007) or discrimination (Vroomen & De Gelder, 2000) of low-level sensory stimuli. Some authors believe that isolated studies of individual sensory modalities can at best yield partial knowledge; studies of inter-modal interaction are essential to a complete understanding of the perceptual system (Welch & Warren, 1986). In this study, we investigate the time-course and mechanisms of concurrent and nearly concurrent cross-modal auditory cues on visual identification, with comparisons to the effects of intra-modal visual cues. We first review the relevant literature.
Orienting spatial attention to the spatial location of the target stimulus prior to its onset often generates “pre-cuing advantages” in detection or identification in brief, multi-element displays, either in terms of substantially improved accuracy (Bashinski & Bacharach, 1980, Carrasco, Penpeci-Talgar & Eckstein, 2000, Cheal & Lyon, 1991a, Dosher & Lu, 2000b, Dosher & Lu, 2000c, Downing, 1988, Enns & Di Lollo, 1997, Eriksen & Hoffman, 1973, Grindley & Townsend, 1968, Henderson, 1991, Liu, Pestilli & Carrasco, 2005, Lu & Dosher, 2000, Lyon, 1990, Mueller & Findlay, 1988, Mueller & Rabbitt, 1989, Pestilli & Carrasco, 2005, Shiu & Pashler, 1994, Smith, 2000, Van der Heijden, Schreuder & Wolters, 1985, Yeshurun & Carrasco, 1999) or response time (Egly & Homa, 1991, Eriksen & Hoffman, 1972, Henderson & Macquistan, 1993, Posner, 1980, Posner, Snyder & Davidson, 1980).
It has been proposed that, depending on the location of the cue in relation to that of the target stimulus, two different attention systems are engaged in orienting spatial attention (Briand & Klein, 1987, Ladavas, 1993, Posner, 1980, Posner & Cohen, 1984, Rafal, Calabresi, Brennan & Sciolto, 1989, Rafal, Henik & Smith, 1991). Central or symbolic cues, which contain information about the target location but are themselves neutrally positioned in space, e.g., in the geometric center of all the possible target locations, are said to activate the endogenous attention system. Peripheral cues, independent of whether they are informative about the target location but are themselves positioned near the target(s), are said to activate the exogenous attention system. Several functional differences between the two attention systems have been observed: reflexive versus voluntary (Jonides & Yantis, 1988, Nakayama & Mackeben, 1989), large versus small cuing effects (Henderson, 1991, Jonides, 1981), faster versus slower action (Cheal & Lyon, 1991a, Hopfinger & Mangun, 2001, Mueller & Rabbitt, 1989), different inhibition of return (Posner & Cohen, 1984, Rafal et al., 1989), different mechanisms (Lu & Dosher, 2000), as well as possibly different neural substrates (Ladavas, 1993, Rafal et al., 1991).
Whereas the classical demonstrations of pre-cuing advantage in spatial attention used visual cues and visual tasks (Pashler, 1998, Posner, 1978), there have been an increasing number of investigations on cross-modal cuing of spatial attention using visual tasks with auditory (Buchtel & Butter, 1988, Driver & Spence, 1998, Driver & Spence, 1999, Dufour, 1999, Eimer & Schroeger, 1998, Farah, Wong, Monheit & Morrow, 1989, Hikosaka, Miyauchi, Takeichi & Shimojo, 1996, Luo & Wei, 1999, McDonald, Teder-Saelejaervi & Hillyard, 2000, Mondor & Amirault, 1998, Quinlan & Hill, 1999, Schmitt, Postma & De Haan, 2000, Teder-Salejarvi, Munte, Sperlich & Hillyard, 1999, Ward, 1994, Ward, McDonald & Golestani, 1998) and somatosensory cues (Butter, Buchtel & Santucci, 1989, Posner, Nissen & Ogden, 1978, Spence, Pavani & Driver, 2000). The interest in studying cross-modal orienting of visual spatial attention arises from the fact that both the natural environment and the human perceptual system are intrinsically multi-modal — therefore, in real life, targets and cues could arise from different modalities.
Most studies on auditory cuing of visual spatial attention used peripheral auditory cues. There have not been many studies using central auditory cues because, independent of being intra- or cross-modal, central/symbolic cues engage the intrinsically supra-modal endogenous attention system, which derives the target location from the information content of the cue regardless of its modality. The more interesting question is whether the exogenous attention system is also supra-modal --- whether auditory peripheral cues function in the same way as visual peripheral cues. The alternative is modality-specific exogenous attention, in which only visual peripheral cues produce pre-cuing advantages in visual spatial attention; auditory peripheral cues either do not produce any pre-cuing advantage or function in different ways in producing pre-cuing advantages in visual spatial attention. The empirical results in the literature are mixed.
Peripheral auditory cues that predict the likely side of a visual target have been shown to decrease simple response time (Buchtel & Butter, 1988). But because the cues were informative, they could have functioned like central/symbolic cues, engaging the voluntary endogenous attention system and inducing criterion shifts in the decision process (Spence & Driver, 1994). Uninformative auditory peripheral pre-cues that occur randomly with equal probability at all the potential target locations and therefore do not predict the location of the visual target have also been shown to reduce simple response times to visual targets near the cues (Farah et al., 1989). But, simple response time measures are subject to cue-induced criterion shifts and therefore may not necessarily reflect increased sensitivity to the attended targets (Eckstein, Shimozaki & Abbey, 2002, Shaw, 1980, Spence & Driver, 1994, Sperling & Dosher, 1986).
To demonstrate improved perceptual sensitivity, paradigms that measure choice response time and/or discrimination accuracy are preferred (Spence & Driver, 1994, Spence & Driver, 1997). In the context of cross-modal cuing, uninformative peripheral cues are normally used in combination with choice response times and/or discrimination accuracies. Using uninformative auditory peripheral cuing, Klein, Brennan & Gilani (1987) found that the choice response time for discriminating a short-length from a long-length visual target was greatly reduced on the cued side. Unfortunately, the accuracy was worse on the cued side, making the result hard to interpret. Ward (1994) found that, while visual pre-cues facilitated both visual and auditory localization, auditory pre-cues failed to influence visual localization. He concluded that the exogenous attention system is modality-specific. In another study, Spence and Driver (1997) required their observers to perform a localization task, in which auditory and visual targets could occur either on the left or right side of fixation, either above or below the horizontal meridian, and visual and auditory cues were located at intermediate elevations relative to the targets. They demonstrated that an uninformative auditory pre-cue improved visual localization, but not vice versa [note that completely opposite conclusions were reached by Ward (1994).]
Spence and Driver (1997) suggested that the attention effects in Ward (1994) may potentially be consistent with an alternative response priming interpretation rather than improved sensitivity: A lateral cue may bias a left-right localization response, speeding the response to the target that is ipsil-lateral to it without increasing target detectability. To address this criticism, Ward, McDonald & Golestani (1998) conducted new studies using both simple detection and choice discrimination tasks in place of the localization task in Ward (1994). Their results still revealed no cross-modal cuing effects when the cue was auditory and the target was visual.
The cause of the controversial findings from these uninformative peripheral cuing studies (Spence & Driver, 1997, Ward, 1994, Ward, McDonald & Lin, 2000) has been attributed to strategic effects in different experimental settings (Ward et al., 2000) and task dependency (Schmitt et al., 2000). Another possibility could be recruitment of the endogenous process into these exogenous cuing situations because even though, from the experimenters' point of view, the uninformative cues should not have called upon the endogenous attention, the observers may nevertheless use the “uninformative” cues to engage their endogenous attention system (Luck & Thomas, 1999).
The time-course of intra-modal cuing of visual attention has been extensively investigated, typically by measuring the effect of cuing, e.g., the performance difference between valid and invalid cuing conditions or the performance difference between pre-cuing and simultaneous cuing, as a function of cue-target onset asynchrony (CTOA). The conventional view is that visual peripheral cues produce faster reflexive orienting (Posner, 1980), reaching their asymptotic effectiveness when the CTOA is about 150 ms and declining in effectiveness when the CTOA is greater than 175 ms (“inhibition of return”; Cheal & Lyon, 1991a, Cheal & Lyon, 1991b, Kroese & Julesz, 1989, Mueller & Findlay, 1987, Mueller & Findlay, 1988, Mueller & Rabbitt, 1989, Murphy & Eriksen, 1987, Nakayama & Mackeben, 1989, Posner & Cohen, 1984, Shulman, Remington & McLean, 1979, Tsal, 1983). Visual central cues, on the other hand, produce slower, voluntary orienting, reaching their asymptotic effectiveness when the CTOA is about 300 ms and remaining effective with even longer CTOA's (Cheal & Lyon, 1991a, Cheal & Lyon, 1991b, Mueller & Findlay, 1988, Mueller & Rabbitt, 1989).
However, some reports are at variance with the view that visual central and peripheral cues produce pre-cuing advantages with different time courses. Both Eriksen & Collins (1969) and Warner, Joula & Koshino (1990) concluded that similar or identical attention processes were activated by central and peripheral cues. Luck, Hillyard, Mouloua & Hawkins (1996b) failed to observe any peripheral cuing effect with a CTOA of 100 ms. Based on a completely different paradigm - the attention reaction time paradigm (Reeves & Sperling, 1986), Lesmes, Lu & Dosher (2002) also found identical time courses for visual central and peripheral cues.
Several factors have been found to affect the time course of spatial cuing, including type (Cheal & Lyon, 1992) and duration (Lyon, 1987) of the target, amount of post-stimulus masking (Cheal & Lyon, 1991a), and observer practice level (Lyon, 1987). For example, Lyon (1987) found that for the same visual task, the time course of cuing depended on the duration of the target stimulus. Moreover, the difference between visual central and peripheral cuing disappeared after substantial practice. A detailed re-examination of the literature suggests that different target stimulus durations (used to achieve similar asymptotic performance levels) were used in many of the studies that compared the time courses of central and peripheral cuing (Cheal & Lyon, 1991a, Cheal & Lyon, 1991b, Mueller & Rabbitt, 1989), and observers were typically not well practiced. In addition, Chastain (1992) and Luck et al. (1996) suggested that peripheral pre-cues might mask target detection/identification more at very short, close to zero CTOA's and produce the apparent rapid increasing cuing effects at short CTOA's.
The time course of cross-modal cuing of visual spatial attention has not been investigated in detail, at least in the CTOA range of 0 to 300 ms, perhaps because cross-modal cuing effects have not been consistently observed. Using auditory peripheral cues, Spence & Driver (1997) sampled CTOA at 100, 200, and 700 ms. They found that the maximum cuing effect on response times occurred at a CTOA of 100 ms. Ward et al. (1998) evaluated the time courses of both visual and auditory peripheral cuing of spatial attention in a response time paradigm at CTOA's of 100, 200, 400, and 1000 ms, and found that cuing advantage reached asymptotic levels at 200 ms for both types of cues. Similarly, Schmitt et al. (Schmitt et al., 2000) sampled CTOA's at 125, 175, 225, and 575 ms and found that both visual peripheral and auditory peripheral cuing of visual spatial attention were maximally effective at a CTOA of 125 ms. In addition, they found that intra- but not cross-modal cuing of visual spatial attention generated inhibition of return.
Several studies have found that auditory stimuli presented in close spatial and temporal proximity to a visual target enhance the brightness (Stein, London, Wilkinson & Price, 1996) and detectibility of the target (Frassinetti, Bolognini & Ladavas, 2002, Teder-Salejarvi, Di Russo, McDonald, Hillyard, 2005, Lippert, Logothetis & Kayser, 2007, Vroomen & Gelder, 2000). There have also been reports that tactile cues presented simultaneously with visual stimuli improved visual discrimination (Butter et al., 1989, Spence, Nicholls, Gillespie & Driver, 1998). Occurring only when the stimuli in multiple sensory modalities are in close spatial and temporal proximity, these “multi-modal integration” phenomena are believed to reflect activities of polysensory neurons and brain areas (Calvert et al., 1998, Welch & Warren, 1986). In a separate line of research, studies on cross-modal cuing of spatial attention find that auditory cues presented in close spatial proximity but proceeding visual targets briefly also improved visual detection, discrimination, and localization (Buchtel & Butter, 1988, Driver & Spence, 1999, Dufour, 1999, McDonald et al., 2000, Mondor & Amirault, 1998, Posner, 1980, Schmitt et al., 2000, Spence & Driver, 1997, Ward et al., 1998). These “cross-modal cuing” effects are thought to reflect orienting of exogenous attention (McDonald et al., 2000, Posner, 1980, Spence & Driver, 1997).
Given the apparently similar conditions for multi-modal integration and cross-modal cuing, some authors recently suggested that the two phenomena could reflect the very same process, i.e., poly-sensory neurons, widely believed to be responsible for multi-modal integration, could also be the neural basis of cross-modal attention (Macaluso, Frith & Driver, 2000, Macaluso, Frith & Driver, 2001). Others argued that the two phenomena reflect separate processes -- one cannot attribute “cuing” effects with simultaneous cues to orienting of attention; multi-sensory integration provides an alternative interpretation (McDonald, Teder-Saelejaearvi & Ward, 2001, McDonald et al., 2000). These authors suggested that multi-modal integration and cross-modal cuing might function with different time courses: Multi-modal integration is most effective when the relevant modalities are stimulated simultaneously; in contrast, it takes a few hundred milliseconds for cross-modal cuing to reach its asymptotic effect level. However, earlier studies of auditory peripheral cuing used relatively coarse sampling of the cuing function, and either did not test simultaneous cues1, or did not include effective comparisons to a baseline (Frassinetti et al., 2002, Spence & Driver, 1997, Ward et al., 1998).
Various mechanisms have been proposed to account for pre-cuing advantages in visual spatial attention. Some suggested that cuing facilitates sensory processing in the cued location (Bashinski & Bacharach, 1980, Cheal, Lyon & Gottlob, 1994, Corbetta, Miezin, Shulman & Petersen, 1993, Mangun & Hillyard, 1987, Posner et al., 1978) and/or allocates limited capacity to the cued location (Broadbent, 1957, Henderson, 1996). Some suggested that cuing reduces stimulus uncertainty (Eckstein et al., 2002, Gould, Wolfgang & Smith, 2007, Palmer, 1994, Palmer, Ames & Lindsey, 1993, Shaw, 1984, Sperling & Dosher, 1986). Others suggested that cuing reduces interference of masks or distractors at the uncued locations (Shiu & Pashler, 1994) or suppresses masking at the cued location (Enns & Di Lollo, 1997). According to Cheal & Gregory (1997), “attention can both facilitate responses to the attended objects and inhibit responses to other objects (p.69)”.
To identify the mechanisms of attention, the many experimental results and proposed mechanisms in the literature need to be studied systematically within a coherent theoretical framework, and with more powerful empirical methods. The external noise plus attention paradigm and the associated observer model, the Perceptual Template Model (PTM), have been developed to quantitatively analyze and distinguish various mechanisms of attention (Dosher & Lu, 2000b, 2000c; Lu & Dosher, 1998, 1999, 2000, 2008; Lu, Liu & Dosher, 2000). Within this framework, effects of spatial attention are attributed to three mechanisms2: stimulus enhancement, external noise exclusion, and (multiplicative) internal noise reduction. Although these three mechanisms do not map exactly onto the proposed mechanisms in the literature, they are closely related. Stimulus enhancement is related to the verbal notion of facilitation of perceptual processing; external noise exclusion is related to elimination of interference from masks, and suppression of masking.
The PTM framework has been developed to investigate mechanisms of attention in the target region. Lu, Lesmes and Dosher (2002) compared central spatial precuing effects in 16 experimental conditions that varied the amount of external noise, the number of signal stimuli, the number of locations masked by external noise, and the number and style of frames surrounding potential target locations. They found that the magnitude of attention effects was more or less independent of the actual number of target/external noise locations. In addition, there was little or no evidence for “cross-talk” between the non-target regions and the response in either the precued or the simultaneously cued conditions. They concluded that the simultaneous cue is sufficient to exclude both external noise and signal at the non-target locations; and precuing advantage in visual central cuing must reflect the additional benefits of external noise exclusion in the target region. On the other hand, the magnitude of the cuing advantage depends on the number of potential target locations (Dosher & Lu, 2000b). The presence of other potential locations for the target is critical because if target location is known in advance, attention can be focused on that location with or without a cue. In this study, we used four potential target locations (one target + three distractors).
Appendix I contains an abbreviated description of the PTM model and the related equations. The formal development, quantitative tests, and initial applications of the PTM model and the associated theoretical framework for identifying attention mechanisms have been detailed in several of our previous publications (Dosher & Lu, 2000c, Lu & Dosher, 1998, see Lu & Dosher (2008) for a recent review). Here, we briefly describe the paradigm and the signature performance patterns of the three attention mechanisms (Figure 1).
In the external noise plus attention paradigm, thresholds are measured as functions of the contrast of the external noise (the so-called “TvC functions”) under different observer attention states (e.g., precuing versus simultaneous cuing, valid versus invalid cuing). Stimulus enhancement (or equivalently, additive internal noise reduction) is characterized by divergence between attention conditions at low, but not high levels of external noise in TvC functions; external noise exclusion is characterized by divergence between attention conditions at high, but not low levels of external noise; and multiplicative internal noise reduction is characterized by divergence between attention conditions at both low and high levels of external noise. Moreover, mixtures of mechanisms can be distinguished by measuring TvC functions at more than two criterion performance levels (see Dosher & Lu, 1999, 2000b; Lu & Dosher, 1999).
In Experiment 1, we measured the detailed time-courses of the intra- and cross-modal cuing on visual target identification over a range of target-cue delays, including concurrent presentation of auditory cues and visual targets. We compared the time-course of AP cuing to those of visual central (VC) and visual peripheral (VP) cuing, measured with the identical procedure and the same observers. In Experiment 2, we extended the external noise method and the related theoretical framework to investigate and compare the mechanisms underlying intraand cross-modal pre-cuing advantages. We did not include an auditory central cue condition because it is expected to engage the supra-modal endogenous attention system.
In this experiment, we measured contrast thresholds for Gabor orientation identification in a range of CTOA's and in both zero and high external noise for three types of cues: visual central (VC), visual peripheral (VP), and auditory peripheral (AP). Many factors must be controlled in comparing the time courses of the effect of different types of cues, e.g., the stimulus and cuing parameters, the task and the overall difficulty level of the task in different cuing conditions, and the practice level of the observers. We have not found a study in the literature that controlled all these factors when comparing the time courses of intra- and cross-modal precuing advantages. Here, identical stimulus parameters, tasks, and observers were used across different types of cues. Task difficulty was also kept constant by measuring contrast thresholds.
The method of constant stimuli (Woodworth & Schlosberg, 1954) was used to estimate contrast thresholds in visual identification. In each trial, four Gabors, all embedded in the same level (either zero or high) of external noise, were presented briefly in four widely separated spatial regions. Observers were cued to identify the orientation of the Gabor from four potential orientations in the target region and therefore performed a four alternative forced identification task with a chance performance level of 25% correct. Threshold contrasts at 50%, 62.5% and 75% correct were estimated for each condition. The time courses (threshold versus CTOA functions) of the three types of cues were fit with descriptive mathematical functions and compared statistically.
Three observers with normal hearing and normal or corrected-to-normal vision participated in the experiment.
All stimuli were projected on a white screen using a Proxima 9400 DLP projector in its achromatic mode with a refresh rate of 67 frames/second and background luminance of 50 cd/m2, driven by a Macintosh computer running PsychToolBox (Brainard, 1997). The projector had a linear gamma and provided 768 equally spaced luminance levels. With the background luminance in the middle of the dynamic range of the projector system, the range of achievable contrast is defined as -1.0 to +1.0. Observers were seated behind the projector at a viewing distance of approximately 220 cm in a dimly lit room.
The signal stimuli were Gaussian windowed sinusoidal gratings (“Gabors”) with spatial frequency f = 2.0 c/d, window width σ = 0.39 deg, and tilted 22.5, 67.5, 102.5, or 157.5 deg relative to vertical (Figure 1a). The contrasts of the Gabors were sampled from seven values determined for each observer and each experimental condition based on data from the practice sessions.
External noise images (2.6×2.6 deg) were made of 3×3 pixel patches (0.09 × 0.09 deg). The contrasts of the pixel patches in each noise image were drawn randomly and independently from a single Gaussian distribution with mean 0 and standard deviation 0 in the zero noise condition, and mean 0 and standard deviation 0.32 in the high external noise condition. The standard deviation of the contrast of the external noise in the high noise condition was set at only 0.32 of the achievable contrast range of ±1.0 to confirm to a reasonably good Gaussian distribution.
Three types of cues were used. The visual central cues were made of white arrows, 1.1 deg in length and appearing in the center of the display. The visual peripheral cues were identical to the visual central cues except that they appeared near the inner corner of one of the four boxes outlining the spatial locations of the Gabors. The auditory peripheral cues consisted of 45 ms, 1000 Hz tone bursts (~85db), generated from one of the four speakers hidden behind the display screen. To ensure that observers were able to use the auditory peripheral cues to effectively locate the visual targets, the four speakers were situated ±10 deg vertically and ±10 deg horizontally from the center of the screen (Figure 2a). Subjectively, the auditory cues were perceived from the corresponding target regions.
Each trial began with the presentation of a white fixation cross at the center of the screen and four 2.6 × 2.6 deg white square boxes outlining potential target locations. The centers of the boxes were displaced from the center of the display by ±5.65 deg both vertically and horizontally. A sequence of external noise - Gabor - external noise frames, each lasting 15 ms, occurred in each of the four marked spatial regions 1005 ms after the onset of the fixation cross. The four Gabors were of equal contrast but independently chosen random orientations. External noise with the same contrast (zero or high) was added to the Gabors in each trial. Before or simultaneously with the Gabor onset, a cue indicated the relevant target location.
Observers were instructed to maintain fixation throughout the trial. Because the maximum CTOA was 240 ms, cued saccadic eye movements to the target locations were impossible (Hallett, 1986). Observers were instructed to make two responses: first to identify the orientation of the Gabor at the target location, and then the target location. Only trials with correct target location identification were included in subsequent data analyses --- observers were rewarded with a “beep” at the end of a trial if both responses were correct.
There were four within-subject factors: cue type (visual central, visual peripheral, and auditory peripheral), cue-target onset asynchrony (CTOA=0, 60, 120, 180, and 240 ms), external noise level (zero and high), and seven Gabor contrast levels.
Each observer participated in five practice sessions (4020 trials), followed by ten experimental sessions (8040 trials). Each session consisted of three blocks, one for each cue type. The blocks were arranged in a Latin Square design across sessions to equate practice levels and minimize order effects for the three cue types. There were 280 trials in each block: four trials in each of the seventy mixed conditions ([two external noise levels] × [five CTOA's] × [seven signal contrast levels]). Observers were asked to take a short break between blocks.
Averaged across external noise and signal contrast levels, target localization accuracy for VC, VP and AP was 98.4 ±0.2%, 98.1 ±0.3%, and 93.9 ±0.5% for CP, 99.0 ±0.2%, 98.5 ±0.2%, and 91.3 ±0.5% for HT, and 95.5 ±0.3%, 96.1 ±0.4%, and 86.5 ±0.6% for WC. Repeated measure analysis of variance (ANOVA) was performed on the data using session as the random factor. For all the observers, the difference in localization accuracy was significant between cue types (F(2,18)=32.42, 20.87, 38.16 for CP, HT and WC; all p < 0.0001), but not across CTOA's (F(2,36)=1.619, 0.510, 0.484; all p> 0.15), external noise levels (F(1,9)=1.607, 1.236, 0.033; all p>0.20), and Gabor contrasts (F(6,54)=1.209, 0.811, 0.871; all p>0.30).
Thirty psychometric functions - percent correct Gabor orientation identification as afunction of Gabor contrast - were constructed for each observer, covering three types of cues, two external noise levels, and five CTOA's. Threshold contrasts were estimated from the psychometric functions by fitting Weibull functions to them using a maximum likelihood procedure (Hays, 1981). The Weibull function
provides a descriptive mathematical relationship between PercentCorrect and Gabor contrast c with three fitted parameters: maximum percent correct max, threshold contrast α at (max+0.25) ×50%, and slope η (Wichmann & Hill, 2001a). For each observer, psychometric functions at a given cue type and external noise level but different CTOAs were fit jointly by constraining the Weibull functions to have the same max and η (Lu, Lesmes & Dosher, 2002). The constraint did not significantly reduce the quality of the fits (p>0.10). The average r2 was 95.2 ±1.0% for CP, 88.9 ±1.8% for HT, and 89.1 ±1.8% for WC.
Threshold contrasts at 50%, 62.5% and 75% correct were computed from the best fitting Weibull functions. The standard deviations of the thresholds were estimated using a re-sampling method (Maloney, 1990, Wichmann & Hill, 2001b). Figure 3 plots the thresholds (τ) at 62.5% correct (corresponding to d'= 1.24) as functions of CTOA (T) for each observer and their average in the three cuing conditions (VC, VP, and AP) in both zero and high external noise conditions.
Both visual central and peripheral cuing produced monotonically increasing pre-cuing advantages in the presence of high external noise (τ(240) τ(0) =76% and 68%). In the absence of external noise, however, only peripheral cuing produced significant pre-cuing advantages (τ(240) τ(0) = 80%); contrast threshold remained virtually identical in the central cuing condition across the full range of cue target lead intervals. In the auditory peripheral cuing condition, unlike either of the visual cuing conditions, the worst performance (the highest contrast threshold) was observed with cues leading the target by 60 ms rather than with simultaneous cues - the simultaneous cues were more effective than significantly advanced precues that allow ample time for orienting attention. Relative to threshold contrasts at 60 ms lead interval, simultaneous cuing reduced contrast threshold by 33.1% and 34.8% in the zero and high noise conditions, respectively. In comparison, pre-cuing at 240 ms reduced contrast threshold by 13.4% and 32.1%. This pattern of non-monotonic cuing functions is consistent across all the observers. We attribute the simultaneous cuing advantage to multi-modal integration; the precuing advantage to cross-modal attention.
The shape of the pre-cuing threshold functions τ(T) were fit by a descriptive function:
where τb is the base-line threshold, τmax is the asymptotic threshold reduction by cuing, andG(T,α,β is a cumulative α-stage Gamma function with time constant β. In AP, τ(0) was not included in the fitting procedure, τb is an estimate of threshold at T 0 if multi-modal integration didn't happen.
A model lattice was fit to the data, consisting of (1) the full model in which all the 24 parameters for the six cue type χ external noise conditions are independent, (2) a model with the same gamma function parameters (α and β) in all the conditions, (3) reduced models from (2) in which one of the τmax 's is set to zero, and (4) the most reduced model in which τmax = 0 in all the conditions. The model fitting and comparison procedures are described in Appendix II.
The same functional shape, with similar temporal parameters, described all the cuing functions that exhibited significant cuing effects for all the observers (Table 2). Here we report the statistics on the average observer. Visual central cues produced no significant cuing effect (τmax =0) in the noiseless condition (p>0.75, F(11,14)=0.4814 from the nested model test). Statistically significant cuing effects (τmax > 0) were produced in all the other experimental conditions (p< 0.001, F(7,15)=12.99 from nested model test), including AP, VP and VC in the high external noise condition and AP and VP in the zero external noise condition. For all the observers, although the magnitude of the cuing effects differed across cue types, the temporal properties of the cuing functions were the same (setting aside simultaneous cuing in auditory peripheral cuing) - they can be described by the Gamma function with the same parameter values (p>0.50, F(6,16)=0.4691 from nested model test). For the best fitting model, α=12.5 stages, β=6.9 ms in all the cuing conditions; the asymptotic pre-cuing advantages (τ (Δτmax/τb) are: 0 and 0.23 for VC, 0.18 and 0.31 for VP, and 0.16 and 0.36 for AP, in the zero and high external noise conditions, respectively. In comparison, simultaneous AP reduced contrast thresholds by 0.34 and 0.38 relative to τb in zero and high external noise conditions. We plot percent threshold reduction as a function of cue-target lead interval T in Figure 4.
Cross-modal auditory cuing improved performance (reduced thresholds) in both noiseless and high external noise display conditions. Excluding the simultaneous cuing condition, the cuing functions in the AP condition are quite similar to those in the VP condition. This was exploited by us to define the “theoretical” baseline thresholds at 0 target-cue delay in the absence of multi-modal integration and therefore to gauge the magnitude of the cuing effects. The estimated theoretical baseline in the AP condition is very similar to the threshold at CTOA=60 ms in the AP condition (Figure 4). Therefore, another way to interpret Figure 4 is that cuing benefits in the AP condition were all relative to the threshold at CTOA=60 ms. Similar to VP (Lu & Dosher, 2000), AP pre-cuing both enhances the stimulus and excludes external noise in the cuing location. On the other hand, in both zero and high external noise conditions, we found that simultaneous auditory peripheral cues were more effective than auditory peripheral cues that preceded the target stimulus with ample time for preparatory orienting. An alternative interpretation of the time course in the AP condition is that AP cues simply produced a masking effect in the CTOA=60 ms condition. However, interpreting the reduced thresholds in all the other CTOA's in the AP condition as reflecting cuing advantage is more parsimonious and consistent with the cuing literature and the results in the VC and VP conditions in this study. To our knowledge, the non-monotonic cuing function for visual identification the AP condition has not been reported previously. We discuss its implications in the “General Discussion” section.
That cuing reduced contrast thresholds in both the zero and high noise conditions can be accounted for by an attention mechanism consisting of a mixture of external noise exclusion and stimulus enhancement, a multiplicative noise reduction, or both. Even though the endpoint method used in this experiment can distinguish mechanism mixtures (Dosher & Lu, 2000a), we chose to investigate the mechanism of attention with more detailed measurements of the full TvC functions in Experiment 2.
In this experiment, we applied the external noise and perceptual template model approach to investigate and compare mechanisms of attention in intra- and cross-modal cuing of spatial attention. Three types of cues were used, visual central (VC), visual peripheral (VP), and auditory peripheral (AP). TvC functions at three performance criterion levels and three CTOA's were measured in six levels of external noise in each of the three cuing conditions. We identified the mechanisms of attention in intra- and cross-modal cuing based on the PTM framework.
The method was exactly the same as Experiment 1 except where is noted.
Subjects WC and HT continued after finishing Experiment 1. One new graduate student, JZ, was recruited for the experiment. He reported normal hearing and corrected-to-normal vision. Before data collection, JZ ran five practice sessions (7560 trials).
Gaussian distributions with six different standard deviations were used to construct external noise images: 0, 0.02, 0.04, 0.08, 0.16 and 0.32. Based on the results from Experiment 1, three cue-target onset asynchronies (CTOA), 0, 60, and 240 ms, were tested at every external noise level for each cue type.
Each experimental session consisted of three blocks, one for each type of cue. Within each block, trials for each of the 126 conditions ([three CTOA's]×[six external noise levels]× [seven contrast levels]) were intermixed. A total of 504 trials were run in each block: Four trials for each of the 126 conditions. The block orders were arranged using a Latin Square design across sessions. Each observer ran ten experimental sessions (15,120 trials).
Similar to Experiment 1, observers made two responses in each trial: first to identify the orientation of the Gabor at the target location, and then to identify the target location. A correct response was defined as correct identification of the orientation of the Gabor given correct target location identification.
Target localization accuracy in VC, VP, and AP was 99.1 ± 0.1%, 98.7 ± 0.2%, and 89.5 ± 0.4% for HT, 97.8 ± 0.2%, 97.8 ± 0.2%, and 85.2 ± 0.5% for JZ, and 97.4 ± 0.2%, 96.5 ± 0.3%, and 84.2 ± 0.5% for WC. Repeated measure Analysis of Variance (ANOVA) was performed on the data using session as the random factor. For all the observers, the difference in localization accuracy was significant between cue types (F(2,18)=138.85, 54.74, 41.84 for HT, JZ, and WC; all p < 0.0001), but not across CTOA's (F(2,18)=0.9350, 2.178, 0.4960; all p> 0.10), and Gabor contrasts (F(6,54)=0.0730, 0.9150, 0.6000; all p>0.45). For HT and WC, localization accuracy was not significantly different across external noise levels (F(5,45)=1.480, 0.2879 for HT and WC; p>0.20). For JZ, the difference in localization accuracy was significant across external noise levels (F(5,45)=3.142, p<0.025). But it represented a non-systematic small difference between the best localization accuracy of 94.8% at the highest external noise level (noise contrast standard deviation = 0.32) and the worst accuracy of 92.3% at the second external noise level (noise contrast standard deviation = 0.02).
Only trials with correct target localization were included in generating psychometric functions --- percent correct Gabor orientation identification versus Gabor contrast. For each observer, a total of 54 psychometric functions were obtained, covering three types of cues (VC, VP and AP), three cue-target onset asynchronies, and six external noise levels. Each psychometric function was sampled at seven Gabor contrast levels.
Following similar procedures from Experiment 1, psychometric functions of different CTOAs were fit jointly by constraining the Weibull functions (Eq. 2) to have the same max and η for each observer in each cue type and external noise condition. The constraint did not significantly reduce the quality of the fits (all p > 0.15). The average r2 was 91.0 ± 2.0% for HT, 87.5 ± 1.3% for JZ, and 92.6 ± 0.9% for WC.
For each observer, cue type, CTOA and external noise level, threshold contrasts at three performance levels, 50%, 62.5% and 75% (corresponding to d"s of 0.84, 1.24 and 1.68 in four-alternative forced identification), were computed from the best Weibull fits. A re-sampling method was used to calculate the standard deviation for each estimated threshold.
Thresholds at 62.5% correct are listed in Table 3 and plotted in Figure 5 for each observer and the average of all the observers as TvC functions. Each panel of Figure 5 represents one observer and one cue type, containing three TvC functions from the three CTOA's. The error bars denote ± one standard deviation of the estimated thresholds. The smooth curves represent the predictions of the best-fitting PTM models (see below). The data patterns are very similar in TvC's at 50% and 75% correct threshold levels.
The external noise manipulation was highly effective. Averaged across cue types and CTOA's, contrast threshold increased from 0.18 to 0.58 (+214%) for HT, from 0.21 to 0.58 for JZ (+175%), from 0.18 to 0.51 (+184%) for WC, and from 0.19 to 0.56 (+192%) for the average of three observers (AVG).
In the VC condition, pre-cuing at CTOA=240 ms only reduced contrast threshold in the presence of high external noise, but not in zero or low external noise. Because the thresholds at CTOA=0 and 60 ms are virtually the same, we only report comparisons between thresholds at CTOA =0 and 240 ms. Averaged across the lowest three external noise conditions, contrast thresholds are 0.178, 0.212, 0.181, and 0.190 for observers HT, JZ, WC, and AVG when CTOA=0 ms; and 0.180 (+1.5% from CTOA=0 ms), 0.215 (+1.4%), 0.183 (+1.3%), and 0.193 (+1.4%) for observers HT, JZ, WC, and AVG when CTOA=240 ms. In the highest external noise condition, contrast thresholds are 0.624, 0.591, 0.543, and 0.586 for HT, JZ, WC, and AVG when CTOA=0 ms; and 0.426 (-31.7%), 0.471 (-20.3%), 0.415 (-23.7%), and 0.437 (-25.4%) for these observers when CTOA=240 ms. We conclude that visual central pre-cuing at CTOA=240 ms, compared to CTOA=0 ms, reduced contrast threshold by 25.4% in the presence of high external noise, but left contrast threshold virtually unchanged in the low external noise conditions. Consistent with our previous results in temporal pre-cuing that compared CTOA=0 and 175 ms (Lu and Dosher, 2000) and cue validity manipulation with pre-cuing at CTOA=175 ms (Dosher and Lu, 2000), this data pattern suggests a pure external noise exclusion mechanism of attention.
In the VP condition, pre-cuing at CTOA=240 ms reduced contrast threshold in the presence of high as well as low (or zero) external noise. Again, because contrast thresholds at CTOA=0 and 60 ms are quite similar, we only report comparisons between thresholds at CTOA=0 and 240 ms. Averaged across the lowest three external noise conditions, contrast thresholds are 0.218, 0.230, 0.205, and 0.218 for observers HT, JZ, WC, and AVG when CTOA=0 ms; 0.185 (-14.9%), 0.216 (-6.1%), 0.190 (-7.3%), and 0.197 (-9.4%) for HT, JZ, WC and AVG when CTOA=240 ms. In the highest external noise condition, contrast thresholds are 0.902, 0.674, 0.616, and 0.731 for HT, JZ, WC, and AVG when CTOA=0 ms; 0.423 (-53.1%), 0.572 (-15.2%), 0.446 (-27.7%), and 0.480 (-34.3%) for HT, JZ, WC and AVG when CTOA=240 ms. We conclude that visual peripheral pre-cuing at CTOA=240 ms, compared to simultaneous cuing, reduced contrast threshold by 34.3% in the highest external noise condition, and 9.4% in the low external noise conditions. This data pattern is consistent with Lu and Dosher (2000), who compared temporal pre-cuing effects of visual peripheral cuing with CTOA's of 0 and 175 ms.
In the AP condition, the cuing effect is not a monotonic function of CTOA: A CTOA of 60 ms produced the highest contrast threshold across all the external noise levels (Figure 5), compared to CTOA's of 0 ms and 240 ms. This is consistent with the results from Experiment 1. Averaged across the lowest three external noise conditions, contrast thresholds were 0.194, 0.209, 0.192, and 0.199 for observers HT, JZ, WC, and AVG when CTOA=60 ms; 0.121 (-37.9% compared to thresholds at CTOA=60 ms), 0.128 (-38.8%), 0.130 (-31.2%), and 0.127 (-36.1%) for HT, JZ, WC, and AVG when CTOA= 0 ms; and 0.158 (-18.5%), 0.179 (-14.5%), 0.177 (-7.7%), and 0.172 (-13.6%) for these observers when CTOA=240 ms. At the highest external noise level, contrast thresholds were 0.602, 0.730, 0.588, and 0.640 for observers HT, JZ, WC, and AVG when CTOA=60 ms; 0.383 (-36.3%), 0.404 (-45.7%), 0.475 (-19.3%), and 0.421 (-34.3%) for HT, JZ, WC and AVG when CTOA=0 ms; and 0.394 (-34.6%), 0.482 (-33.9%), 0.483 (-17.9%), and 0.453 (-29.2%) for these observers when CTOA= 240 ms. Several features in the data are worth noting: (1) The non-monotonic nature of the cuing effect; (2) In low external noise, cuing at CTOA=0 ms is much more effective than cuing at CTOA=240 ms; (3) In high external noise, cuing at CTOA=0 ms is slightly more effective than cuing at CTOA=240 ms. Consistent with Experiment 1, we found that simultaneous auditory peripheral cue (CTOA=0 ms) is very effective.
Effective cuing only in the presence of high external noise in the VC condition suggests a pure external noise exclusion mechanism of attention. Effective cuing in both low and high noise in VP and AP suggest a mechanism of attention that consists of either a mixture of stimulus enhancement and external noise exclusion, or multiplicative internal noise reduction, or both.
To further quantify the pre-cuing effects and to identify the mechanisms of attention underlying the cuing effects, we fit the PTM with various combinations of attention mechanisms to the TvC functions. TvC functions for the three types of cues were fit separately for each observer and their average. A description of the model fitting and selecting procedure as well as the detailed statistical results is in Appendix III. The parameters of the best fitting models are listed in Table 4. The predictions of the best fitting models for all the observers are plotted as smooth TvC functions in Figure 5 along with the measured TvC functions. We only present a summary of the major statistical results in this section.
In visual central cuing, the best fitting PTM includes a pure mechanism of external noise exclusion at CTOA=240 ms for all observers and their average. For HT and AVG, a slight increase of the impact of external noise at CTOA=60 ms was included in the best fitting model (marginal for AVG). On average, visual central cuing at CTOA=240 ms excluded 23.9% external noise, compared to cuing at CTOA=0 ms; but it did not alter additive, nor multiplicative noise. Visual central cuing at CTOA=60 ms was pretty much the same as cuing at CTOA=0 ms; if anything, it increased external noise by 7.5%, reflecting the fact that contrast thresholds in high external noise conditions at CTOA=60 ms was slightly higher than those at CTOA= 0 ms for one of the observers.
In visual peripheral cuing, the best fitting PTM includes a mixture of stimulus enhancement and external noise exclusion at CTOA=240 ms for all observers and their average. At CTOA=60 ms, no attention mechanism was significant for JZ and WC; a pure external noise exclusion mechanism was effective for HT and AVG. On average, visual peripheral cuing at CTOA=240 ms enhanced stimulus by 41.5% (or equivalently, reduced additive internal noise by 29.4%) and excluded 30.4% external noise, compared to cuing at CTOA=0 ms; but it did not alter multiplicative internal noise. Visual peripheral cuing at CTOA=60 ms reduced external noise by 10.7% without changing either additive or multiplicative noise.
In auditory peripheral cuing, the best fitting PTM includes a mixture of stimulus enhancement and external noise exclusion at CTOA= 0 ms, compared to CTOA=60 ms for all observers and their average. For HT, JZ and AVG, the best fitting PTM includes the same type of mechanism mixture at CTOA=240 ms, compared to CTOA=60 ms. For WC, the best fitting PTM at CTOA=240 ms only has a pure external noise exclusion mechanism. On average, auditory peripheral cuing at CTOA=240 ms enhanced stimulus by 48.0% (or equivalently, reduced additive internal noise by 32.4%) and excluded 30.1% external noise, compared to cuing at CTOA=60 ms. Auditory peripheral cuing at CTOA=0 ms enhanced the stimulus by 218% (or equivalently, reduced additive internal noise by 68.6%) and excluded 36.1% external noise, compared to cuing at CTOA=60 ms. In no case was there evidence for a change in multiplicative noise. Cuing at CTOA=0 ms is more effective than that at CTOA=240 ms, a replication of the “hyper-effective” cuing of simultaneous auditory peripheral cues.
Consistent with previous studies with shorter CTOA's (Lu & Dosher, 2000), we identified a pure external noise exclusion attention mechanism for visual central pre-cuing, and a mixture of stimulus enhancement and external noise exclusion for visual peripheral pre-cuing. For auditory peripheral cuing, we identified a mixture of stimulus enhancement and external noise exclusion for both simultaneous and pre-cuing.
In Experiment 1, we examined the time courses of visual central, visual peripheral, and auditory peripheral cuing of visual spatial attention. Contrast thresholds were measured as a function of cue-target onset asynchrony (CTOA) in both zero and high noise conditions. The central finding is that the effect of cross-modal cuing is not a monotonic function of the cue-target delay and simultaneous auditory peripheral cuing led to better performance than auditory peripheral pre-cues. The second major finding is that the time courses of pre-cuing effects are independent of the types of the cues - excluding the simultaneous AP condition, the time courses of the three cue types were fit by descriptive mathematical functions with identical temporal dynamics but different asymptotic levels. In Experiment 2, we measured cuing effects for the three cue types in a full range of external noise conditions to identify the mechanisms of attention. We found that endogenous attention (visual central cuing) excludes external noise, while exogenous attention (visual and auditory peripheral cuing) enhances the stimulus in addition to excluding external noise if informative peripheral cues are used. It is possible that external noise exclusion follows from an endogenous consequence of informative peripheral cuing.
By using 100% valid cues across all cue-target delays, we eliminated uncertainty about the target location in the decision process. Even for an ideal observer without capacity limitation (Palmer et al., 1993, Solomon, 2002), the presence of such uncertainty would have caused performance losses because information from non-target regions may participate in the decision process (Shaw, 1984, Sperling & Dosher, 1986). We note that removing decision uncertainty about the location of the target in a task does not imply that all possible sources of uncertainty are removed. Uncertainty might still exist in the feature space and in the temporal domain. However, uncertainty in the feature space, if it occurs, is most likely the same across all the cuing conditions. In the temporal domain, simultaneous cues (CTOA=0 ms) in the VC and VP conditions did not lead to improved performance because pre-cuing advantages were found in both conditions. It would be difficult to conceive a theory in which the simultaneous cross-modal AP cue is more accurate in time than the simultaneous cues in the intra-modal VC and VP conditions. The observed cuing advantages are beyond reduction of temporal uncertainty. When decision uncertainty is removed, additional effects of attention on performance reflect functional resource limitations in the perceptual system.
The relatively detailed measurement of the cuing functions also enabled us to quantitatively characterize and compare the temporal properties of the three types of cues. We concluded that the same function with different asymptotic levels describes all the cuing functions in both the zero and high noise conditions across three types of cues. This suggests that similar temporal processes underlie the observed cuing effects. Using a completely different paradigm, Lesmes et al. (2002) also found identical time course for visual central and peripheral cues. This conclusion is consistent with some of the studies in the literature (Eriksen & Collins, 1969, Lyon, 1987, Warner et al., 1990), but at odds with others (Cheal & Lyon, 1991a, Cheal & Lyon, 1991b, Mueller & Findlay, 1988, Mueller & Rabbitt, 1989). As discussed in Section 1.4, many experimental factors could affect cuing functions, e.g., the stimulus and cuing parameters, the task and the overall difficulty level of the task in different cuing conditions, and the practice level of the observers. In this study, we measured threshold contrast for target identification as a function of cue-target delay using identical stimulus parameters, tasks, and observers for the three types of cues. We kept task difficulty constant by measuring performance at given threshold levels. In addition, the observers were also highly practiced. The generalizability of our conclusion awaits further experimentation.
The observed effects of intra-modal visual cuing, i.e., visual peripheral pre-cuing improved performance in both zero and high external noise while visual central cuing improved performance only in high external noise, are fully consistent with the results of previous observations obtained in a narrower CTOA range (Dosher & Lu, 2000c, Lu & Dosher, 2000, Lu et al., 2002). Whereas the previous observations only compared two CTOA's (175 versus -33 ms), the current study measured the full time course of pre-cuing advantage in a wider range (0 to 240 ms). The detailed sampling of these monotonic CTOA effects in visual cuing might not have seemed to be so necessary or interesting before we obtained detailed measurements of the cuing function with auditory peripheral cues, where the cuing effect was non-monotonic. In fact, without measurements of the full time course, we would have never discovered the non-monotonic cuing effect in the auditory peripheral cuing condition.
Although the improved performance in the simultaneous AP condition might be possibly related to prior reports of multi-modal integration in perceptual strength, target detection and localization (Calvert et al., 1998, Frassinetti et al., 2002, Stein et al., 1996, Vroomen & Gelder, 2000, Welch & Warren, 1986) in which all the relevant modalities contained task-related information, the auditory peripheral cues in the current experiment provided information only about target location but not target identity. The observed simultaneous auditory peripheral cuing advantage implies an improved representation of the features of the visual target in the simultaneous cuing condition, perhaps from multi-modal integration that is more than a simple visual-auditory summation, which could improve detection but might even have damaged identification. The observed simultaneous cuing advantage in the AP condition is unrelated to that of Lippert, Logothetis & Kayser (2007) who found that simultaneous sounds informative of the precise timing of the target improved visual target detection. Because the simultaneous condition was always mixed with other CTOA conditions in our study, the simultaneous AP cue was non-informative of the precise timing of the visual target according to Lippert et al. (2007) and should therefore not generate any cross modal advantage by reducing decision uncertainty.
The combined results from the three types of cues suggest two separate processes in cuing of visual spatial attention: an orienting or alerting process that increases its effectiveness with additional cue-target lead interval with identical temporal dynamics but different magnitudes for the three cue types, and a multi-modal integration process that provides the visual system with privileged access to simultaneous targets based on cross-modal interaction (Macaluso et al., 2001, McDonald et al., 2001). The relatively large simultaneous cuing advantage may have significant implications for practical applications. For example, in interface design, auditory peripheral cuing might be more effective than visual cuing. Or perhaps a combined visual and auditory cue will optimize performance. Further research should assess and replicate these two attention processes.
In terms of mechanisms of attention, our results on intra-modal cuing reinforce those of Lu and Dosher (2000). In that study, they compared simultaneous cuing at -33ms and 150ms CTOA for visual central and visual peripheral cuing across different observers. Here, we used a within subject design with detailed sampling of a wider range of CTOA's. Our results on auditory peripheral cuing are novel. What seems to be most interesting is the fact that both types of peripheral cues, visual peripheral and auditory peripheral, generated a pre-cuing advantage via a mechanism of stimulus enhancement, in addition to external noise exclusion, which is the only mechanism visual central cues can invoke in generating pre-cuing advantage. Here again, we speculate that external noise exclusion is a consequence of endogenous mechanisms even in exogenous peripheral cuing. As discussed earlier, central and peripheral cues have been associated with two separate attention systems: the endogenous and the exogenous (Briand & Klein, 1987; Ponser, 1980; Posner & Cohen, 1984). The current study extends the mechanism difference between endogenous and exogenous attention system from intra-modal (Lu & Dosher, 2000) to cross-modal situations.
The research was supported by US Air Force Office of Scientific Research, Visual Information Processing Program, and National Institute of Mental Health. WC and CP were supported through the Provost Undergraduate Research Fellowship at the University of Southern California.
Both the external noise plus attention paradigm and the PTM theoretical framework were initially introduced in Lu & Dosher (1998). An in-depth review of the external noise methods and observer models can be found in Lu & Dosher (2008). We briefly summarize the major mathematical results here.
The PTM consists of five components: (1) a perceptual template (e.g., a spatial frequency filter) with a contrast gain to the signal (β) (normalized relative to its gain to the external noise), (2) a nonlinear transducer function, which raises its input to the γ power, (3) a Gaussian-distributed internal multiplicative noise with mean 0 and standard deviation that is proportional to (Nm ×) the contrast energy in the input stimulus, (4) a Gaussian-distributed additive internal noise with mean 0 and a “constant” standard deviation Na, and (5) a decision process.
In a PTM, accuracy of perceptual task performance is indexed byd':
For a given performance level, d', we can solve Eq. 1 to express threshold contrast cτ as a function of Next in log form:
For an observer described by the PTM, attention could only improve its performance via one or a combination of three mechanisms: (1) Stimulus enhancement turns up the gain of the perceptual template to the input (both the signal and the external noise) in the attended condition, modeled by multiplying Na by a factor of Aa <1.0 in the attended condition; (2) External noise exclusion eliminates some of the external noise by tuning the perceptual template around the signal-valued stimulus in the attended condition, modeled by multiplying Next by a factor of Af <1.0 in the attended condition; and (3) Internal multiplicative noise reduction changes the contrast gain control properties of the perceptual system, modeled by multiplying Nm by a factor Am < 1.0 in the attended condition. If all three mechanisms are operative, the contrast threshold versus external noise function for a PTM becomes:
In reality, mixtures of mechanisms may occur. Of particular concern is that a mixture of stimulus enhancement and external noise exclusion may generate a performance pattern that looks very similar to that for internal multiplicative noise reduction. Mixtures of mechanisms can be distinguished by measuring TvC functions at more than two criterion performance levels (see Dosher & Lu, 1999, Lu & Dosher, 2008).
The least-square fitting procedure was implemented in Matlab 5.2. For a given set of the model parameters, model predictions were generated from. The squared difference between the predicted and the measured log thresholds was computed in each condition and summed over all the conditions: . The log approximately equates the standard error over large ranges in contrast thresholds, corresponding to weighted least squares, an equivalent to the maximum likelihood solution for continuous data. A gradient descent method adjusted the model parameters to find the minimum L.
After obtaining the minimum L, the r2 statistic was used to evaluate the goodness of the model fit:
where Σ and mean() runs over all the thresholds for a particular observer.
An F-test for nested models was used to statistically compare the models. For two nested models with kfull and kreduced parameters, the F statistic is define as:
where df1 =kfull-kreduced, and df2 = N-kfull; N is the number of predicted data points.
Within the PTM theoretical framework, each cue type and CTOA is governed by three parameters: Aa,Af, and Am (Eq. A2) corresponding to three attention mechanisms: stimulus enhancement (or equivalently, internal additive noise reduction), external noise exclusion, and multiplicative noise reduction. Given that we had three CTOA's for each type of cuing, nine potential attention parameters could be required to model the TvC functions for each cue type. Among these nine attention parameters, one set of three parameters associated with a baseline CTOA condition were all set to 1.0 [the baseline was set at CTOA = 0 ms for VC and VP, 60 ms for AP]. There were therefore six potential free-to-vary attention parameters, and a total of 64 (26) models with combinations of various numbers (0 to 6) of attention parameters. Models are characterized by the particular set of free to vary attention parameters and their corresponding attention mechanisms. If an attention parameter was not “free” to vary in a model, the parameter was set to 1.0, implying that the PTM was not modified by the corresponding attention mechanism.
The full set of 64 models was fit to the nine TvC functions (three CTOA x three performance levels) measured with each cue type for each observer and their average. This resulted a total of 768 model fits. For each model, the best fitting parameters minimized the least square difference between the log of the measured threshold contrast and the log of the predicted threshold contrast using a gradient descent method. This procedure was implemented in Matlab 5.2. The goodness-of- fit was gauged by r2 (Eq. B1).
Different variants of the PTM were compared using an F-test for nested models. A nested model is one that is a special case of another model in which one of more parameters are equated or fixed. In the most saturated model, all the attention parameters may differ in different CTOA conditions. Nested special cases are those in which, for example, the stimulus enhancement parameter Aa is set to 1.0 in the CTOA=240 ms condition, and so on. A (nested) reduced model, which is a special case of a fuller model, is tested for significance using the nested-F test (Eq. B2). The F-statistic compares the mean squared errors per each added parameter in the fuller model to the mean squared errors for each remaining degree of freedom after prediction by the fuller model. In other words, the test evaluates whether the variance accounted for by the added parameters in the fuller model is significantly larger than expected by chance. If so, then the added parameter in the fuller model expresses a significant difference between conditions. A lattice of such nested models may be used to identify the model that best accounts for the data with the fewest (significantly different) parameters.
The following procedure was used to select the best model for a given set of TvC functions from a cue type and an observer: (1) Test whether any single attention parameter can be dropped from the most saturated model (the one which allows all six attention parameters free to vary) by comparing each of the five-attention-parameter models with the most saturated model. If a model without a particular attention parameter cannot be rejected from the most saturated model (p > 0.25), it suggests that the particular attention parameter might be excluded in the best fitting model. This might lead to exclusions of multiple attention parameters. We defined the model that excludes the union of all the suggested exclusions as the candidate model. (2) Test whether the candidate model can be rejected from the most saturated model (p>0.15). If the candidate model cannot be rejected, then it is sufficient to account for the data. If the candidate model were rejected from the most saturated model, the testing situation could be potentially rather complicated. We were fortunate that all the candidate models were not rejected from the most saturated model. (3) Test whether the candidate model is the most reduced model, i.e., whether all the free-to-vary parameters in the candidate model are necessary to account for the data. We compared the candidate model with all its reduced models. (4) If the candidate model can reject all its reduced models (p<0.01), we concluded that it is the best fitting model; if a reduced models cannot be rejected from the candidate model, we designated the reduced model as the new candidate model and repeated (2) and (3).
For observers JZ and WC, the candidate best fitting model has only one attention parameter, Af (240). This model provides a fit that is statistically equivalent to the most saturated model (F(5,44) =0.1189, and 0.2310 for JZ and WC; both p >0.90). The reduced model, which has no parameters, can be easily rejected (F(1,48)=18.82, 54.07; both p <0.0001). We conclude that the best model for JZ and WC is the model with a single Af(240) attention parameter, corresponding to a pure mechanism of external noise exclusion at CTOA=240 ms; Cueing at CTOA=60 ms was not effective at all.
For observer HT, the candidate model has three attention parameters: Aa(240), Af(240), and Af(60). This model provided a fit that is statistically equivalent to the most saturated model (F(3,44)=0.3607, p>0.75). In a subsequent test, however, we found that a reduced model with only two attention parameters, Af (240) and Af(60), provided a statistically equivalent fit to the candidate model. We therefore designated that model as the new candidate model. This new candidate model was also statistically equivalent to the most saturated model (F(4,44)=0.4124, p>0.75). But it rejects all its reduced models (F(1,48)=12.59, and 42.35 in comparison to Af(240) only and Af(60) only; both p<0.001. F(2,48)=52.02, p<0.0001, in comparison to the model with no attention effects.) We conclude that a pure mechanism of external noise exclusion was effective at CTOA=240 and 60 ms for observer HT.
For the average of all the observers, AVG, the candidate model had two attention parameters: Af(240) and Af(60). The candidate model was equivalent to the most saturated model (F(4,44)=0.4368; p>0.75). It is significantly better than the model with only Af(60) (F(1,48)=49.55; p<0.0001) and the model without any attention parameter (F(2,48)=45.07; p<0.0001) but is only marginally better than the model with only Af(240) (F(1,48)=3.8952; p<0.075). We concluded that a pure mechanism of external noise exclusion was effective at CTOA=240 ms and marginally effective at CTOA=60 ms for AVG.
For observes JZ and WC, the candidate model has two attention parameters: Aa(240) and Af(240). The candidate model is equivalent to the most saturated model (F(4,44)=0.2344, and 0.4842 for JZ and WC; both p>0.70). It is superior to all its reduced models (F(1,48)=14.41, and 32.37; both p<0.0005, compared to the model with only Aa(240); F(1,48)=5.657, and 6.550; both p<0.025, compared to the model with only Af(240); F(2,48)=10.253, and 20.440; both p<0.001, compared to the model with no attention effect.) We conclude that a mixture of stimulus enhancement and external noise exclusion was effective at CTOA=240 ms; there was no significant cuing effect at CTOA=60 ms.
For observers HT and AVG, the candidate model has three attention parameters: Aa(240), Af(240), and Af(60). The candidate model is equivalent to the most saturated model (F(3,44)=0.1005, and 0.0113 for HT and AVG; both p>0.95). It is superior to all its reduced models. Specifically, for HT, F(1,47)=9.928, 69.12, and 16.64, in comparison to Aa(240)+Af(240), Aa(240)+Af(60), and Af(240)+Af(60), all p<0.005; F(2,47)=35.52,13.75, and 43.90, in comparison to Aa(240), Af(240), and Af(60), all p<0.00025; F(3,47)=29.82, p<0.0001, in comparison to the model with no attention parameter. For AVG, F(1,47)=4.818,46.62, and 11.92, all p<0.05; F(2,47)=24.44, 8.438, and 29.976, all p<0.001; F(3,47)=20.727, p<0.0001. We conclude that a mixture of stimulus enhancement and external noise exclusion was effective at CTOA=240 ms and a pure mechanism of external noise exclusion was effective at CTOA=60 ms.
For observers HT, JZ and AVG, the candidate model has four attention parameters: Aa(240), Af(240), Aa(0) and Af(0). The candidate model is equivalent to the most saturated model (F(2,44)=0.0000, 0.0000, and 0.0010, for HT, JZ, and AVG; all p>0.95). It rejects all its 16 reduced models at p< 0.0001. For observer WC, the candidate model has three attention parameters: Af(240), Aa(0) and Af(0). The candidate model is equivalent to the most saturated model (F(3,44)=0.6988, p> 0.50). It rejects all its eight reduced models at p<0.005.
1But see Klein et al. (1987) who compared simultaneous cues (cue and target occurring simultaneously) with other temporal cues. The authors found that the effects of cuing on visual simple reaction time, although small (about 19ms), were about the same at all CTOAs, from 0ms to 500ms.
2For paradigms involving decision uncertainty, e.g., visual search or the classical Posner paradigm, the PTM observer model and the associated attention mechanisms would require uncertainty extensions consistent with an ideal observer.