|Home | About | Journals | Submit | Contact Us | Français|
Human vision remains perceptually stable even though retinal inputs change rapidly with each eye movement. Although the neural basis of visual stability remains unknown, a recent psychophysical study pointed to the existence of visual feature-representations anchored in environmental rather than retinal coordinates (e.g. ‘spatiotopic’ receptive fields; Melcher, D., and Morrone, M.C. (2003). Spatiotopic temporal integration of visual motion across saccadic eye movements. Nat Neurosci 6, 877-881). In that study, sensitivity to a moving stimulus presented after a saccadic eye movement was enhanced when preceded by another moving stimulus at the same spatial location prior to the saccade. The finding is consistent with spatiotopic sensory integration, but it could also have arisen from a probabilistic improvement in performance due to the presence of more than one motion signal for the perceptual decision. Here we show that this statistical advantage accounts completely for summation effects in this task. We first demonstrate that measurements of summation are confounded by noise related to an observer's uncertainty about motion onset times. When this uncertainty is minimized, comparable summation is observed irrespective of whether two motion signals occupy the same or different locations in space, and whether they contain the same or opposite directions of motion. These results are incompatible with the tuning properties of motion-sensitive sensory neurons and provide no evidence for a spatiotopic representation of visual motion. Instead, summation in this context reflects a decision mechanism that uses abstract representations of sensory events to optimize choice behavior.
The stability of perception relies on a spatial coding scheme that takes into account changes in gaze direction. In principle, gaze information could be used to construct receptive fields that are selective for a region of external space rather than a region of the retina. To date, however, physiological investigations have yielded little evidence for such a spatiotopic coding scheme (but see Galletti et al., 1993; Duhamel et al., 1997). Other studies have probed for spatiotopic representations by measuring perceptual interactions between stimuli presented before and after a saccade at a common spatial position (e.g. McConkie and Zola, 1979; Bridgeman and Mayer, 1983; O'Regan and Levy-Schoen, 1983; Irwin et al., 1988). This psychophysical approach has supported the primacy of retina-centered rather than environment-centered representations (see Prime et al., 2006).
In a recent study, however, Melcher and Morrone (2003) observed changes in perceptual thresholds for visual motion that are consistent with spatiotopic coding (Fig. 1). In that study, observers monitored randomly moving dots in the periphery for the arrival of two brief probe-intervals (M1 and M2) in which a proportion of the dots moved in a common direction. Their task was to determine the direction of motion. Participants either maintained their gaze, such that both motion signals occupied the same spatial and retinal location (Fig 1a); or they performed an eye movement such that M1 and M2 occupied the same position in space but different positions on the retina (Fig. 1b). In both conditions, the authors observed an enhancement of sensitivity compared with a single-motion baseline.
The prevailing interpretation of these findings is that the enhancement reflects temporal integration of sensory inputs to spatiotopically-tuned motion detectors early in the visual system (Melcher et al., 2004; Melcher, 2005; Prime et al., 2006; d'Avossa et al., 2007; Melcher, 2007). Increased sensitivity for the dual-motion condition may be expected, however, even without a spatiotopic representation in the brain. For example, observers could respond on the basis of the stronger of two sensory representations (‘probability summation’; Watson, 1979; Meese and Williams, 2000; Tyler and Chen, 2000), or combine estimates of motion direction during a decision process akin to statistical inference (Gold and Shadlen, 2000; Knill and Pouget, 2004; Gold and Shadlen, 2007). The doubling of sensitivity in the dual-motion condition observed by Melcher and Morrone (2003) is greater than that predicted by these decision-stage accounts. In a first experiment, however, we show that sensitivity in the single-motion condition – but not the dual-motion condition – is greatly reduced by uncertainty about the onset time of the near-threshold motion signals. This factor, which was not controlled in the original study, thus confounds estimates of summation.
We measured the dual-motion advantage reported by Melcher and Morrone (2003) under a variety of novel conditions that allowed a more complete characterization of the underlying mechanism. Our results show that when temporal uncertainty is minimized, probabilistic decision mechanisms account completely both for our own findings and for those of Melcher and Morrone. This new perspective relies only on well established principles of perceptual decision-making and not on a spatiotopic representation of visual motion.
All experimental procedures were approved by the University of Melbourne Human Research Ethics Committee and the Institutional Review Board of Rutgers University. The stimuli, procedures, and analyses used in the current experiments were comparable to those used in the original study. Some methodological details that were not reported in the original paper were obtained directly from one of the authors (M. C. Morrone, personal communication, December 3, 2006). Note that in our experiments, observers discriminated upward from downward motion and the random-dot stimuli were located to the left and/or right of fixation (i.e., as if the display used in the original study had been rotated by 90°). This configuration was selected for compatibility with a concurrent fMRI investigation of summation (in which motion signals could evoke lateralized BOLD responses; e.g., Merriam et al., 2003). Experiment 1 replicated the summation effect reported by Melcher and Morrone (2003) using our modified design (Fig. 2).
A total of twelve observers (ten males, two females) participated in the experiments including four authors (APM, JBM, SJC, and JDF), two of whom were naïve to the specific purpose of the experiments at the time of their participation (SJC, and JDF), and four naïve observers (LM, KM, PG, AT, JD, FR, and MQ). Four observers participated in each of the experiments. All participants had normal vision and were aged between 22 and 42 years.
Visual and auditory stimuli were generated and presented using Matlab software (The Mathworks, Inc) in conjunction with the OpenGL-based Psychophysics Toolbox extension (version 3; Brainard, 1997), and running on a Pentium-class computer operating under a Windows XP (SP2) environment. Visual stimuli were displayed using a linearized 22″ CRT monitor (1280 × 1024 resolution) with a refresh rate of 60Hz and viewed from a distance of 57cm. A chin rest (Experiments 1-4) or bite-bar (Experiment 5) was used to stabilize the viewing position. All experiments were performed in a dimly-illuminated testing cubicle.
The random dot motion pattern comprised 58 circular dots (diameter = 0.15°) confined to a 6°×6° square region. Half of the dots were luminance increments and the other half luminance decrements of equal contrast (Weber contrast = -98% and +98%) against a uniform grey background (mean luminance = 35.1cd/m 2 ). Each frame comprised complementary proportions of ‘signal’ and ‘noise’ dots. Noise dots were re-plotted at random positions within the aperture on each frame to generate spatiotemporal noise. Signal dots were displaced from their previous positions either upward or downward (depending on the direction of motion assigned to the trial), by a distance consistent with a dot speed of 10°/s. Dots that were selected as signal dots on one frame were ineligible to be selected in the subsequent frame. Signal dots therefore had a limited lifetime of two frames and the maximum level of motion coherence possible was 50%. The frame rate of the random dot stimulus matched the refresh rate of the display (60Hz). During noise-only intervals, the proportion of noise and signal dots was set to one and zero, respectively. During signal-plus-noise intervals (i.e., coherent motion), the proportion of signal and noise dots was determined by the coherence value assigned to that trial by the adaptive QUEST algorithm (Watson and Pelli, 1983; see Procedure).
The auditory cue used to reduce temporal uncertainty in all experiments was a brief, pure tone (60db, 500Hz, 70ms) presented bilaterally in free-field via speakers mounted behind each side of the display.
Eye position was recorded using an infra-red eye tracking system (Eyelink II; SR Research, Toronto, Canada) for the saccadic experiment (Experiment 5).
Each trial consisted of either one (‘single-motion’) or two (‘dual-motion’) coherent motion signals (150ms duration) embedded within ten seconds of spatiotemporal noise (0% coherence). The two motion signals in the dual-motion condition (M1 and M2) were separated by an inter-stimulus interval (ISI) of 1000ms, and were yoked to the same coherence level and direction of motion (except for Experiment 3, see Results for details). Motion sign (upward or downward) was selected randomly at the start of each trial. Observers were required either to identify the direction of motion (upward or downward; Experiments 1, 2, 4, and 5) or to determine whether the trial contained signal-plus-noise or noise-only (Experiment 3). Observers indicated their response at the end of the trial by pressing one of two buttons on a keyboard, and feedback regarding response accuracy was provided by a change in the color of the fixation point (correct: green; incorrect: red). An adaptive algorithm (QUEST, Watson and Pelli, 1983) was used to set motion coherence on each trial to the current estimate of the signal strength required to yield 75% correct responses. Note that the QUEST algorithm was used only to specify the strength of the motion signal on each trial, and not to provide a final estimate of the observer's sensitivity for a given condition (see Data Analysis). At least six QUEST sessions of 40 trials each were run for each condition in each experiment. The order in which sessions were completed was counterbalanced within and across observers in each experiment.
Note that the aim of the current study was to probe further the basic mechanisms that underlie the perceptual advantage observed for the dual-motion condition in Melcher and Morrone (2003); the hypotheses apply equally to the conditions in which gaze was fixed throughout the trial and to those in which a saccade was performed between the presentations of motion. Thus, all experiments except Experiment 5 did not include eye movements.
Observers maintained gaze on a fixation point (diameter = 0.3°) located 6° to the left of a central random dot motion stimulus for the entire trial (Fig. 2a). The motion signal in the single-motion condition occurred at the temporal center of the trial except for the addition of a random offset within ±500ms (i.e. motion onset was 4425-5425ms after the start of the trial). The two motion signals in the dual-motion condition (separated by a 1000ms ISI) straddled the temporal center of the trial except for the addition of a random offset within ±500ms (i.e. M1 onset occurred 3850-4850ms after the start of the trial). These motion onset times matched those used in the study by Melcher and Morrone (2003). Trial blocks consisted of either ‘cued’ trials or ‘uncued’ trials. On ‘cued’ trials, an auditory tone was present 150ms prior to the onset of motion signals. In the dual-motion condition, the cue was presented prior to the onset of M2 (M1 was uncued). No cue was presented on uncued trials, as in the study by Melcher and Morrone. In a variant of the uncued condition, the noise-only epochs at the start and end of each trial were trimmed to shorten the duration of the random-dot pattern from ten to five seconds. The shorter interval between the onset of the random-dot pattern and the onset of coherent motion should reduce temporal uncertainty without explicit cuing (Fraisse, 1984; Gibbon et al., 1997; Leon and Shadlen, 1999; Gallistel and Gibbon, 2000; Leon and Shadlen, 2003; Janssen and Shadlen, 2005). To further increase the predictability of the stimulus, motion onset times were not jittered, unlike all other experiments.
The spatial layout of the fixation point and random-dot stimulus in Experiment 2 was identical to that of Experiment 1, as was the timing of motion signals in the dual-motion condition. Unlike Experiment 1, however, thresholds were measured separately for each component of the dual-motion condition. Thus, there were two single-motion conditions; one that included M1 only; and a second that included M2 only. Each of these component motion signals was presented at the same time as they occurred in the dual-motion condition. A cue was presented 150ms prior to nominal onset of M1 in all conditions (Fig. 4a).
Experiment 3 was identical to Experiment 2 with the following exceptions. The motion signals (one or two, depending on the condition for that block) were presented on only half of the trials (selected at random). The remaining trials contained incoherent motion for the duration of the trial. The observer's task was to determine whether the trial contained signal-plus-noise or noise-only. There were two dual-motion conditions; one in which M1 and M2 were in the same direction (‘correlated’), as in previous experiments; and a second in which M1 and M2 were in opposite directions (‘anticorrelated’; Fig. 5a).
Experiment 4 was identical to Experiment 2 with the following exceptions. M1 and M2 were presented within separate random-dot patterns centered 6° to the right and left side of a central fixation point, respectively (Fig. 6a). Importantly, the center-to-center retinal separation of the two motion signals (12°) in the dual-motion condition of Experiment 4 matched that of the trans-saccadic condition of Melcher and Morrone (2003). To facilitate the ability of observers to attend covertly to the appropriate location at the appropriate time, the contrast of the dots in the right aperture (where the onset of M1 was pending) was set to double that of the dots in the left aperture for the first half of the trial, and vice versa for the second half of the trial. Specifically, the Weber contrast of the dots in the left and right apertures was set initially to ±49% and ±98%, respectively. During the middle of the 1000ms interval between M1 and M2, these differential contrast values for the left and right apertures were switched smoothly using inversely-proportional Gaussian contrast ramps (FWHM = 500ms), such that the sum of the contrasts across both apertures was constant. This continuous transition prevented the sense of a sudden jump in the display that would otherwise arise by switching the contrast values with a step function.
Experiment 5 was identical to Experiment 2 with the following exceptions. The display contained separate random-dot patterns positioned 6° above and below the horizontal meridian and centered horizontally (Fig. 7a). As for Experiment 4, the center-to-center retinal separation of these patches matched that of the trans-saccadic condition of Melcher and Morrone (2003). Shortly after (450ms) the nominal onset of M1, the fixation point stepped from 6° to the left of the center of the display to 6° to the right of the center of the display. Observers followed the fixation point with 12° saccadic eye movement. Trials were rejected if the eye movement was performed at the wrong time (i.e., if the saccade latency was not within 80-400ms) or was spatially inaccurate (i.e., if the primary saccade failed to land within 3° of the saccade target position). There were two dual-motion conditions; one in which M1 and M2 appeared within the upper patch (‘match’ condition); and a second in which M1 and M2 appeared in the upper and lower patches, respectively (‘non-matched’ condition). There were three separate single-motion conditions (M1-only [upper patch], M2-only [upper patch], M2-only [lower patch]).
The data from each observer were analyzed separately. For each condition in the discrimination experiments, the proportions of correct responses on upward and downward motion trials, PU(C) and PD(C), were determined separately at each coherence level. These proportion scores were then converted to a bias-free measure of sensitivity (d′) for each bin using the formula
where Z[·] denotes the inverse normal (z-score) transformation. For Experiment 3, PU(C) and PD(C) were replaced with the hit-rate for signal-plus-noise trials, H, and the false-alarm rate for noise-only trials, F, respectively, and d′ values were not divided by the factor of . This latter difference in calculation ensured that sensitivity measures obtained from the different task designs (two alternative forced-choice vs. yes-no) were nevertheless comparable (Macmillan and Creelman, 1991). Corrected values of 0.99 and 0.01 were substituted for any bins in which the observed proportion correct (or incorrect) was equal to 1 or 0, respectively.
To provide a continuous description of how sensitivity (d′) related to motion coherence, the binned data for each condition and observer were fitted with a cumulative Weibull function, F(c), of the form
where c is the motion coherence for the bin and α, β, and γ are the asymptote, spread, and shape parameters, respectively (Wichmann and Hill, 2001b). Because observers tended to achieve near-perfect accuracy at high levels of coherence, the asymptote parameter was fixed to the value of d′ that corresponded to values for PU(C) and PD(C) of 0.99 and 0.01, respectively. For the detection experiment, the asymptote parameter was fixed to the value of d′ that corresponded to a hit-rate of 0.99 and a false-alarm rate equal to the average false-alarm rate across all conditions to be compared. The two free parameters of the model, β andγ, were estimated by minimizing the chi-square statistic
where and Fi(C) are the observed and fitted data points, respectively, at coherence c over N bins. The variance of the d′ estimate for each bin, , was calculated using the method of Miller (1996, see Equations 6, 7 and 8), which is the preferred method for estimating the variance of d′ estimates based on small sample sizes. This was necessary because the adaptive algorithm used to determine test values of coherence in these experiments (QUEST) often generated only a small number of trials in some regions of the coherence scale. An important property of the cost function shown in Equation 3 is that it weights the model fit in favor of the most reliable data points. Note that the variance estimates provided by the equations of Miller (1996) were divided by 2 for the discrimination experiments as a reflection of the term in the definition of d′ (Equation 1).
The final coherence threshold estimate for a given condition was obtained by substituting the parameters of the best-fitting model into the equation for the Weibull function (Equation 2) and solving for c when d′ was equal to 1.35. This point on the Weibull function corresponds approximately to 75% correct detections. The standard error of this point estimate was obtained by a nonparametric bootstrap procedure in which the correct and incorrect responses from each bin were re-sampled (with replacement) to produce new estimates of d′ across bins (Wichmann and Hill, 2001b). A psychometric function was fitted to each bootstrap sample and a corresponding threshold estimate was derived (as was done for the original data set). For each observer, the standard deviation of 10,000 bootstrap estimates was used to represent the standard error of the reported coherence threshold.
Summation in Experiment 1 was quantified by the ratio of single- and dual-motion coherence thresholds. Confidence intervals for these ratio estimates were calculated by taking the ratio of the coherence threshold estimates for each bootstrap sample, and then noting the 2.5% and 97.5% percentile of the resultant ratio distribution. Using this percentile method, differences in sensitivity were considered significant if the confidence interval for the threshold ratio of the two conditions did not include one (Carpenter and Bithell, 2000; Wichmann and Hill, 2001a). The method for calculating estimates of summation for Experiments 2-5 is described in the Results section.
To assess the effects of temporal uncertainty on performance, Experiment 1 measured sensitivity in single- and dual-motion conditions comparable to the fixation condition of Melcher and Morrone (2003), with and without an auditory cue that marked the onset time of the coherent motion (Fig. 2a). On cued trials, the cue was presented prior to the onset of the motion signal in the single-motion condition and prior to the onset of M2 in the dual-motion condition. Thus, in both single- and dual-motion conditions, there was only a single motion signal for which temporal uncertainty was eliminated. On uncued trials, motion onset was never flagged by a cue, as in Melcher and Morrone's experiments. Importantly, the visual parameters of the task were identical in cued and uncued conditions.
Figure 2b plots psychometric functions relating sensitivity (d′) to motion coherence for cued (thick lines) and uncued (thin lines) conditions in four observers. For a single motion signal embedded within the random-dot pattern (left panel), the temporal cue had a strong effect on sensitivity, as indicated by a leftward shift of the psychometric function for each of the four observers. Coherence thresholds measured for the uncued condition were reliably higher than those obtained in the cued condition (by a factor of 1.66 on average; SEM= 0.11). This strong effect of cuing on performance confirms that temporal uncertainty degrades sensitivity for discrimination of a single motion signal. In contrast, the cue had only a modest effect on performance in the dual-motion condition for most observers (right panel). Coherence thresholds for the uncued dual-motion condition were higher than for the cued dual-motion condition by a factor of just 1.19 on average (SEM= 0.09). This latter result implies that the second motion signal is subject to minimal uncertainty even in the absence of an explicit temporal cue. This would occur, for example, if partial information about the first motion signal – information that was insufficient to sustain reliable direction discrimination – provided a temporal cue to the impending onset of the second motion signal. Regardless of the specific mechanism, the differential effect of cuing on performance in the single- and dual-motion conditions implies that temporal uncertainty may have inflated estimates of summation in the study by Melcher and Morrone (2003).
Figure 3 provides a direct comparison of coherence thresholds in the single- and dual-motion conditions with and without the cue. The right panels show the ratio of coherence thresholds for single- and dual-motion conditions for each observer. A threshold ratio of one indicates that sensitivity for the single- and dual-motion conditions was equivalent (i.e., zero summation), whereas a ratio of two indicates that sensitivity in the dual-motion condition was twice that observed in the single-motion condition (i.e., as expected for a linear integrator; (Morrone et al., 1995; Burr and Santoro, 2001). In the absence of a temporal cue, sensitivity for the dual-motion condition was higher than that observed for the single-motion condition for all observers (Fig. 3a). This summation is reflected by the threshold ratios, which were well above one (M = 1.41, SEM = 0.06). This clear advantage for the dual-motion condition replicates that reported by Melcher and Morrone (2003), though the effect appears somewhat smaller than in the original study. We address this issue in the Discussion. Strikingly, the provision of a temporal cue abolished entirely the dual-motion advantage: threshold ratios were around one for all observers (M = 1.03; SEM = 0.06; Fig. 3b). Given that the visual parameters of the task were identical in cued and uncued conditions, these findings are difficult to explain in terms of integration early in the visual system. If it is assumed that visual motion detectors accumulate inputs obligatorily within a finite temporal window (Burr and Santoro, 2001), summation should have been the same for the cued and uncued conditions.
In a variant of the uncued condition, we found that halving the overall duration of the trial – which increases the predictability of motion onset without explicit cuing (Fraisse, 1984; Gibbon et al., 1997; Leon and Shadlen, 1999; Gallistel and Gibbon, 2000; Leon and Shadlen, 2003; Janssen and Shadlen, 2005) – also markedly reduces summation, even though the duration of each coherent motion signal (and the interval between motion signals in the dual-motion condition) was identical in both cases (Supplementary Figure 1). This attenuated summation was mostly attributable to improved sensitivity for the single-motion condition for the short trial relative to the long trial; dual-motion performance was similar in both conditions. This result is again difficult to explain in terms of sensory integration but would be expected if thresholds (in the single-motion condition) are limited by the effects of temporal uncertainty on performance.
It might appear that the effects of decision noise arising from temporal uncertainty provide a complete account of summation in this task. This is not the case, however; In Experiment 2 we had the cue announce the onset of M1 rather than M2 (Fig. 4a). In that case the cue provides good temporal information about both M1 and M2, and so it would be difficult to attribute any summation observed to the differential effects of uncertainty on performance in single- and dual-motion conditions. To provide more explicit estimates of summation, we measured thresholds for the dual-motion condition as well as for each component motion signal in isolation, that is, when M1 was present but not M2, and when M2 was present but not M1. This approach allows the data to be compared more directly with the predictions of the sensory integration account, on the one hand, and with models that attribute summation to post-sensory decision processes, on the other.
Figure 4b shows coherence thresholds for the dual-motion condition, as well as for each of the component single-motion conditions, when a cue was provided near the nominal onset time of M1. Thresholds in the dual-motion condition were typically lower than those observed in both of the component conditions, demonstrating that summation can be observed under conditions of minimal temporal uncertainty. To assess the linearity of this summation, it is not appropriate to use a simple threshold ratio because sensitivity for each of the component conditions differed within each observer. However, summation can be assessed by expressing the motion strength of each component (M1 and M2) at threshold in the dual-motion condition as a proportion of its corresponding threshold when measured in isolation (Fig. 4c). For linear integration, these normalized quantities should sum to one and the threshold in the dual-motion condition should fall at a point along the diagonal line in the figure (Alais and Burr, 2004).
There is clear evidence of summation for three observers (i.e., their dual-motion thresholds are within the shaded region), but the magnitude is smaller than that expected for a linear sensory integrator. The summation is consistent, however, with the predictions of an alternative model in which M1 and M2 are assumed to be processed independently at a sensory stage and then combined in a statistical sense at a subsequent stage of the perceptual decision process (dashed curve; Ernst and Banks, 2002). This ‘decision-stage integration’ model, which implements a form of maximum likelihood estimation or optimal Bayesian integration, has accounted for a variety of perceptual phenomena in which sensory-level neural interactions are thought to be implausible (e.g. in experiments on multisensory integration; Ernst and Banks, 2002; see Knill and Pouget, 2004 for a review). Because the model weights information in proportion to its reliability during the decision process, it can also account for the finding that no summation was observed when the cue announced the onset of M2 rather than M1 (Experiment 1). In that case, the sensory evidence derived from M1 would be unreliable due to uncertainty and thus receive a low weight in the decision. Responses would instead be guided predominantly by the more reliable information derived from M2, and thresholds in the dual-motion condition would resemble those in the single-motion condition, as we observed. Finally, for comparison, the predicted effects of probability summation are typically smaller than those for each of the models plotted in the figure (Watson, 1979; Meese and Williams, 2000; Tyler and Chen, 2000).
In sum, Experiments 1 and 2 demonstrate that summation in the motion discrimination task is strongly modulated by temporal uncertainty and is best explained by a model that assumes a statistical combination of independent sensory events at a decision stage. However, these quantitative comparisons of the data with linear and non-linear models cannot rule out entirely a sensory basis for summation. In the following experiments, we demonstrate that the summation mechanism does not possess either of two fundamental characteristics of visual motion detectors: direction selectivity and spatial selectivity.
Experiment 3 employed a variant of the task in which the direction of motion in the stimulus was irrelevant to the perceptual decision. Specifically, observers were required to distinguish trials that contained coherent motion (i.e., signal-plus-noise) from trials that did not contain coherent motion (i.e., noise-only; Fig. 5a). Crucially, the two motion signals in the dual-motion condition were either in the same direction (‘correlated’) or in opposite directions (‘anticorrelated’). The sensory integration hypothesis predicts that the dual-motion advantage should be greater for the correlated condition than for the anticorrelated condition, because the effects of integration will be maximal when both motion signals stimulate a common population of direction-selective units (Albright, 1984; Meese and Harris, 2001; Clifford and Ibbotson, 2002). In contrast, a decision-stage mechanism predicts no difference in sensitivity between the two conditions because the motion signals provide equally good statistical evidence for the presence of coherent motion regardless of the directional correlation between them.
For three out of the four observers, the coherence thresholds for the dual-motion conditions were lower than those for each of the component conditions (M1-only, M2-only), indicating that both correlated and anticorrelated directions yielded a dual-motion advantage (Fig. 5b). These dual-motion thresholds can be compared with the predictions of the linear sensory integration model and the non-linear, decision-stage integration model (Fig. 5c). Figure 5c has been expanded to allow separate expression of correlated (green symbols) and anticorrelated motion components (magenta symbols). Here, the linear integration model predicts summation only for correlated motion directions (dotted line). By contrast, the decision-stage integration model predicts equivalent summation for correlated and anticorrelated conditions (dashed curve).
The coherence thresholds for the dual-motion condition are clearly inconsistent with the sensory integration model: similar improvements in sensitivity were observed for correlated and anticorrelated conditions. The symmetric thresholds are, however, entirely consistent with the predictions of the decision-stage model. Summation is slightly weaker for this coherence-detection task than for direction discrimination, making it unclear whether the data are best explained by a Bayesian integration model (Ernst and Banks, 2002), or by probability summation (Watson, 1979; Meese and Williams, 2000; Tyler and Chen, 2000).
Melcher and Morrone's key finding was that summation occurred even when a saccade intervened between the presentation of M1 and M2. In that condition, the two motion signals occupied the same position in space but stimulated disparate positions on the retina. To ensure that this summation reflected a spatially selective integration mechanism, they conducted a control experiment in which an observer maintained gaze on a central fixation point and M1 and M2 were presented above and below fixation, respectively. This arrangement approximated the retinal events that occurred during their saccadic condition, except that there was no spatial correspondence between the two motion signals. They observed no summation under these conditions, consistent with their putative spatiotopic integration mechanism. In that experiment, however, an auditory tone was presented during the interval between the two motion signals to cue the observer's attention from one random-dot pattern to the other. In Experiment 1, we showed that cuing the onset of M2 (while leaving M1 onset uncertain) in this way abolishes summation even for motion signals that occupy the same retinal and spatial location. From this perspective, the absence of summation in Melcher and Morrone's control experiments is perhaps not surprising, and prompted us to reinvestigate the question of whether the integration mechanism exhibits spatial selectivity.
In Experiment 4, we examined the spatial specificity of summation using an approach similar to that of Melcher and Morrone's control experiments, except that temporal uncertainty was minimized by cuing the onset of M1 (Fig. 6a). For three out of the four observers, sensitivity in the dual-motion condition exceeded that of both component conditions, indicating that summation occurs even for spatially separated motion signals (Fig. 6b). For the remaining observer (SJC), sensitivity in the dual-motion condition was notably worse than for the condition in which M1 was presented alone. The reason for this unexpected finding is not clear, but may reflect a difficulty in shifting attention from one side of the display to the other in the dual-motion condition. Figure 6c compares thresholds in the dual-motion condition with the predictions of the sensory and decision-stage integration models. A spatially-selective integrator – regardless of whether it is linear or non-linear, and of whether it operates at a sensory or decision-stage – predicts no summation for spatially non-matched motion signals. By contrast, the predictions for a spatially invariant integrator are the same as for the spatially-matched condition in Experiment 2 (cf. Figure 4c). For all observers except SJC, the summation observed with spatially separated motion signals is consistent with the decision-stage integration model. Moreover, the magnitude of summation is comparable to that in which the motion signals occupied the same retinal and spatial position (compare Figure 6c and Figure 4c).
Finally, we conducted a further experiment to rule out the unlikely possibility that saccadic eye movements introduce spatial specificity to an otherwise non-spatial integration mechanism. As in Melcher and Morrone's saccadic task, observers performed a 12° saccadic eye movement during the 1000ms interval that separated the two motion signals. Crucially, the two motion signals occurred at either the same position in space (that is, M1 and M2 both appeared within a patch located above the horizontal meridian; ‘matched’ condition) or at different positions in space (that is, M1 and M2 appeared within a patch above and below the horizontal meridian respectively; ‘non-matched’ condition). The geometry of the task and stimuli for the matched and non-matched conditions were equivalent, and the retinal separation of the two motion signals was equal to that used in Melcher and Morrone's experiments. We also measured sensitivity for each of the component motion signals (M1 only [upper patch], M2 only [upper patch], M2 only [lower patch]) to permit quantification of summation.
Two out of the four observers showed summation in both matched and non-matched conditions (Fig. 7). The magnitude of this effect was similar for the two conditions, and also similar to that observed during fixation (compare Figure 7c with Figures Figures4c4c and and6c).6c). These findings are in clear opposition to the predictions of the spatiotopic sensory integration hypothesis, in which summation is expected only for spatially aligned motion signals. The remaining two observers performed worse in the dual-motion condition than for the condition in which M1 was presented alone. These findings are also inconsistent with the sensory integration hypothesis, but can be explained the decision-stage model if we assume that the eye movement requirements of the task compromised the ability of observers to effectively incorporate both motion signals into the perceptual decision. The non-ubiquitous presence of summation across participants in this experiment parallels that observed in Experiments 2 and 4 for discrimination of spatially matched and non-matched motion signals during fixation (cf. Figure 4c and Figure 6c).
The findings of the current study argue strongly against a sensory-integration explanation for the dual-motion advantage observed during fixation and across saccadic eye movements by Melcher and Morrone (2003). The provision of an auditory cue around the time of the (second) motion signal was sufficient to eliminate this summation effect, even though the visual parameters of the task were identical in cued and uncued conditions (Experiment 1). This finding is inconsistent with a motion detector that integrates sensory inputs obligatorily over time (Burr and Santoro, 2001), regardless of whether the detector is spatiopically or retinotopically tuned. Cuing the onset of the first motion signal (Experiment 2), which provides good temporal information about both motion signals (because of the short ISI), restored the dual-motion advantage for most observers. However, the mechanism that gives rise to summation under conditions of minimal temporal uncertainty is not tuned to motion direction per se, because equivalent summation was observed for detection of motion signals in the same direction as for detection of motion signals in opposite directions (Experiment 3). Moreover, the mechanism is not spatially selective. Similar summation was observed for spatially matched and spatially non-matched motion signals (in retinal and spatial coordinates), regardless of whether gaze was maintained (Experiments 2 and 4) or an intervening saccade was required (Experiment 5). The results of Experiments 3-5 are inconsistent with the sensory integration hypothesis because motion detectors would be expected to combine minimally signals that are in their non-preferred directions or located outside their receptive fields (Albright et al., 1984).
In contrast, the results from all of our experiments are consistent with a model in which the two motion signals are assumed to be processed independently at the sensory-stage, but are available to the decision-maker at the time of the perceptual choice. This additional sensory evidence leads to a probabilistic enhancement of perceptual sensitivity, either because the decision is based on the “better” of the two estimates ((Watson, 1979; Meese and Williams, 2000; Tyler and Chen, 2000), or because a refined estimate of motion direction is obtained via post-sensory computations akin to near-optimal statistical inference. (Gold and Shadlen, 2000; Knill and Pouget, 2004; Gold and Shadlen, 2007). In that case, the quantities that are considered are samples of a ‘decision variable’ – an abstract representation of sensory data that is useful for directing the particular decision at hand, such as the relative likelihood of the available choice alternatives, but which discards other properties of a visual stimulus (e.g. its spatial location, color, form etc.; Gold and Shadlen, 2007). This lack of spatial and feature representation suggests that these decision-stage mechanisms would be of limited use for maintaining the stability of visual perception across saccadic eye movements.
This new interpretation stands in opposition to that of Melcher and Morrone (2003) and other authors (e.g., Melcher et al., 2004; Melcher, 2005; Prime et al., 2006; d'Avossa et al., 2007; Melcher, 2007), in which summation is attributed to temporal integration by spatiotopically-tuned motion detectors early in the visual system. These hypothetical detectors were suggested to reside in area MT (V5) of visual cortex – a prediction that was supported by an initial neuroimaging investigation of spatial tuning in human area MT (d'Avossa et al., 2007). Subsequent work, however, demonstrated that ostensibly spatiotopic responses in MT reflect noise rather than the visual stimulus (Gardner et al., 2008). Instead, a retinal frame of reference for visual responses in human MT (and other visual areas) was confirmed, consistent with studies of non-human primates (Krekelberg et al., 2003). Given our reinterpretation of Melcher and Morrone's findings, we do not know of any psychophysical evidence to suggest that spatiotopic coding of visual motion should be found in visual cortex, consistent with the findings of Gardner et al (2008).
One aspect of our results that remains unexplained is that the magnitude of summation observed in the uncued condition of Experiment 1 – which most closely matched the experiments of Melcher and Morrone (2003) – appears to be smaller (a threshold ratio of around ) than that observed in the original study (a mean threshold ratio of 1.73 for the fixation condition). We note two important points that could explain this apparent discrepancy. First, threshold ratios are volatile, as is evident in the large confidence intervals surrounding our estimates in Figure 3. Given that our estimates were based on a larger dataset than in the original experiment (at least six QUEST sessions per condition for each observer compared with four in the original study), we expect similar or larger confidence intervals would surround the ratio estimates of Melcher and Morrone (2003). Thus, the apparent differences in effect size between the two studies are not statistically reliable. Second, Experiment 1 identified temporal uncertainty as an uncontrolled factor in the original study that confounds measurements of sensitivity and summation. This effect of uncertainty on performance was not small, but rather an almost two-fold modulation of sensitivity for discrimination of a single motion signal. Such effects of stimulus uncertainty are well-documented (Cohn and Lasley, 1974; Lasley and Cohn, 1981; Pelli, 1985; Shiu and Pashler, 1994; Luck et al., 1996; Prinzmetal et al., 1997; Luck and Thomas, 1999; Gould et al., 2007). Hence, small differences in levels of uncertainty between the two studies could lead to considerable changes in estimates of summation magnitude. Although we have no direct evidence for increased uncertainty in the original study, we speculate that it could be related to the fact that a wide range of ISIs was used (500-8000ms) compared with the single ISI in the current study (1000ms). Alternatively, given the small number of participants (four in the current study, three in the study by Melcher and Morrone), different levels of uncertainty might have arisen simply from natural intersubject variability.
From a broader perspective, the current findings shed new light on other studies that have employed variants of the dual-pulse discrimination task introduced by Melcher and Morrone (2003). Such studies have probed feature-based attentional selection (Melcher et al., 2005), visual selection in the absence of awareness (Melcher and Vidnyanszky, 2006), and attentional modulation of sensory integration time constants (Melcher et al., 2004). Specifically, it will be important to determine the potential contribution of temporal uncertainty to the reported psychophysical phenomena.
In sum, the findings of the current study suggest that the dual-motion advantage reported by Melcher and Morrone (2003) – as observed during fixation and across saccadic eye movements – is most parsimoniously explained by a probabilistic advantage at the level of decision-making and not by sensory integration. This new perspective reconciles the findings of Melcher and Morrone with the large body of work which suggests that little information about visual features is retained and integrated across saccadic eye movements (see Prime et al., 2006; Prime et al., 2007) and leaves open the question of how perceptual stability is realized in the brain. Further, our results highlight the importance of decision-making factors beyond the representation of sensory variables and provide a novel example of near-optimal perceptual integration in the human brain.
The authors wish to thank David Melcher and Concetta Morrone for generously providing methodological details and data from their study. This work was supported by an Overseas Biomedical Fellowship from the National Health and Medical Research Council of Australia awarded to APM (525487), an NHMRC Project Grant awarded to JBM, and an NIH grant awarded to BK (EY017605).