|Home | About | Journals | Submit | Contact Us | Français|
A moving object elicits responses from V1 neurons tuned to a broad range of locations, directions, and spatiotemporal frequencies. Global pooling of such signals can overcome their intrinsic ambiguity in relation to the object’s direction/speed (the “aperture problem”); here we examine the role of low-spatial frequencies (SF) and second-order statistics in this process. Subjects made a 2AFC fine direction-discrimination judgement of ‘naturally’ contoured stimuli viewed rigidly translating behind a series of small circular apertures. This configuration allowed us to manipulate the scene in several ways; by randomly switching which portion of the stimulus was presented behind each aperture or by occluding certain spatial frequency bands. We report that global motion integration is (a) largely insensitive to the second-order statistics of such stimuli and (b) is rigidly broadband even in the presence of a disrupted low SF component.
Motion perception is an essential component of vision within complex natural environments, and has been a focus of study for more than a century (for review see Bradley and Goyal (2008)). However, the computational issues that motion perception raises, and how these might be addressed in terms of neural modelling, have proven difficult to overcome. Our present understanding rests on the notion that motion is initially encoded by directionally-tuned neurons in cortical area V1, each operating within a narrow range of spatio-temporal frequencies and over a limited area of space (Anderson & Burr, 1987; Hubel & Wiesel, 1968). Populations of such neurons (collectively tuned to a broad range of spatial and temporal frequencies) operate in parallel across the visual field. While such highly spatially-localized and specific signals accurately reflect local retinal motion (i.e. over a small area), they fail to reflect global motion (i.e. of larger moving objects), which elicit responses from neurons tuned to a broad range of spatial frequencies (SF), directions and spatial locations. Consequently local motion signals must be combined to signal global motion. The present works explores the global pooling of local signals in terms of the integration of signals across SF channels (experiment 1–3) and the role of 2nd order spatial statistics (experiment 2)
In order to effectively encode motion in the natural environment, the integration of V1 signals must reflect the correlations that exist across space and time. We focus on two statistical regularities that we review in turn. The first concerns the second-order correlations within the contours of natural scenes and the second, the spatial alignment of structure across spatial frequencies (SF) channels (Attneave, 1954; Barlow, 1961).
Natural scenes contain a preponderance of edges (Attneave, 1954; Barlow, 1961) whose properties tend to vary smoothly across a scene, a characteristic termed ‘good continuity’ by the Gestalt psychologists (Wertheimer, 1958.) More formally the relationship has been defined in terms of the probability that one edge point predicts the occurrence of another edge point at a given distance (d), orientation difference (ϕ) and contour angle (θ) (Geisler, Perry, Super, & Gallogly, 2001). Broadly speaking the smaller ϕ, θ & d, the more likely one is to encounter another edge point. Psychophysicists have examined if and how the visual system exploits such regularities using paradigms in which small oriented elements (typically Gabor) are used to build contours with particular second-order relations (e.g. co-circularity) which are then embedded in a field of randomly-oriented distracter elements (e.g. Field et al, 1997). In this paradigm, contour detection must involve global integration since it operates over spatial distances and across spatial phase in a manner that could not be achieved by conventional V1 neurons (Hess & Dakin, 1997). Sensitivity to contours has been shown to increases with lower curvature (smaller ϕ & θ) and contour length (Field, Hayes, & Hess, 1993), consistent with the statistics of natural scenes although the effect of inter element distance does not strongly alter performance (Field, et al., 1993).
While it is clear that the second-order distribution of orientation structure across the visual field is critical for determining our ability to see static extended contours, the role of such statistics in motion processing is less clear. Second-order orientation statistics can certainly influence motion processing when the underlying elements are locally ambiguous. In a classic study (Lorenceau & Shiffrar, 1992) demonstrated that the perceived directions of four moving bars (Figure 1) can be dramatically altered by changing the appearance of occluding elements. Although the bars in Figure 1a & 1b move in an identical fashion (sinusoidaly translating in the direction perpendicular to their orientation) the perceived directions of motion are different. In Figure 1a the bars are perceived to move as independent pairs, but when the occluders are present in Figure 1b, the individual components ‘cohere’ and appear to move as a rotating diamond whose vertices are occluded. This dramatic change in percept is thought to arise from a change in the classification of the end points from ‘intrinsic’ (i.e. part of the object) to ‘extrinsic’ (arising from occlusion by another object). This argument is intuitive. When the endpoints are considered part of the object eliciting motion there is only one physically realistic interpretation: independent motion. However, if the endpoints are due to an occluding object, the motion signal generated at the intercept bears no relation to object motion. In isolation, this leaves the speed and direction (velocity) of each bar ambiguous, potentially consistent with an infinite range of speeds (as shown in Figure 1d). However by combining multiple vectors, a unique solution can be found where the constraint lines meet, a direction know as the intersection of constraints (IOC) solution (Movshon, Adelson, Gizzi, & Newsome, 1986).
The manner in which occlusion affects motion processing is consistent with recent research demonstrating that one-dimensional (1D) and two-dimensional (2D) signals are treated quite differently by the motion-processing stream. By measuring the perceived direction of multiple Gabor stimuli (Nishida, Amano, Edwards, & Badcock, 2006, 2007) have shown that integration of 1D plaids occurs in a manner consistent with the IOC rule, whilst integration of 2D plaids produces answers in line with predictions from a Vector average (VA) rule. Furthermore Browns and Alais (2006) have shown that adaptation to stimuli yielding a VA solution generates a large shift in the perceived direction towards the IOC interpretation and vice-versa. Such adaptation suggests that the two solutions operate independently and compete to determine the overall percept of motion. Although differential processing of 1D and 2D stimuli has been demonstrated in both the psychophysical (Nishida, et al., 2006, 2007) and neurophysiological (Adelson & Movshon, 1982; Albright, 1984) literature, it remains unclear whether the aperture problem is solved locally (e.g. IOC; (Adelson & Movshon, 1982; Simoncelli & Heeger, 1998) or by the non-Fourier component of motion (Wilson, Ferrera, & Yo, 1992) or the change in the position of features over time (Bowns, 1996).
This uncertainty also abounds in the interpretation of ‘component’ and ‘pattern’ cells in motion sensitive areas of primate and feline brains. While the majority of V1 neurons are selective for the motion of the individual sine-wave components of moving plaids, the majority of cells in area MT respond to the direction of the plaid (Albright, 1984; Movshon, et al., 1986). Again, it is unclear what mechanism mediates the response of the ‘pattern’ selective cells and a number of models have been proposed (Adelson & Bergen, 1985; Adelson & Movshon, 1982; Perrone, 2004; Rust, Mante, Simoncelli, & Movshon, 2006; Simoncelli & Heeger, 1998; Wilson, et al., 1992). Recent research by Majaj, Carandinin & Movshon (2007) has shown the ‘pattern’ selectivity of MT cells is contingent upon local interactions because ‘pattern’ selectivity is only observed when the two components are overlapping within the receptive field of MT neurons, if two components are spatially separated, the response profiles of MT ‘pattern’ cells are similar to those of ‘component’ cells. This suggests the response profiles of ‘pattern’ cells are mediated by local 2D cues not the integration of spatially disparate 1D cues (e.g. figure 1b). In short, the response of individual neurons in area MT is not immune to the aperture problem.
As well as the second-order spatial regularities discussed so far, natural scenes have the tendency for content across spatial frequencies to be spatially aligned (Attneave, 1954; Barlow, 1961). The early decomposition of retinal signals cannot fully encapsulate this property, as their spatial frequency tuning is too narrow (Anderson & Burr, 1987; Hubel & Wiesel, 1968), accordingly signals must be recombined achieve the broad SF tuning observed in the integration of both static (Dakin & Hess, 1998) and moving contours (Bex, Simmers, & Dakin, 2001; Ledgeway & Hess, 2002, 2006; Ledgeway, Hess, & Geisler, 2005). Such broadband integration is not without danger - an inflexible integration mechanism increases the risk that inappropriate or noisy signals may be integrated. Variation in the extent of integration across scale has been shown in a static contour tasks, with contour integration being spatially broadband along straight elements but narrowband at areas of high curvature (Dakin & Hess, 1998). Functionally, this arrangement should reduce the impact of noise by integrating where the signals are likely to be the same across scale (straight edges) but selectively integrating when the signal will vary across frequency (curved edges).
In the motion domain global integration has been shown to be broadband in detection tasks (Bex & Dakin, 2002) and in motion after effects (MAE) when isotropic flickering test stimuli are employed (Ashida & Osaka, 1994; von Grunau & Dube, 1992). For instance, while participants are unable to detect the motion of locally band-pass dots whose spatial frequency content do not overlap (Bex & Dakin, 2002; Ledgeway, 1996), the perception of global motion (rotation, translation & expansion) can be masked by noise elements at spatial frequencies that are remote from signal elements (Bex & Dakin, 2002). This suggests that global motions are not only SF broadband but are unable to selectively tune their input with respect to the stimulus type at hand. Such ‘rigid’ integration has also been observed in the orientation domain where Schrater, Knill & Simoncelli (2000) found thresholds for a signal embedded white noise are near optimal when the energy is uniformly spread around one speed place, but sub-optimal when the energy in confined to isolated sub-sets of the space.
The study of apparent motion has established that dmax (the greatest distance that motion may be detected over two successive frames), scales inversely with SF (under most conditions) (Baker, Baydala, & Zeitouni, 1989; Cleary & Braddick, 1990; Eagle & Rogers, 1996; Morgan, 1992) but much less work has studied the influence of SF in global motion tasks. There is some evidence that low SFs play a special role. In (Bex & Dakin, 2002), masking was strongest for low SF noise elements, even when matched for visibility, suggesting that coarse information is preferentially integrated. Further evidence that low SFs are used to ‘bind’ high SF comes from the phenomenon of ‘motion capture’, where high SF structure is perceived as moving in the direction of the low SFs, even when the directional signals are centred on opposing directions (Ramachandran & Cavanagh, 1987).
Given that global motion mechanisms appear to be broadly tuned for spatial frequency, we now assess the discriminability of direction within high-pass, low-pass or broadband carrier signals. To date the majority of stimuli used for probing spatial frequency and second-order statistics have relied on either dot stimuli, gratings or drifting Gabors. By contrast here we used more natural contoured stimuli to examine the interplay between these different constraints on direction discrimination.
Three psychophysically experienced observers (DK, SD, JG) with normal or corrected-to-normal vision, took part in all experiments. In experiment 2, subject JG was replaced with JC.
Stimuli were generated on an Apple iMac, running MATLAB (MathWorks) using elements of the Psychtoolbox (Brainard 1997, Pelli 1997). Stimuli were displayed on a Dell, Trinitron CRT with spatial and temporal resolution set to 1080 * 768 pixels and 85 Hz respectively. The screen was viewed from a distance of 1.5 m so that one pixel subtended 0.35 arcmin. of visual angle. The monitor signal was passed through an attenuator (Pelli & Zhang, 1991), following which the signal was amplified and copied (using a line-splitter) to the three guns of the monitor resulting in a pseudo 12 bit monochrome image. Monitor linearization was achieved by recording the relationship between the signal and the monitor intensity (Minolta LS 110 photometer), to create a linearization look up table that was passed to the Psychtoobox internal colour look up table.
The mean luminance of the stimuli was 30.5 cd/m2 with a root-mean-square contrast of 0.20. Stimuli were viewed through a large 2D raised cosine aperture (tapered annulus radius; 1.38 arc min) presented in the centre of the display. The radius of the aperture was ether 2.95° or 1.17° (two viewing areas were employed to control against ceiling affects; see below). The smaller aperture size was equal to the total signal area in the locally apertured condition in Experiment 2. Due to the tapered annulus used, the visible area was taken to be the area above contrast detection threshold in keeping with the detectable area of Gabor stimuli (Fredericksen, Bex, & Verstraten, 1997).
Stimuli were generated by spatially band-pass filtering random noise using a 2D Laplacian-of-Gaussian filter - σ = 22.8 arc min - and then thresholding the result at mean luminance to generate binary “blob” images. An example stimulus is illustrated in Figure 2a. This procedure allowed us to rapidly generate complex shapes with a broad SF profile. 200 such images were generated. On each trial a random image was selected, with replacement. Low-pass images were generated by convolving the broadband images with a Gaussian filter (σ = 5.4 arc min., Figure 2b). The use of the same source images reduced any between-image variability. One set of high-pass images (Figure 2c) was generated by subtracting a Gaussian (σ = 2.1 arc min.) filtered version of the broadband images from the source image. This process is “leaky” – allowing through some low-frequency information and leading to the Craik–Cornsweet–O’Brien (CCOB) illusion (Cornsweet, 1970; Craik, 1966; O’Brien, 1958) to be present in our stimuli (observe how the areas within the contours appear to be light or dark even though the luminance of each patch is equal). To control the potential influence of this illusory coarse-scale structure the low-pass image was subtracted twice more (Figure 2d). This procedure has previously (Dakin & Bex, 2003) been shown to completely abolish the CCOB effect by both further attenuating the low-frequencies and nulling the effect by reversing the polarity of the inter-blob areas.
The carrier component of stimuli translated at a speed of 3.93 deg/s for 0.3 seconds (refresh rate 85 Hz = 26 frames) in near-upwards directions. Motion was generated using operations built in to the computer’s graphics card, accessed using the OpenGL programming language. During each trial the stimuli was passed to the graphics card buffer. Stimuli (11.5 × 11.5 deg.) were greater in size than the viewing aperture (radius 2.95 deg./1.17 deg.), during each frame a segment of the original image was displayed - By smoothly varying the area of the original image presented to the monitor/subject a percept of rigid translation of the image through the aperture was generated. To avoid the potential effects of an orientation bias the underlying image was randomly flipped from left to right during each trial. Between trials a phase-scrambled version of the original broadband stimuli was placed within the viewing area.
A method of constant stimuli (MCS) was used to assess fine direction-discrimination with such patterns. A small offset clockwise (CW) or anticlockwise (ACW) was added relative to vertical upwards motion. The observer’s 2AFC task was to fixate on a continuously present cross at the centre of the monitor and to indicate the direction of motion, guessing if necessary. Audio feedback was provided for incorrect answers. The offset was between ±7° (large radius) and ±10° (small radius) at 17 equally spaced intervals. Each point was measured 17 times per run and all participants completed at least 2 runs (i.e. 578 trials per condition), extra trials were added if the psychometric function was under or over constrained. All conditions where randomly interleaved.
The procedure for deriving thresholds was identical to (Dakin, Mareschal, & Bex, 2005); the psychometric function was fit with a wrapped Gaussian and the standard-deviation parameter of the best fitting function was taken as the estimated threshold. A bootstrapping technique was employed to estimate 95% confidence intervals on these estimates; data were re-sampled with replacement across each point (assuming binomial error) in the psychometric function a total of 1024 times and the function refit. In all plots, error bars indicate 95% confidence intervals on the threshold estimates
In the first experiment we sought to determine the relative influence of information across SF channels in our ‘naturally’ contoured stimuli. Figure 3 plots direction discrimination thresholds for DK, JG & SD, measured with four underlying carrier signals (broadband, leaky high-pass, strictly high-pass and low-pass). Error bars indicate 95% confidence intervals. Note how performance is worse with the smaller aperture (dark grey), indicating that performance in the smaller aperture conditions is not at ceiling. In Figure 3d, thresholds for DK, JG & SD were first mean adjusted to zero to correct for biases, then pooled across participants. Thresholds were broadly similar for the high-pass and broadband conditions but thresholds were significantly higher for low-pass stimuli in the smaller aperture condition. Thus direction sensitivity increases either by increasing spatial frequency or increasing aperture size, which suggests that performance depends on the number of cycles of the elements that are present within the aperture. In a noisy system, increasing the number of elements present increases the signal certainty. The observation that performance in the low pass condition is worse than in the broadband and high-pass conditions indicates that the signal is the least reliable in the low frequencies as shown later; and reveals no special role for low SFs, i.e. unlike motion capture (Ramachandran & Cavanagh, 1987).
Given that removing the low SF information from broadband images did not substantially impair direction discrimination thresholds, we next asked if the natural contour structure within our stimuli was promoting the integration of high SF motion signals. To test this hypothesis, we assessed the impact of disrupting the second-order motion/orientation statistics of our stimuli by placing apertures over the stimuli (Figure 4a). Global structure could then be disrupted by randomly switching the signals passing under each aperture with another randomly chosen aperture (Figure 4b,d). Scrambling in this manner across all apertures preserved local signals but disrupted global structure. Note that breaking global structure in this way disrupts both the second-order statistics and the low SF components of the signal. Therefore the effect of scrambling can only be identified by comparing performance across both the high-pass and broadband stimuli. Thus if motion processing exploits the statistical regularities of second-order structure in naturalistic images, then performance should deteriorate in both the high-pass and broadband conditions as this structure is abolished. Alternatively, if a detriment to performance is observed only in broadband stimuli, then disruption to the low SFs is driving any observed reduction in performance.
Stimuli were identical to the broadband (Figure 2a) and high-pass (Figure 2c) stimuli of Experiment 1 but were viewed through a mask consisting of a series of circular raised-cosine apertures (radius 16.2 arc min.; tapered region radius 1.38 arc min.). The underlying noise carrier translated upwards and each contour passed through the middle of each aperture during the middle frame of the trial (Figure 4f). This arrangement of the apertures and contours rendered the global structure of the stimuli easily apparent to the observer. Further, centring the apertures over the contours reduced between-trial variability that would have resulted from a random placement of the apertures. Due to the random nature of the stimuli the number of apertures varied, with a mean of 86.4 and a standard deviation of 6.8. Scrambling was achieved by swapping the signal under one aperture with that of another randomly chosen aperture. Scrambling in this manner preserved local signals but disrupted global structure. Example stimuli can be viewed in movie 1 (un-scrambled) and movie 2 (scrambled).
Subjects, procedure and apparatus were identical to Experiment 1. Upper and lower limits of the MCS were ±10° across all conditions.
Results from Experiment 2 are plotted in Figure 5 and show two main effects. First, thresholds for the broadband stimuli (grey triangles) are higher than thresholds obtained without the obscuring apertures (dashed lines). Second, scrambling increased thresholds two-fold across the broadband condition but only had a weak effect on the high-pass stimuli. Interestingly participants reported a percept of rigid translation under all conditions except the broadband scrambled condition where a small amount of spatial incoherence was observed. This pattern of results suggests that the second-order statistics do not (significantly) influence motion processing in our experiment because scrambling would have predicted an equivalent effect in both the high-pass and broadband signals. Instead, our results are consistent with a global motion mechanism that pools directional information across space and SFs but is insensitive to the relative motion information in nearby locations. In this model, scrambling increases the directional bandwidth at low SF’s (Figure 5; d,e) leading to a loss of sensitivity. The weaker effect observed in the high-pass conditions reflects the weak signal in the low SF’s (see Figure 2e). Later sections attempt to justify this position further by isolating the low SF component of the signal (experiment 3) and assessing the variability in the signal though a model of V1 neurons (see Model).
Experiment 3 was designed to probe the role of low SFs in the scrambling effect observed in Experiment 2. This was achieved by progressively attenuating the high SF component of the broadband signal to isolate the low SF component by convolution of the carrier signal with Gaussian kernels of progressively larger spatial extent. Examples of the scrambled and unscrambled stimuli are shown in Figure 6.
Subjects, procedure and apparatus were identical to Experiment 2. Stimuli were low-pass versions of the broadband stimuli in Experiment 1 from which five low-pass conditions were created by convolving the broadband images with a Gaussian filters set to σ = 5.4 7.8 11.4 16.2 22.2 arc min - after convolution the contrast for all conditions was set to a root-mean-square contrast of 0.20 (6.0 cd/m2). The five new stimuli were then tested across both the scrambled and unscrambled conditions of Experiment 2 to generate 10 new conditions
Figure 7 shows the results of Experiment 3, which are in good agreement with the results of Experiment 2. Scrambling induced a twofold increase in thresholds at low levels of stimulus blur (σ = 0.09). To examine the effects of increasing blur, a straight line was fit to the log of thresholds across the scrambled and unscrambled conditions. The exponent of the fit was recorded and error bars were generated using a bootstrapping procedure with 1024 iterations. The results of the fitting procedure (Fig 7d–f) show that the exponent is higher in the scrambled condition (significantly so for DK and SD). This means that motion discrimination thresholds increase more quickly with blurring for scrambled than unscrambled conditions. Since increasing the level of blur in the images does not alter the second order statistics we conclude that it is the disruption of low SF components of the signal that is driving the effect of scrambling
The accurate estimation of motion-direction is trivial for objects containing isotropic orientation structure. Under such conditions the distribution of motion energy is predictable and veridical estimates of the direction of motion can be obtained by simply calculating the centre of motion energy. However in natural, unconstrained environments this is rarely, if ever the case and biases in motion energy render such a strategy unreliable. The paradigm we have described is able to probe the influence of imbalances in motion energy simply because the stimuli used exhibited anisotropies in the orientation structure that varied randomly from trial-to-trial. We propose that the psychophysical data reported in this paper can be explained by the manner in which motion energy imbalances vary with the type of stimulus manipulation used (e.g. SF filtering or scrambling). Statistically the size of the imbalance will vary inversely with the number of cycles present in the stimulus, a fact that can explain the higher thresholds observed in the low-pass condition of Experiment 1. In Experiment 2&3 scrambling will induce ‘spurious’ correlations in the low SF component of the signal (see model), increasing anisotropies in the motion energy and in turn raising psychophysical thresholds.
The lack of an effect of disrupting the second order statistics is surprising considering the importance of second-order statistics in the detection of static (e.g. Field, Hayes et al. (1993)) and moving contours (Bex, et al., 2001; Ledgeway & Hess, 2002, 2006; Ledgeway, et al., 2005). More directly our work appears to contradict the findings of Lorenceau and Alais (2001) who show performance on a motion discrimination task is better for ‘closed’ forms than ‘open’ forms. Although both studies used very similar paradigms, the stimuli employed differed in terms of their perceptual ambiguity: The class of stimuli employed by Lorenceau and Alais (2001) has been well studied and the percept of global motion is ambiguous and bi-stable (McDermott, Weiss, & Adelson, 2001) reflecting the potential of such displays to be consistent with more than one physical interpretation (see Figure 1). In contrast, the signal presented in the current paradigm was consistent with only one interpretation. This suggests that global second order statistics may only influence performance in a motion discrimination task when there are very high levels of uncertainty in the binding of spatially disparate elements. The finding also implies that studies of global motion with random second-order orientation statistics (e.g. Nishida, Amano et al (2006, 2007)) are designed to an appropriate level of abstraction.
Although our results suggest no role for the second-order statistics (within one SF channel) like that shown in contour detections paradigms (e.g. Field et al, (1993)), the effect of scrambling highlights the importance of the low SF component of motion and how manipulations of spatially disparate elements can dramatically influence the directional signal at this frequency. This observation has implications for a number of other studies using apertured but broadband stimuli (e.g. Lorenceau (2001), Mingolla et al (1992)) where the directional signal of the low-pass component may play an important role.
The rigid integration of the damaged low SF component observed in our study indicates the motion stream is unable to filter out or ‘ignore’ SF channels on the basis of a high directional bandwidth in the distribution of motion energy. Although in the present stimuli such ‘filtering’ the low frequency component of motion would likely improve psychometric thresholds, the relationship between signal bandwidth and reliability is not straightforward. For instance a broad directional bandwidth is often the hallmark of an unambiguous directional signal (e.g. small dot stimuli), an observation that has been incorporated into the model of Weiss and Adelson (1998) where signals with broad directional bandwidths constrain estimates of global judgements to a greater extent than signals with narrow directional bandwidths.
We have applied a model of V1 neurons to the stimuli employed in Experiment 2 with the aim of examining the effect of scrambling and high-pass filtering on our stimulus. It is worth noting that initial attempts to model the data using V1 filters tuned only to the object speed where unable to match psychophysical thresholds, even in the absence of noise. This suggests that for ‘natural’ stimuli (exhibiting irregular orientation structure) motion signals must be integrated across broader range of V1 neurons to capture the full expression of component motion (i.e. stemming from all orientations, not just those orthogonal to the direction of motion). Accordingly, the model included a broad range of V1 neurons tuned to speeds and SF’s above and below that of the carrier signal (see appendix). All V1 neurons had a direction and spatial frequency bandwidth of 45° (full width at half height) and 1.5 octaves respectively. The output of each V1 neuron was divided by the neurons maximal possible output; no other forms of gain or inhibition were used. These simplifications were necessary to ensure that the variations in motion energy observed were stimulus lead rather than sensor lead.
The ‘raw’ motion energy of the model of V1 neurons is shown in velocity-direction plots (Figure 8). The conditions are depicted in the left hand column and the motion sensor is depicted in the bottom row. Note the spatial frequency of the carrier signal and V1 neurons becomes matched at 0.75 c/deg.
The first point to note is the peak motion energy follows the temporal frequency tuning of the V1 neurons not the speed tuning (in the broadband conditions; Figure 8(a,b). This pattern is consistent with the observed independence in the spatial and temporal frequency tuning of many V1 neurons (Foster, Gaska, Nagler, & Pollen, 1985; Priebe, Lisberger, & Movshon, 2006; Tolhurst & Movshon, 1975) and results in the motion energy being centred upon the veridical direction only when the spatial frequency of the motion sensor and the carrier signal becomes matched. Further the pattern provides an intuitive explanation for the rigid, broadband integration of signals across spatial frequencies observed in global motion studies (Bex & Dakin, 2002; Schrater, et al., 2000); for without such broadband integration (from the peak SF of the carrier signal upwards) the full expression of component motion (occurring across temporal frequencies) would not be captured.
Comparison of Figure 8(a) and (b) shows the effect of scrambling is to dramatically increase the bandwidth of the signal in the lower SF component indicating that scrambling leads the motion sensors to detect spurious correlations at low SF’s. Finally, the directional bandwidths are sharper in the high-pass conditions, reflecting a lower superposition of signals between SF’s. This suggests the low frequency component of the broadband stimuli leads to ‘masking’ of the high frequencies and provides a plausible explanation for the higher psychophysical thresholds observed in the broadband conditions.
The changing nature of the signal across SF channels has implications for models of global motion processing. The influential model global motion processing proposed by Simoncelli and Heeger (1998) was based on the observation that motion energy generated by a rigidly translating object is spread across a plane in spatiotemporal space. In order to capture this pattern of motion energy the model weights the influence of V1 neurons in a manner proportional to the Euclidian distance from a plane that defines the MT neurons velocity tuning. In such a model the optimal stimuli is one in which the signal is fractal and the motion energy is spread equally across a plane in spatiotemporal space. However the motion energy profile of the present stimuli deviates substantially from this assumption as the distribution of motion energy is greatest at the orthogonal orientations and the signal is not fractal. To further illustrate this point Figure 9 plots the spatiotemporal spectrum of the broadband stimuli of experiment 1 (shown in Figure 2a), in this plot the motion energy was normalised for the expected 1/f spatiotemporal profile (see appendix). The resulting spectrum looks like a donut in space, not a plane. Further the donut is not circular but makes an ellipse, with an elliptical hole orientated orthogonal to that of the ellipse (Figure 9a). Thus the motion energy has a broader spatiotemporal bandwidth at orientations orthogonal to motion whilst at orientations parallel to motion the signal is narrower, centred around a higher spatial frequency and temporal frequency of 0 Hz.
Clearly the motion energy generated by the current stimuli is different from the receptive field profile of Simoncelli and Heeger’s (1998) MT neurons, but which profile best reflects naturally occurring stimuli and/or which best reflect the spatiotemporal tuning profile of MT neurons?
More work is needed to fully answer this question, however it is worth noting that the current stimuli (which approaches zero energy at the low spatiotemporal frequencies) is able to be analysed in relative isolation to the surround (see appendix) and thus the profile generated can be considered as being elicited by the moving carrier signal rather than a combination of the carrier signal and it’s surround. Further, using the spatiotemporal profile of our stimuli as a model of MT generates specific predictions: Firstly, motion integration would have a preference for the motion energy to be centred upon the veridical direction and temporal frequency when the spatial frequency of the underlying carrier and V1 neuron is matched (low SF), secondly, global motion integration will have a preference for a stronger signal at the orthogonal orientations than the parallel orientations. Both prediction appear to contradict the evidence provided by Schrater et al. (2000) who show that performance in a motion detection paradigm is best when the motion energy is spread equally across spatiotemporal plane, rather than confined to isolated sub-sets of the plane, although this result does not rule out the possibility that more subtle variations in the distributions of energy across a plane may yield better performance in the same detection paradigm.
The work was funded by the Welcome Trust. We thank John Greenwood for his input.
256 trials were tested for each condition of Experiment 2. To avoid the artefacts introduced by the horizontal/vertical pixel raster the direction of motion on each trial was randomised between 0–2π, the analysis conducted, and the final result corrected for the initial random rotation. V1 neurons were simulated using drifting Gabors (equation 1) tuned to one of eight spatial frequencies and one of thirteen speeds - the aim was to compare the response of neurons to speeds across the full spectrum of component motion. V1 neurons were tuned to one of thirteen evenly spaced speeds from 0% (static) to 150% of the carrier signal speed (3.95 c/deg). The SF tuning went from 50% to 700% of the peak SF of the carrier signal (0.75 c/deg) in eight half-octave steps. To aid the comparison between V1 neurons the spatial and directional bandwidth of the neurons was constant at 1.5 octaves and 45° (half width and full height) respectively. The output of each neuron was divided by the sum of the absolute of the receptive field across space and time; this had the effect of evening out the expected 1/f spatiotemporal frequency spectrum. No normalisation or inhibition occurred between neurons.
Convolution of the signal took place in the Fourier domain and was inverse-transformed back into the spatial domain. The sum of the square of the real and imaginary components was taken to represent the motion energy at each point in space for each V1 neurons, a computation that is formally equivalent to the full rectified square of odd and even phase neurons to generate a phase invariant output (Adelson & Bergen, 1985). A global motion analysis was achieved by taking the sum of the energy across V1 channels. Each spatial frequency channel could then be represented as a 2D velocity–direction image as shown in figure 11, in which the intensity of each region represents the global sum of motion energy across V1 neurons whose velocity tuning is denoted by the regions position in the velocity-direction graph.
The spatiotemporal frequency spectrum shown in Figure 9 was generated from the mean Fourier transforms from 200 example stimuli. The profile was normalised for the expected 1/f spatiotemporal frequency spectrum found in natural images by dividing through by the wavelength of each point in spatiotemporal representation. The profile was then thresholded at 50% of the motion energy and the resulting 3D surface extracted through Matlab’s isosurface and patch functions. The example stimuli did not include the global aperture because we wanted to examine the profile of the stimuli in isolation from the surround. Note that the fast--transform function can only work on a rectangular matrix, and because the fast-Fourier -transform works over an infinite spatial (and temporal) extent, the stimuli (which is spatially circular) can not be considered in isolation from it’s surround. However because the stimuli approaches zero energy in the low SF component of the signal the influence of the lower SF found at the oblique orientations (due to the spatially square representation) has a relatively weak little impact. In contrast including a circular aperture generates significant energy in the low spatiotemporal frequencies, which is impossible to ignore due to the infinite spatial extent of the Forrier analysis.