|Home | About | Journals | Submit | Contact Us | Français|
Detection of motion is a crucial component of visual processing. To probe the computations underlying motion perception, we created a new class of non-Fourier motion stimuli, characterized by their third- and fourth-order spatiotemporal correlations. As with other non-Fourier stimuli, they lack second-order correlations, and therefore their motion cannot be detected by standard Fourier mechanisms. Additionally, these stimuli lack pairwise spatiotemporal correlation of edges or flicker—and thus, also cannot be detected by extraction of one of these features, followed by standard motion analysis. Nevertheless, many of these stimuli produced apparent motion in human observers. The pattern of responses—i.e., which specific spatiotemporal correlations led to a percept of motion—was highly consistent across subjects. For many of these stimuli, inverting the overall contrast of the stimulus reversed the direction of apparent motion. This “reverse-phi” phenomenon challenges existing models, including models that correlate low-level features and gradient models. Our findings indicate that current knowledge of the computations underlying motion processing is as yet incomplete, and that understanding how high-order spatiotemporal correlations lead to motion percepts will illuminate the computations underlying early motion processing.
Detection of movement is one of the most fundamental and important tasks performed by our visual system. In our everyday life, whenever we are moving, we need to keep track of our surroundings to coordinate ourselves with the environment. To predator and prey alike, motion detection is key to survival.
Motion analysis is generally considered to consist of two stages: an early stage in which local motion is extracted and a later stage at which local motion signals are combined into object motion or flows. Early motion processing is generally considered to be carried out by first-order (Fourier) and second-order (non-Fourier) mechanisms (Lu & Sperling, 2001; Reichardt, 1961). The former extracts motion when the spatiotemporal correlation of luminance signal is present. The latter extracts motion under other circumstances and is often modeled as local nonlinear preprocessing, such as flicker detection or extraction of unsigned contrast, followed by a spatiotemporal correlation of the resulting signals.
In parallel with the categorization of early motion processing mechanisms, motion stimuli are also categorized into first- and second-order. First-order (Fourier) stimuli are those that can be detected by first-order motion processing mechanism. Such motion stimuli must have pairwise spatiotemporal correlation of luminance. For example, drifting sinusoidal gratings are first-order motion stimuli. Motion stimuli that can be detected by second-order mechanisms but not first-order mechanisms are called second-order (non-Fourier) motion stimuli. A typical example of a non-Fourier motion stimulus is a static high spatial frequency random checkerboard whose contrast is modulated by a drifting low spatial frequency sinusoid. This stimulus has pairwise spatiotemporal correlation of contrast but not luminance. Thus it is readily detected by second-order mechanisms but cannot be detected by first-order mechanisms. Motion stimuli can also contain spatiotemporal correlations of a more complex derived signal, such as texture (Lu & Sperling, 2001). The possibility remains that yet other types of stimuli can elicit motion percepts. Here we describe such stimuli and the percepts they produce.
To create these stimuli, we use a “spatiotemporal glider” containing three or more voxels in an arbitrary spatiotemporal configuration. Within each glider, we use a parity rule to generate correlation. Special gliders produce well-known examples of first-order and second-order stimuli (as shown in the Methods section)—but generic gliders result in stimuli that have novel characteristics.
While many stimuli generated by the spatiotemporal glider method elicit a motion percept, most motion processing models fail to generate a correct motion signal, if they generate a motion signal at all. Moreover, even the models that do predict a motion signal fail to account for the perceived direction of motion, or for the instances in which an overall inversion of contrast results in a change in the apparent motion direction. Our findings thus indicate that current knowledge of the computations underlying motion processing is incomplete, and that an understanding of how the spatiotemporal correlations in these stimuli lead to motion percepts will provide a more complete understanding of the computations underlying early motion processing.
To create spatiotemporal movie stimuli with specific high-order spatiotemporal correlations, we generalize the “isodipole texture” method for creation of spatial stimuli with high-order correlations (Julesz, Gilbert, & Victor, 1978). The general isodipole texture algorithm (Victor & Conte, 1991) is as follows: a “glider”, consisting of several nearby checks, is chosen. Then, the texture is colored black and white, such that within any glider, the total number of black checks has a particular parity (even or odd).
For example, to construct the “even texture” described in Victor and Conte (1991), we use a 2 × 2 square as the glider. Assume the texture is an N × N array of checks at coordinates (x, y). In Step 1, checks in the first row (the checks (x, 1)) and the first column (the checks (1, y)) of the texture are randomly assigned to black or white. In Step 2, the glider is placed on the corner of the texture so that it covers 4 checks: (1, 1), (1, 2), (2, 1), and (2, 2). Since (1, 1), (1, 2), and (2, 1) are already colored in Step 1, the fourth check (2, 2) can be determined by counting the total number of black checks among the 3 known checks. If the number is even, the check (2, 2) is colored white; if the number is odd, it is colored black. This way, the total number of black checks within the glider is even. In Step 3, the glider is moved by one unit along the x-direction, and the method in Step 2 is used to determine the color of check (3, 2). The glider is now moved in successive one-check steps, until all checks (x, 2) are now colored. At this point, the glider is moved to the first column of the next row, and the process is repeated. The whole texture is made using this recursive method after initialization, and therefore we can be sure that within any glider, the total number of black checks is even. Note that the above construction can be carried out for gliders of other shapes, not just a 2 × 2 square.
To extend this idea to spatiotemporal stimuli, we generalize the construction from a two-dimensional spatial array of checks (x, y) to a three-dimensional spatiotemporal array of voxels (x, y, t). Correspondingly, the defining glider is a set of three or four nearby spatiotemporal voxels. The movie (see Supplementary data) is then colored with black and white voxels, with the requirement that within any glider, the total number of black voxels must have a particular parity (even or odd). Checks that cannot be determined by the glider rule, such as the initial frame’s checks or the boundary checks, are randomly assigned black or white. This process is presented in Figure 1, and formally described in Appendix A. Note that in Figure 1 (and later figures), the voxels of the glider are shown by coloring several corners of a wireframe cube. That is, the wireframe cube represents a 2 × 2 × 2 region, and each of its colored corners represents a voxel in the glider. The three colored corners are three voxels that form the glider, with different colors indicating differences in time.
We now describe the correlation structure of the spacetime stimuli that result from these constructions. We are interested in correlations of orders 2, 3, and 4. Second-order spatiotemporal correlations are important because they could support Fourier motion; we will see that they are absent. Fourth-order correlations are important because some of them can support standard non-Fourier motion; we will see that these particular correlations are also absent for most of the stimuli constructed by this method. Finally, third-order correlations are present in many of the stimuli, and (see the Discussion section) these provide the simplest means to extract a motion percept.
As mentioned above, our stimuli generalize the “isodipole texture” construction, by replacing two-dimensional spatial gliders with three-dimensional spatiotemporal ones. Second-order correlations are absent from isodipole textures (Julesz et al., 1978; Victor & Conte, 1991), and consequently, second-order spatiotemporal correlations are absent from our stimuli as well. (This can be seen, for example, from the work of Gilbert, 1980, whose proofs of the correlation properties of glider-based textures do not depend on dimension.)
Higher order correlations are present, however. These are determined by the shape of the glider. In particular, the parity rule itself is a statement about correlations among the voxels that form the glider.
Figure 2 shows how for some gliders, these fourth-order correlations are related to non-Fourier motion. In Figure 2A, we apply the even-parity rule to a four-element glider. This glider can be thought of as consisting of two pairs of voxels, each pair parallel to the x-axis, with a spacetime displacement between the pairs (grouped in Figure 2A by dashed lines). Since the parity rule requires that an even number of these voxels are black, the only possibilities are that both pairs contain one black voxel and one white voxel, or that all of the voxels match within each pair (one black pair and one white pair, or all voxels black, or all voxels white). In the first case, there is an edge orthogonal to the x-axis between each pair of voxels; in the second case, there is no edge at either pair of voxels. That is, an edge in one location and time requires a similarly oriented edge in another location and time. The result is an edge that propagates along a spacetime diagonal. In sum, a specific fourth-order correlation (defined by the glider) corresponds to propagation of an edge along a spacetime diagonal. This propagation of a feature yields standard non-Fourier motion. Note that had we used the odd-parity rule, the same pattern of correlations would be present, but some would be negative.
Analogously, a four-element glider that consists of two pairs of voxels arranged as in Figure 2B leads to a different kind of fourth-order correlation. Here, the voxels are arranged in two pairs, with each pair corresponding to voxels in the same location on adjacent frames. Therefore, the even-parity rule means that flicker in one location (a mismatch in one pair) implies flicker in the other location, at a later time (a mismatch in the other pair). So this fourth-order correlation corresponds to propagation of flicker along a spacetime diagonal, i.e., another example of standard non-Fourier motion.
Figures 2C and 2D show x–t slices of the movies corresponding to the gliders in Figures 2A and 2B, and demonstrate that these fourth-order correlations induce a visually obvious diagonal structure in spacetime.
Figure 2E analyzes a subset of the correlations induced by these gliders and shows how this diagonal structure arises. As the above analysis indicates, the glider of Figure 2A leads to a spacetime diagonal of correlated edges orthogonal to the x-axis, and the glider of Figure 2B leads to a spacetime diagonal of correlated flicker. In sum, we have seen that if a glider consists of two parallel pairs of adjacent checks, it will generate a standard non-Fourier stimulus.
Next we consider what happens if we use gliders that do not consist of two parallel pairs of adjacent checks—the gliders that generate the stimuli we study here. One way to do this is to use a glider with four elements, but to choose the elements so that they do not form two parallel pairs (Figure 3A). Another way to do this is to use a glider with only three elements (Figure 3B). Figures 3C and 3D show the x–t slices of the corresponding movies. Figure 3C has no evident visual structure. Figure 3D has a visually obvious diagonal structure, but this does not arise from pairwise correlations. This is illustrated by the absence of correlations in Figure 3E and proved in Appendix A.
While the examples in Figures 3C and 3D give a glimpse of the stimulus properties, they do not capture the entire picture, because they are necessarily limited to single-layer slices parallel to one coordinate plane. The reader is encouraged to view the Supplementary data and examine Appendix A to gain a more complete picture of their appearance and properties. For example, Figure 3C shows no structure at all. This is because the x–t plane cannot contain all voxels of the four-element glider shown in Figure 3A, and therefore none of the checks in the x–t slice are correlated, at any order. For this stimulus, statistical structure is only present when double-layer slices at specific orientations are considered. Visual inspection of Figure 3D can also be misleading. For this and other three-element gliders, a visually evident diagonal structure is present in the appropriate spacetime plane. However, this structure is not based on pairwise correlations and thus is not available to Fourier mechanisms. Moreover, the kinds of spatiotemporal correlations that are present do not correspond to “flicker motion” or “edge motion” and, thus, would not be available to standard non-Fourier mechanisms either. This is shown in Appendix A.
Even though many gliders, such as those of Figure 3, do not support edge motion or flicker motion, they do have a spacetime “slant”. This slant can be defined by the trajectory of the centroid of the voxels in each time frame of the glider. We call this slant the “centroid direction” and use it as the reference direction in the psychophysical experiments.
Figure 4 illustrates how to find the centroid direction of a glider. We separate a glider into the voxels in the plane at time t (filled green circles), and the voxels in the plane at time t + 1 (filled blue circles). The centroid direction is the vector from the centroid of the voxels at time t (open green circle), to the centroid of the voxels at time t +1 (open blue circle). Note that for many gliders, the centroid motion direction may be oblique (Figures 4B–4D). The strategy of finding the centroid direction can be extended to gliders that span multiple time slices, by choosing the direction to be the vector that is the best fit to the centroids in each time slice in the least-squares sense.
The glider shown in Figure 4E has a centroid direction parallel to the time axis (i.e., there is no spacetime slant). We use this glider as a negative control, since its symmetry properties eliminate the possibility of a net motion direction.
A Matlab (version 2008a) routine was used to generate and display the above stimuli and record subjects’ responses. (The Matlab source code for stimulus generation is provided in the Supplementary data.)
Stimuli consisted of 20 images, each presented for 100 ms (6 hardware frames on a 60-Hz monitor). Each image was a 64 × 64 black-and-white checkerboard, occupying a 25° × 25° region in the subject’s visual field. Ten different three-element gliders and 14 different four-element gliders (including the negative control and the glider that generates standard second-order motion) were used. Each glider was tested with 100 examples of each parity (even and odd), except for the “negative control” (Figure 4E), which was tested with even parity only. For each kind of glider, a random half of the epochs were presented with its direction of motion to the left; in the other half, the glider was flipped in space so that its centroid direction of motion was to the right.
Stimuli were displayed on a 17″, 60-Hz LCD monitor (Dell 1704FPTt) and synchronized to the monitor’s refresh. We also tested two subjects on a 60-Hz CRT (Dell M991) monitor and found no significant difference compared to the results obtained using the LCD monitor. That eliminated the possibility that our findings resulted from artifacts specific to LCD monitors, such as the motion blur effect (Har-Noy & Nguyen, 2008).
Five normal subjects participated in the experiment (1 male, 4 females). Visual acuities were normal or corrected to at least 20/30. Subjects free-viewed the stimuli binocularly at 50 cm in a darkened room and were asked to identify the horizontal direction of motion and ignore any vertical component (2-alternative forced choice, left or right), and to register their response by pressing left- or right-arrow key on a regular computer keyboard. Tests were self-paced and no feedback was given. Each subject was studied in eight 1-h sessions, each of which started with a short practice to familiarize the subject with the test. Within each session, approximately 600 trials were presented.
Data analysis was performed in Matlab (version 2008a). Binomial confidence intervals were used to determine significance of apparent motion responses (i.e., whether the fraction of trials with perceived motion in the centroid direction is significantly different from 0.5). To detect performance significantly above or below chance, we used two-tailed statistics.
As described in the Results section, in many cases, the apparent motion of a stimulus was in the centroid direction for one parity, and opposite to it for the other parity. That is, for one parity (e.g., even), the perceived motion was systematically biased toward the centroid direction (fraction value Feven > 0.5), while for the other parity (e.g., odd), percepts were systematically biased opposite to the centroid direction (Fodd < 0.5). To ask whether the in-centroid direction percept was stronger than the opposite-direction percept, we proceeded as follows.
In the above example, if the in-centroid percept generated by the even parity was stronger than the opposite-direction percept, we would have (Feven – 0.5) > (0.5 – Fodd). Similarly, if the odd-parity stimulus elicited motion in the centroid direction, and this was stronger than the percept of motion in the opposite direction elicited by the even stimuli, we would have (Fodd – 0.5) > (0.5 – Feven). Both of these are equivalent to Feven + Fodd – 1 > 0. So, the index of whether centroid-direction perceived motion for one parity was stronger than the opposite-direction percept for the other parity is, whether Feven + Fodd – 1 is greater than 0.
To analyze the statistical significance of this phenomenon, we proceeded as follows. First, we only considered responses to gliders for which the apparent direction of motion depended on parity. For these gliders, we combined the responses to form an index S, based on Feven + Fodd − 1:
A positive index means that the parity that produced apparent motion in the centroid direction led to a stronger percept (as measured by the positive deviation of fraction value F from 0.5) than the parity that produced the opposite motion (as measured by the negative deviation of F from 0.5). An index of 0 means that the percepts were equally strong, and a negative index means that the percept of centroid motion was weaker.
To determine whether this index deviated significantly from 0, we used a surrogate data method. Ten thousand surrogate data sets were built from each subject’s responses. In surrogate sets, data pairs (Feven and Fodd for one glider) were inverted with respect to chance performance (0.5). That is, the change was made as follows: Fsurrogate-even = 1 – Fodd, Fsurrogate-odd = 1 – Feven.
To create each surrogate data set, this change was applied to a random selection of gliders. We then calculated 10,000 surrogate index values (Ssurrogate) from these data sets, via Equation 1. The fraction of surrogate index values Ssurrogate higher than the index constructed from the original data was used to estimate the probability that the observed value of the index S could be due to chance.
Most of the 24 three- and four-element glider stimuli elicited consistent percepts of apparent motion in all five subjects. Results for stimuli generated with three-element gliders and four-element gliders are shown in Figures Figures55 and and6,6, respectively.
Results are represented by the fraction in the centroid direction, which is calculated as the fraction of trials that the movement direction judged by subjects agree with centroid direction of that stimulus. Each column shows the result as 5 pairs of points, one pair for each subject. The left point corresponds to the even parity and the right point corresponds to the odd parity, produced by the glider underneath the column.
The results are highly consistent across all subjects. Most stimuli (at least 16/23) were perceived as having a definite direction of apparent motion, where significance is defined as a binomial confidence interval that does not include the level of random performance (0.4–0.6, corresponding to p < 0.05, two-tailed). Of the 10 three-element gliders, 9 produced a consistent apparent motion percept. Of the 13 four-element gliders (excluding the negative control), 7 showed consistent apparent motion. Note that we are using the direction of centroid motion simply as a reference, and thus, these counts include all of the stimuli that elicited motion in the centroid direction or opposite to it, as long as it was consistent.
Four-element gliders can be subdivided according to whether they have three elements at one time and one at an adjacent time (Figure 6A), or two elements at one time, two at an adjacent time (Figure 6B). Of the gliders in Figure 6A, 5 out of 8 showed consistent apparent motion.
The gliders containing two elements at one time and two elements at an adjacent time (Figure 6B) include two special gliders: the “negative control” (the last column of Figure 6B) and a glider that generates a standard second-order motion stimulus, consisting of steadily moving edges that changes their contrast polarity randomly (column 5 of Figure 6B).
We call the glider in the last column of Figure 6B the “negative control” because its centroid direction has no spatial component. That is, the fourth-order statistics have no directionality, so the direction of apparent motion should be perceived as ambiguous. Only the even parity condition was used. Results show that it did not produce a significant motion precept: as expected, performance was at chance level.
The glider in column 5 of Figure 6B generates standard second-order motion, because every edge in one position at time t is followed by an edge in the adjacent position at time t + 1. This stimulus generated the strongest motion percept among all four-element gliders with two elements at each time, although some of the stimuli with three elements at one time and one element at another time (Figure 6A) and many of the three-element gliders (Figure 5) generated motion percepts that were similar in strength.
As mentioned above, we considered a stimulus to elicit apparent motion if the percepts were consistent across subjects, whether or not it was in the centroid direction. Interestingly, although the centroid direction, by definition, did not depend on the parity of the glider (even or odd), the perceived motion direction often did. This occurred for most (14/16) of the stimuli that had a motion percept. Only one stimulus (column 1 of Figure 5) elicited motion in the same direction for both parities.
This phenomenon can be considered a kind of “reverse-phi” illusion because, like standard reverse phi, the reversal of the apparent direction of motion is induced by inversion of contrast of a portion of the stimulus (Anstis, 1970; Anstis & Mather, 1985; Anstis & Rogers, 1975).
The observation that parity reversal led to a reversal in motion direction is particularly interesting for the three-element glider stimuli. For these stimuli, reversal of parity is equivalent to inversion of contrast polarity of the entire movie. The reason for this is simple: the parity of the rule indicates whether the number of black voxels in a glider is even or odd. For three-element gliders, an even number of black voxels implies an odd number of white voxels, while an odd number of black voxels implies an even number of white voxels. So if one inverts contrast (i.e., changes all voxels from black to white), one changes an even-parity rule into an odd-parity rule. That is, contrast inversion is equivalent to changing parity. So our finding (for three-element gliders) that the judgment of apparent motion direction depends on parity means that the judgment of apparent motion inverts with contrast. This, in turn, implies a fundamental asymmetry in how motion mechanisms treat light and dark.
For four-element gliders, changing parity is not equivalent to inverting contrast. For 6/7 of these stimuli, even parity generally led to a centroid motion percept, while odd parity generally led to a reverse-phi percept (i.e., a fraction correct that was significantly lower than 0.5).
While we use the “direction of centroid motion” primarily as a reference, there is some evidence that it is related to the underlying motion computations. Specifically, for stimuli that elicit a motion percept in the centroid direction at one parity and opposite to it for the other parity, the strength of the percept in the centroid direction was generally stronger. That is, the amount of that fraction that was above 0.5 (in the centroid direction) was larger than the amount that was below 0.5 (opposite to the centroid direction). This difference was statistically significant for three-element gliders (p < 0.001) and four-element gliders (p < 0.01), via the surrogate data method described in the Methods section.
In summary, many stimuli constructed by parity rules on spatiotemporal gliders lead to a percept of motion. Like previous non-Fourier stimuli, there are no spatiotemporal luminance correlations. However, in addition, there are no pairwise spatiotemporal correlations of low-level features, i.e., flicker or edges. Nevertheless, many of these stimuli produce consistent apparent motion. Moreover, for many of these stimuli, inverting the overall contrast of the stimulus reverses the direction of apparent motion.
We now consider the implications of these observations for computational models of motion.
We begin with the standard (or Reichardt) model (Reichardt, 1961), in which motion is extracted from spatiotemporal correlations of luminance. The stimuli used in our experiment, however, have no pairwise spatiotemporal correlations at all (see Methods section). So for the standard model, there will be no motion signal at all. The motion energy model is mathematically equivalent to the Reichardt model (Adelson & Bergen, 1985). Therefore, it cannot detect any motion signal as well.
The standard non-Fourier motion detection model consists of a preprocessing stage in which a feature such as spatial contrast (an edge) or temporal contrast (flicker) is extracted, followed by standard motion analysis (Lu & Sperling, 2001). None of the stimuli presented here have second-order correlations in the spatiotemporal locations of flicker or iso-oriented edges. Thus, standard motion analysis of these features cannot account for a directional motion signal. This is illustrated by example in Figure 3 and shown in general in Appendix A.
Note that this result—the absence of a motion signal in correlations of single pairs of checks, edges, or flicker—generalizes to any mechanism that calculates a quadratic (purely multiplicative) correlation between sums of checks, sums of edge tokens, or sums of flicker tokens. This is because in the pairwise product, the contribution of each pair of checks simply add—and each pairs contribution is zero.
However, interactions of edges with flicker could produce a motion signal that accounts, partially, for what we observe. There are several variations on this idea. We describe these variants—which are presented as “existence proofs” of mechanisms that can generate motion signals from these stimuli—and then mention why these mechanisms can provide only a partial account of our findings.
In the gliders of Figure 6A, three elements occur at one time and one at an adjacent time. Most of these four-element gliders that have two spatially adjacent voxels (their border can form an edge) and the other two voxels are at another single location but adjacent time frames (which can produce flicker). As shown in Figure 7A, the parity of the number of black checks within the glider determines whether the presence of an edge at one pair of voxels implies the presence (or absence) of flicker at the other pair. For example, assume that the stimulus is generated using an “even” parity. That is, the total number of black checks at the glider locations is always an even number: 0, 2, or 4. This in turn means that either there is an edge at one check pair and a flicker at the other (1 black check each), or, no edge and no flicker (0 or 2 black checks each). The result is that an edge in one location is correlated with flicker in another location. Therefore, the motion signal can be extracted by correlating edges with flicker (Figure 7B).
This basic idea could also account for motion for some of the gliders in Figure 6B, in which two glider elements are adjacent in one time frame, and the other two glider elements are adjacent in the next time frame. Both pairs can form an edge, but the edges are orthogonal. If a motion mechanism can correlate the presence of orthogonal edges across time, a motion signal could be extracted. However, this kind of mechanism cannot account for a directional motion signal for the three-element gliders of Figure 5 or the four-element gliders of Figure 6A that contain three voxels in one frame, and one in the next.
For the three-element gliders, an interaction of an edge in one location with the luminance in another location and a separate time could lead to a motion signal. This is because the parity rule means that knowing whether two adjacent checks form an edge determines the luminance polarity of the third check (Figure 7C). Thus, a multiplicative interaction between an edge detector at one location and a luminance-sensitive element at a second location can identify the presence of a third-order spatiotemporal glider (Figure 7D).
We can recast the above examples in a way that suggests other kinds of computations that can extract a motion signal via a simple nonlinearity. The key ingredients in this construction are (1) summing luminances within the glider, (2) applying a nonlinearity f that is not merely quadratic, and (3) combining opponent mechanisms. As an example, we start with the same three-element glider in Figure 7C, and consider the nonlinearity f(z) = z3. We denote the luminances of the three checks by c1, c2, and c3, where black is represented by +1 and white by −1. Note that f(z) = (c1 + c2 + c3)3 contains a term c1c2c3. This term effectively calculates the parity in the glider, since it is negative if there is an even number of black checks, and positive if there is an odd number. So, when the glider is placed on an even stimulus, this term is always negative. Note that this only happens because the three checks are constrained by the glider; if z summed any other set of three checks, the term can be either positive or negative with equal probability. (Since there are no pairwise correlations, the other terms in the expansion of z3 do not contribute to its average over the stimulus.) Thus, we can construct an opponent mechanism by comparing the average value of nonlinearity z3 when applied to triplets of checks within the glider, to its average value when applied to triplets of checks within another configuration—i.e., the glider facing the opposite direction. For further details on this calculation, see Appendix B.
Table 1 generalizes this calculation to several nonlinearities. As can be seen, a cubic nonlinearity and half-squaring yield a motion signal for the three-element gliders. A fourth-order nonlinearity and full- or half-wave rectification yield a motion signal for the four-element gliders. The final example in the table, a nonlinearity with a more complex form, yields a motion signal for both three- and four-element gliders. (The example in the table is the front-end nonlinearity inferred by Taub, Victor, & Conte, 1997 that accounts for motion sensitivity for certain kinds of standard non-Fourier gratings.)
However, all of the above possibilities are at most a partial explanation for our findings. These mechanisms will always yield motion whose direction has a fixed relationship to the centroid motion of each glider. However, this is not the case for the perceived motion: for some gliders it is in the centroid direction, and for some it is not. This cannot be a matter of the sign convention (e.g., black = +1, white = −1), since if the sign convention is wrong, the model should predict the opposite of the perceived motion for all gliders, and not just for some. Secondly, this kind of computation predicts that reversing the parity of the glider rule always reverses the sign of the correlation (no matter what is chosen for the sign convention). Thus, these models cannot account for why some gliders did not result in reverse phi. Finally, all of these models only generate a strong motion signal when the summing area has the same spatiotemporal shape of the glider, which is not very physiologic.
Finally, we consider models with a very different computational structure—the spatiotemporal gradient model (Johnston, McOwan, & Benton, 1999; Johnston, McOwan, & Buxton, 1992). It is based on the assumption that the luminance of image points is conserved during motion. This leads to the following constraint equation:
where u = dx/dt and v = dy/dt.
Because of the divisive interaction in this model, it is sensitive not only to first- and second-order correlations, but to higher order ones as well. Therefore (depending on the details of the gradient calculation), it could detect the high-order spatiotemporal correlations in our stimuli, and thus generate a motion signal. However, like the elaborations on the Reichardt detector, this can only be a partial account—because a polarity change does not influence the gradient. Thus, although a gradient model may extract a motion signal from the stimuli used here, it cannot account for the reverse phi that is often seen when luminance polarity is reversed.
To probe the underlying computations of early motion processing, we created a set of stimuli that only contain spatiotemporal correlations of order 3 or more. In psychophysical experiments, most of these stimuli generated consistent motion percepts. In addition, for many of the stimuli that were perceived as motion, the direction of motion can change by changing the parity rule of the gliders that generate the stimuli, without changing the spatiotemporal configuration of the glider themselves. Moreover, for a subset of the stimuli (three-element glider stimuli), this means that the perceived motion direction can be reversed by reversing the contrast polarity.
Although these stimuli are not likely to occur in nature, our motion-detecting mechanisms, which presumably are shaped by the characteristics of natural motion, nevertheless detect their spatiotemporal correlations. Simple augmentations of currently proposed models can account for some aspects of the percepts, but not for others—thus suggesting that a full account of motion percepts driven by high-order spatiotemporal correlations will lead to a more complete understanding of the computations underlying early motion processing.
We thank Mary Conte for comments on the manuscript. This work was supported in part by NIH EY7977 and NIH EY9314.
In this appendix, we prove some facts about the stimuli constructed from spatiotemporal gliders. Our main result is that, given certain technical conditions on the glider (that its generating function is a prime polynomial; see definitions below), there are no spatiotemporal correlations between pairs of horizontal luminance edges, between pairs of vertical luminance edges, or between pairs of temporal edges (flicker).
To obtain these results, we use two ingredients. The first ingredient is the work of Gilbert (1980), who studied properties of binary images in the plane—essentially, the 2-D analog of the movies considered here. We utilize specific results he obtained in 2-D that generalize immediately to 3-D, and we also use his general approach, namely, generating functions. Generating functions replace gliders by polynomials and allow us to recast questions about superpositions of gliders into algebraic ones. The second ingredient we use is an important algebraic property of polynomials, namely, that they form a “unique factorization domain” (Lang, 1993, Chap. II.5). This will allow us to reduce questions about quadruplets of voxels to questions about pairs.
Spatiotemporal binary movies, formally, are an assignment of a binary value to each voxel, V(ξ, η, τ)—where ξ is the x-coordinate of the voxel (as an integer), η is the y-coordinate of the voxel, τ is its time slice, and V is 0 or 1, according to the luminance. The spatiotemporal movies we consider here are defined by a “glider rule”: whenever a set of voxels form a translation of the glider shape, then the parity of the sum of their contents is constrained to be a constant b. We choose either b = 0 for the “even” parity movies, or b = 1 for the “odd” parity movies.
A glider is defined by a set of integer coordinates (ξi, ηi, τi; i = 1, …, N). Each such triplet designates the position of one voxel within the glider.
A glider rule is a constraint on the number of black voxels (V = 1) in a set of voxels that are related by a glider. That is, a glider rule is formalized by
The left-hand side counts the number of black checks within a placement of the glider. This sum is interpreted mod 2, so that the right-hand side is 0 when an even number of the terms is 1, and 1 when an odd number of the terms is 1.
Since the glider rule (Equation A1) applies at every location (ξ, η, τ), it induces constraints among many sets of voxels, not just those within the original single glider. Therefore, to understand the correlation structure of a spatiotemporal movie, we need to consider the implications of repeated instances of the glider rule (Equation A1). Specifically, we determine the effect of applying the glider rule R times; each application of the glider is done at a different starting point (ξ’j, η’j, τ’j), j = 1, …, R. These iterated applications lead to
where again this sum is interpreted mod 2. The coordinates of the starting points (ξ’j, η’j, τ’j) can be positive or negative. (For R = 1 and a single starting point at (0, 0, 0), the above equation reduces to Equation A1.)
Equation A2 demonstrates the basic problem we have to solve. As both sums range over their indices i and j, some combinations (ξi + ξ’j, ηi + η’j, τi + τ’j) appear multiple times. If they occur an even number of times, they cancel—since the sum is interpreted mod 2. Thus, repeated application of a glider at the R locations (ξ’j, η’j, τ’j) can result in a relationship (Equation 3) in which there is cancellation among many of the RN terms in the summation. This will in turn result in a parity constraint among far fewer than RN voxels. We need to characterize these possible cancellations to understand whether repeated application of the glider rule can ever result in correlations involving only a small number of voxels.
To determine the effects of multiple applications of the glider rule, we (following Gilbert, 1980) introduce a generating-function approach. We define a generating function for a glider as follows. For a glider G with N elements occupying the voxels at integer coordinates (ξi, ηi, τi; i = 1, …, N), its generating function is defined by
Generating functions thus establish a correspondence between sets of voxels and polynomials.
We can always choose our coordinates so that one voxel of the glider is at (0, 0, 0). With this convention, a glider that is a triangle in the (x, t) plane (Figure 4A) has a generating function Gtriangle = 1 + x + xt. A glider that forms a pyramid (Figure 4C) has a generating function Gpyramid = 1 + xy + y + xt. A glider that has two adjacent voxels at t = 0 along the y-axis that translates along the x-axis at t = 1 has a generating function Gsnf = 1 + y + xt + xyt, so-called because it generates a “standard” non-Fourier movie (column 5 of Figure 6B).
Similarly, if we have a scheme S for repeated application of the glider rule specified by (ξ’j, η’j, τ’j), we can define its generating function by
Since generating functions are polynomials, they can be multiplied. For example,
Note that the combinations that occur in the exponents on the right side of Equation A5 are exactly the combinations that occur as additive offsets in Equation A2. This means that the generating function GS corresponds to the voxels that are constrained by the iterated glider rule (Equation A2). This is formalized by Gilbert’s Theorem 2: for configurations T, the total parity of their contents is constrained by the glider G if and only if the generating-function relationship
holds, for some scheme S. Gilbert also showed that if there is no such constraint, then both parities are equally likely. He proved these results for 2-dimensional colorings, but they generalize immediately to colorings of any dimension.
Our goal is to determine conditions on G for which there are no pairwise spatiotemporal correlations between pairs of horizontal luminance edges, pairs of vertical luminance edges, or pairs of temporal edges (flicker). Other than the labels associated with the coordinates, these three cases are identical—so we focus on the case of edges formed by pairs of adjacent voxels alone the x-axis. We will show that the crucial condition is that the generating function of G is prime, in the sense defined below.
To do this, we consider all “double-domino” configurations T. A double-domino configuration is a configuration of four voxels, arranged in two parallel pairs of adjacent voxels. We need to show that the total parity of T is unconstrained by the glider. This will imply that the total within one domino of T is independent of the total parity in the other domino of T. Since the total parity within one domino indicates the presence of an edge (1) or the absence of an edge (0), this will show that the co-occurrences of edges are uncorrelated. As mentioned above, we assume that the dominos are parallel to the x-axis.
To determine whether the glider constrains the parity of T, we position the first domino of T at coordinates (0, 0, 0) and (1, 0, 0), and the second domino at coordinates (ξ, η, τ) and (ξ + 1, η, τ). The generating function of this four-voxel set is
T thus can be factored into two polynomials. The first factor, 1 + x, is the generating function for a domino along the x-axis. The second factor, T’ =(1 + xξyηtτ), is the generating function of a 2-voxel configuration that expresses the relative position of the two dominos of T.
According to Gilbert’s Theorem 2 (see Equation A6), the total parity of these gliders is constrained only if GS = T, i.e., only if
for some configuration S.
We next use the fact that the polynomials with integer coefficients (mod 2) form a unique factorization domain (Lang, 1993): every polynomial has a unique factorization into “prime” polynomials. Here, a “prime” is defined in the usual way: a polynomial that has no factors other than 1 and itself. So for example the polynomial (1 + x) is a prime, but (1 + x3) is not, since (1 + x3) = (1 − x + x2)(1 + x).
Note that for the gliders used to make the novel stimuli, the generating polynomials are prime: for example, Gtriangle = 1 + x + xt and Gpyramid = 1 + xy + y + xt. However, the generator for standard non-Fourier motion, Gsnf = 1 + y + xt + xyt = (1 + y) (1 + xt), is not a prime. This factorization is the algebraic correlate of the fact that in the standard non-Fourier stimulus, an edge—represented by the term 1 + y—is correlated across space and time, represented by the term 1 + xt. In other words, if a glider’s generating function can be factored, then it corresponds to a displaced pair of parallel edges. However, if it is prime—as it is for the stimuli we focus on—then it cannot be decomposed into a displaced pair of edges.
We now make use of the assumption that G is a prime to show that there is no iterated application of G that can constrain the four voxels in a double-domino configuration T. Because of the unique factorization property, the left and right sides of Equation A8 must have the same prime factors. Since (1 + x) is a prime, it cannot have G as a factor. Thus, the only way that the left and right sides of Equation A8 can have the same prime factors is that T’ = (1 + xξyηzτ) must be composite and contain G as a factor. That is, if Equation A8 holds, then so does
for some configuration Q. In sum, we reduced a question about correlations within the 4-voxel set T, to a question about correlations within a two-pixel set T’. We now show that Equation A9 is impossible, and, consequently, that Equation A8 is impossible too—thus implying that the total parity of the voxels of T must be independent.
If Equation A9 holds, then it would imply that the voxels in the configuration with generating function T’ = (1 + xξyηzτ) are correlated. This configuration contains one voxel at the origin, and one at the location (ξ, η, τ). The independence of these two voxels follows from Gilbert’s construction of an “initial set” (Gilbert, 1980, Figure 1), in which all voxels are independent. Alternatively, one can see that these two voxels must be independent for specific gliders such as Gtriangle = 1 + x + xt by observing that any product GQ must contain at least three distinct terms that do not cancel: one with the highest exponent of x, one with the highest exponent of t, and one with the lowest total exponent.
Note that the above argument generalizes to any four-voxel configuration T that forms a parallelogram—by replacing the factor (1 + x) in Equation A8 with a factor (1 + xαyβzγ). Thus, even if local features are defined in terms of pairwise correlations across non-adjacent voxels (i.e., by (1 + xαyβzγ)), such local features are themselves pairwise uncorrelated in spacetime.
Finally, we mention that the “initial set” construction of Gilbert (1980) provides a simple demonstration of another property of these stimuli: any number of voxels along any single spacetime ray is uncorrelated.
In this appendix, we show how a motion signal can be extracted by the opponent mechanism that compares the average value of nonlinearity applied to the glider, to the average value of the same nonlinearity applied to the glider facing the opposite direction.
As an example, we apply the nonlinearity f(z) = z3, to a stimulus generated by a three-element glider G and the even-parity rule. We denote the luminances of the three checks within a glider by c1, c2, and c3, where black is represented by +1 and white by −1. Therefore, the coloring of the glider placed at any position and time in the stimulus can be represented by a triplet (c1, c2, c3).
Since this is an opponent mechanism, and it involves the average of all placements of the glider, we first need to list all the possible colorings of the glider G, and of the glider facing the opposite direction, denoted G’. Since the stimulus is constructed with glider G and the even-parity rule, the number of black voxels in the G can only be 0 or 2. So the colorings of G have only 4 possibilities: (+1, +1, −1), (+1, −1, +1), (−1, +1, +1), and (−1, −1, −1). In contrast, the coloring of the G’ does not have such parity constraint, so the colorings can be (+1, +1, +1), (+1, +1, −1), (+1, −1, +1), (+1, −1, −1), (−1, +1, +1), (−1, +1, −1), (−1, −1, +1), and (−1, −1, −1), a total of 8 possibilities.
(c1, c2, c3)
z = c1 + c2 + c3
f(z) = z3
|G||(+1, +1, −1)||1||1||−6|
|(+1, −1, +1)||1||1|
|(−1, +1, +1)||1||1|
|(−1, −1, −1)||−3||−27|
|G’||(+1, +1, +1)||3||27||0|
|(+1, +1, −1)||1||1|
|(+1, −1, +1)||1||1|
|(+1, −1, −1)||−1||−1|
|(−1, +1, +1)||1||1|
|(−1, +1, −1)||−1||−1|
|(−1, −1, +1)||−1||−1|
|(−1, −1, −1)||−3||−27|
Next, the luminances within each possible coloring of the glider are summed, and the nonlinearity f applied. Then, we compare the average of this signal for G and G’. In each case, the allowed colorings are all equally likely (see Appendix A and Gilbert, 1980), so they contribute equally to the average. The process is detailed in Table B1.
The results of the table show that the glider G generates a signal of −6, and its mirror G’ generates a signal of 0. That is, for these particular nonlinearity and stimulus, this opponent mechanism results in a negative motion signal m (G) = G − G’ of (−6) − 0 = −6.
Note that the above calculation process can be used on different nonlinearities and stimuli generated with different gliders and parity rules.
Because we want to compare the motion signals generated by different nonlinearities in a manner that focuses on their shape rather than their absolute amplitude, we normalize the motion signal m(G). That is, we divide the motion signal m(G) by the root-mean-squared value that the nonlinearity would produce when placed on a random binary movie. The results in Table 1 are generated by this method.
Commercial relationships: none.
Qin Hu, Tri-Institutional Training Program in Computational Biology and Medicine, New York, NY, USA, & Department of Neurology and Neuroscience, Weill Cornell Medical College, New York, NY, USA.
Jonathan D. Victor, Department of Neurology and Neuroscience, Weill Cornell Medical College, New York, NY, USA.