|Home | About | Journals | Submit | Contact Us | Français|
Using drifting compound grating stimuli matched in energy and frequency spectrum, we previously showed that neurons in the primary visual cortex (V1) were tuned to line-like, edge-like, and intermediate one-dimensional features. Because these compound grating stimuli were drifting, allowing for potential interaction between shape and motion, we examine here the dependence of V1 feature tuning on drift speed. We find that the feature selectivity and specificity of individual V1 neurons strongly depend on speed. A simple model explains these observations in terms of an interaction between linear filtering by the receptive field and the static nonlinearity of spike threshold, embedded in a recurrent network. Although the speed-dependent behaviors in single V1 neurons preclude their acting as extractors of one-dimensional features, the population as a whole retains a representation of a full suite of features.
Lines and edges are salient features and their detection and discrimination is implicated in processes fundamental to object vision, including image segmentation, contour continuation (Field et al. 1993, 2000), and completion (Kovacs and Julesz 1993). Various extrastriate visual cortical areas were previously physiologically identified as candidates to process assignment of boundary ownership, contour integration, figure–ground segregation (for a thorough review, see, e.g., von der Heydt 2003), all of which depend on low-level local feature extraction and manipulation. The local image processing that takes place in the primary visual cortex (V1) appears to receive global context by top-down modulatory feedback from extra-striate areas that extract texture boundaries (Zipser et al. 1996) or collinear contours (Kourtzi et al. 2003; Polat et al. 1998). However, bottom-up feature processing might already begin in earnest at the earliest cortical stage of visual processing: we provided evidence in our first study of this subject (Mechler et al. 2002) that typical neurons of the primary visual cortex in the anesthetized primate already exhibit “feature tuning” to optimally oriented one-dimensional spatial profiles, including lines and edges. Although in that study we used drifting stimuli, we did not examine how those results might have depended on the drift velocity of the stimulus.
The possible dependence of feature tuning on velocity is important from several points of view. First, V1 neurons can be considered to signal the presence of these features only if they do so in a velocity-independent way. Second, psychophysical studies show various degrees of degradation of visual performance with increasing speed (Burr et al. 1986; Morgan and Castet 1995). Finally, an increasing number of neurophysiological studies suggest, contrary to previous assertions (Ungerleider and Haxby 1994) of parallel processing of shape and motion, that these two streams of scene analysis are not independent at various stages of extrastriate visual processing (Desimone and Schein 1987; Tolias et al. 2005). Our study fits in this context by seeking to elucidate the velocity dependence of how single V1 neurons and their ensembles represent the stimulus attributes that determine one-dimensional spatial features.
The view that single neurons function as feature detectors, which would imply speed invariance among other characteristics, enjoyed early but not uncontroversial popularity (Barlow 1972; Lettvin et al. 1959) and, when applied to the primary visual cortex, initially appeared to gain support from influential early experiments (Hubel and Wiesel 1962) on simple cells. However, decades of work consistently failed to turn up direct experimental evidence for the single-neuron-as-detector view in any cortical area examined. The evidence accumulated in V1, reviewed most recently by Carandini et al. (2005), instead favors the current consensus, according to which V1 neurons represent banks of variously tuned nonlinear filters that adapt to local contrast energy. The “adaptive filter” view is validated by results obtained mostly with stimuli confined to a narrow frequency band such as gratings and Gabor patches. However, salient features such as lines and edges are defined by phase coherences across a range of spatial frequencies (Morrone and Burr 1988). In fact, natural stimuli (which are natural because of, among other factors, their highly nonrandom local phase spectra) highlighted the weaknesses of the current adaptive filter model by pointing to the need for the incorporation of a pattern-selective modulatory influence (Felsen et al. 2005). In the absence of a vetted nonlinear model of sufficient accuracy (Rust and Movshon 2005; Wu et al. 2006), the sensitivity of cortical neurons to features defined by phase cannot be predicted from their sensitivity to sinusoidal gratings or Gabor patches, but rather, must be determined experimentally.
To this end, we use a family of compound gratings (whose spatial frequencies span a sevenfold range), parameterized by phase congruence (Morrone and Burr 1988). The stimuli are matched in spectrum and energy, to eliminate any confounding effects of spatiotemporal filtering on feature tuning. Using this stimulus set, we showed earlier (Mechler et al. 2002) that typical V1 neurons have nonlinearities that allow them to exhibit “feature tuning” to optimally oriented line-like, edge-like, and intermediate one-dimensional spatial profiles. Here we find that speed strongly influenced specificity and depth of feature tuning of individual neurons. These speed-induced changes in feature tuning were comparable in simple cells and complex cells. We also find that, although the feature tuning of individual V1 neurons is strongly speed dependent, the population as a whole retained a full suite of feature analyzers.
Finally, we analyze a simple model to see how well feature tuning is explained, in qualitative terms, by the known basic properties of V1. We consider a recurrent network model for V1 that was proposed to account for the range of behaviors across the simple– complex gamut observed in response to single gratings (Chance et al. 1999). In the model, feature selectivity essentially arises from the interaction between the phase-sensitive linear kernel and the static nonlinearity of the spike threshold. This “iceberg” effect can be either diluted by a phase-insensitive recurrent pooling or compounded by phase-biased recurrent pooling or inhomogeneity in the network. We show that this model accounts for several aspects of responses to compound gratings that we observed experimentally: at each speed, there is a full representation within the V1 population of the entire space of one-dimensional features; there is a comparable degree of feature tuning at different speeds and in simple and complex cells; moreover, this tuning has a comparable degree of speed dependence.
Our results are qualitatively consistent with the consensus view that V1 neurons are adapting nonlinear filters. Specifically, our experimental observations constitute direct evidence against the possibility that individual orientation-selective V1 simple cells function as detectors of oriented lines or edges. Rather, it appears V1 neurons provide an ensemble with selectivity and coding properties that depend dynamically on the stimulus.
Standard acute preparation techniques were used for electrophysiological recordings from single units in the primary visual cortex (V1) of the primate (cynomolgus monkeys, Macaca fascicularis) previously described in detail (Mechler et al. 1998, 2002). All procedures were in accordance with institutional and National Institutes of Health guidelines for the care and experimental use of animals.
In brief, extracellular recordings were made with tetrodes (quartz-coated platinum–tungsten fibers; Thomas Recording, Giessen, Germany) placed in the occipital cortex (near Horsley–Clark 14 mm posterior, 14 mm lateral) of 14 adult animals under general opiate (sufentanil) anesthesia and muscle paralysis. The analogue signal from each tetrode channel was amplified, filtered (0.6 – 6 kHz), and digitized (25 kHz). Multiple single units were isolated by cluster analysis of spike waveforms initially performed on-line (Autocut, DataWave Technologies) then off-line (custom software; Reich 2001). Isolation criteria included stability of principal components of spike waveforms and a 1.2-ms minimum interspike interval consistent with a physiologic refractory period. Spike times were identified to 0.1-ms precision. Recording tracks and the laminar position of recording sites were anatomically reconstructed using standard histological techniques (Mechler et al. 2002).
The pupils were dilated with topical atropine and covered with gas-permeable contact lenses (Metro Optics, Houston, TX). Artificial pupils (2 mm) and corrective lenses were used to focus the stimulus on the retina. Optical correction was optimized by the aid of responses of isolated single units to high spatial frequency visual stimuli.
Foveae and the receptive fields of isolated neurons were mapped on a tangent board. Visual stimuli were generated by a special-purpose stimulus generator (Milkman et al. 1978, 1980) under the control of a PDP-11/93 computer and displayed on a Tektronix 608 monochrome oscilloscope (green phosphor, 150 cd/m2 mean luminance, 270.32 Hz frame refresh). Luminance of the display was linearized with lookup tables in the range 0 to 300 cd/m2. At the 114-cm viewing distance of the animal, the stimuli appeared in a 4° circular aperture on dark background.
The receptive fields of isolated single units fell between 3 and 6° eccentricity and were always fully covered by the stimulus patch. The receptive fields were characterized in a standard way using drifting sine gratings: tuning was measured first for orientation, then for spatial frequency, and finally for temporal frequency, each parameter optimized for subsequent tuning measurements. The contrast response function was measured using the optimal sine grating. With tetrodes, simultaneous isolation of two to eight (on average, three) single units per site was routine. To keep experimental time within practical limits, receptive field characterization (i.e., finding the optimal grating) was limited to the most responsive one or two units.
In each trial of the main experiment, taken at a fixed stimulus drift velocity, each of eight compound gratings, each of the four component gratings, and one blank stimulus was presented for 4 s in a randomly interleaved sequence. Trials were rerandomized and repeated (typically 12 to 25 times) until a target signal-to-noise ratio was obtained for at least one isolated unit. The experiment was then repeated with fourfold increase in the drift speed (by changing the temporal but not the spatial fundamental frequency).
Compound gratings were of near-optimal orientation and drifting in the optimal direction for the V1 neurons. As in our previous study (Mechler et al. 2002), each of our compound-grating stimuli was a superposition of the first four odd harmonics of a common fundamental, each with a contrast inversely proportional to the harmonic number. Here, a brief formal description of the stimuli follows.
Let ν denote the spatial frequency; f, the temporal frequency; and C, the Michelson contrast of the fundamental component. Thus formally, the spatiotemporal light intensity variation around its mean for the mth component grating is given by
and, for a compound grating, summing the above components, it is given by
The parameter is the phase of each component grating at the origin.
Across a stimulus set, with the spatial and temporal frequencies and the contrasts of the four components fixed, the phase was varied systematically to specify the shape of the compound waveform. With the spatial origin (x = 0) centered on the display, all component gratings share the same phase at the center of the display at time t = 0. If = 0, each component peaks at x = 0. Because they reinforce each other, they produce a line-like shape. If = π/2, the components’ sharpest rising parts coincide at x = 0 and, reinforcing each other, produce an edge-like shape—as expected because they constitute the truncated Fourier approximation of a square wave. Following Morrone and Burr (1988), we therefore designate the “congruence phase” of the compound grating.
The feature space, defined by the congruence phase, is periodic in π. Because compound gratings are sums of only odd harmonics, two stimuli whose congruence phases differ by π have identical spatial waveforms save for a half-cycle shift, which makes them equivalent as periodic stimuli. As shown in Fig. 1, we sampled the congruence phase in eight equal steps on the [0, π) phase interval to construct eight different rigidly drifting compound waveforms.
The compound gratings thus constructed constitute a set of equal-energy stimuli because the amplitudes of the components were the same for each stimulus. The root-mean-square contrast was 0.38 for each compound grating, corresponding to C = 0.5 in Eq. 2. Note that the Michelson contrast varies with the congruence phase ~ |cos ()|, with the maximum (0.84) realized by the line and the minimum (0.47) by the edge. The reader is referred to the preceding paper (Mechler et al. 2002) for a fuller discussion of the mathematical properties of these compound gratings.
Two drift velocities were used to determine how stimulus speed interacted with a neuron’s sensitivity to congruence phase. Drift velocity, V = f/ν, was changed from V = 3.1 deg/s “low” speed to V = 12.4 deg/s at “high” speed. This was done by increasing the temporal frequency of each component grating fourfold while keeping their spatial frequency fixed (the fundamental was at ν = 0.25 c/deg). The specific temporal frequencies used for the fundamental and the higher harmonics were (values in Hz) f = 0.78, 3f = 2.34, 5f = 3.90, and 7f = 5.46 at low speed; and f = 3.12, 3f = 9.36, 5f = 15.6, and 7f = 21.84 at high speed. Because all recordings were at approximately the same eccentricity, this choice allowed all four components of the compound grating to be within the spatiotemporal pass-band of each cell at a “low” speed. A “data set” denotes recordings of responses of one cell to the eight compound gratings at a single drift velocity.
The 63 cells with 100 data sets (out of a total of 226 data sets recorded in 137 cells) selected for analysis were those that 1) maintained good spike isolation throughout the experiment and 2) passed a signal-to-noise criterion in the compound-grating experiments. Signal variance was defined for each Fourier component as the squared Fourier amplitude of the trial-averaged response to each compound grating summed over all stimuli. Noise was defined as the trial-by-trial variance of the same component summed over all stimuli. The selection criterion required that the median ratio of signal over noise variance taken over the first eight Fourier components of the response be >0.3.
This data set substantially overlaps with that presented earlier (Mechler et al. 2002), but the two are not identical. The earlier paper, which focused on analyzing single-response harmonics but did not look into the influence of speed, used a different signal-to-noise criterion (it was based on a d′ threshold placed on the Fourier components in comparison to the blank) and also included data sets that were obtained with stimuli of different fundamental frequencies. As a result, the 100 data sets analyzed here included 78 of the 121 presented in the earlier paper and 22 from the same pool that were not analyzed earlier.
Cell classification is based on the modulation ratio (Skottun et al. 1991). According to this convention, the fundamental (F1) of the response to a single drifting grating of near-optimal spatial parameters was compared with the DC component after subtraction of the maintained rate of firing (F0) and a cell was labeled simple if F1/F0 > 1 and complex otherwise. Accordingly, there were 24 complex and 13 simple cells in the speed-paired sample. We analyze and report dependence on the modulation ratio F1/F0 both categorically and parametrically.
Chance et al. (1999) introduced a network model for V1 with variable recurrent gain. In response to drifting gratings, this model produces phase-modulated, simple-like responses at low gain and phase-invariant, complex-like responses at high gain. We asked whether this model could account for various aspects of the feature tuning we observed experimentally. As detailed below, only minor changes to this model were made: we changed the time constant of the feedforward impulse response and we varied the nonlinearity to include nonzero firing thresholds and half-squaring.
In this model, the continuous firing rate of the ith neuron, ri, is instantaneously boosted by the sum of the input from its feedforward sources ( ) and those from its recurrent connections ( ) and relaxes with a time constant τr (set to 1 ms)
Note that there is no spontaneous activity in the model. The effect of including spontaneous activity would be to allow for negative thresholds, but would not alter the simulation results.
A two-stage linear–nonlinear (LN) operator acting on the stimulus supplies the feedforward input
Here the linear filter stage is represented by the convolution of the compound grating W(x, t; ν, f, ) (Eq. 2), with the separable spatiotemporal kernel Gi(x)H(t). The scale factor A sets the absolute response magnitude. The nonlinear operator has two stages. The first is a static nonlinearity that consists of a threshold θ and a rectifier [x]+ = max (0, x); the second is a power function with an exponent n ≥ 1. As an example, θ = 0 and n = 1 represent perfect half-wave rectification and θ = 0 and n = 2, half-squaring. The value of θ was chosen to be zero for some networks; for other networks, a nonzero θ was chosen such that the response of the neurons with the smallest receptive field to the fundamental component (presented alone) was half-maximal.
Gi(x), the spatial filter of the ith cell, is a Gabor function
whose shape is fully determined by the envelope size σi, the carrier (or Gabor) frequency ki, and the carrier (or Gabor) phase under the envelope γi. The model included nk = 7 spatial frequency channels, with the Gabor frequency k sampled in equal steps of 0.5 c/deg from 0.5 to 3.5 c/deg, a 3-octave range. For each Gabor frequency, the Gabor phase was evenly sampled in steps of π/32 radians from the entire [−π, π] interval (nγ = 33). Thus the network size was N = nknγ = 231.
If the shape of receptive field profiles were independent of their size, then σi would be proportional to 1/ki. That is, the dimensionless combination σk, which measures the average number of cycles of the optimal grating “seen” by the neuron within the aperture of its receptive field, would be constant. An alternative to this picture (σ = const/k) would be that receptive field size is independent of the optimal spatial frequency, i.e., σ = const. Macaque V1 neurons apparently represent a compromise between these two possibilities. This is based on the observation of a weak negative correlation between size (σ) and optimum spatial frequency (k) (D Xing, MJ Hawken, and RM Shapley, personal communication). To endow the model with a bit of realism but keep its details simple, we implemented the compromise between constant shape and constant size by allowing two shape factors, a smaller one, that held at large scales (σi2πki = 2.5, ki ≤ 1.5 c/deg), and a slightly larger one that held for small scales (σi2πki = 2.7, ki ≥ 2.0 c/deg). In equivalent terms, the high spatial frequency channels in this model have somewhat narrower frequency bandwidths than the low spatial frequency channels.
H(t), the temporal response, is a single-parameter biphasic function
scaled by the time constant α. The time constant was set identical for each unit (α = 66 s−1) except as noted.
The recurrent input to each neuron is pooled from all other neurons in the entire network by a kernel defined as a difference of two Gaussians in the space of the Gabor frequencies ki of the feedforward inputs
This pooling kernel is shaped like a Mexican hat, with the excitatory center and inhibitory surround centered on each cell’s own Gabor frequency. The characteristic widths of center and surround are identical for each unit (σc = 0.5 c/deg and σs = 1 c/deg, respectively). The bandwidth of the resulting spatial-frequency tuning curve is similar for all units because it is primarily determined together by σc and σs, and less dependent on σi, the width of the Gabor envelope of the feedforward input. The gain term gi, normalized by the network size, sets the strength of the recurrent input that each neuron receives. In homogeneous-gain networks, all cells behave like ideal simple cells when g = 0, and increasingly like ideal complex cells as g → gmax, where gmax denotes the maximum gain attainable in homogeneous-gain networks. For gains g ≥ gmax, recurrent amplification makes the network unstable. Numerical values of recurrent gain are presented, even for inhomogeneous-gain (“mixed-gain”) networks, as g/gmax, relative to gmax of the homogeneous-gain networks. However, in mixed-gain networks, g is not bounded by gmax. This is because the true maximum gain is a parameter that depends on other network parameters, including the distribution of gains. In particular, the true maximum gain in a mixed-gain network can be made arbitrarily large if the number of units with very high gain are kept sufficiently low, and this in turn permits some units to have gains g ≥ gmax.
Firing rate responses for each neuron in the network were analyzed in exactly the same way as the spiking responses collected experimentally from V1 neurons. Off-line data analysis and statistical tests were performed using Matlab (The MathWorks, Natick, MA) toolbox functions and custom software written in Matlab.
The 100 data sets selected for analysis in this paper were obtained from 37 cells that provided data suitable for analysis at both drift velocities (74 data sets), 18 cells that provided data at the low drift velocity only (for two of which high-speed responses were measured but excluded by selection criteria), and eight cells that provided data at the high drift velocity only (for all of which low-speed responses were measured but excluded by selection criteria).
Earlier we showed (Mechler et al. 2002) that V1 neurons are tuned to the congruence phase of compound gratings, and that response energy and other response measures based on harmonics beyond the DC are especially sensitive to this tuning. Here we demonstrate that in most V1 neurons, feature tuning is dependent on the drift velocity of the compound gratings.
It is tempting to analyze the responses to compound gratings in terms of the responses to their components and a nonlinear response model. However, as indicated in our earlier study (Mechler et al. 2002), the accounting for the compound-grating responses requires a highly nonlinear model; idealized rectifiers and energy mechanisms do not suffice. This is further illustrated in Fig. 2. It shows the time histograms of the responses of three representative V1 neurons to the compound gratings (arranged along the phase circle in the same way these stimuli were introduced in Fig. 1), as well as to the four component gratings presented alone (stacked in the center, as labeled in Fig. 2A). For each cell, the set of responses on the left correspond to the stimuli drifting at low speed, and the set on the right, to stimuli drifting at high speed. Other examples (not paired for speed) can be found in Mechler et al. (2002).
These examples, especially the complex cells (A and B) illustrate the difficulties that prevent a simple prediction of the responses to the compound stimuli from the responses to the single components. The responses to compound gratings are much more peaked than responses to the components and the magnitude of these peaks is selective for specific congruence phases. Qualitatively, simple thresholds would not account for this kind of behavior, in that peaks in the compound-grating responses occur even though all of the component-grating responses are characterized by a weakly modulated steady firing rate. As analyzed in detail in Mechler et al. (2002), such behavior is also qualitatively inconsistent with global energy models. Note also that overall gain controls cannot confer the observed response selectivity either because all compound grating stimuli are equated for power.
On the other hand, local squaring operations (Burr and Morrone 1992) can provide some feature selectivity. Additionally, the behavior of Fourier components of the response as a function of congruence phase implies the presence of high-order nonlinearities (order ≥3), for both complex and simple cells (Mechler et al. 2002). Another way to rescue a linear-static nonlinear model would be to add phase-sensitive (Felsen et al. 2005) or strongly dynamic nonlinearities. However, specific forms for such nonlinearities have not yet been proposed, so it is difficult to test models of this kind from the data of individual cells.
The example cells of Fig. 2 typify another feature of our data. They exhibit, to various degrees, a more low pass spatial sensitivity at the (fourfold) higher temporal frequency, indicating that spatiotemporal sensitivity of these neurons is not separable in the two frequency domains. On the other hand, their spatial frequency optimum does not seem to decrease in inverse proportion to the temporal frequency change, indicating that these neurons were not exactly tuned to velocity either. Cells like these, whose sensitivity was neither separable in spatial and temporal frequency nor tuned to velocity when assayed with single gratings, were found to constitute a large fraction of cells in V1 (Priebe et al. 2006). This mixed behavior in the responses to single gratings (spatiotemporal inseparability) further complicates predictions of the responses to compound gratings when their drift velocity is varied.
In sum, a cell-by-cell approach to fitting the compound grating responses from the single-grating responses is insufficiently constrained by existing models that could conceivably work (spatiotemporally inseparable models with high-order phase-sensitive and/or dynamic nonlinearities). For this reason, our analytical approach will consist of an attempt to account for the range of behaviors across the population from a minimal network model, rather than the details of individual cells.
The first step is the extraction of indices that describe the responses to the compound gratings. Figure 3 shows the tuning to congruence phase (feature tuning) for the three cells in Fig. 2. The three illustrate the observed range of behavior and are ordered (from top to bottom) by increasing difference between the optimal phases at the two stimulus speeds. Each panel shows the response (total energy) of a single cell at low speed (open symbols) and high speed (filled symbols). Total response energy is defined as the summed squared amplitudes of the DC (after subtracting the baseline level) and the first eight Fourier components of the mean response. It is one of many alternative scalar response measures that were shown in our earlier paper to be consistent in identifying the feature optimum and comparable in their sensitivity (depth) of feature tuning.
As in Mechler et al. (2002), we fit these tuning curves with a family of even-harmonic functions of the congruence phase
by adjusting the five parameters—a0, a1, a2, α1, α2—to minimize the mean squared error of the fit. This family is a natural choice for the empirical description of feature tuning because it encompasses contributions of nonlinearities up to and including fourth-order and captures much of the variance in the tuning. The best-fitting function from Eq. 8 (thick continuous lines in Fig. 3) was used to extract objective measures of tuning curves and their change for further analysis.
We defined the optimal stimulus by its congruence phase, opt, at the peak position of the tuning curve (Fig. 3, thin arrows for low speed, thick arrows for high speed). The congruence phase, , which parameterizes the feature space, is periodic with period π. opt = 0 corresponds to a line-like stimulus; opt = π/2 corresponds to an edge-like stimulus; and intermediate values of the congruence phase correspond to intermediate one-dimensional features.
To quantify the change in the optimal stimulus, opt, induced by a change in the drift velocity, we determined
the signed minimum phase-shift. Δopt must lie between −π/2 and π/2. A value of Δopt = 0 indicates no speed-dependent change in optimal congruence phase; values of Δopt = ±π/2 are the maximum possible changes. We also consider the unsigned quantity |Δopt|, which indicates the change in feature selectivity independent of the direction of change (0 < |Δopt| < π/2).
To quantify the overall similarity of two tuning curves measured at different velocities, we use the Pearson correlation coefficient, r, which is sensitive to the shape of the phase variation but not to the size of the untuned part (mean elevation) of the tuning curves. For a pair of sinusoidal tuning curves, maximum positive and negative correlation (r = ±1) correspond to minimum (Δopt = 0) and maximum (|Δopt| = π/2) phase shifts, respectively, and minimum correlation (r = 0) corresponds to the intermediate shifts (Δopt = ±π/4). The latter are quarter-cycle shifts of tuning curves in this feature space, defining quadrature pairs. Although r = 1 implies that there is no change in the peak of the tuning curve (Δopt = 0), the converse is not true because the tuning curve may peak in the same position (Δopt = 0) yet change in shape (r < 1).
For most neurons, opt depended on stimulus velocity, but the extent of this dependence varied widely across the population. The same was true for the relative size of the responses to a given spatial waveform. Exemplifying one extreme is the neuron shown in Fig. 3A. This cell responded about twice as vigorously at high velocity (filled symbols) as at low velocity (open symbols). Despite this overall change in responsiveness, the tuning curves at the two velocities were similar in shape (Pearson correlation coefficient, r > 0.8). Correspondingly, the optimal stimulus was line-like (opt ≈ 0), at both stimulus speeds (|Δopt| < 0.11π). Illustrating the other extreme, the neuron shown in Fig. 3C was tuned to almost perfectly opponent congruence phases at the two velocities (|Δopt| ≈ 0.4π). Its tuning curves at the two speeds were strongly anticorrelated (r < −0.6). This neuron decreased, rather than increased, its response magnitude from low speed (open) to high speed (filled). The neuron shown in Fig. 3B was approximately halfway between these extremes. Its phase preference at the two stimulus speeds approximated a quadrature pair (|Δopt| ≈ 0.25π), and the correlation coefficient (|r| < 0.3) was small, as expected for a quadrature shift. This neuron responded equally vigorously at both speeds.
The range of the speed-induced changes of the optimal phase and of the shape and size of tuning curves in the examples shown in Fig. 3 is representative of the range observed in the entire V1 sample. (The sign and magnitude of the velocity-induced change in response size were not correlated with the velocity-induced change in feature preference, although the three examples of Fig. 3 may give an impression of correlation.) These and other aspects of feature tuning are shown for the entire V1 sample in Fig. 4. The plot on the left (Fig. 4A) summarizes how the optimal congruence phase depends on the drift velocity of the compound gratings. Note that these scattergrams are periodic in π on both dimensions, corresponding to the periodicity of the stimulus space. In these plots, speed invariance would correspond to a concentration of data points near the diagonal and a constant phase shift from low to high speed would correspond to a concentration of data points on a line that is parallel to the diagonal. The pair of dotted off-diagonal lines traces the locus of maximum phase offset (|Δopt| = 0.5π). In our sample, the optimal features obtained at low speed (opt,low) and high speed (opt,high) exhibited no significant (linear) circular association as measured by the circular correlation modulus (Fisher 1993) (|r| ≤ 0.1, P > 0.5). Because the modulus of the circular correlation is not significant, there is no observed tendency for an average speed-induced Δopt. In sum, we find no evidence either for speed invariance or a net speed dependence of feature tuning in V1. Rather, we find a scattering of tuning at low and high velocities, which, from our finite data sample, is indistinguishable from random.
Simple cells have traditionally been considered as better suited than complex cells for reliably signaling phase information. It is thus natural to ask whether simple cells signal these one-dimensional features (formalized as relative spatial phase) in a more speed-invariant manner than complex cells. The summary answer, derived with limited statistical power from evidence shown in the middle plot (Fig. 4B), is that simple cells’ phase preferences are not more speed invariant. This plot shows how the speed-induced change in feature preference (measured by 0 ≤ |Δopt| on the vertical axis) varies with the F1/F0 modulation ratio, a traditional index of nonlinearity and the simple– complex type (Skottun et al. 1991). F1/F0 is expected to form a bimodal distribution as the result of a nonlinear effect of the spike threshold (Mechler and Ringach 2002), and it does in our sample, too. However, both complex cells (n = 24) and simple cells (n = 13) were broadly scattered with respect to |Δopt| and the negative correlation between |Δopt| and F1/F0 was not significant (Pearson correlation coefficient −0.3 < r < 0 and P > 0.08). Moreover, the distributions of the speed-induced phase-shifts, both the signed and unsigned quantity, were statistically indistinguishable in simple and complex cells (Kolmogorov–Smirnov two-sample test, P > 0.05 for |Δopt|, P > 0.2 for Δopt). However, these statistical results are not robust given the rather small sample size. It is possible that with a larger sample size one would find a significantly stronger tendency among simple cells to maintain their phase preference or that the size of speed-induced change in feature preference negatively correlated with the index of cell type.
The meaning of the optimal feature parameter depends on the selectivity of tuning. Therefore we also analyzed the selectivity of the tuning, as measured by the circular variance (CV) of the tuning curve. (Here CV denotes 1 minus the usual measure. For calibration, a delta function of a circular variable has CV = 1 and the CV of a cosine raised to a constant pedestal is about half the modulation depth measured by the Michaelson contrast.) The CV indicated that at both speeds, most cells were broadly tuned: CV < 0.3 for all but two cells. Unlike the preferred feature, tuning selectivity as measured by the CV was highly correlated at the two speeds (r = 0.71). The median CV at low speed was 0.11 and increased to 0.13 at high speed, a slight and marginally significant change (paired sign-rank test, P < 0.1). Also unlike the preferred feature, both the CV and the speed-induced change in the CV were uncorrelated with F1/F0 and these measures were similarly distributed in simple and complex cells (Kolmogorov–Smirnov two-sample test, P > 0.5).
The CV, unlike the bandwidth or the depth of modulation, is a good measure of the overall shape of a tuning curve. The above results were not dependent on the measure of selectivity, though: the same conclusions were reached when the measure was the depth of modulation of the tuning curve. Thus the relative magnitude of the feature-independent and the feature-modulated components of the compound-grating responses of V1 neurons are essentially independent of stimulus speed.
In principle, a speed-induced change in feature tuning could be attributable to a shift in optimal phase, a change in the shape of the tuning curve, or both. The third plot (Fig. 4C) examines this issue. If the tuning curves at low and high velocities were related by a pure shift in optimal phase Δ (i.e., a translation, permitting a rescaling of the tuning curve), it follows that the correlation coefficient r of the two tuning curves is given by
Here a1 and a2 are the parameters in Eq. 8 that describe the shape of the tuning curve. Because typically a1 > a2, Eq. 6 predicts that the relationship between r and |Δopt| is dominated by declining sigmoid. This accounts for the general shape of the scattergram in Fig. 4C. Thus a Δ shift accounts for a substantial component of the velocity-induced change in tuning. On the other hand, if a shift in Δ were the sole cause of the velocity-induced change in tuning, then an appropriate translation in the tuning curve measured at high velocity should bring it into coincidence with the tuning curve measured at low velocity (permitting rescaling). We determined this “corrective” phase shift as the phase shift that Δcorr maximizes the correlation coefficient r of the tuning curve measured at low velocity and the tuning curve measured at high velocity after a translation by Δcorr. Not surprisingly, Δcorr is highly correlated with the speed-induced shift in the optimal congruence phase Δopt (r > 0.9). However, this translation does not bring the low- and high-velocity tuning curves into coincidence. Rather, the median correlation coefficient between the speed-paired tuning functions was r = 0.73. Thus a translation of the tuning curve accounts for only about half of the variance (r2 ≈ 0.5). A change in shape of the tuning curve, as well as measurement error, constitutes the other half of the variance.
As a final point, we mention that feature preference or tuning depth did not correlate with relative cortical depth. Laminar location was identified histologically for most cells, but possible laminar variations could not be studied because of the small sample size.
Many aspects of the behavior of real V1 neurons can be understood in terms of some variant of the “iceberg effect,” i.e., in terms of the interaction between a linear filter (the spatiotemporal kernel of the receptive field) and a static nonlinearity (that of spike threshold). As we show later, this mechanism is also fundamental in endowing V1 neurons with feature tuning. We now examine to what extent this can account for our data.
A linear operator scales the amplitude and shifts the phase of the frequency components present in the stimulus but adds no new frequency components. Moreover, the amplitude in the output of a linear transform depends only on the frequency but not the phase of the input. Thus neither the amplitude nor its square (the energy), taken in any combination of output components, can exhibit feature tuning for the stimuli used here: feature tuning signifies nonlinearity.
By general considerations similar to those laid out in Mechler et al. (2002), one can show that an isolated linear–nonlinear (rectified) simple cell receptive field model is expected to exhibit feature tuning, that the tuning is periodic in twice the congruence phase, and that the dominant term in its harmonic expansion in phase is ~ cos [2( − opt)]. Furthermore, the energy model of complex cells that sums with equal weight the squared output of two quadrature pairs of simple cell (rectified linear) subunits (one even symmetric and one odd symmetric as well as their opposites in contrast polarity) will by design produce no phase tuning because the subunits’ outputs combine to a phase-independent constant DC elevation. The key premise necessary to reach these conclusions is that, by design, the congruence phase is the same in each component of a given compound grating. The key observation in the analysis is that for a nonlinear contribution of order n, the output phase is the sum of the phases of the interacting components.
However, simple LN models cannot account for the responses to compound gratings—for example, the peaking of the responses seen in Fig. 2 or the manner by which the response Fourier components depend on the congruence phase (Mechler et al. 2002). Adding phase-sensitive nonlinearities or dynamic gain controls might recover such features within the context of a feedforward model, but concisely parameterized models of this sort capable of predicting responses to moving stimuli are not yet in hand. An alternative approach to determine whether the critical features of our responses could be derived from a physiologically reasonable elaboration of idealized LN models is to incorporate idealized LN neurons into a simple recurrent network (Chance et al. 1999). This model departs from the Hubel and Wiesel (1962) hierarchical (feedforward) model of V1 in which complex cells pool their inputs from simple cells that have complementary receptive field profiles and reflects the growing consensus that cortico-cortical interactions are critical to understanding responses of individual cortical neurons. Chance et al. (1999) proposed that complex-cell responses arise through recurrent amplification of simple-cell responses and that simple and complex cells represent the weakly and highly coupled regimes of the same basic cortical circuit. We now ask whether the same basic network model can account for the characteristics of feature tuning that we observe.
Although the isolated linear–nonlinear receptive field model is tractable (as outlined earlier), interconnection of such units requires numerical simulation to determine the contributions from single-cell receptive fields and network mechanisms that shape feature tuning.
We implemented several variants of the above network model (as detailed in METHODS). Briefly, the network consists of interconnected rectified Gabor units whose receptive fields are identically centered and oriented. Gabor frequency and phase, representing the linear feedforward input to the network, tile the space of spatial frequency and phase. The recurrent gain relative to the strength of the linear kernel can be varied. Previously, we showed that this model could account for much of the diversity of feature preference and selectivity seen in V1 responses to compound gratings (Ohiorhenuan et al. 2004). Here we report that this model captures most of the qualitative behavior of V1 neurons to one-dimensional features and, specifically, the model can explain the pattern of speed dependence of V1 responses to this stimulus set.
To develop an intuition for how the recurrent model leads to feature tuning, we begin with homogeneous-gain models, in which the gain of recurrent feedback is the same for every cell. Figure 5 shows tuning to compound gratings drifting at low and high speeds for model neurons in three homogeneous-gain networks that differed only in the gain parameter. In each data set, neurons are organized in rows by k, their Gabor spatial frequency, and in columns by γ, their Gabor phase. The network of neurons is evenly subsampled for display. For each model neuron, tuning curves are plotted analogously to Fig. 3. In the simulated experiments, the fundamental grating component’s spatial frequency was 0.25 c/deg and its temporal frequency was 1 Hz at low speed, 4 Hz at high speed; each parameter value was chosen to be similar to those used in our V1 experiments.
At the zero-gain extreme (Fig. 5A), the decoupled network becomes a set of isolated simple cells. The most important observation for these model units is that the nonlinear interaction between the spike threshold and the feedforward kernel results in feature tuning. There are several characteristics of feature tuning, detailed later, that are commonly observed in all our simulations and demonstrate the fundamental role of the rectified feedforward input in the genesis of feature tuning in V1. These key results are not particular to the choice of parameters used in the simulations. In the model units shown the threshold was set to a moderate level (defined in METHODS) and the exponent used for the static nonlinearity was n = 2. However, similar feature tuning resulted for other static nonlinearities (not shown). The notable exception is the piecewise linear perfect half-wave rectifier (θ = 0 and n = 1), which uniquely precludes tuning to equal-energy compound gratings because its output preserves the equal-energy property of the input. Next, we describe the characteristics of feature tuning that are common to all model networks studied.
First, at any given stimulus speed, feature sensitivity in each simple cell varies approximately as ~ cos [2( − opt)] function of congruence phase, with a distinct feature preference, opt. Thus the simulation of the zero-gain network affirms the qualitative inferences made earlier for the shape of feature tuning in an isolated rectified feedforward unit.
Second, at a given drift velocity, for any particular cell, the feature preference monotonically depends on the receptive field’s Gabor phase, i.e., opt(γ) ~ (γ + const) mod π. This dependence on Gabor phase survives increased recurrent interactions and points to the critical role that the symmetry of the feedforward kernel plays in shaping feature preference in V1. Furthermore, although the form of this dependence does not change with a change in stimulus velocity, the constant offset and thus the tuning optimum itself depends on speed: changing the drift speed V of the stimulus results in a drift-dependent shift, Δopt(V), in the preferred stimulus, i.e., opt(γ, V) ~ [γ + Δopt(V)] mod π.
The dependence of the constant offset is the signature of the complex multipliers of the spatiotemporal kernel. The kernel need not be separable in the frequency domain to have this effect. The phase offset depends on the complex amplitudes (and thus phases) of the spatial and temporal transfer functions of the feedforward kernel. In the simulations discussed so far, all units in a network had identical temporal integration property, which translates into identical complex multipliers in the time domain. Model neurons in different Gabor channels are expected to differ in their spatial complex multipliers, but because of the similar overall shape of their spatial tuning function this difference does not alter the phase dependence very much (its extent is reflected by the scatter in Fig. 7B)—thus the approximately constant phase offset at a fixed stimulus velocity.
The form of this dependence of preferred phase on velocity guarantees that, at any given speed, preferred features cover the entire feature space in a population of cells in which the Gabor phases sample the entire phase space.
Third, within each spatial frequency channel corresponding to a fixed Gabor frequency k, the magnitude of the response varies regularly with γ, the Gabor phase, approximately as ~ cos (2γ). Thus the units with the symmetric Gabor kernel (first column, labeled γ = 0, in the plots shown) have the largest and the units with the asymmetric Gabor kernel (column labeled γ = π/2) have the smallest responses. This pattern arises through the feedforward input because the even-symmetric linear component, taken after rectification, is larger than the odd-symmetric one. A similar pattern would arise in any family of kernels that sample a mixture of odd and even functions.
Because it arises from an interaction between the linear kernel and the static nonlinearity, this pattern is enhanced by an increase in the threshold or in the acceleration (i.e., the exponent) of the power function. This mechanism is especially prominent in the high spatial frequency channels. This is explained as follows. Stimulus energy, by construction, declines with component frequency. Thus cells of the highest Gabor frequency (largest k values) respond to the compound gratings with the smallest magnitude in the entire network, which, assuming a networkwide constant threshold, makes them the most sensitive to clipping.
Our simulations also indicate that changing the drift speed does not affect the ~ cos (2γ) dependence of the magnitude of the responses across units, but can affect the absolute magnitude of the responses as well as the selectivity of the feature-tuning curve in a spatial frequency-dependent manner.
Chance et al. (1999) showed that for homogenous gain networks, increasing the gain results in increasing phase-insensitive pooling and leads to single grating responses that are progressively more complex-like. The same mechanism decreases the sensitivity (modulation depth) of feature tuning to compound gratings, as illustrated by Fig. 5, B and C for various (high) levels of gain. Thus when pooled phases are balanced, recurrent pooling acts against the static nonlinearity of the receptive field by making responses more complex-like. Underlying the importance of the role that the rectified feedforward component plays in setting up feature tuning is the fact that the recurrent gain must be quite high to generate a noticeable change in the shape of the feature tuning curves. Specifically, feature tuning remains stable while the recurrent gain is raised from zero (g/gmax = 0, all feedforward simple cells) all the way up to an intermediate level (g/gmax = 0.5, a value that results in interacting model neurons that are all borderline simple– complex by the measure of the modulation ratio; not shown). Thus a point of special emphasis here is that intermediate gains generate complex cells that exhibit significant feature (phase) tuning. This is all the more notable because the F1/F0 ratio, the index of the simple– complex continuum, is also a measure of phase sensitivity.
Notice that the preferred feature in each unit is independent of the choice of the static nonlinearity or the recurrent gain, but only if the latter is not too high. At very high homogeneous gains (Fig. 5C), feature tuning becomes homogeneous because all units begin to behave independently of their own afferent input and similarly to the units that respond the most strongly. That is, in the high homogeneous-gain regime, these strongly coupled networks exhibit winner-take-all behavior, which is expected from strongly coupled recurrent networks in general. For these networks, the “winner” among Gabor units of the same spatial frequency k is the one with a symmetric kernel (Gabor phase γ = 0 or γ = π).
This winner-take-all behavior is more prominent when clipping by the rectifier is more severe. This accounts for the more prominent winner-take-all behavior in the higher spatial frequency channels (Fig. 5C, bottom row) because, in these channels (see above), the linearly filtered stimulus energy is smaller. The winner-take-all favoring of the symmetric Gabor is powerfully reinforced by the recurrent excitation from neighboring frequency channels, where this mechanism is similarly prominent.
Note that even though the high-gain regime of the model leads to cells with complex-like behavior in terms of F1/F0 (Chance et al. 1999), the high-gain regime does not lead to energy-like behavior in terms of feature tuning. This follows from the biases set up by the feedforward input as explained earlier, along with the winner-take-all behavior. The selectivity of tuning remains larger in the higher-frequency channels because of the relatively stronger effect of clipping in those channels.
Homogeneous-gain networks illuminate the genesis of feature tuning in model neurons. However, a single homogeneous gain can produce only one kind of behavior, not a simple– complex continuum. Moreover, a well-documented observation about the primate V1 (Ringach et al. 2002) is that simple and complex cells are both present in every cortical layer, with slight variation of their relative abundance across layers but no obvious spatial segregation within layers. Thus by virtue of its ability to generate an arbitrary simple– complex continuum, a random-gain network is likely to be a more realistic model of the V1 population.
Before proceeding to the presentation of the mixed-gain network simulations, a technical point about the behavior of the gain parameter needs to be made. In the preceding analysis of homogeneous-gain networks, we (following Chance et al. 1999) have referenced values of the homogeneous gain g to the maximum stable value of the gain gmax. An inhomogeneous network can remain stable even if some cells have g > gmax—provided that there are not too many of them. Thus for inhomogeneous-gain networks g can be sampled in a wider range than the one limited by gmax of homogeneous networks of otherwise identical parameters.
To illustrate this point and to examine how gain determines the simple– complex character in the mixed gain network we plotted in Fig. 6 the F1/F0 modulation ratio for the optimal sine grating as a function of the gain. To facilitate comparison with results for homogeneous-gain networks, we normalized gain with gmax of homogeneous networks of otherwise identical parameters (thus g/gmax > 1 could be realized). Gains were randomly chosen from a uniform distribution over the g/gmax [0, 1.4] range. The functional relationship is a slowly decaying one, with F1/F0 → 0 at very large gains. (Thus complex cells with F1/F0 < 0.2 can be realized by recurrent gains greater than the range sampled in Fig. 6.) The dependence of F1/F0 on gain is parametric in the Gabor frequency, as indicated by the fine thread-like densities in the scatterplot, each of which is composed of data from units of a particular spatial frequency channel. The asymptotic dependence is very different from the linear relationship (slanted dotted line) known for the homogeneous gain networks (Chance et al. 1999). This difference is reflected in the range of gains associated with simple cells (triangles) and complex cells (squares). In homogeneous networks, the class boundary (horizontal dotted line) intersects in a single point with the linear regression of data, sharply dividing the continuum of gain between simple cells (g/gmax < 0.41) and complex cells (g/gmax < 0.41). In mixed-gain networks, simple cells are confined to a narrower range of gains and the boundary is not sharp (scatter of triangles and squares along abscissa overlap in Fig. 6). This is because the location of the intersection of class boundary (horizontal dotted line) with the data depends on the Gabor frequency.
Figure 7A shows the tuning curves for model units in a “mixed-gain” network in an arrangement similar to that in Fig. 5. As may be expected from the observations already made, unit by unit, feature preference in the mixed-gain population closely resembles that observed in the homogeneous intermediate-gain network (Fig. 5B), although there are differences. Selectivity, but especially response magnitude, response parameters that are more dependent on recurrent gain are more varied in the mixed-gain network. A case in point is the lawful variation of tuning magnitude with Gabor phase observed in homogeneous networks. That pattern, which survived even in strongly coupled units in a network of homogeneous high gain (Fig. 5C), is diluted here. The pattern is expected to be fully eliminated in a sufficiently inhomogeneous network.
To compare the mixed-gain model with the V1 population, Fig. 7, B–D presents the same population analyses as in Fig. 4. In V1 (Fig. 4A), the scattergram of optimal feature at low speed (opt,low) versus high speed (opt,high) showed no statistical association by linear circular correlation statistics. However, the simulations (Fig. 7B) for the recurrent network model show prominent “tracks,” indicating strong correlation between feature tuning at the two speeds. They signify the monotonic dependence of feature preference on Gabor phase, a legacy of the linear kernel. Thus not surprisingly, the tracks were also seen in homogeneous-gain networks and their pattern and position were preserved across gain levels (data not shown). The exact shape of that dependence, and thus the shape of the track (e.g., the degree of deviation of the data points from a line of unity slope), depends on the relative spatial frequency of the stimulus and the Gabor frequency— data from units of the same frequency channel form fine “fibers” within the track. We show later (Fig. 8) that the offset of the track along the axes strongly depends on the temporal integration of the feedforward kernel. Had we allowed for it, a variation in the temporal integration would contribute a much larger scatter to the pattern in Fig. 7B than the variation in Gabor frequency.
To further compare the randomized gain model with the observed V1 population responses, Fig. 7C shows (exactly as in Fig. 4B) the speed-induced change in feature preference (|Δopt|) versus the F1/F0 modulation ratio. Most values of |Δopt| are near the average phase difference determined by the track pattern seen in Fig. 7B. We show later that this average strongly depends on the time constant of the temporal impulse response function of the feedforward kernel. The range of scatter around the average |Δopt| depends on both the random variation of gain and the diversity of the spatial frequency channels in the sample. In the horizontal scatter, the data with the lowest values correspond to the sampled highest gain values shown in Fig. 6. At the high end, the theoretical upper bound is a constant 32, defined by the largest F1/F0 associated with the static nonlinearities used by units in the network (F1/F0 ≈ 1.7 for the half-squaring in Fig. 6).
The pattern seen in Fig. 7C suggests independence between F1/F0 and |Δopt|, but the very small negative correlation (Pearson r = −0.067) is statistically significant (P < 0.001). The distributions of |Δopt| and gain are similarly independent (not shown). Moreover, there is no statistically significant difference between simple and complex cells in the distributions (Kolmogorov–Smirnov two-sample test, P > 0.5) or the medians (Wilcoxon rank-sum test, P > 0.3) of the |Δopt|. This model prediction—that the speed-induced change in feature preference is similar across the simple– complex continuum—is robust with the choice of the temporal integration properties of the units and is consistent with the results in the real V1 sample.
We also compared feature selectivity, as measured by the CV of the tuning curve, between the model and data. The selectivity of feature tuning depends on both the Gabor frequency and the recurrent gain. The medians and spreads of the CV in the entire population of model units were comparable at the two simulated velocities. It appears that the model is able to capture, in the low-to-intermediate frequency channels and at intermediate gains, both the spread of the distribution of the CV and its relative speed invariance that are observed in V1.
Figure 7D shows (exactly as in Fig. 4C) the relationship between the correlation coefficient r of the two tuning curves, taken at low and high speeds, and |Δopt|, the difference in the preferred features. The model predicts a declining sigmoid (thin line in Fig. 7D), r ~ cos (2|Δopt|), which, as discussed with respect to Fig. 4C, is expected when the tuning curve changes in preferred phase and depth, but not in shape. That the model predicts no speed-induced change in the shape of the tuning curves is also implied by the earlier observation (Fig. 5) that all simulated feature-tuning curves are dominated by a single cos (2) component. However, in the measured V1 population, tuning functions also changed in shape. These recurrent network models cannot account for this component of the observed responses.
One aspect in which the physiologic and simulated data are distinct is the range of speed-induced phase shift (cf. the scatter along the ordinate in Fig. 4B and Fig. 7C). As we subsequently demonstrate (Fig. 8), the model can achieve arbitrary ranges of speed-induced phase shift with an appropriate variation of the parameter of temporal integration that was kept fixed for the simulations shown in Fig. 7B.
Another, more important, difference between the model and V1 neurons is in the detailed shape of their feature-tuning curve. We observe that whereas feature tuning in many V1 neurons exhibits higher even-harmonic distortions (e.g., Fig. 3B), in Gabor units the ~ cos [2( − opt)] term is by far the dominant one.
The tuning curves of both model units and V1 neurons are well fit by a five-parameter harmonic function (Eq. 8) and their shape could be quantified by a2/a1, the amplitude ratio of the fourth harmonic over the second harmonic component of the fit. As seen in the examples in Fig. 5, the feature tuning in model neurons is dominated by the second harmonic component. The relative contribution by the fourth harmonic is negligible (a2/a1 < 0.07) for all units in each of the explored model networks. This was true in both homogeneous-gain and random-gain networks independent of the threshold and the nonlinearity used. However, in almost all (36/37) real V1 neurons, this ratio was sizeable (all a2/a1 > 0.07, median 0.9).
This dominance by a ~ cos [2( − opt)] term in model neurons arises, in large part, from the low-pass property of the Gabor envelopes. Biasing the linear kernels toward high-pass (e.g., by asymmetric envelope) does enhance the higher-order even-harmonic distortions in the tuning curves, especially in the low-frequency channels of the model (numerical simulations not shown). Thus receptive fields that have high-pass kernels with respect to the stimulus could generate tuning curves whose shape differs noticeably from ~ cos [2( − opt)].
Other possibilities could be realized in a recurrent network. For example, a model neuron in which pooling of recurrent input is asymmetric or biased in the Gabor phase could exhibit distortions from the basic cosine shape of feature tuning curves. However, this effect alone cannot explain the shape of tuning curves in physiologic data because such a mechanism would lead to distortions in complex but not simple cells (because the latter receive little or no recurrent input). This prediction is inconsistent with the observation that large a2/a1 > 1 ratios were equally prevalent in real V1 simple and complex cells. Moreover, there was no significant correlation of the a2/a1 ratio with cell category (P > 0.5 by t-test) and this ratio is uncorrelated with the F1/F0 modulation ratio (r = 0.07, NS).
Another possibility is that the shape of the static nonlinearity can be very different from neuron to neuron in the network. Because the shape of the nonlinearity is an important factor determining the shape of the feature tuning function, such network inhomogeneity of the nonlinearity could explain some of the diversity in V1 feature tuning, especially if compounded with an inhomogeneity of the linear kernel (as mentioned earlier). We did not explore this systematically.
We now return to the nature of the “tracks” in Fig. 7B and their absence in the experimental data. We observed that the positions of these tracks did not depend on the recurrent gain (whether homogeneous or random) or the type of static nonlinearity (assumed the same for all units in a network). Therefore we hypothesized that the position and slope of these tracks would depend on the spatiotemporal response function of the feedforward input. To isolate this factor, we simulated a series of homogenous zero-gain (all simple cell), homogeneous half-squaring networks, each with the same range of spatial scales and a different (homogeneous) time constant, including the value (α = 66 s−1) used in Figs. 5–7.
Each panel on the left in Fig. 8A corresponds to one such network (labeled with its value of α). The scatterplots show the optimal phase at low versus high speed, with each symbol corresponding to one neuron of a particular k Gabor frequency and γ Gabor phase. As the Gabor phase varies, so does the phase of the optimal feature: the optimal feature phase traverses its [0, π] range two times as the Gabor phase varies over its [−π, π] range, and this is true at both speeds. There is thus an approximate constant phase difference between optimal features at the two velocities and the plot of optimal feature at high versus low speed forms a “track” that is approximately parallel to the main diagonal.
Any single such track is characterized by a significant linear circular association, but superimposing many such tracks (at different offsets) would dilute this association and cover the domain. The opt,low versus opt,high scatterplot in Fig. 8B is a superposition of data contributed by all panels shown in Fig. 8A. The data fill almost the entire rectangular range. Scatter is increased and blending is more complete in the more realistic mixed-gain network (Fig. 8, C and D). Blending is incomplete if, as in Fig. 8, large numbers of cells are modeled for each of only a few distinct samples of α, although a more realistic sampling approach, in which only a few units are modeled for a large number of distinct samples of the α parameter, will eliminate the appearance of distinct tracks. This suggests that a network in which spatial and temporal integration properties of units are defined by parameter values that are drawn randomly and independently from a broad range could account for the observed absence of correlation between opt,low and opt,high (Fig. 4A).
This study extends our first investigation (Mechler et al. 2002) of V1 neurons’ tuning to one-dimensional spatial waveforms by experimentally probing the dependence of such tuning on stimulus drift velocity. Morrone and Burr (1992, 1988) argued forcefully that the essence of one-dimensional features is phase congruence, i.e., the correlation of phase across multiple spatial scales. We use equal-energy compound-grating stimuli that permit studying sensitivity to phase congruence in isolation from spatial filtering. Despite this advantage, only a few earlier physiological studies (Levitt et al. 1990; Pollen et al. 1988) used these stimuli. Our primary finding here is that the feature selectivity of typical neurons is highly speed dependent and this lack of speed invariance makes it impossible to regard single V1 neurons as feature detectors per se. We also demonstrate that the existence of feature tuning and its dependence on stimulus velocity are both predicted by a recurrent network model of V1, in which single neurons are modeled as rectified linear feedforward kernels.
In Mechler et al. (2002), we introduced a novel stimulus set that consists of one-dimensional compound gratings that have identical spectral distribution of contrast energy but have different waveforms, including those resembling lines or edges, and others with intermediate shapes. We reported that many V1 neurons were selectively tuned to one or another feature in this space, but the distribution of the optimal features in the V1 population represented the full feature space with little bias. Here, we show that these population results hold at different stimulus speeds, at least for speeds well within the range of velocity tuning optimum in V1 neurons (Priebe et al. 2006). Importantly, we also find that the feature preferences of single V1 neurons can be radically altered by a change in the drift velocity of the stimulus and there is no detectable correlation between feature preferences at two (sufficiently different) velocities. This contrasts with the orientation preference of individual V1 neurons, which is largely speed invariant. Thus because the specificity of individual neurons is thoroughly scrambled as velocity changes, V1 neurons cannot signal edges and lines by themselves. It is an open question whether single neurons in any given extrastriate visual cortical area could do that or whether they too represent these features by a population code, as in V1.
Our results indicate that signaling motion and the shape of the luminance profile in V1 are interdependent. This implies that signaling motion by most of these neurons could be compromised by the variation in stimulus shape. This likely does not matter though because most neurons in V1, by reason of their largely independent spatial and temporal frequency selectivity, are not tuned to the velocity of a moving stimulus (recently reviewed, e.g., by Lennie and Movshon 2005) and thus may not be genuinely involved in signaling speed. However, a distinct minority of direction-selective V1 neurons is also speed tuned, independent of the spatial frequency of the stimulus (Priebe et al. 2006). Because direction- and velocity-selective neurons are the most likely source of motion signals sent downstream to extrastriate processing (Felleman and Van Essen 1991), it would be interesting to determine whether feature tuning in those neurons was different or not as strong as in the majority of V1 neurons. We do not know what fraction of our V1 sample may have been speed tuned, but we do know that about one quarter (16/61 neurons) consisted of direction-selective neurons (defined by a direction selectivity index DI > 0.33, where DI is defined based on responses to gratings drifting in the optimal direction and its opposite in the usual manner), and a similar fraction (13/61) was directionally biased (defined by moderate selectivity, i.e., 0.16 < DI < 0.33). We found that feature tuning (as indexed either by the preferred phase or the CV) was independent of the direction selectivity index. This evidence discounts hypothetical scenarios in which feature signaling would differ in motion signaling neurons from the rest of V1.
There is another result of the Priebe study that warrants discussion. Priebe et al. (2006) not only used single gratings but also assayed the responses of direction-selective V1 complex cells with a superposition of two spatial component gratings that drifted rigidly together at the same velocity. They found that these neurons combined components linearly, which would exclude feature tuning as we defined it. The discrepancy between the results of Priebe et al. (2006) and of ours may be puzzling, considering the similarity between their stimuli and ours (we used four components drifting rigidly together). Because those authors do not report the phases of the frequency components, a more detailed comparison with our experiments is impossible.
We recorded with tetrodes multiple single neurons and the stimuli were optimized for the most responsive cell. Thus for the other simultaneously recorded neurons, the compound gratings could be of nonoptimal orientations. The most important effects of using stimuli of nonoptimal orientation are likely to be 1) reduced response magnitude together with increased response variance and 2) a lower high-cut in spatial frequency. These effects may be greatly enhanced at high speed by further low-pass temporal filtering. Therefore nonoptimal orientation could have contributed to diminish the size and modulation depth of feature tuning at high speed in neurons for which orientation was not optimized, but it is difficult to see how nonoptimal orientation might have induced a velocity dependence of feature tuning.
We intentionally did not optimize spatial frequency and temporal frequency for any of the tetrode-isolated units. By design, the stimulus fundamentals were below a cell’s optimum (see METHODS), so that their pass-band could accommodate at both drift speeds as many components as possible. In most macaque V1 neurons with parafoveal eccentricities, the spatial frequency peak falls in the 0.8 –5 c/deg range, the typical spatial frequency pass-band is 0.5–2.5 octaves (De Valois et al. 1982), the temporal frequency low-cut is <1 Hz, and the high-cut is between 4 and 32 Hz (Hawken et al. 1996). Thus our choice of frequencies typically served the stated goal. With these fixed parameters, there were 8/47 cells (too high-pass) that responded only at 12 deg/s and 2/47 cells (too low-pass) that responded only at 3 deg/s speed (the data points outside the axes of Fig. 4A). These two velocities bracket the central part of the measured distribution of velocity tuning optimum in V1 neurons (Priebe et al. 2006), supporting our choice of parameters.
Feature tuning and discrimination constitute a phase-sensitive and fundamentally nonlinear operation that must access more than one spatial frequency. Our model simulation suggests that a combination of the threshold and other static nonlinear components of the operator that generates the spike rate output play a crucial role in the genesis of feature selectivity. In this respect, it is noteworthy that experimentally we found essentially the same feature sensitivity in classically defined simple and complex cells. Consistent with a broad V1 continuum, the relevant nonlinearities were comparable in both cell types when exposed to our broadband stimuli. The similarity of responses of simple and complex cells obtained here with compound gratings is remarkably different from the categorically distinct responses of these cells routinely obtained with single sine gratings (Skottun et al. 1991), but concur with recent arguments against a dichotomy in the synaptic organization of V1 (Mata and Ringach 2005; Mechler and Ringach 2002; Priebe et al. 2004). Furthermore, although feature tuning and the modulation ratio both measure some sort of phase sensitivity, the two need not be consistent: the magnitude of F1/F0 indicates the sensitivity to the phase of F1, which is closely related to a notion of position, whereas congruence phase is more closely related to a notion of shape.
Contrast energy-sensitive gain-control mechanisms operate at various levels of the early stages of visual processing, including V1 (Geisler and Albrecht 1992). The fast-acting contrast gain control active within the receptive field center is not tuned to orientation and is likely inherited from subcortical afferents (Bonin et al. 2005; Smith et al. 2006). With increasing contrast, the contrast gain control may advance the response phase (by reducing the stimulus-response delay) and make the temporal filter of the affected receptive fields more high-pass (Albrecht 1995; Holub and Morton-Gibson 1981; Shapley and Victor 1981). Although such an operator might affect feature processing when stimulus contrast is varied, it cannot account for the feature tuning we observed because we assayed it by using equal-energy stimuli. However, incorporating a quantitatively accurate gain-control mechanism is necessary to explain responses to both compound and component gratings in the same model.
Although surround effects were not studied here, it is possible that our results could be explained by a nonlinear nonclassical surround modulation. Neurophysiological (Bonds 1989; Levitt and Lund 1997; Polat et al. 1998) and psychophysical (Polat and Sagi 1994) evidence implicates a modulatory surround mechanisms that is sensitive to oriented contours and one that may also be sensitive to the spatial frequency content of the surround stimulus (Polat and Sagi 1993). Because oriented contours, such as the edge and line elements our stimuli approximated, are jointly defined by components taken across multiple spatial scales, their ability to exert modulatory influence may just as much depend on their relative phase as on orientated contrast energy. However, the involvement of such mechanisms must be considered hypothetical because their activity depends on the relative orientation in center and surround—which is not varied here.
Felsen et al. (2005) hypothesize that the enhanced feature gain they measured in V1 complex cells in response to stimuli of natural phase spectra may imply some phase-sensitive nonlinear nonclassical surround modulation, which could encompass phase-sensitive interactions between two gratings of the same orientation. Our stimuli include close approximations of naturally occurring salient features that are defined by phase congruence; thus it is at least possible that our stimuli could tap into these hypothetical modulatory mechanisms as well. Our studies differ in the choice of species and in the use of natural versus “designed” stimuli and, in contrast to Felsen et al. (2005), we find no significant dependence of feature sensitivity on the index of the simple– complex continuum.
Whereas feature tuning in single V1 neurons depends on stimulus speed, the V1 population as a whole preserves the range of tuning. As a result of the shifts of optimum phase, the feature-specific identities of neurons are shuffled with changes of stimulus speed. This mechanism allows the V1 population as a whole to represent a full suite of feature analyzers independent of stimulus speed.
It is difficult to imagine how these neurons, acting individually, could serve to represent one-dimensional features. Our new results mesh with the body of evidence that suggests that V1 neurons act as tuned nonlinear filters that represent a bank of spatially localized intermediate processors, rather than the view that individual neurons serve as detectors. That is, the information-processing strategy of V1 appears not so much one of each individual unit serving as a detector, but rather one providing a range of selectivity realized by an ensemble of neurons that are dynamically selected by the stimulus.
As inferred from perceptual phenomena, mechanisms that extract features in a way that is at least approximately independent of speed are necessary components of visual processing. A corollary of our results is that those hypothetical feature detectors must be able to pool their input signals from V1 ensembles (feature analyzers), even though the latter are defined dynamically by the stimulus instead of a static place code. The considerable variation that we observe in feature preference in a local ensemble that is confined within the recording volume of our tetrodes (Mechler et al. 2002) puts further constraints on the possible potential pooling mechanisms. Thus our results make it difficult to ascribe feature detection (even of simple one-dimensional elements) to individual V1 neurons and highlight open questions regarding the cortical circuitry necessary to perform this function.
We thank K. Purpura, D. Reich, and M. Repucci for help in the data collection.
This work was supported by National Institutes of Health Grants EY-9314 to J. D. Victor and F. Mechler and GM-7739 to I. E. Ohiorhenuan.