|Home | About | Journals | Submit | Contact Us | Français|
We have previously analyzed shape processing dynamics in macaque monkey posterior inferotemporal cortex (PIT). We described how early PIT responses to individual contour fragments evolve into tuning for multifragment shape configurations. Here, we analyzed curvature processing dynamics in area V4, which provides feedforward inputs to PIT. We contrasted 2 hypotheses: 1) that V4 curvature tuning evolves from tuning for simpler elements, analogous to PIT shape synthesis and 2) that V4 curvature tuning emerges immediately, based on purely feedforward mechanisms. Our results clearly supported the first hypothesis. Early V4 responses carried information about individual contour orientations. Tuning for multiorientation (curved) contours developed gradually over ~50 ms. Together, the current and previous results suggest a partial sequence for shape synthesis in ventral pathway cortex. We propose that early orientation signals are synthesized into curved contour fragment representations in V4 and that these signals are transmitted to PIT, where they are then synthesized into multifragment shape representations. The observed dynamics might additionally or alternatively reflect influences from earlier (V1, V2) and later (central and anterior IT) processing stages in the ventral pathway. In either case, the dynamics of contour information in V4 and PIT appear to reflect a sequential hierarchical process of shape synthesis.
Early visual representations are highly fragmented, with information distributed across densely sampled retinotopic mosaics. Object perception depends on synthesizing these fragmented signals into larger, more configural representations (Connor et al. 2007). The dynamics of this synthetic process may offer a window into the neural mechanisms underlying object perception. In a previous study, we analyzed the dynamics of object shape processing in posterior inferotemporal cortex (PIT), a later stage in the monkey ventral pathway containing neurons that respond selectively to shape configurations comprising multiple contour fragments (Brincat and Connor 2004). We discovered that PIT shape representations undergo a clear temporal evolution. Early PIT responses exhibit linear tuning for individual straight and curved contour fragments. These early responses transition gradually to nonlinear selectivity for multifragment configurations over the course of ~60 ms (Brincat and Connor 2006).
Here, we expanded on these previous findings by studying shape dynamics in area V4, which is antecedent to PIT in the ventral pathway hierarchy (Felleman and Van Essen 1991). V4 neurons have smaller receptive fields (RFs) and respond to simpler visual patterns than PIT neurons (Kobatake and Tanaka 1994). In particular, V4 neurons are simultaneously tuned for orientation and curvature of contour fragments (Pasupathy and Connor 1999, 2001, 2002; Hegde and Van Essen 2007; Yau et al. 2009). Contour fragment signals in V4 are comparable to early stage contour fragment responses in PIT (Brincat and Connor 2004, 2006). Based on these background findings in V4 and PIT, we postulated 2 alternative time courses for curvature tuning dynamics in V4 (Fig. 1). One possibility is that contour curvature representations are synthesized through dynamic network interactions. If so, we predicted that the time course of V4 shape processing would complement the PIT dynamics in 2 ways. First, we predicted that V4 tuning for contour fragments would evolve in an analogous way, from tuning for simpler elements (in the V4 case, oriented contours) toward tuning for more complex constructs (in the V4 case, multiorientation contour fragments, i.e., curves and angles). In this case, V4 neurons would respond transiently to individual orientation inputs (Fig. 1A). Responses to multiorientation (curved) contours would also have a transient early response (Fig. 1B, left, shading indicates early response period) dominated by sensitivity to individual orientations. This evolves into a later phase (Fig. 1B, right, shading indicates late response period) of selectivity for curvature sustained by feedback from other V4 neurons with similar tuning. Second, we predicted that the time course of curved contour fragment tuning in V4 would coincide with the time course of similar tuning in PIT, consistent with V4 being the source of early PIT responses. The other alternative is that V4 contour curvature tuning could depend on purely feedforward inputs from early visual processing stages (Fukushima 1980; Riesenhuber and Poggio 1999; Serre et al. 2007), analogous to the postulated basis of orientation tuning in V1 (Hubel and Wiesel 1962; Celebrini et al. 1993; Ferster and Miller 2000; Mazer et al. 2002). In this case, curvature selectivity would depend on a threshold nonlinearity, such that V4 neurons would not respond to individual subthreshold orientation inputs (Fig. 1C) but would respond to multiple orientation inputs (in curved contours) that together exceed the threshold (Fig. 1D). Given this scheme, even the earliest V4 response should be tuned for curvature (Fig. 1D).
We tested these hypotheses by characterizing shape tuning dynamics of isolated V4 neurons recorded from awake fixating macaque monkeys. The analyses described here are directly analogous to those used to characterize PIT shape tuning dynamics but adapted to the contour stimulus set used here to study V4. We measured neural responses to contour stimuli containing individual and multiple orientations (edges, angles, and curves). We analyzed neural response patterns using models designed to capture tuning for individual orientations and multiorientation combinations. We fit temporal weighting functions for these models to characterize tuning dynamics. We supplemented these model-driven analyses with model-free analyses based on analysis of variance (ANOVA) and vector strength. The results of all analyses were consistent with the curvature synthesis hypothesis: We observed rapid emergence of tuning for individual orientations followed by more gradual development of selectivity for orientation combinations. Furthermore, the time course of signals for multiorientation (curved) contour fragments in V4 closely matched the time course in PIT. Together, these results imply a partial sequence for dynamic synthesis of complex shape representations through successive stages of the ventral pathway.
We recorded extracellular action potentials from 127 well-isolated V4 neurons in the lower parafoveal representation on the surface of the prelunate gyrus and adjoining banks of the lunate and superior temporal sulci in 2 awake rhesus monkeys (Macaca mulatta) trained to maintain fixation within a 0.5°-radius window. We selected recording sites on the basis of skull landmarks, neural response properties, and inferred positions of sulci. Further details can be found in Pasupathy and Connor (1999), where nondynamic data from these experiments were reported. The neurons analyzed in this report comprise a subset of the V4 cells originally described in Pasupathy and Connor (1999) (see Supplementary Material). All animal procedures were approved by the Johns Hopkins Animal Care and Use Committee and conformed to National Institutes of Health and US Department of Agriculture guidelines.
The visual stimulus set comprised 6 contour fragment shapes: 45°, 90°, and 135° angles and curved B-spline approximations to these angles. These 6 shapes were presented at 8 orientations (45° intervals). In addition, straight contour fragments (lines) were presented at 4 orientations (45° intervals), yielding a total of 52 distinct stimuli (Fig. 2A).
Stimulus size was scaled according to average V4 RF size at the cell's eccentricity. Stimuli were rendered in the cell's optimum color against a gray background. Stimulus color and luminance were constant within the RF and faded gradually into the background gray over a distance equal to the RF radius. During each fixation trial, a sequence of 5 randomly chosen stimuli were flashed for 500 ms each, separated by a 250-ms interval with only the background present. The entire stimulus set was presented 5 times. Further details can be found in Pasupathy and Connor (1999).
For each stimulus, the response rate was calculated by summing spikes over the 500-ms presentation period and averaging across 5 repetitions. We first determined that evoked responses were significantly greater than baseline activity (one-tailed paired t-test, P < 0.05) in each of the 127 V4 neurons. We then fit contour tuning models to the response patterns of these V4 neurons.
We characterized each stimulus in terms of 2 component orientations, θ1 and θ2, and their relative position, θrp, defined as the angle between their midpoints. For the 90° angle stimulus, we measured orientation and position at 2 points on either side of the vertex, shifted back from the vertex by 0.5 x (RF radius) (see Fig. 1 in Pasupathy and Connor 1999). We measured orientations and positions at equivalent positions for all other stimuli.
For each neuron, we fit a linear/nonlinear contour tuning model designed to capture separately both tuning for individual orientation components and tuning for close conjunctions of disparate orientations, (i.e., for curvature). The model includes 3 linear subunits, 2 with Gaussian orientation tuning (L1 and L2), and 1 with Gaussian tuning for the relative position of the 2 preferred orientation components (Lrp). The total predicted response to each stimulus was given by a weighted sum of the responses of the 2 orientation-tuned subunits and the product of the 3 linear subunits,
where θ1 and θ2 are the contour component orientations nearest the L1 and L2 tuning peaks, respectively, θrp is the angular offset between the component orientations, w1 and w2 are estimated response weights for individual orientation subunits, wC is the estimated response weight for the product of the 3 subunits (i.e., the conjunction of the orientations), and b0 is the baseline response level. The Gaussian tuning peaks for the orientation and relative position subunits are μ1, μ2, and μrp, respectively. The baseline response and each weight coefficient are constrained to be less than or equal to the maximum observed response. The widths of the orientation-tuning functions (constrained to have the same width) and the relative position tuning function are σL and σrp, respectively. The overall contour tuning model thus has a total of 9 free parameters (μ1, μ2, μrp, σL, σrp, w1, w2, wC, and b0). These models were fit to each neuron's time-averaged (across the 500-ms stimulus duration) response pattern using a nonlinear least-squares algorithm (lsqnonlin; Matlab, Mathworks Inc.) to minimize the sum of squared differences between the observed and predicted responses.
This model provides a simple approach to contrasting our 2 hypotheses, since the linear terms capture separable tuning for orientations and the nonlinear multiplicative term captures tuning for curvature (configuration of 2 orientations) (Zetzsche and Barth 1990; Poirier and Wilson 2006; Gheorghiu and Kingdom 2009). The time-averaged model fits ranged from purely linear tuning for a single component orientation (i.e., selectivity for a single orientation) to predominantly nonlinear tuning for orientation configurations (i.e., selectivity for curvature). This range of V4 shape selectivity is consistent with previous reports (Motter 1994; McAdams and Maunsell 1999; Pasupathy and Connor 1999; Hegde and Van Essen 2007).
Statistical validity of models was tested with a 2-fold randomization cross-validation analysis. For each cell, stimuli were ranked by response level and divided into 2 groups (odd and even rank values). Thus, each group included a full range of high, moderate, and low response stimuli. Separate models were fit to the 2 stimulus groups, and each model was used to predict responses to the other stimulus group. We used an adapted permutation method (Manly 1997) to test for significant cross-prediction between groups. The null hypothesis was that the model did not explain response levels, and thus, correlations between predicted and observed responses were no higher than those expected by chance. The distribution of correlation values expected under this null hypothesis was generated by measuring between-cell correlations, (i.e., correlations between model predictions for one cell and observed responses for another) (mean correlation = 0.04; Supplementary Fig. S1A). Correlations between observed and predicted responses within cells (across the entire 127 neuron sample) were typically higher (mean correlation = 0.37; Supplementary Fig. S1A). Our threshold for statistically valid models was within-cell correlation above the 95% point of the null hypothesis distribution. This threshold defined the analysis sample of 62 neurons included in the temporal analyses. Validated models explained significantly more response variance than models that failed the validation procedure (2-sample t-test, P < 0.001; Supplementary Fig. S1B). The average variance explained across the 62 significant neurons was 59 ± 1% (mean ± standard error [SE]).
We computed instantaneous response rates by convolving the spike trains with a spike density function comprising 2 exponentials:
where and are the time constants for the growth phase and decay phase (5 and 20 ms, respectively). The time constants were chosen to fall within the range of values previously used to generate smoothing kernels (Thompson et al. 1996; Brincat and Connor 2006). The 8-ms offset served to center the peak of the resulting spike density function on each spike. Smoothing spike trains with asymmetric spike density functions yields more accurate instantaneous rate estimates by avoiding backward biasing (Thompson et al. 1996). Smoothed spike trains were averaged across 5 repetitions of each stimulus.
For 62 neurons with valid models, dynamic models were created by fitting temporal weighting functions to predict temporal response profiles. The temporal weighting functions were created by repeatedly refitting the linear weights (w1 and w2) and the nonlinear weight (wC) (holding the other model parameters constant) for successive 5-ms time bins. This approach assumes that each neuron's preferred stimulus features (i.e., orientation components) are unchanged over the response period and allows for changes in only the relative weighting of the individual components and their conjunction. We explicitly tested the hypothesis that contour tuning emerged from the synthesis of component orientation tuning by constraining temporal weighting functions to be greater than or equal to 0. This focus on excitatory tuning also served to facilitate comparison with our previous analysis of PIT response dynamics. To enable comparison of tuning dynamics across the V4 sample, we normalized the temporal weighting functions of each neuron by the maximum (across time) sum of the neuron's linear and nonlinear response components. We validated our model fitting procedures by applying them to simulated neural populations (Supplementary Fig. S2).
Peak linear and nonlinear component response latencies were defined as the first time bin during which 1) component response level exceeded its baseline (mean value from –50 to +50 ms relative to stimulus onset) by 70% of its baseline-to-peak range, 2) response variance explained by the model exceeded 20% of the maximum explained variance, and 3) component response exceeded 10% of total predicted response (sum of linear and nonlinear responses). Figure 5A shows all time points during which responses met these criteria.
The relative contributions of the linear and nonlinear components to the total response were quantified by a nonlinearity index, defined as NL/(L + NL), where L is the neuron's summed linear response components and NL is the neuron's nonlinear response component. This metric ranged from 0 (purely linear) to 1 (purely nonlinear). We used this index to classify neurons as predominantly linear (0–0.33), predominantly nonlinear (0.67–1), or mixed linear/nonlinear (0.34–0.66).
The degree to which response nonlinearity varied across time in cells with meaningful linear and nonlinear response components was given by
where LPeak and NLPeak are the peak latencies of a cell's linear and nonlinear response components. Negative transition values indicate nonlinear-to-linear response transitions, a value of 0 corresponds to no latency difference between response types, and positive values signify linear-to-nonlinear response transitions.
We compared the temporal evolution of linear and nonlinear responses in the population (Fig. 5C) by averaging component responses across V4 cells and computing the first time point at which each average component response exceeded baseline by 90% of its baseline-to-peak range. We used a randomization procedure to test the statistical significance of the observed time difference. Each neuron's response components were randomly assigned to the linear and nonlinear categories, and latency differences were recomputed. This process was repeated 50000 times to generate a distribution of latency difference values expected under the null hypothesis that linear and nonlinear latencies are equivalent. Observed latency differences were considered statistically significant if they exceeded 95% of the null hypothesis distribution (Manly 1997). To establish the robustness of these statistical results, we performed the same comparison with latency defined as 80% (instead of 90%) of baseline-to-peak range. In addition, we used a randomization Kolmogorov–Smirnov procedure to test whether the cumulative difference between the 2 curves was significant across the 0- to 200-ms time period. The same randomization procedure was used to generate a distribution of maximum cumulative difference between curves. The observed maximum cumulative difference was considered significant if it exceeded 95% of this null hypothesis distribution.
We supplemented the tuning model–based analysis of component orientation-tuning dynamics with a temporal analysis based on 2-way ANOVA with main effects of component orientations. This approach enabled us to estimate the degree to which 1 or 2 stimulus component orientations modulated response rates. Orientation values used to define each stimulus in the analysis were determined in the same manner as described for the contour tuning models (thus each stimulus was characterized by 2 component orientations). We characterized the evolution of orientation-driven response modulation for each neuron by conducting ANOVA at each time point.
We supplemented the tuning model–based analysis of curvature dynamics with a temporal analysis based on vector strength on the curvature direction dimension (Yau et al. 2009). At each time point, t, we computed a mean vector index for each cell:
where Ri is the response rate to the ith stimulus and θi is the curvature direction of the ith stimulus (the direction of a vector along the stimulus axis of symmetry pointing away from the interior of the angle) (e.g., a 90° angle stimulus pointing to the right has a θi value of 0). Larger DI values indicate stronger curvature selectivity. We tested the statistical significance of population DI values at each time point with a randomization test. At each time point, a mean population DI was determined from DI values calculated for each neuron based on responses randomized across stimuli within neurons. This process was repeated 1000 times to generate a distribution of population DI values expected by chance. For each time point, we considered the observed population DI value to be significant if it exceeded 95% of the values from the randomized distribution.
We studied 127 neurons in the V4 lower field representation of 2 macaque monkeys (43 cells from Monkey-V and 84 cells from Monkey-M). The results from analysis of each animal separately revealed no significant response differences (see Supplementary Material). A typical result is shown in Figure 2. Straight, angled, and curved stimuli (Fig. 2A) were flashed in the V4 RF in random order while the monkey performed a fixation task. Average response to each stimulus (during the 500-ms presentation period) is indicated by background gray level (Fig. 2A; see scale bar). We fit the neural response pattern with a linear/nonlinear model based on 2 orientation-tuning functions (Fig. 2A) and a function representing relative angular position of the 2 orientation components (see Materials and Methods). Predicted response was a weighted sum of the 2 orientation subunit terms (Fig. 2A, L1, L2) and a nonlinear term representing a configuration of those 2 orientations at the fitted angular offset (LrpL1L2). Thus, the nonlinear term corresponds to tuning for orientation combinations (i.e., curvature). The largest weight in this example was on the nonlinear term, which represented a configuration of orientations near 120° (blue) and 70° (cyan) with an offset angle of 260°, forming a shallow angle or curve pointing toward the left. Only one linear term had a nonzero weight in this case; across the sample, models ranged between this pattern and equal weighting for 2 linear orientation terms (Supplementary Fig. S3). The correlation between observed responses and responses predicted by this model was 0.74. The model fit was significant (P < 0.05) based on cross-validation (Supplementary Fig. S1A,B). Further analyses presented here are based on a subpopulation of 62 neurons for which tuning model fits were significant (P < 0.05; mean correlation = 0.77 ± 0.13). However, analysis of shape dynamics based on the entire V4 sample (127 neurons) yielded similar results and significance levels (Supplementary Fig. S4). Analysis restricted to only neurons exhibiting significant response modulation across the stimulus set (103 neurons) based on a 2-way ANOVA (stimulus shape × stimulus direction, main or interaction effects, P < 0.05) also yielded similar results and significance levels (Supplementary Fig. S5).
To analyze the fine-scale dynamics of neural tuning, we fit temporal weighting functions for the 3 model components (Fig. 2B). For this cell, the weights for the 2 linear terms (blue and cyan) grew rapidly, peaking near 100 ms after stimulus onset. (The linear term L1 had zero weight in the static model but nonzero weight in the temporal model near 100 ms.) The linear weights then declined as the nonlinear weight (red) developed, peaking near 175 ms. Thus, the responses of this cell evolved from initial linear (additive) tuning for individual orientations to subsequent nonlinear (multiplicative) selectivity for a specific configuration of those orientations (specified by the relative position function in the model). This kind of result was consistent with the hypothesis that curvature information is dynamically synthesized from early signals for individual orientation components.
The specific contributions of the linear and nonlinear response components across time are visualized in Figure 3. Each row shows response levels (background gray levels) for all stimuli at 5 time points. The top 3 rows show the response components predicted by the 2 linear and 1 nonlinear model components. The fourth row shows the sum of these components, that is, the total predicted response. The bottom row shows the observed responses. At early time points (75 and 100 ms after stimulus onset), the strong linear components predict responses for a large number of stimuli containing one or both orientations. The sum of these predictions (fourth row) is a characteristic response pattern with broad but distinct peaks that appear also in the observed responses (fifth row). At later time points (175 and 475 ms), the linear component predictions are much lower (top 2 rows) and the nonlinear component (third row) dominates, predicting responses to shallow curves and angles pointing left. This leftward-pointing curvature selectivity is evident in both the composite predicted responses (fourth row) and the observed responses (fifth row). The same temporal pattern is visible in the summed responses of stimulus subgroups containing the component orientations and leftward-pointing configurations (Supplementary Fig. S6).
In contrast to the Figure 2 example, some V4 neurons exhibited predominantly nonlinear selectivity throughout the response period. The neuron shown in Figure 4 responded strongly to acute leftward-pointing curves and angles. The model fit to this neuron's time-averaged responses had a strongly weighted nonlinear term representing configurations of orientations near 70° (blue) positioned directly above (at an offset angle of 90°) orientations near 140° (cyan). The correlation between observed and predicted responses was 0.88. Estimated temporal weighting functions indicated that the neuron's linear responses grew minimally while its nonlinear responses rose sharply at response outset and remained elevated for the duration of the stimulus period (Fig. 4B). Curvature selective responses appeared within 75 ms after stimulus onset and were evident in each of the subsequent temporal response snapshots (Fig. 4C). This type of result, apparent in approximately 10% of neurons with valid models (see Discussion), is consistent with the hypothesis that curvature tuning emerges immediately, based on a purely feedforward, threshold nonlinearity mechanism.
Population analysis showed that the predominant pattern was gradual transformation from component orientation signals to curvature tuning. Timeline plots (Fig. 5A) of linear and nonlinear response component peaks for each cell show that summed linear responses (blue) peaked earlier and nonlinear responses (red) peaked later. This held true for cells that were primarily linear (Fig. 5A, bottom), primarily nonlinear (top), and transitional (middle). Example time courses illustrate that linear cells responded early (Fig. 5B, bottom), nonlinear cells responded later (Fig. 5B, top), and transitional cells showed early linear tuning followed by delayed nonlinear selectivity (Fig. 5B, middle). These response patterns summed to produce a clear temporal pattern across the entire sample (Fig. 5C). The average linear weight (Fig. 5C, blue) peaked early, reaching 90% of maximum at 95 ms after stimulus onset, and then declined. Average nonlinear weight (Fig. 5C, red) grew more slowly, reaching 90% at 145 ms. The temporal offset between the linear and the nonlinear 90% points was significant (randomization test, P < 0.01). This difference was not dependent on a specific threshold; for example, the temporal difference between the linear and the nonlinear 80% points (85 and 115 ms, respectively) was also significant (randomization test, P < 0.01). In addition, the cumulative difference between the linear and the nonlinear curves over the 0–200-ms period was significant (2-sample Kolmogorov–Smirnov randomization test, P < 0.01). Model performance remained high throughout the response period as average explained variance increased with linear response growth and peaked with the nonlinear response peak (Supplementary Fig. S7).
To clarify the population trends, we classified neurons according to their nonlinearity index into groups exhibiting neural responses that were predominantly linear (n = 17), predominantly nonlinear (n = 20), and mixed linear/nonlinear (n = 25). (Although we defined 3 distinct neuron categories, our results indicate a continuous underlying distribution of response linearity across the population [Supplementary Fig. S1C], similar to the pattern observed in PIT [Brincat and Connor 2006].) These subpopulation temporal weighting functions also illustrated the transition from earlier linear responses to later nonlinear responses (Fig. 5C). In neurons exhibiting mixed linear/nonlinear responses, average linear weights peaked earlier than average nonlinear weights (85 and 145 ms, respectively, using the 90% threshold; randomization test, P < 0.05). (These mixed cell curves also differed significantly using the 80% threshold and in the cumulative difference test [randomization test, P < 0.05].) The response curves for primarily linear and primarily nonlinear cells had significantly different 80% and 90% threshold times (randomization test, P < 0.05), but their cumulative difference, which revealed a trend for delayed nonlinear responses, was not significant, possibly due to the lower number of neurons involved in the comparison. Together, these results show that both within- and between-cell temporal differences contribute to the observed V4 population dynamics.
We also analyzed linear and nonlinear peak times of individual cells at the population level. We characterized response dynamics within neurons using a transition index (Fig. 6). Positive values signify transition from linear to nonlinear selectivity. The average transition value was 0.52 ± 0.18 (mean ± SE), which was significantly greater than 0 (1-sample 1-tailed t-test, P < 0.005), indicating that response transitions from linear to nonlinear selectivity were stronger and more common than the reverse across the population. (The existence of transitions in the opposite direction may reflect the inherent noisiness of the transition index due to random fluctuations in weighting function values through time [Fig. 5A].) Consistent with this, the distribution of nonlinear 90% peak times across individual neurons (Fig. 7, red) was significantly delayed relative to linear (blue) peak times (2-sample Kolmogorov–Smirnov test, P < 0.01). Thus, the transformation from linear orientation tuning to nonlinear curvature tuning was evident at the level of individual neurons.
The results described above depended on fitting Gaussian tuning models to the data. To ensure that our results were robust, we also characterized V4 response dynamics using more general analyses that do not rely on fitted tuning models (Fig. 8). These analyses corroborated the results based on Gaussian tuning models. We used 2-way ANOVA to estimate response modulation by the 2 component orientations in each stimulus. This ANOVA accounted for the most response variance (across the V4 population) at 115 ms after stimulus onset. We used vector strength as a measure of curvature tuning. Vector strength should be low during periods of component orientation tuning, which produces a dispersed pattern of responses to multiple curvature directions. Vector strength should be high only during periods of tuning for a restricted range of curvature directions. Vector strength peaked at 170 ms, substantially later than the peak time for orientation selectivity indicated by ANOVA. The 55 ms disparity between orientation and curvature tuning peaks is consistent with our analyses based on model fitting. Thus, the response dynamics, we observed are robust and do not depend on a specific analysis.
Our analyses revealed a dynamic transformation of shape information in area V4 during the period immediately following stimulus onset. Early V4 responses, beginning around 50 ms post-onset, were dominated by linear tuning for orientation, which peaked and began to decline around 100 ms poststimulus onset. Nonlinear selectivity for orientation configurations emerged more gradually, peaking near 150 ms. This transformation was evident in the evolving response profiles of individual neurons as well as at the population level. The transition from early tuning for separate orientation components to later selectivity for orientation configurations was also supported by temporal analyses that did not depend on fitted tuning models. These results imply a dynamic process for synthesizing individual orientation signals into representations of larger, more complex contour fragments characterized by orientation change (curvature).
Transformation from orientation (a first-order derivative of contrast boundary position) to curvature (a second-order derivative) at the V4 level makes sense in the context of object information processing (Connor et al. 2007). Extraction of local orientation in primary visual cortex (V1) is the first critical step in transforming retinal activation patterns into useful representations of objects (Connor et al. 2007). On the scale of V1 RFs (fractions of a degree), retinal images contain many smooth extended contrast boundaries formed by object edges and other natural object features. The local orientation/spatial frequency transformation in V1 is optimal for sparse representation of natural images on this scale because it captures this statistical regularity (Olshausen and Field 1996; Vinje and Gallant 2000). On the larger scale of V4 RFs (several degrees), contrast boundaries are less likely to be smooth and straight because object edges and other contour features undergo more orientation changes on this scale. Neural tuning in V4 reflects this larger scale statistical structure, in that many V4 neurons are tuned for curvature (i.e., orientation changes along contours) (Pasupathy and Connor 1999, 2001). Contour curvature is a critical element of form perception to which human observers are extremely sensitive (Treisman and Gormican 1988; Wolfe et al. 1992; Chen and Levi 1996; Wilson et al. 1997; Habak et al. 2004; Ben-Shahar 2006; Gheorghiu and Kingdom 2007, 2008; Haushofer et al. 2008; Bell et al. 2009).
An obvious hypothesis is that local orientation signals first extracted in V1 are subsequently combined to generate V4 representations of larger contour regions encompassing multiple orientations. With a few exceptions (see below), the predominant pattern at the single neuron and population levels was gradual emergence of curvature tuning after initial selectivity for component orientations, implying a dynamic process for synthesizing curvature information. The 50-ms time course of this transition is consistent with recurrent network models in which initially linear summation of signals for simpler elements evolves toward nonlinear selectivity for specific element combinations (Salinas and Abbott 1996; Chance et al. 1999; Brincat and Connor 2006). A similar time course (~60 ms) has been reported for the evolution of composite orientation pattern motion responses in dorsal pathway middle temporal area (Pack and Born 2001; Pack et al. 2001; Smith et al. 2005). Curvature representation in the ventral pathway and pattern motion detection in the dorsal pathway both require some way of integrating local orientation signals (Zetzsche and Barth 1990; Simoncelli and Heeger 1998; Poirier and Wilson 2006; Rust et al. 2006; Hancock and Peirce 2008; Gheorghiu and Kingdom 2009; McGovern et al. 2011). Conceivably, these 2 pathways implement analogous recurrent network mechanisms for orientation integration. Network processes may effectively result in the multiplication of feedforward orientation signals, a specific type of nonlinearity essential to some models of curvature processing (Zetzsche and Barth 1990; Gheorghiu and Kingdom 2009).
Curvature tuning in a small subset of V4 neurons emerged immediately after stimulus onset (Fig. 4), possibly reflecting response selectivity based on a purely feedforward mechanism. Among 20 neurons classified as predominantly nonlinear, just 3/20 had peak times occurring before 95 ms, the average linear response peak time (Fig. 5C). Among 25 neurons classified as mixed linear/nonlinear, just 3/25 had nonlinear response peak times occurring before 95 ms. Thus, the vast majority of individual response patterns as well as the population dynamics were consistent with gradual emergence of curvature tuning following initial orientation selectivity. If the rare, early curvature-tuned responses were the basis for most curvature tuning in V4, propagation to other V4 neurons should be rapid. Instead, the 50-ms time course for development of curvature tuning in most of the V4 population reflects a time-consuming recurrent processing mechanism in which information about orientation conjunctions is gradually reinforced (Brincat and Connor 2006).
Orientation information is still present during the late response period, especially among primarily linear neurons (Fig. 5A, bottom) and thus remains available for transmission to higher processing stages. Unambiguous information about curvature is carried by primarily nonlinear neurons (Fig. 5A, top). Mixed linear/nonlinear neurons carry both types of information during late stage responses (Fig. 5C). This must reflect a network structure in which the relative degree of feedforward and recurrent input varies continuously across the neural population (Salinas and Abbott 1996; Chance et al. 1999; Brincat and Connor 2006). While the ultimate goal may be pure orientation and curvature signals, carried by primarily linear and nonlinear neurons, respectively, biological implementation of a multiplicative operation also produces mixed signals.
The processing stage subsequent to V4 in monkey ventral pathway is PIT. Many neurons in PIT represent configurations of multiple contour fragments (Brincat and Connor 2004). The dynamics of multifragment tuning in PIT (Brincat and Connor 2006) are analogous to the dynamics of V4 curvature tuning observed here. Linear tuning for individual contour fragments emerges early in PIT, peaking near 150 ms. Nonlinear selectivity for multifragment configurations evolves more slowly, peaking near 200 ms. As in V4, many individual PIT neurons transition from early exclusive linear tuning to subsequent linear and nonlinear selectivity.
The overall temporal pattern in V4 and PIT suggests a sequential process for shape synthesis (Fig. 9) along the following lines. Signals for individual orientations emerge first (Fig. 9, V4 linear). These signals are gradually synthesized into representations of contour fragments comprising multiple orientations (Fig. 9, V4 nonlinear). Tuning for such contour fragments emerges in PIT with a similar time course (Fig. 9, PIT linear), based on V4 inputs. Further synthesis in PIT generates representations of multifragment shape configurations (Fig. 9, PIT nonlinear). We favor the hypothesis that curvature synthesis occurs in V4 and configural synthesis occurs in PIT. However, our results could additionally or alternatively reflect synthetic processes occurring in other regions that provide inputs to V4 and PIT. In particular, there is evidence for integration of orientation signals in V1 and V2 (Hubel and Wiesel 1965; Dobbins et al. 1987; Hegde and Van Essen 2004; Ito and Komatsu 2004; Anzai et al. 2007), although these integrative mechanisms do not appear to result in unambiguous tuning for specific curvatures.
The time course of shape signals in V4 and PIT measured here is compatible with neural and psychophysical analyses of how object identity information evolves through time. Coarse categorization (e.g., animals vs. cars vs. faces) appears to occur within 100–150 ms of stimulus onset. For example, human observers presented with 2 images can initiate a saccade to the image containing an animal within 120 ms (Kirchner and Thorpe 2006). Evoked response potentials differentiating between animals and nonanimals can be observed at 150 ms following stimulus onset (Thorpe et al. 1996). Network analyses of human and monkey temporal lobe neural activity can distinguish basic categories near 100 ms post-onset (Hung et al. 2005; Liu et al. 2009). Our results show that during this 100- to 150-ms period, contour fragment signals in V4 and PIT are rising from 50% to 100% of maximum, and multifragment configuration signals are rising from 35% to 70%. This level of shape information might be sufficient to support coarse categorization.
Finer categorization or identification requires more time (Kourtzi and Huberle 2005; Scharnowski et al. 2007; Schendan and Kutas 2007; Fahrenfort et al. 2008; Hegde 2008; Martinovic et al. 2008; Akrami et al. 2009) and thus may depend on further evolution of complex shape constructs in PIT and subsequent ventral pathway stages after 150 ms. For example, discriminating particular classes of animals (dogs vs. nondogs and birds vs. nonbirds) requires 45–60 ms longer than discriminating animals from nonanimals (Mace et al. 2009). Discriminations based on part configurations (comparable to the multifragment constructs evolving in PIT) require 30–50 ms longer than discriminations based on shapes of individual parts (Wolfe and Bennett 1997; Arguin and Saumier 2000). Neural activity pattern analyses require ~50 ms more time to distinguish individual faces than to distinguish faces from nonfaces (Sugase et al. 1999; Liu et al. 2002). Thus, time-consuming synthesis and refinement of configural shape representations in temporal cortex may provide the essential information for discriminating individual objects within general categories. A reasonable working hypothesis is that early fragmentary representations support rapid categorization of ecologically important categories, while subsequent configural representations support discrimination of individual objects. The next step in establishing this hypothesis would be to demonstrate precise temporal relationships between successive stages of shape representation and corresponding levels of object discriminability.
U.S. National Institutes of Health (R01EY11797 to C.E.C. and F31NS062511 to J.M.Y.).
We thank E. Carlson, C. Moses, S. Patterson, and H. Dong for technical support. We thank S. Kim for comments and suggestions. Conflict of Interest: None declared.