|Home | About | Journals | Submit | Contact Us | Français|
To understand sensory encoding and decoding, it is essential to characterize the dynamics of population responses in sensory cortical areas. Using voltage sensitive dye imaging in awake, fixating monkeys, we obtained complete quantitative measurements of the spatiotemporal dynamics of V1 responses over the entire region activated by small, briefly-presented, stimuli. The responses exhibit several complex properties: they begin to rise approximately simultaneously over the entire active region, but reach their peak more rapidly at the center. However, at stimulus offset the responses fall simultaneously and at the same rate at all locations. While response onset depends on stimulus contrast, both the peak spatial profile and the offset dynamics are independent of contrast. We show that these results are consistent with a simple population gain-control model that generalizes earlier single-neuron contrast gain-control models. This model provides valuable insight and is likely to be applicable to other brain areas.
Small visual stimuli elicit neural responses that are distributed over a large area in the primate primary visual cortex (V1; e.g., Hubel & Wiesel, 1974; Grinvald, Lieke, Frostig, & Hildesheim, 1994), suggesting that even small stimuli are encoded by a large population of neurons in V1. Furthermore, electrophysiological studies in behaving primates suggest that perception is mediated by a population of neurons rather than by single neurons (Parker & Newsome, 1998; Purushothaman & Bradley, 2005). Thus, to understand the encoding and decoding of visual stimuli in the cortex, it is important to characterize the properties of V1 population responses.
One approach is to estimate population responses from single neuron responses. Single unit recordings in V1 have revealed a number of fundamental properties that ought to contribute to the population responses. First, single neurons have receptive fields with a substantial spatial extent that increases rapidly as a function of retinal eccentricity (Hubel & Wiesel, 1974; Van Essen, Newsome, & Maunsell, 1984). Second, the response amplitude of single neurons increases nonlinearly with contrast, typically reaching response saturation at low to modest contrasts (Albrecht & Hamilton, 1982). Third, the tuning of single neurons is typically invariant with contrast, even in the saturated response range (Albrecht & Hamilton, 1982; Albrecht & Geisler, 1991; Heeger, 1991, 1992). Fourth, the latency of the response of single neurons decreases as a function of stimulus contrast (Dean & Tolhurst, 1986; Carandini & Heeger, 1994; Albrecht, 1995). Although these properties are common in V1 neurons, there is a vast heterogeneity among the neurons, and thus it is unclear how these properties are combined and manifested at the population level. In addition, single-unit and multiple-unit studies in V1 have focused mainly on responses at or near the center of activity produced by the stimulus. Responses at locations more peripheral to the center of activity are largely unknown.
Here we provide a complete quantitative description of the real-time spatiotemporal dynamics of V1 population responses to a small, briefly-presented (200 ms), localized stationary visual stimulus. Most measurements of response properties in V1 have been performed using drifting stimuli with relatively long durations (several seconds) to approximate a steady-state condition. However, natural saccadic inspection of a visual scene typically produces transient stimulation: 200–300 ms fixations separated by rapid eye movements. In addition, while it is common to analyze cortical responses by their peak responses and latencies (phases) for drifting stimuli, the falling edges of the responses can potentially provide useful information for briefly-presented stimuli (Bair, Cavanaugh, Smith, & Movshon, 2002). Thus, to fully understand the properties of the population responses under natural conditions, it is important to measure the complete time courses of responses to briefly-presented stimuli.
We used voltage sensitive dye imaging (VSDI; Grinvald & Hildesheim, 2004) in alert, fixating monkeys, to measure population responses in the superficial layers of macaque V1 over an area of approximately 1 cm2. The imaged area covered the entire region activated by the small local stimulus. We found several unexpected properties that are not obvious from single unit responses: First, the spatial profile of the peak response is independent of stimulus contrast. Second, responses start to rise at all locations approximately at the same time, but rise at a faster rate at the center of activity than at peripheral locations. Third, both the latency and steepness of the rising edge of the response depend on stimulus contrast. Finally, after stimulus offset, the responses at all locations fall simultaneously and at the same rate, regardless of stimulus contrast. These complex properties illustrate the importance of quantitative characterization of population responses in both space and time.
Next, we considered whether there is a general mechanism that can account for these rich dynamics. To do this, we explored several well-known families of computational models. We find that the observed response properties are inconsistent with simple models having a fixed linear operation followed by a nonlinearity that operates within the individual neuron (e.g., a spike threshold or a refractory effect), and with models that use slow lateral connections to explain the difference in the response dynamics at different cortical locations. To account for the observed properties in both time and space, we propose a simple feedforward population gain-control (PGC) model that generalizes earlier normalization models for single V1 neurons (Albrecht & Geisler, 1991; Heeger, 1991, 1992; Carandini & Heeger, 1994; Carandini, Heeger, & Movshon, 1997). In this model, the temporal dynamics and the gain of the local responses are controlled by population activity in the network rather than by the nonlinear properties of individual neurons or synapses. We simulated the early visual pathway from the retina to V1 by a two-stage PGC model. The model’s dynamics closely resemble those in the data.
Responses in the retina and LGN also show evidence of nonlinear contrast gain-control (Shapley & Victor, 1978; Sclar, Maunsell, & Lennie, 1990; Kaplan & Benardete, 2001), and thus it is an open question whether a significant part of the nonlinearity in V1 responses is inherited from its input. To address this important question, we used the PGC model to predict how the relative contributions of nonlinearities within layers 2–3 in V1 versus its inputs affect the relationship between stimulus size and V1 response amplitude. The results from a VSDI experiment varying stimulus size were consistent with a PGC model in which most of the nonlinear processing occurs in the first stage, suggesting that the nonlinearities observed in the VSDI responses may be mostly implemented prior to the superficial layers in V1.
In summary, our results illustrate the value of quantitative analysis and computational modeling in testing hypotheses regarding the biophysical and anatomical factors underlying neural population activity. We characterized the spatiotemporal properties of the V1 population responses to small, briefly-presented, stimuli that are relevant in natural vision, and found that the response dynamics are complex and unexpected from the results in single unit recordings. Importantly, we show that a simple PGC model can qualitatively account for the complex dynamics, suggesting that gain-control is likely to be a general mechanism contributing to neural dynamics in the brain.
We used VSDI to measure V1 population responses to a briefly presented stationary Gabor stimulus while a monkey was performing a fixation task. The goal was to characterize the spatiotemporal dynamics of the population responses over the entire area activated by a small Gabor stimulus. The spatiotemporal dynamics of the fine-scale columnar signals, which are modulated by the orientation of the stimulus, were not measured in the current study and will be addressed in future studies.
The peak responses were taken by averaging the responses over a fixed 100 ms window (160 to 260 ms after stimulus onset; shaded region in Figure 2a). The spatial distribution of the peak responses at two contrasts are shown in Figures 1a and b. The distributions are well fitted by two-dimensional Gaussians (Figure 1c). The Gaussians are elongated because of the anisotropic mapping of visual space in V1 (Van Essen, et al., 1984; Blasdel & Campbell, 2001; Yang, Heeger, & Seidemann, 2007). Importantly, as shown in our previous study (Chen, Geisler, & Seidemann, 2006; 2008), the widths of the Gaussian fits are not significantly different across different contrasts for both the major axis ( σ = 2.1 mm ) and minor axis ( σ = 1.8 mm ) (one-way ANOVA, p > 0.1; Figure 1d and 1e). The widths of the spatial distributions are hence largely contrast-invariant.
We have previously shown that this spread is not significantly affected by the small variability in the monkey’s eye position (Chen, et al., 2006). The width of the stimulus (0.167°) maps to only ~0.5 mm on the cortex through the cortical magnification factor alone (3 mm/deg; obtained in the same animal from retinotopy measurements in another experiment (Palmer, Chen, & Seidemann, 2008)). The ~2 mm wide spatial profiles is mainly due to the size and scatter of V1 receptive fields (Hubel & Wiesel, 1974; Dow, Snyder, Vautin, & Bauer, 1981; Van Essen, et al., 1984), which dictate the cortical point image (McIlwain 1986). In addition, some of this widening could reflect significant lateral spread of activity through horizontal connections in V1 (Gilbert & Wiesel, 1979; Rockland & Lund, 1983; Martin & Whitteridge, 1984) and significant contribution from feedback connections (Angelucci, Levitt, Walton, Hupe, Bullier, & Lund, 2002). One goal of the modeling component of the current study was to examine the possible contribution of these distinct mechanisms to response spread in V1.
Figure 1f shows the average peak response over a circular region of 0.5 mm radius at the center of the response profile (central region outlined in Figure 1c) as a function of stimulus contrast. Similar to single units, the responses follow a sigmoidal function on a log contrast axis; the solid curve is a Naka-Rushton function fitted to the data (r2 = 0.98).
To analyze the response properties at different locations in V1, the imaging pixels were divided into small bins according to their distances from the center of activity. The image was first divided into 0.5 mm wide concentric annular regions, centered at the peak of the spatial response, with the central region being a disc of 0.5 mm radius. The pixels in the central region had an average distance of 0.25 mm from the center, and the average distance increased by 0.5 mm in each annulus. Due to the anisotropic response profile, we considered only the pixels within a 1 mm wide strip along the major axis of the fitted Gaussian profile. Within this strip the relationship between distance and amplitude was nearly constant. The temporal responses of the pixels within each annulus that were also inside of the strip were averaged to produce a single time course for the corresponding distance. Figure 1c shows the bins up to an average distance of 2.75 mm; responses at greater distances were not analyzed because they were weak and noisy, especially at lower contrasts.
Figure 2a shows the average time courses of the responses at the center bin for different stimulus contrasts. To quantitatively characterize these time courses, they were first divided into two parts. The first part, defined as the rising edge, was the response in the first 210 ms after stimulus onset. The rest of the time course was defined as the falling edge. Each individual edge from each trial was smoothed by a five-frame moving average, normalized, and then fitted separately with a logistic function 1/(1 + exp(λ(t – t50))) (e.g., Figure 2b). The parameter λ describes the slope of the response, and t50 is the time that the response reaches half of its peak. For example, a λ of 0.05 means that the response takes about 44 ms after t50 to reach 90% of the peak. The same fitting procedure was applied independently at the different locations shown in Figure 1c for each stimulus contrast. We also define the rising edge latency as the time after stimulus onset ( t10 ) for the fitted response to reach 10% of its peak amplitude. Similarly, the latency of a falling edge is the time to decrease by 10% from the peak after stimulus offset. The duration of the rising and falling edges is at least 50 ms (5 frames; see Figure 2a), hence there are enough samples for fitting the logistic functions, which provide a good description of the response dynamics.
Figure 2c shows the latencies of the rising and falling edges in the center bin as a function of contrast. As observed in single neuron studies (Dean & Tolhurst, 1986; Carandini & Heeger, 1994; Albrecht, 1995) and Figure 2b, the latency of the rising edge decreases as stimulus contrast increases (one-way ANOVA, p < 0.01). On the other hand, there is no significant difference in the falling edge latencies for different contrasts (one-way ANOVA, p > 0.15). Asymmetric properties between rising and falling edges are also observed in their slopes (Figure 2d). The slope of the rising edge increases as contrast increases (one-way ANOVA, p < 0.01), while the slope of the falling edge remains approximately constant (one-way ANOVA, p > 0.15). In other words, the rising edge is accelerated in both latency and slope as contrast increases, while the falling edge is largely independent of contrast.
The space-time color plots in Figure 2e summarize in a compact form all the temporal responses as a function of stimulus contrast and position. The normalized fitted responses for each contrast are shown as separate subplots; within each subplot the time course of the response at each of the 6 location bins from Figure 1c is indicated by a horizontal row, progressing from the center location at the top to the most peripheral location at the bottom. For example, the upper horizontal row in the plot for 100% contrast corresponds to the dark blue curve in Figure 2b. Several qualitative observations can be made from these maps (they will be quantified later). For each contrast, (1) the response latencies at different locations are approximately equal, as can be seen by the vertically aligned transitions from blue to cyan in each map, and (2) the response rises at a slower rate as distance from the center increases, as can be seen by the increase in the tilt of the transition between the colors as the normalized amplitude increases. In addition, for each location, as contrast increases (3) response latency decreases, and (4) the response rises at a faster rate. Finally, (5) after stimulus offset, the falling edges are similar for all locations and contrasts. All these key properties were also observed in additional experiments in a second animal. We next examine these properties quantitatively.
The above observations can be quantitatively evaluated using the logistic fits obtained from individual trials. Figure 3a plots, for each stimulus contrast, the rising edge latency ( t10 ) as a function of distance from the center bin. For each contrast, there was no significant difference in the latencies at different locations (one-way ANOVA, p > 0.1 for all contrasts). However, as observed at the center bin, latency of the rising edge decreased as stimulus contrast increased at all locations (one-way ANOVA, p < 0.01 for all locations). These results confirm observations (1) and (3).
Figure 3b plots the same latency data in Figure 3a, but as a function of the peak response. For the same response amplitude, the latency can be different at different stimulus contrasts (e.g., see latencies at a ΔF / F = 0.065). This important result demonstrates that the dynamics of the response at a given location in V1 do not depend solely on the local response amplitude, but rather, depend on the response amplitudes over a larger region. We will revisit this key property when we discuss possible models of V1 activity.
The rate at which the response rises ( λ ) depends on both stimulus contrast and on location (Figure 3c). For a particular contrast, the slopes decrease significantly as distance from the center increases (one-way ANOVA, p < 0.01 for all contrasts), confirming observation (2). In addition, at a fixed location, the slope of the rising edge increases with contrast (one-way ANOVA, p < 0.01 for all locations), supporting observation (4). Furthermore, the slope also increases with peak response with a correlation coefficient of 0.94 (Figure 3d).
Due to the decreasing slope as a function of distance from the center, the time to half of the peak response ( t50 ) increased at locations peripheral to the center of activity. If t50 was employed as a measure of latency, a traveling wave of activity would appear to be originating from the center (Figures 3e and f), as observed previously in anesthetized animals (Grinvald, et at., 1994; Jancke, Chavane, Naaman, & Grinvald, 2004; Benucci, Frazor, and Carandini, 2007). The average difference of the time to half peak between the locations 0.25 mm and 2.75 mm away from the center was 6.2 ms. This difference corresponds to a propagation speed of 0.4 mm/ms (for t50), which is at the higher end of the speed of propagation through lateral connections (0.1–0.4 mm/ms; Hirsch & Gilbert, 1991; Murakoshi, Guo, & Ichinose, 1993; Grinvald, et al., 1994; Nelson & Katz, 1995; Gonzalez-Burgos, Barrionuevo, & Lewis, 2000). As we shall see later, such differences in time to half of the peak can be explained by a feedforward population gain-control (PGC) model.
The dynamics of the falling edges of the responses differed markedly from those of the rising edges. As shown in Figure 4, both the latency and slope were independent of contrast and location (two-way ANOVA, p > 0.15 across contrasts and locations for both latency and slope). The responses at all locations therefore fell approximately all at once and at the same rate, regardless of the stimulus contrast and response amplitude, supporting observation (5). The latency was slightly longer ( t10 = 65 ms) and the slope shallower ( λ = 0.026) than those of the rising edges. Such asymmetry in the temporal properties of the falling and rising edge can also be explained by the PGC model.
While the falling edge latency is longer than the rising edge for all contrasts, the reverse relationship has been observed in the firing rates of single units (Bair, et al., 2002). The apparent discrepancy is consistent with the fact that the VSDI measures membrane potentials (Grinvald & Hildesheim, 2004). Because of spike threshold, the onset of spiking activity will lag behind the rise in the VSDI response. On the other hand, for the falling edge the drop in spiking activity will coincide with the drop in the VSDI response (as long as membrane potential is above threshold). Thus, for spikes it is quite possible for the onset latency to be greater than the offset latency. Consistent with this possibility, we recently found that the threshold for observing significant spiking activity in V1 to be about 30–40% of the maximal VSDI response (Palmer, et al., 2008).
Is there a simple functional model that can account for these complex spatiotemporal dynamics of V1 population responses? An obvious starting point is to consider previous models that have been proposed to account for the response properties of single neurons. Here we consider three families of such models, scaling them up for population responses by regarding each VSDI pixel as a “single unit” that has the average properties of the neural population falling under that pixel.
A well known model of single neuron responses is the LN model, which consists of two components: an initial linear weighting function or linear filter (L) followed by a static nonlinearity (N) such as spike threshold or response saturation due to refractory effects. This model is popular because it is relatively simple and easy to analyze. For example, when the model is valid then the receptive field estimated using spike-triggered averaging will equal the initial linear weighting function. Unfortunately, this simple class of models is inconsistent with several key properties of the spatiotemporal dynamics of V1 population responses. First, these models predict that the spatial profile of the responses should widen and change shape as stimulus contrast is increased. This is inconsistent with the observed spatial profiles (Figure 1e). Second, these models predict a longer latency for the falling edge of the response at high contrast, which is not observed in the VSDI responses (see Figure 4). Finally, in an LN model the dynamics of the response are tied to response amplitude, yet V1 responses show clear decoupling of amplitude and latency (Figure 3b). For simulations of the LN model’s predictions, see Supplementary Materials 1.
The time to half peak ( t50 ) of the rising edges in the VSDI responses increases with increasing distance from the center of the active region. A common view about such delay is that it results from propagation through slow lateral connections. If lateral spread was indeed the only source of the response beyond a critical distance from the center, then beyond this critical distance the response latency ( t10 ) should increase linearly as a function of distance, which should be evident because of their relatively slow propagation speed (Hirsch & Gilbert, 1991; Murakoshi, et al., 1993; Grinvald, et al., 1994; Nelson & Katz, 1995; Gonzalez-Burgos, et al., 2000). However, there was no significant difference in the rising edge latency of the VSDI responses for a wide range of distances (Figure 3a). In fact, for 50% and 100% contrast stimuli, rising edge latencies remained the same up to a distance of 3.25 mm (within which a reasonably good fit to the data could be obtained; data not shown). These results and additional simulations (see Supplementary Materials 2) suggest that within the examined range the observed dynamics are not consistent with a significant contribution of slow lateral connections to the observed response spread. Below we show that the simple feedforward PGC model can account for the observed dynamics.
Normalization gain-control models have been used to account for many nonlinear properties of single unit responses in the LGN and V1 (Albrecht & Geisler, 1991; Heeger, 1991, 1992; Carandini & Heeger, 1994; Carandini, et al., 1997, Mante, Bonin, & Carandini, 2008). In particular, this family of models can explain response saturation (Albrecht & Hamilton, 1982), contrast-invariant tuning (Sclar & Freeman, 1982; Skottun, Bradley, Sclar, Ohzawa, & Freeman, 1987; Albrecht & Geisler, 1991), and phase advance of response at high stimulus contrasts (Carandini, et al., 1997). As we have seen, these properties are also observed in the VSDI responses. Thus, normalization models appear to be more promising than the other two families of models. However, an important question is whether normalization models can account for the other properties observed in the VSDI data, especially the changes in the rising edge at different locations and the invariance of the slope and latency of the falling edge. In the next two sections, we show that a generalization of the normalization model can qualitatively account for all the spatiotemporal properties of the VSDI responses.
The PGC model is a generalization of earlier single-neuron normalization models (Albrecht & Geisler, 1991; Heeger, 1991, 1992; Carandini & Heeger, 1994; Carandini, et al., 1997, Mante, et al., 2008). In contrast to these early models, the PGC model aims at explaining the responses of the entire active neural population in V1 in both time and space.
In describing the PGC model, we focus on determining the predicted responses to simple Gabor patches like those used in the present experiments. To further simplify the discussion we describe a one-dimensional version of the model, which represents the collapsed data along the major axis (x axis) of the VSDI response profile (see the black rectangular region in Figure 1c). Each layer in the model therefore contains an array of units indexed by x, where each unit represents the average activity of the small neural population within a pixel in the VSDI image. Extension to a full two-dimensional model is straightforward but will not be discussed here.
The visual pathway is represented by a network consisting of an input layer and two stages in Figure 5a. Since VSDI responses in V1 are largely determined by the contrast of the Gabor stimulus, independent of its specific orientation and phase, the stimulus is represented in the input layer by its Gaussian contrast envelope, with magnitude directly proportional to the stimulus contrast. A fixed time delay is also added.
The first stage in the model represents the nonlinear processing that occurs in the retina, LGN, and layer 4 in V1. While it will be more realistic to model each of these areas individually, there is not enough experimental data at the population level to provide sufficient constraints. The second stage represents layers 2–3 in V1 where the VSDI signals are measured. The units within each stage are identical and implement the filtering and normalization circuit illustrated in Figure 5b.
At each stage of the model there is an initial step that represents receptive field summation and normalization pooling. The spatial receptive field of each unit x has a Gaussian weight profile G(x) centered at its location, and the spatial normalization pool has a Gaussian weight profile H(x) centered at the same location. The result of the receptive field summation step, A(x,t), then passes through a resistor-capacitor (RC) circuit, whose conductance is controlled by a normalization pool (Figure 5b; Carandini, et al., 1997; Mante, et al., 2008). As Carandini et al. point out, in such model the conductance will affect both the gain and the response latency. The voltage across the capacitor, V(x,t), which represents the membrane potential, is the response of the unit.
The key property of the model is that the conductance, g(x,t) > 0, of the resistor at each unit is not fixed but increases from a baseline value as a function of the weighted average over a local region of the input. For a static current, i.e., A(x,t) = A(x), the steady state of the voltage across the capacitor is V(x) = A(x) / g(x). In other words, the gain of the circuit is the inverse of conductance, and therefore the conductance has a divisive (normalizing) effect on the output of the receptive field summation. The input region that contributes to g(x,t) is called the normalization pool. The overall strength of normalization activity is controlled by a scale factor on the output of the normalization pool.
The response at each model pixel represents membrane potential, which dominates the VSDI responses. Since neurons communicate through spikes, the responses in the first stage must be converted into spikes that the second stage receives. In a recent study we found that the VSDI responses are related to spiking activities by a power function (Palmer, et al., 2008). A similar relationship has been found between average membrane potential and spike rate in single unit recordings (Anderson, Lampl, Gillespie, & Ferster, 2000; Finn, Priebe, & Ferster, 2007). A fixed power function is thus applied to the responses in the first stage and the results are fed into the second stage as inputs.
While feeding the stimulus represented in the input layer directly into the first stage provides a reasonably good fit to the data, the predictions shown here are for a model with a second fixed power function applied to the activity in the input layer. This initial nonlinearity is plausible given the accelerating point nonlinearities seen in the earliest levels of the visual system; e.g., the nonlinear relationship between the membrane potential of the photoreceptors and their rate of glutamate release (Witkovsky, Schmitz, Akopian, Krizaj, & Tranchina, 1997). The model’s general behavior is not affected by this nonlinearity.
One way to understand the dynamics of the model is through the time constant of the RC circuit of each unit (Figure 5c). Note that the time constant and the gain of the circuit are inversely related to conductance. When the normalization activity in a unit is high, the conductance is large and the time constant and gain are small. This property can account for much of the dynamics observed in the VSDI responses. At a particular unit, when the stimulus contrast is high, the receptive field summation and hence the normalization activity is large in all the neighboring units, resulting in faster dynamics (Figure 5c). This property is consistent with the observed dynamics in the rising edges of the VSDI responses. In addition, for a Gabor stimulus, the normalization activity is largest at the center, where the contrast is the highest (Figure 5c). The response at the center therefore rises at a faster rate than those at the periphery, which again is consistent with our observations. This is an interesting property because the spatial difference in gains can account for the traveling wave of the time to half peak observed in the rising edges, which are generally attributed to slow lateral connections.
When there is no input, conductance is at the baseline value in all the units; hence, the temporal dynamics are the same everywhere in the model (Figure 5c). Thus, after input offset, the responses at all the units start to decay at the same time and at the same rate, as observed in the falling edges of the VSDI responses. Finally, the divisive effect of conductance in the model also causes the steady state response to saturate when the input amplitude is large. The dynamic nonlinearity in the model therefore can account for many of the observed properties of the responses. Interestingly, there is some evidence that the model’s prediction of asymmetrical effect of contrast on the rising and falling edges of the response holds approximately for single neurons in the primary visual cortex of cat (see Supplementary Materials 3).
How does the width of the normalization pool weighting function affect the rising edge and the spatial profile of the response? Consider a fixed localized input and different Gaussian normalization weighting functions that have the same total weight. If the pool is wide, then the normalization activity will be similar for units near and far from the center of activity. Thus, the difference in the slopes of the rising edges across space will be small. On the other hand, if the normalization pool size is small, there will be a large difference in the time constants of different units. These considerations suggest that the observed difference between the time courses at the different locations could be explained by a feedforward PGC model with an appropriate pool size.
Normalization pool size also influences the spatial profile of the response. Consider a static Gaussian input and its corresponding steady state response, V(x) = A(x)/ g(x). If the width of the pool is much wider than the input, then the normalization activity and hence the conductance g(x) will be the same at all units. In this case, the spatial response profile will simply be a scaled version of the receptive field summation, which is a Gaussian. On the other hand, if the normalization pool is much smaller than the input, then response saturation will occur at a different stimulus contrast for each unit, as in the LN model, thus flattening the response profile at high contrasts (Supplementary Material 1). As a result, to achieve the contrast-invariant spatial profile observed in the VSDI responses, the normalization pool size must be at least comparable to the size of the stimulus.
In sum, the normalization pool size affects both spatial and temporal properties of the responses. Based on these properties, it is possible to estimate the overall pool size from the data analytically (see Supplementary Materials 4).
VSDI responses were simulated with a network that consisted of an input layer and two subsequent stages (see Figure 5a). Although each stage has its own set of parameters, some of the key parameters were constrained by previous anatomical and physiological measurements reported in the literature (see Experimental Procedure). Figure 6a plots the spatial profiles of the peak responses in the model for different input contrasts. Consistent with the VSDI responses, the widths of the profiles are all the same. Note that profiles will only be contrast-invariant in the model for stimuli with sizes that are smaller or comparable to the receptive field of the V1 units; profiles for large stimuli will change shapes and widths as a function of contrast, due to saturation. The contrast response function of the model is plotted in Figure 6b, which provides a good fit to the data (r2 = 0.98).
Figure 6c shows the space-time plot of the predicted VSDI responses. The model captures qualitatively the observed spatiotemporal properties of the responses. For each contrast, (1) the rising edge latencies ( t10 ) at different locations are similar, with a maximum difference of 2 ms, and (2) the slope of the rising edge becomes shallower as distance from the center increases. For each location, as contrast increases, (3) response latency decreases, and (4) the rising edge becomes steeper. Finally, for all contrasts and locations, (5) latencies and slopes of the falling edges are similar (< 3 ms difference).
In the mammalian visual system contrast gain-control (normalization) has been observed in the retina, LGN and visual cortex, with a spatial scale that progressively increases (see Introduction). Similarly, in the PGC model, normalization operates at two stages, with the sizes of the receptive fields and normalization pools in the second stage being twice those in the first (Sceniak, Chatterjee, & Callaway, 2006). An important question is whether our results provide evidence concerning the relative strength of the normalization in the different stages of visual processing. As it turns out, the stimuli used in our main experiment are not sufficient to discriminate between hypotheses concerning relative normalization strength.
However, by exploring the PGC model we found that the relative strength of normalization at the two stages has a large impact on the expected size tuning of V1 responses. Therefore, by varying the size of the stimulus it is possible to estimate the relative contributions of normalization in the first stage (retina to layer 4 of V1) and the second stage (superficial layers of V1). Figure 7a shows predicted response amplitude at the center of the activated region in the superficial layers of V1 as a function of stimulus size for a 100% contrast Gabor stimulus. Each curve in the figure is for a different strength of normalization in the first stage of the model relative to the total strength in both stages. When normalization only occurs in the first stage of the model (1.00), the response increases with stimulus size because the second stage is linear (i.e., no normalization). As the normalization in the second stage becomes stronger (the other curves), the relative response to the larger stimuli (e.g., σ = 1°) decreases, because normalization has a divisive effect on the input from the first stage.
In an additional experiment we measured the VSDI responses to 100% contrast Gabor stimuli with σ = 0.167° and 1°. The red dots in Figure 7a plot the relative responses to the two stimuli. The peak response to the large stimulus is about 7% less than to the small stimulus, which is consistent with a strong normalization in the first stage of the model. Similar results were observed in additional experiments in another animal. This surprising result suggests that nonlinearities observed in the data may be mostly implemented before the superficial layers of V1 where the VSDI signals are measured. This pair of normalization strengths was used to obtain the simulation results shown in Figure 6.
Figure 7b plots the predicted size tuning curves at the centers of the two model stages for stimuli at 5% and 100% contrasts, using the same parameters for Figure 6. Because normalization in the second stage is relatively weak and because the receptive fields in the second stage are larger, the tuning curves for both contrasts peak at larger sizes than those in the first stage. In each stage, the peak of the tuning curve for low contrast occurs at a larger size than that for high contrast, consistent with the observations in single units in the LGN (Bonin, et al., 2005) and V1 (Sceniak, Ringach, Hawken, & Shapley, 1999) and with previous models of normalization at the level of single neurons (Sceniak, Hawken, & Shapley, 2001; Cavanaugh, Bair, & Movshon, 2002; Bonin, Mante, & Carandini, 2005).
VSDI in fixating monkeys was used to characterize the spatiotemporal dynamics of V1 population responses evoked by a small, briefly-presented, visual stimulus. The VSDI signals are particularly informative because they capture responses over the entire active region in V1. The population responses exhibited systematic and unexpected nonlinear properties. At different locations, they started to rise approximately all at once, with the response at the center of the active region rising at a faster rate than those that were further away. Stimulus contrast also affected the response latencies and slopes. In contrast to the relatively complex dynamics at stimulus onset, the responses following stimulus offset fell together at approximately the same time and rate, regardless of stimulus contrast and spatial location. We also found that the spatial profile of the peak response was constant and independent of contrast.
The rich spatiotemporal dynamics observed in the responses place strong constraints on models of V1. Models that rely solely on a single-unit nonlinearity or slow lateral propagation in V1 are inconsistent with the observed properties of the VSDI responses. Instead, we find that a simple canonical normalization-based PGC model can qualitatively account for such dynamics.
We also used the PGC model to examine the degree to which nonlinearities in V1 responses are inherited from its inputs. Contrast gain-control has been used to explain many nonlinear properties of single unit responses in the retina and LGN that are also observed in V1, such as phase advance of response at high contrast (Shapley & Victor, 1978; Victor, 1987), contrast saturation (Bonin, et al., 2005), and size tuning (Bonin, et al., 2005). It is therefore possible that gain-control before V1 contributes significantly to the response nonlinearities in V1. On the other hand, there is some evidence that the P-cells, which provide about 80% of the input to V1, are fairly linear (Derrington & Lennie, 1984; but see Levitt, Schumer, Sherman, Spear, & Movshon, 2001). The PGC model predicts how the responses to a large stimulus depend on the nonlinearity in V1 and its input. Results from an additional VSDI experiment that varied stimulus size suggest that most of the gain-control for localized stimuli is implemented before the superficial layers of V1 (i.e. in the retina, LGN, and/or layer 4 in V1).
It is perhaps not surprising that contrast gain-control is implemented at various stages along the visual pathway, given its crucial role in preserving tuning characteristics of V1 neurons (except contrast tuning), while allowing high sensitivity to contrast. Potential advantages of implementing a large component of the contrast gain-control before V1 is that it could then help preserve tuning in the retina and LGN (as well as in cortex), and it could be implemented with a spatial pooling that might involve relatively fewer connections than required in V1.
Population gain-control is a simple and effective mechanism that can maintain the sensitivity and tuning of neurons; hence it is quite possible that it operates in most, if not all, sensory cortical areas. If so, then the population dynamics reported here in V1 may be observed in many other cortical areas, and the corresponding pathways might be simulated by a cascade of PGC stages.
When the responses of a large population of neurons are pooled, the result can behave quite differently from the individual neurons that contribute to it. This idea is illustrated nicely in the size tuning behavior in our model. When all of the normalization occurs in the first stage of the model, then the units in this stage have strong size tuning. However, when these units are pooled linearly to produce the response of a unit in the second stage, this unit has much weaker size tuning (Figure 7a). The reason is that as the size of the stimulus increases, some of the units in the first stage that provide input to this unit decrease their responses due to surround suppression, while others increase their responses, because the stimulus now enters the center of their receptive field. The net effect of increasing the stimulus size is therefore much weaker in the second-stage unit than in the individual units in the first stage that provide input to it (Figure 7b). This is one factor that may explain why the VSDI responses (which measure the summed activity of a large population of V1 neurons) are only slightly lower for a large stimulus than for a small stimulus, while strong size tuning has been observed for single neurons in V1 (Sceniak, et al., 1999, 2001; Cavanaugh, et al., 2002; Levitt & Lund, 2002).
A second factor is the heterogeneity in the tuning properties of the neurons within the population. For example, the size tuning for different neurons vary greatly (Cavanaugh, et al., 2002). There was also a large range of suppression: Some neurons were suppressed to spontaneous firing rate as stimulus size increased, while some neurons were not suppressed at all. Overall, more than half of the neurons were suppressed by less than 40% of their peak responses. When the responses from these units are pooled together, the combined tuning curve will in general be shallower than the individual curves in the population.
The above discussion illustrates how unexpected properties can emerge at the level of neural population responses (for a more general discussion see Seidemann, Chen, & Geisler, 2009). In general, in many cases it will be difficult or even impossible to predict the population responses based on a small sample of single-unit measurements. Our results on size tuning demonstrate this difficulty and emphasize the importance of direct measurements of population responses.
A central idea of our model is that the gain is controlled through division. A key question is therefore: How is the division achieved in a neuron? It is possible that division is implemented by a combination of different biophysical mechanisms (Kayser, Priebe, & Miller, 2001; Carandini, 2004) at different scales. At the level of individual neurons, local nonlinearities such as synaptic depression (Abbott, Varela, Sen, & Nelson, 1997; Tsodyks & Markram, 1997) have a divisive effect on the presynaptic activity, but these mechanisms are unlikely to account for the long-range effects that we observe. Connections with inhibitory interneurons could deliver the normalization signals at the population level.
Noise-reduction in the membrane potential could also contribute to gain-control by reducing the likelihood of crossing spike threshold (Finn, et al., 2007); however, noise reduction may itself be the result of some form of gain-control (in many systems lowering gain lowers noise). Further investigations will be required to understand the relationship between contrast gain-control and membrane-potential noise.
Another key question is: Where do the signals that control the gain come from? In the feedforward implementation, which is illustrated in Figure 5a, the gain of the individual neuron is computed at the same level as its input, and is provided to the neuron at the same time as (or before) the excitation. Alternatively, in a feedback implementation that has been previously proposed (Heeger, 1992; Carandini, et al., 1997), the gain is computed from the output of the neuron and its neighbors. In this case the gain computation can occur either at the same level, or potentially even in a subsequent stage that then sends fast feedback. While a feedforward circuit appears to be the simplest and most parsimonious implementation of the gain-control, a mechanism that involves very rapid feedback, potentially through a specialized subset of the neurons with fast dynamics, cannot be ruled out. Additional experiments are needed to address this important question.
The PGC model assumes that conductance changes instantaneously with the input. While this is not plausible, there is evidence suggesting such a change occurs within milliseconds (Albrecht, Geisler, Frazor, & Crane, 2002). In addition, simulations of a modified model where the conductance changed with a time constant of 10 ms showed that there was no qualitative difference in the responses. The basic instantaneous model thus provides a reasonable approximation to a more realistic model.
Consistent with previous VSDI studies (Grinvald, et at., 1994; Jancke, Chavane, Naaman, & Grinvald, 2004; Benucci, Frazor, & Carandini, 2007), our results show that if the latency of the response is estimated from the time to half peak ( t50 ) or from the response phase (using spectral analysis), then a traveling wave would appear to originate from the center and propagate towards the periphery at a moderate speed of ~0.4 mm/ms (see Figure 2e and Figure 3e). Although the accepted hypothesis for such spatiotemporal dynamics is that they reflect propagation of responses through slow lateral connections in V1, our results suggest that lateral connections are not necessarily the major cause for such dynamics. In fact, the lateral connections hypothesis predicts a spatial increase of response onset latency ( t10 ) that is not observed in our data. Our results are therefore inconsistent with a major contribution of slow lateral propagation to the observed dynamics in V1 in responses to small stimuli (see Supplementary Material 2 for a quantitative analysis of lateral spread). Instead, our measurements suggest that these dynamics are the result of changes in the slope of the rising response, which could be explained as a gain-control effect.
Importantly, we show that a feedforward PGC model, in which the responses reach all locations in V1 at the same time, can account for the spatial changes in t50 or phase of VSDI responses shown here. In Supplementary Materials 5 we also show that the model can account for the previously reported results of Benucci, et al. (2007). Note that while the PGC model can explain these results, it is currently specific to the stimuli used in our experiments; altering the stimulus properties can potentially change the dynamics of the responses. Thus, it remains to be seen if our simple feedforward model can predict the response dynamics for a wider range of stimulus conditions.
To understand the processing of arbitrary visual stimuli in the cortex, it is important to characterize the properties of V1 population responses and evaluate models that can account for them. As an initial step, we used VSDI in fixating monkeys to fully characterize the spatiotemporal dynamics of the population responses in the superficial layers of V1 evoked by a small, briefly-presented, visual stimulus. The population responses exhibited systematic and unexpected nonlinear properties that are not obvious from single unit results. We also showed that models with static nonlinearities in the final stage and models with slow lateral propagation of responses in V1 are inconsistent with the observed properties of the VSDI responses. Instead, a simple canonical population gain-control model was found to qualitatively account for such dynamics. The consistency of our data with population gain-control and the advantages of such a mechanism for simultaneously providing tuning invariance and high sensitivity to weak signals suggests that population gain-control is likely to operate in most, if not all, sensory cortical areas.
The results reported here are based on methods that have been described in detail previously (Seidemann, Arieli, Grinvald, & Slovin, 2002; Chen, et al., 2006, 2008). Here we focus on details that are of specific relevance to the current study. All procedures have been approved by the University of Texas Institutional Animal Care and Use Committee and conform to NIH standards.
A monkey was trained to maintain fixation while a small oriented stationary Gabor stimulus was presented on a uniform gray background. Each trial began when the monkey fixated on a small spot of light (0.1° × 0.1°) on a video display. Following an initial fixation, the Gabor stimulus was presented for 200 ms at 2.2° eccentricity, with σ of 0.167° and spatial frequency of 2.5 cycles per degree. Throughout the trial, the monkey was required to maintain gaze within a small window (< 2° full width) around the fixation point in order to obtain a reward. Early fixation breaks invalidated the trials, which were not included in the analysis. Each block of trials contained eight to twelve different contrasts from 0% (blank) to 100% presented pseudorandomly and ten valid trials were run for each condition.
In a separate set of experiments, the width of the Gabor stimulus was either 0.167° or 1° in each trial. The contrast of the stimulus was always 100%, and it was presented for 100 ms. The other parameters of the stimulus were the same as the experiment described above.
Imaging data were collected at 100 Hz at a resolution of 512×512 pixels. The size of each pixel was 37×37 µm2. Our basic analysis is divided into four steps: (i) normalize the responses at each pixel by the average fluorescence at that pixel across all trials and frames, (ii) remove from each pixel a linear trend estimated on the basis of the response in the 100-ms interval before stimulus onset for each trial, (iii) remove trials with aberrant VSDI responses (generally less than 1% of the trials) (see Chen, et al., 2008), (iv) subtract the response to the blank condition from the stimulus-present conditions.
After the basic analysis described above, the spatial properties of the responses in individual trials were determined. First, the center of the spatial response of each experiment was estimated by fitting a 2D Gaussian to the average response taken over a time window of 160–260 ms after stimulus onset (shaded region in Figure 2a), for all stimulus contrasts (25% to 100%). This center was then held fixed while the average response over the same time window was fitted with a 2D Gaussian to determine the lengths of the major and minor axes and the orientation of the major axis for each trial of the experiment.
To include more trials at each contrast level in the analysis, we combined responses of five experiments from one monkey. Due to the slight difference in the setup of each experiment, the spatial responses could be translated and rotated with respect to each other. The center and average orientation of the 2D Gaussian fit of each experiment were used to transform the data so that the spatial responses aligned and overlapped in all the experiments. Data from individual experiments are similar to the combined data but noisier.
In the linear step of the model, each unit in a stage computes the weighted sum of the input I (x,t) by cross-correlation with a Gaussian spatial receptive field:
where , and denotes cross-correlation evaluated at x. Note that if the input is a Gaussian in space, then the weighted sum across the population will also be a Gaussian. The summation then passes through an RC circuit to produce the response. The response of the unit at x can be described by the following RC circuit equation (see Figure 5b):
where C is the fixed capacitance, A(x,t) is the receptive field summation activity, and g(x,t) is the conductance of the resistor for the stage. The conductance at each unit increases with the normalization pool activity B(x,t) and is defined as
where g0 is the fixed baseline conductance. For each unit the normalization activity is given by: B(x,t) = b · I (x,t) H(x), where b is a scaling factor that represents the strength of normalization and H(x) is the Gaussian weighting function defining the normalization pool (see Figure 5b). All the parameters are the same for the units in the same stage, but they can differ between stages.
The detailed dynamics of the model are analyzed in Supplementary Materials 6.
Since the response in each model stage represents average membrane potential, responses in the first stage are converted into spikes by a power function before being sent to the second stage. In other words, the input I2 (x,t) that the second stage receives from the first stage is:
where V1(x,t) is the response in the first stage. The same function is also applied to the activity in the input layer before feeding into the first stage. (Note that applying a power exponent to a Gaussian profile changes the width, but leaves the shape Gaussian.)
The values of the parameters for the two stages in the model were estimated by fitting the responses in the model V1 to the VSDI responses. To reduce the number of free parameters, we assumed g0 = 1 for both stages because it is effectively a scaling factor of the response and the capacitance. The constant delay in the input layer was chosen to be 20 ms, which was a few milliseconds shorter than the shortest latency seen in the data. The exponent n of the power function that converts membrane potential into spikes was chosen to be 2, which is similar to what we found experimentally (Palmer, et al., 2008) and provides a good fit to the data. The same exponent is used in the power function in the input layer. Based on the literature suggesting that the widths of the center and surround in the afferents of V1 are about half of those in V1 (Sceniak, et al., 2006), we also assumed σG,2 = 2σG,1 and σH,2 = 2σH,1. By assuming the width of the VSDI spatial profile to be the result of cascaded receptive field summations and the power functions, we estimated the value of σG,1. Using the difference in the rising edge slopes at different locations, we estimated the normalization pool size in the second stage, σH,2, by the procedure discussed in Supplementary Materials 4.
The remaining free parameters that needed to be estimated were C1, C2, b1, and b2. We first fitted the center’s normalized contrast response function to the data by minimizing the sum of the squared error. This step enabled b1 and b2 to be determined separately from C1 and C2, because the capacitances do not affect the steady state response in the model. After that, the normalization strengths, b1 and b2, were held fixed, while the capacitances were estimated by fitting the slopes of the rising and falling edges at different locations and stimulus contrasts simultaneously. The obtained parameters were σG,1 = 0.983 mm, σH,1 = 1.386 mm, C1 = 3.19, C2 = 2.30, b1 = 1521, and b2 = 2. The model was simulated for a 20 mm long strip (extending the black rectangular region in Figure 1c) using the Matlab function ode45().
We thank W. Bosking, C. Michelson, C. Palmer, and Z. Yang for assistance with experiments and for discussions, and T. Cakic for technical support. This work was supported by National Eye Institute Grants EY-016454 and EY-016752 to E. Seidemann and EY-02688 to W. S. Geisler and by a Sloan Foundation Fellowship to E. Seidemann.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.