|Home | About | Journals | Submit | Contact Us | Français|
How do neuronal populations represent concurrent stimuli? We measured population responses in cat primary visual cortex (V1) using electrode arrays. Population responses to two superimposed gratings were weighted sums of the individual grating responses. The weights depended strongly on the relative contrasts of the components. When the contrasts were similar the population performed an approximately equal summation. When the contrasts differed markedly, however, the population performed approximately a winner-take-all competition. Stimuli that were intermediate to these extremes elicited intermediate responses. This entire range of behaviors was explained by a single model of contrast normalization. Normalization captured both the spike responses and the local field potential responses; it even predicted visually evoked currents source-localized to V1 in human subjects. Normalization has profound effects on V1 population responses, and is likely to shape the interpretation of these responses by higher cortical areas.
Even the simplest sensory stimulus or planned movement causes a large pool of neurons to be active. The code for both sensory perception and motor output, therefore, is thought to lie in the collective activation profile of neuronal populations (Georgopoulos et al., 1982; Mcllwain, 1986; Nicolelis et al., 1995; Pouget et al., 2000). This profile has been measured mostly when a population is faced with an individual sensory stimulus or a single motor output. In such cases, the profile of population activity is typically bell-shaped, with the strength of each neuron’s response depending on the match between the neuron’s preferences and the sensory signal or the planned movement (Chen et al., 2006; Georgopoulos et al., 1986; Purushothaman and Bradley, 2005).
In nature, however, a sensory system is often confronted with the conjoint presence of multiple stimuli. Likewise, a motor system needs to represent a combination of multiple movements. In these circumstances the population responses will not simply have a single peak centered on a particular stimulus or movement, but rather a combination of multiple peaks (MacEvoy et al., 2009; Pasupathy and Connor, 2002; Treue et al., 2000). Understanding the rules of this combination is fundamental to our comprehension of population coding.
To investigate the representation of multiple stimuli in a neuronal population, we measured responses in primary visual cortex (V1) to sums of two component gratings. To control the strength of sensory stimulation, we varied grating contrast. To control the identity of the stimulated neurons, we varied grating orientation. Having control of both quantities provided a space that is ideal to test models of population coding.
We first investigated the impact of stimulus contrast, and found that population responses are contrast-invariant: contrast scales the population response profiles without changing their shape. By comparing response properties of populations to those of the underlying single neurons we explain how this invariance relates to the invariance of tuning curves seen in single units (Finn et al., 2007; Sclar and Freeman, 1982).
We then asked how population responses in V1 represent two superimposed component gratings (plaids), and we found that these responses could be well approximated by a weighted sum, in which the weights applied to the component gratings depend on contrast. The weights given to the component gratings of the plaid were always smaller than the sum of the weights to the individual components, consistent with the well-known phenomenon of cross-orientation suppression (DeAngelis et al., 1992; Morrone et al., 1982). We also found a profound effect of relative contrast: a gradual transition between two regimes. When the component contrasts are similar the population gives sizeable weights to both components (equal summation). When the contrasts are dissimilar, however, the responses overwhelmingly favor the component with higher contrast (winner-take-all competition).
We were able to capture all these phenomena with a simple model based on contrast normalization. This model involves a division between a numerator that sums contributions of individual components and a denominator that grows with overall contrast. The model has been previously applied to individual neurons that are optimally tuned to one of the component gratings (Carandini et al., 1997; Freeman et al., 2002; Heeger, 1991; Heeger, 1992; Heuer and Britten, 2002). Here we extend it to entire populations and show that it can accurately describe their widely different regimes of operation.
To demonstrate the relevance of these findings to visual perception, we investigate them not only in the spike responses of anesthetized cats but also in visually evoked currents source-localized to V1 of human subjects. Moreover, we show with simple simulations how normalization in area V1 can have profound effects on the interpretation of V1 signals by higher cortical areas, and thus shape perceptual judgments.
To measure the responses of a large population of neurons in area V1 we recorded from a 10×10-electrode array implanted in anesthetized cats (Figure 1A). The array covered an area of 16 mm2 so it included regions with a diversity of orientation preferences. As a result, spike responses measured at individual sites exhibited tuning curves whose preferences covered fairly uniformly the range of orientations (Figure 1A).
As expected, a stimulus of a given orientation evokes across the population a response whose profile peaks at the neurons that prefer the stimulus orientation (Figure 1B–C). This profile could be well fit by a simple circular Gaussian function Gϕ(θ) centered on stimulus orientation ϕ and varying with the preferred orientation of the neurons θ (Figure 1C).
We asked how these population responses are affected by stimulus contrast, and we found them to be invariant: changing contrast affected their profile in amplitude but not in width (Figure 1D–E). To test for invariance, we fitted the responses to an oriented stimulus (ϕ) by a separable model, the product of the Gaussian function of preferred orientation θ and a function of stimulus contrast c:
where the parameters rmax, c50 and n determine the overall responsiveness, the semisaturation contrast, and the exponent of an accelerating nonlinearity related to spike threshold. The separable model provided excellent fits to the population responses. It explained 98.8% of the variance in the example data set (Figure 1D–E) and an average of 98.5% of the variance in all 9 data sets. These results confirm earlier indications obtained by intrinsic imaging (Carandini and Sengpiel, 2004): population responses to individual orientations are contrast invariant.
A factor contributing to this invariance of population responses is surely the well-known invariance of tuning curves in individual neurons. Such tuning curves are scaled by stimulus contrast without changes in shape (Finn et al., 2007; Sclar and Freeman, 1982). The invariance seen in single neurons, however, is not sufficient to explain the invariance of population responses. For example, if neurons that are more sharply tuned responded only to higher contrasts, then increasing contrast would narrow the profile of population responses.
The invariance of population responses, indeed, makes a strong prediction about the relationship between selectivity and sensitivity across individual neurons. The prediction is that orientation tuning width and contrast sensitivity should be distributed independently across neurons. Increasing contrast, then, would not preferentially engage neurons that are more or less sharply tuned, thus leaving the width of the population profile unchanged. This intuitive argument is formalized mathematically in Appendix.
We tested this prediction, and indeed found no systematic relationship between tuning width and contrast sensitivity across individual neurons (Figure 2). We measured orientation tuning curves (Figure 2A) and contrast responses (Figure 2B) in 75 well-isolated V1 neurons individually recorded in a separate set of experiments. We quantified contrast sensitivity of each neuron by the two parameters of the hyperbolic ratio: semisaturation contrast c50, and exponent n. We then asked whether these parameters are independent of the neuron’s tuning width (half-width at half height). We assessed statistical independence by calculating quartiles of the marginal distributions and testing for uniformity of the joint distribution. We saw no departure from independence between tuning width and semisaturation contrast c50 (Figure 2C, X2 = 4.57, df = 7, p = 0.71). Similarly, there was no departure from independence between tuning width and exponent n (not shown, X2 = 7.56, df = 7, p = 0.37).
This analysis gives us the opportunity to compare the properties of individual neurons to those of the population, and reveals that the two can be quite different. The mean tuning width of the individual neurons was 23 ± 14 deg (mean ± SD, N = 75, Figure 2A), comparable to the 22 ± 2 deg measured in the population (N = 9 experiments, Figure 1D). The semisaturation contrast was also similar: 20 ± 11 % for individual neurons (median ± MAD, Figure 2B) and 22 ± 9 % for the population (Figure 1E). The exponent, however, was much lower in the population than in most of the individual neurons: 2.2 ± 1.0 (median ± MAD, Figure 2B) for individual neurons but only 1.0 ± 0.1 for the population. The contrast responses of the population (Figure 1E), therefore are much shallower than those of most individual neurons.
These similarities and differences between neurons and population are simply explained. We built predicted population responses by averaging the fitted responses of the individual neurons after aligning their preferred orientation (Figure 2A,B, black curves). The predicted population tuning profile has a width of 20 deg, intermediate to that of individual neurons and similar to that of the actual population. The predicted population contrast response, instead, is much shallower than that of the average individual neuron: it has a similar semisaturation contrast (20%) but a lower exponent (1.4). These results echo the measurements made in the actual population. The shallow response derives from the fact that individual neurons have a broad range of semisaturation contrasts (Figure 2B).
We then asked how a population represents more than one stimulus, and measured responses to plaids (Figure 3A). Plaids were obtained by superimposing two component gratings with different orientations; the component contrasts were independently varied.
The population responses to these plaids depended strongly on the relative contrasts of the components (Figure 3B). If the component contrasts were similar (e.g., 6% and 12%), or identical (e.g., both 12%) the population response to the plaid exhibited two peaks, one at each of the component orientations. However, if the component contrasts differed considerably (e.g., 50% and 12%), the response to the component of higher contrast (50% vertical) dominated the population response to the plaid (Figure 3B, lower right). In these conditions, it is as if the 12% contrast horizontal grating had almost disappeared. And yet, when this stimulus was presented alone it elicited a large response (Figure 3B, lower left).
To characterize these responses we described them with a weighted-sum model (Figure 3C). In this model, the response to the plaid with component orientations ϕ1 and ϕ2 contrasts c1 and c2 is given by a linear combination of the responses to the component gratings:
where the responses to the individual components R1 (c1) and R2 (c2) are given by Equation (1) with Gaussians centered on ϕ1 and ϕ2, the scaling factors w1, w2 depend on the combination of contrasts. Here and elsewhere, we express population responses (quantities in bold letters) as vectors: they are functions of stimulus orientation. These vectors are in turn affected by stimulus contrast.
Since the weighted-sum model provided good fits, its best-fitting weights w1, w2 can be used to describe the rules of combination. Across contrast combinations, adding the two component gratings resulted in weights < 1, a signature of cross-orientation suppression. Furthermore, the weights greatly depended on relative component contrasts. Weights were sizeable for both components if the two contrasts were similar (w1 = 0.99, w2 = 0.24) or identical (w1 = w2= 0.77), resulting in two clear peaks in the activity profile (Figure 3C, left and middle). However, if the component contrasts differed considerably, the weight assigned to the lower-contrast component was close to zero (w1 = 0.002, w2 = 0.92) resulting in a single peak (Figure 3C, right).
To understand these values of the weights, it helps to consider two extreme scenarios: equal weights and winner-take-all weights (Figure 3D,E). In the first scenario, the population applies equal weights to both component responses: w1 (c1, c2) = w2 (c1, c2). This scenario entails the presence of two peaks in all population responses to plaids, leading to reasonable or excellent fits if component contrasts are similar or identical (Figure 3D, left and middle), but unacceptable fits if component contrasts differ considerably (Figure 3D, right). In the second scenario, instead, the population response completely disregards the grating of lower contrast, w1 = 0 if c1 < c2 and vice versa. Assuming winner-take-all weights leads to excellent predictions if component contrasts differ considerably (Figure 3E, right), but not if they are similar or equal (Figure 3E left and middle). Indeed, winner-take-all weights can only predict a single peak in the activity profile. The sets of weights predicted by the two scenarios, therefore, have a complementary pattern of successes and failures.
To extend these qualitative findings we explored population responses obtained with a broad range of contrast combinations and with different plaid angles (Figure 4A–C). As in the previous examples, these population responses were closely described by the weighted-sum model (Figure 4A, solid lines). We searched for the weights w1 and w2 that yielded the best fits and obtained a pair of weights for each contrast combination (Figure 4B). With these best-fitting parameters, the weighted-sum model made excellent predictions (mean fit quality index q = 94.4 ± 0.5% SE, N = 9, Figure 4C).
The best-fitting weights w1 and w2 displayed a marked dependence on component contrast (Figure 4B). First, the weights for plaids were consistently below unity, indicating that the response to a grating forming part of a plaid is smaller than the response to that grating presented alone (cross-orientation suppression). Second, with components of similar contrast both weights are considerable (Figure 4B, diagonal). Third, when component contrasts differ substantially, the weight given to the component of lower contrast is much weaker (and sometimes negligible) than the weight given to the component of higher contrast (Figure 4B, off the diagonal).
These data therefore reveal a range of behaviors extending from a regime of equal summation (at work when component contrasts are similar) to a regime of winner-take-all competition (at work when contrasts are very dissimilar). In intermediate contrast conditions the weights are intermediate between these extremes. To delineate the contrast conditions in which these two regimes operate, we return to the two extreme scenarios mentioned earlier: equal weights and winner-take-all weights (Figure 4D–G).
Assuming equal weights (w1 = w2, Figure 4D) leads to excellent predictions when component contrasts are similar (q = 93.8 ± 0.8% SE, Figure 4E, diagonal region), but results in poor predictions elsewhere (q = 81.7 ± 1.1%). Assuming winner-take-all weights (w1 =0 if c1 < c2, Figure 4F) yields the opposite pattern of successes and failures (Figure 4G): poor fits when contrasts are similar (q = 72.8 ± 1.2%), and good predictions elsewhere (q = 91.9 ± 0.4%).
The weighted-sum model describes accurately how the population integrates two superimposed inputs (Figure 4A–C), but it does not capture explicitly the effects of component contrast. For each combination of component contrasts c1 and c2 the model requires two free parameters, the weights w1 (c1, c2) and w2 (c1, c2). If a new combination of contrasts c1 and c2 were to be tested, the model would make no prediction for the relevant weights.
What is needed, therefore, is a model that embodies the weighted-sum model while explicitly representing the role of contrast. Such a model should predict approximately equal weights when component contrasts are similar and approximately winner-take-all weights when component contrasts are dissimilar.
A promising candidate is the model of contrast normalization that has been developed for individual neurons. Normalization involves a ratio: the numerator sums the contributions of the different stimuli, weighted nonlinearly by component contrast, and the denominator scales these contributions based on overall contrast (Albrecht and Geisler, 1991; Carandini et al., 1997; Heeger, 1991; Heeger, 1992; Kouh and Poggio, 2008). Normalization explains cross-orientation suppression for individual neurons, the ones that are selective for one of the two orientations in the stimulus (Carandini et al., 1997; Freeman et al., 2002). Can normalization predict the responses of the whole population, and can it predict the gradual transition between the regimes of equal summation and winner-take-all competition?
To apply the normalization model to our population responses, we express it as
where rmax, c50, and n are constants, and G1, G2 are the usual Gaussians centered on the component’s orientations ϕ1 and ϕ2. The term is the root-mean-square contrast of the stimulus. In the model, therefore, the component contrasts c1 and c2 appear both in the numerator and in the denominator. If one of these contrasts is zero, i.e. if the stimulus is a single grating, the model reduces to Equations (1) and (2), and hence incorporates contrast invariance. If instead both contrasts are positive, the model provides a closed-form prediction for how the two components of a plaid should combine to yield a single population response.
The normalization model has in principle all the desired properties: it has an explicit role for the component contrasts and it embodies the weighted-sum model by predicting a gradual transition of behaviors from equal summation to winner-take-all competition. As shown in the Appendix, we can derive the weights predicted by the normalization model explicitly in two key cases: when the two component contrasts are similar and when they are very dissimilar. In the first case, the model approximates the scenario of equal weights, and in the second case it approximates the scenario of winner-take-all weights.
The gradual transition between equal summation and winner-take-all competition arise rather intuitively from the combination of operations in the normalization model (Equation (4)). Equal summation is a natural consequence of the sum in the numerator; winner-take-all competition, in turn, stems from the scaling of the summation terms by contrast; with the help of an exponent > 1, this exponent introduces imbalances in the effects of the two gratings when contrasts differ. Across all contrast conditions, the denominator scales the whole response, causing the effective weights to be < 1 (cross-orientation suppression). This effect is of course most striking in the regime of winner-take-all competition, where the denominator reduces the weight applied to the weaker response to almost zero.
We tested the normalization model on the population responses and found that it provided good fits in both regimes (Figure 5). For the example experiment (Figure 5A) the model captures qualitatively all the fundamental behaviors of the population. Fits to other data sets, including those where the component gratings had different angles, were of comparable or higher quality (Figure S1). The good performance of the model is confirmed by an analysis of effective weights and fit quality across experiments (Figure 5B–C). The model captured the regime of equal summation seen when component contrasts are similar, predicting similar weights for the two component responses (Figure 5B, diagonal) and yielding good fits (q = 91.5 ±1.0%, Figure 5C). The model also captured the regime of winner-take-all competition, predicting progressively smaller weights for gratings whose contrast decreases relative to the other grating (Figure 5B, off-diagonal) and again yielding good fits (q = 91.7 ± 0.6% SE, Figure 5C). Median parameters of the normalization model were n = 1.5 ± 0.13, c50 = 13.1 ± 4.2, and σ = 19.0 ± 2.1.
Importantly, in the normalization model the gradual transition from one regime to the other is explicitly controlled by the component contrasts, without requiring a change in model parameters. The normalization model, indeed, performs almost as well as the weighted-sum model with optimal weights. Yet it requires only five parameters: one for overall responsiveness, rmax, two for contrast responses (c50 and n), and two for the circular Gaussian G (width and offset). The weighted-sum model, by comparison, requires the same five parameters plus two weights w1 (c1, c2) and w2 (c1, c2) for each combination of positive grating contrasts c1 and c2. For our data sets that makes 5+16 = 21 free parameters, more than four times the 5 free parameters in the normalization model.
The physiological mechanisms underlying response normalization in individual V1 neurons are at the moment unclear and may rest on a combination of factors (Carandini et al., 1997; Carandini et al., 2002; Chance et al., 2002; Finn et al., 2007; Freeman et al., 2002; Priebe and Ferster, 2006). It is therefore of interest to know to what degree normalization is present in the subthreshold responses as opposed to spiking responses.
To measure subthreshold activity in populations of neurons we analyzed local field potential (LFP) responses measured with the same 10×10 electrode array that measures the spike responses. The LFP is comprised of the combined subthreshold activity of the neurons surrounding the electrode (Katzner et al., 2009). To obtain the highest signal/noise ratios, we pooled these LFPs across all responsive sites of the array and across all animals (N=4). To distinguish the responses to the test from the responses to the mask, we employed a frequency tagging method typically used in EEG research (Candy et al., 2001; Morrone and Burr, 1986; Regan, 1989): we made the test and mask contrast-reverse at different frequencies. These frequencies effectively act as a tag in the LFP responses, which oscillate at twice the frequency of reversal (Katzner et al., 2009).
This design provides an opportunity to test the same normalization model that we have applied to spike responses. By pooling across sites we can no longer study population responses as a function of preferred orientation. By tagging the two stimuli by frequency, though, we can distinguish the responses to test and mask. The predicted responses Rj, therefore depends on stimulus tag, j = 1 for the test and j = 2 for the mask:
Because adding a superimposed grating affects only the denominator, cross-orientation suppression provides a frank reduction in tagged responses.
The LFP signals agreed closely with these predictions (Figure 6A,B). First, the LFPs evoked by the test grew and eventually saturated as the test contrast increased (Figure 6A, open symbols). The normalization model closely predicted this saturation (Figure 6A, curves). Second, these LFPs were reduced by adding a 25% mask grating (Figure 6A, closed symbols). The normalization model closely matched this effect, correctly predicting a rightward shift of the contrast response function (Figure 6A, curves). Third, the suppressive interactions between test and mask were mutual: increasing test contrast reduced the LFPs evoked by the mask (Figure 6B closed symbols). The model did the same (Figure 6B, curves).
Because the predictions of the normalization model were accurate (98.2% explained variance), the model’s parameters can be used to compare subthreshold responses to spike responses. We found that subthreshold population responses measured from LFPs saturated at lower contrasts (c50 = 13%, n = 1.5) than those measured earlier from population spike responses (c50 = 22%, n = 1.0). A similar difference between contrast responses of subthreshold potentials and spikes has been seen in individual neurons and may be a simple consequence of spike threshold (Finn et al., 2007).
Does normalization govern the population responses only under anesthesia or can it be used to explain responses in the awake cortex? Does it apply to humans? We sought an answer to these questions by measuring the electroencephalogram (EEG) of human subjects with a sensor net of 128 electrodes. From these signals we obtained visual evoked potentials (VEP). Human VEP responses to plaid stimuli exhibit suppressive interactions between the grating components (Burr and Morrone, 1987; Candy et al., 2001; Ross and Speed, 1991), and such interactions may be explained by normalization (Candy et al., 2001).
The VEP is a close approximation to the pooled LFP measured within the underlying cortex (Schroeder et al., 1991). To measure it, therefore, we used stimuli similar to those used to measure LFP in anesthetized cats: we made the test and mask gratings reverse in contrast at different frequencies. Test and mask thus caused distinct responses that oscillated at twice the reversal frequencies (Candy et al., 2001). To extract responses originating specifically from area V1, we estimated the current source density across the entire cortex and then defined a region of interest based on maps of retinotopy measured with fMRI (Appelbaum et al., 2006).
The amplitudes of the visually evoked currents source-localized to V1 in humans were remarkably similar to the pooled LFP signals in anesthetized cats (Figure 6C,D). First, increasing test contrast increased the responses to the test (Figure 6C, open symbols). Second, adding the mask to the test reduced these responses (Figure 6D, closed symbols). Third, increases in test contrast reduced the responses to the mask stimulus (Figure 6D). Therefore, in the population responses measured in human subjects (just as in those measured from anesthetized animals) there is competition between the stimuli.
The human V1 population responses, again, were closely predicted by the normalization model (Equation (5)) with fit parameters that were almost identical to those estimated for cat LFPs (Figure 6C–D, curves). As in anesthetized cats, the model (Equation (5)) captures the competitive interactions between test and mask: the contrast response function to the test is shifted to the right in the presence of a mask (Figure 6C, curves), and responses to the mask decrease as the test contrast increases (Figure 6D, curves). As in cat LFPs, these predictions were accurate (96.2% explained variance). The model parameters for fitting these human V1 responses (c50 = 12%, n = 1.4) were similar to those for fitting pooled LFP responses in anesthetized cats (c50 = 13%, n = 1.5). The similarity of subthreshold population responses in anesthetized cats and human observers indicates that V1 responses in both species and across behavioral states are governed by the same functional mechanism: contrast normalization.
We finally ask what consequences our findings may have on the decoding of V1 signals by higher cortical areas. We have seen that normalization explains the regimes of equal summation and competition observed in V1 and that competition can be so extreme as to correspond to a winner-take-all effect. Such profound competition is bound to have consequences on downstream areas.
To investigate these consequences, we implemented a well-established model of a pattern-selective neuron in area MT (Figure 7A–C). When stimulated with plaids, such neurons respond to the global direction of the plaid rather than the direction of motion of the individual components (Movshon et al., 1985). A successful model for these neurons (Rust et al., 2006; Simoncelli and Heeger, 1998) postulates that they sum the activity of a population of V1 neurons with weights appropriate to obtain the desired preferred direction (Figure 7A). We implemented this model for a model MT neuron whose direction tuning curve measured with drifting gratings peaks at 180 deg (Figure 7B). The neuron exhibits the same selectivity when the stimulus is a plaid composed of two gratings differing in direction by 120 deg: it responds most strongly when the global direction of the plaid is 180 deg (Figure 7C, black curve). The model incorporates contrast normalization in V1, but none of these phenomena rests on such normalization: neglecting normalization simply results in larger responses, without changes in tuning (Figure 7C, gray curves).
Normalization in area V1, however, can profoundly affect the responses of this MT neuron to a plaid with unequal component contrasts (Figure 7D). When component contrasts are 25% and 75%, normalization of population responses in area V1 leads to marked competition. The lower-contrast stimulus almost disappears from the population responses of V1, so the model MT neuron responds mostly to the higher contrast stimulus. Its tuning for plaid direction shifts towards 240 deg, i.e. the plaid direction in which the higher contrast grating drifts at 180 deg (Figure 7D, black curve). The model MT neuron, in other words, has largely lost its ability to represent the overall motion of the plaid. This effect is almost entirely explained by normalization: neglecting normalization would lead to much smaller effects (Figure 7D, gray curve). This simple example, therefore, illustrates how normalization of population responses in area V1 can have profound consequences for the response properties of areas downstream in the visual hierarchy.
Experiments were conducted at the Smith-Kettlewell Eye Research Institute under the supervision of the Institutional Animal Care and Use Committee. Detailed methods for these experiments are published elsewhere (Katzner et al., 2009). Briefly, young adult cats were anesthetized with Sodium Penthotal (0.5–2 mg/kg/hr, i.v.) and Fentanyl (typically 10 μg/kg/hr, i.v.), supplemented by inhalation of N2O mixed with O2 (typically in a ratio of 70:30). Eye-movements were prevented by a neuromuscular blocker (Pancuronium Bromide, 0.15 mg/kg/hr, i.v.). A 10×10 electrode array (400 μm spacing, 1.5 mm electrode length) was implanted in area V1 to record multiunit activity (MUA) and local field potentials (LFP). Insertion depths were about 0.8–1 mm, resulting in recordings confined mostly to layers 2–3. A Cerebus 128-channel system (Blackrock, Utah) was used to sample the data.
Stimuli were contrast-reversing sinusoidal gratings presented monocularly on a CRT monitor (refresh rate 125 Hz, mean luminance 32 cd/m2). Gratings were modulated sinusoidally in contrast with a temporal frequency of 4 Hz (9 experiments). Spatial frequency was adjusted to optimally drive the majority of sites in the electrode array. Gratings were presented in a circular window (30° diameter) and lasted 2 sec. The contrast of the gratings was 0, 6, 12, 25, and 50%. Plaids were obtained by summing two gratings. For each experiment, the angle between component orientations in the plaid was fixed. To reduce effects of adaptation, each experiment consisted of three pairs of orientations (e.g., 0°/90°, 30°/120°, 60°/150°). Data for different pairs were collapsed after adjusting for the difference between stimulus orientation and neuronal preference. In addition, we recorded responses to 100% contrast gratings of 12 different orientations to obtain orientation tuning curves for each site in the array. The stimuli were shown in random order in blocks presented at least 8 times.
In each channel of the multi-electrode array we set thresholds to ~4 SD of the background noise. Threshold crossing were considered multi-unit activity and were pooled together for each site. For each site, orientation preference was defined as the vector average of firing rate responses to the 12 full-contrast tuning stimuli. Only responsive sites with at least minimal tuning (circular variance < 0.85) were considered in all further analyses (Ringach et al., 2002). Using this criterion, on average 42±13 sites (median ± MAD) were included. For the computation of population responses sites were binned (15 deg bin width) according to their preferred orientation. Before averaging across sites, responses for each site were normalized to their average response to the tuning stimuli.
The stimuli used in the LFP experiments were identical to the stimuli used in the MUA experiments with the following exception: the two components were modulated with a temporal frequency of 3.5 and 5 Hz. Data were obtained in 4 experiments.
LFPs were sampled at 2 kHz with a wide front-end filter (0.3 Hz – 2 kHz). We further low-pass filtered the LFP below 90 Hz to exclude any contamination by multi-unit responses. LFPs were averaged across all electrodes in the Utah array that were responsive (circular variance of multi-unit responses to 12 different orientations < 0.85). Amplitudes at twice the stimulus frequency were extracted from the averaged data using Fourier analysis.
Methods for VEP recordings and source localization are described elsewhere (Appelbaum et al., 2006). Briefly, the EEG was acquired with 128-channel HydroCell Sensor Nets and a Netstation digitization system (EGI, Eugene OR) in 5 subjects. For each subject, the 3D locations of the sensors were recorded using a ‘Fastrak’ radio-frequency 3D digitizer (Polhemus, Colchester VT) and co-registered to their T1-weighted anatomical Magnetic Resonance (MR) scans, from which a three-shell boundary element model of the skull and scalp was computed.
Stimuli were presented foveally on a CRT monitor (refresh rate 72 Hz, mean luminance 32 cd/m2). Stimuli were viewed binocularly and consisted of contrast-reversing gratings, modulated at temporal frequencies of 4.5 Hz (test) and 3.6 Hz (mask). Gratings were presented within a circular window spanning 5 degrees of visual angle and their spatial frequency was 1 cpd. The test gratings had 0, 6, 12, 25, or 50% contrast; the mask gratings had 0 or 25% contrast. Each combination of grating contrasts was presented at least 15 times. Each trial lasted 11.1 seconds. During the trial, subjects attended to a stream of simultaneously presented letters and were instructed to indicate the presence of a probe letter “T” among distracters “L”.
EEG signals were post-processed using custom software to remove artifacts due to head motion and blinks. To localize the sources of the VEP activity cortically constrained minimum norm source estimates (Hämäläinen and Ilmoniemi, 1994) were computed and related to each subject’s visual areas as defined by fMRI retinotopic mapping (Engel et al., 1997). We extracted the time course of activity from area V1, and obtained the responses to test and mask grating by Fourier analysis.
The models used to fit MUA are given by Equations (1)–(4) in the main text. Before fitting the models, we subtracted spontaneous activity to the gray screen from all responses. The model used to fit LFP and VEP responses is given by Equation (5). All models were fitted by a least-squares algorithm.
Fit quality was assessed by one of two measures. The first measure is the percentage of variance in the responses r explained by the model predictions m:
where the indices i indicate the bin of orientation preference and indicates the mean of the responses. The second measure (fit quality index) is the root mean square deviation between responses and model, normalized to the mean of the observed responses:
A population of direction-selective V1 neurons was modeled using the average parameters obtained from fitting our population recordings with the normalization model (Equation (4), width, semisaturation contrast and exponent), and assuming an equal spacing of preferred direction across neurons. The weighting profile of the MT neuron is given by a Gaussian with a sigma of 65 deg (inspired by the neuron in Figure 5E of Rust et al. (2006)). V1 population responses to plaids of 120 deg angle were simulated using the normalization model (Equation (4)) and multiplied by the MT weighting profile. Responses below zero were set to zero.
We investigated the representation of concurrent stimuli in the population responses of visual cortex. When the visual system is faced with multiple simultaneous orientations and contrasts, the population response can be described as a weighted sum of the responses to the individual stimulus components. The weights applied to the stimulus components depend on the specific combination of stimulus strengths and lead to a range of behaviors from equal summation to winner-take-all competition.
Our results agree qualitatively but not precisely with a recent study that has explored population responses to a subset of our stimuli, namely plaids whose components have identical contrast (MacEvoy et al., 2009). This study reported that population response to such plaids resembled the average of the responses to the component gratings, independent of overall contrast. This behavior corresponds to equal-weights summation with weights of 0.5. Similar to their results, we find that the weights for same-contrast plaids do not depend on overall contrast (ANOVA, F = 0.61, df = 3, p = 0.5). However, we consistently found weights higher than 0.5 (average of 0.67 ± 0.14 SD, N = 9). The discrepancy may be due to differences in recorded signals (spikes vs. blood related signals) and in species (cat vs. tree shrew).
Our study, in fact, considerably extends this previous finding by providing a more direct measure of neuronal responses, by probing the responses to plaids in which component contrasts differ, by showing that they obey a weighted-sum law across all contrast combinations, and by providing a unifying normalization model that accounts for the responses. The model has few free parameters, and exhibits a gradual transition between regimes of equal summation and winner-take-all competition that depends on the stimulus contrasts.
By applying and validating a normalization model to populations, moreover, we substantially extend prior studies of normalization in individual neurons. The normalization model has been shown to predict the responses of single neurons to stimuli that were tailored to the recorded neuron (Carandini et al., 1997; Freeman et al., 2002; Heeger, 1992). These prior results, however, concern only a subset of the neurons in a population: those that are stimulated optimally by either component of the plaids. Such neurons inhabit 2 of the 12 orientation preference bins in which we have divided the population. It was not known whether their collective action could be explained by a single normalization model. Moreover, the majority of neurons – those in the remaining 10 bins are – driven optimally by neither grating. It was unknown whether the normalization model would have predicted their responses.
While performing the apparently mundane task of contrast gain control, normalization provides a single mechanism that can gradually turn the cortex from operating equal summation to operating winner-take-all competition. Work in single neurons had well established that normalization could capture the effects of cross-orientation suppression and thus predict sublinear summation of responses to the individual components. As revealed by our measurements, however, the normalization model exhibits an unintuitive behavior: a strong population response to a single stimulus can be dramatically suppressed by adding a stimulus of higher contrast, so much that the population response to the compound stimulus seems to represent only the stimulus with higher contrast.
The ability of the normalization model to capture population activity in response regimes ranging from equal summation to winner-take-all competition resonates with a recent proposal of a canonical divisive computation (Kouh and Poggio, 2008). While the mathematical expressions for the two divisive computations are similar, there are also important differences. First, the normalization model acts on the representation of a stimulus in the population activity; this representation is weighted by stimulus strength, not by synaptic strength. Second, the normalization model accounts for all responses with a single set of parameters, not by postulating different circuits with different sets of parameters. Hence, the normalization model predicts that the same neural circuitry can operate across all stimulus conditions.
The neural circuitry underlying response normalization is far from clear, and may involve a combination of synaptic inhibition (Carandini et al., 1997), modulation of input noise (Chance et al., 2002; Finn et al., 2007), and nonlinearities in the input from the LGN (Carandini et al., 1997; Freeman et al., 2002; Priebe and Ferster, 2006). Our results of profound suppression of LFP responses in the presence of multiple stimuli indicate that spike threshold in area V1 may not play a major role in response normalization. This result certainly agrees with measurements of cross-orientation suppression in the membrane potential of V1 simple cells (Priebe and Ferster, 2006).
Similar to LFPs in anesthetized cats, normalization also affects population responses recorded from human V1. Our data are in general agreement with earlier measurements of cross-orientation suppression in the human VEP (Candy et al., 2001; Morrone and Burr, 1986). We extend the results of these studies in three ways. First, by using source-localization combined with fMRI retinotopic mapping we concentrate on responses of area V1 rather than the whole occipital cortex. Second, we relate directly these population responses in humans to measurements of LFP obtained in cats with similar stimulation methods. Third, we demonstrate that both LFP and source-imaged VEP responses can be predicted quantitatively by a normalization model with similar parameter values.
The normalization observed in the human primary visual cortex should have behavioral consequences during the perception of superimposed orientations: perceptual thresholds for the test stimulus should be elevated in the presence of a mask. Such threshold elevations have indeed been found in a number of psychophysical experiments and are well characterized (see e.g., Petrov et al., 2005). Also in agreement with our findings, psychophysical experiments exploiting reflexive ocular following movements suggest an intriguing combination of summation and winner-take-all mechanisms depending on the relative contrast of competing image motions (Sheliga et al., 2006). In the future, combining recordings of neuronal population activity with measurements of perceived contrast promises to advance our understanding of the link between neural population responses and perception.
With simple simulations we show that normalization in the population responses of V1 can have profound effects on neurons in higher visual areas. Strikingly, normalization in V1 makes our model MT neuron lose its pattern-selectivity when the two components of a plaid have dissimilar contrast. This prediction may not be far from truth. First, it is consistent with a recent abstract reporting exactly such behavior in actual MT neurons (Kumbhani et al.). Second, it is consistent with similar behaviors seen in the responses of MT neurons to multiple stimuli at different positions in their receptive field, which exhibit summation for stimuli with similar contrast and competition for stimuli with different contrast (Britten and Heuer, 1999; Heuer and Britten, 2002). Third, our prediction is consistent with the tendency of human observers to perceive plaids of different component contrast as moving in the direction of the component of higher contrast (Stone et al., 1990).
The normalization model for population responses could serve as a front-end of models of attentional selection. Attention enhances the processing of behaviorally relevant information and reduces the impact of irrelevant information; effects of attention have been likened to changes in effective stimulus contrast (Martinez-Trujillo and Treue, 2002; Reynolds et al., 2000). In our experiments, which do not engage attentional modulation, the sensory representation of multiple stimuli overemphasizes the more salient object. Attention could act on top of these purely sensory effects: spatial attention could further accentuate the existing asymmetry of the sensory representation (Ghose and Maunsell, 2008; Moran and Desimone, 1985; Reynolds et al., 1999; Treue and Martinez-Trujillo, 1999) and feature-based attention could additionally sharpen the population response to the attended feature (Martinez-Trujillo and Treue, 2004). In line with these ideas, the effects of attention on psychophysical performance have been modeled as competitive interactions between visual filters (Lee et al., 1999). Furthermore, it has recently been proposed that attention might use existing normalization circuits to modulate neuronal responses to sensory stimuli (Reynolds and Heeger, 2009).
Normalization, in summary, makes population responses operate in not only in a regime of equal summation, where the response to summed stimuli resembles the scaled sum of the responses, but also in a regime of winner-take-all competition, where the response to summed stimuli resembles the response to the stronger stimulus alone. Effects of this kind have been observed in single neurons of regions as diverse as cortical area MT (Britten and Heuer, 1999; Heuer and Britten, 2002) and inferior colliculus (Keller and Takahashi, 2005). They may be due to normalization operating in the population responses in those regions or in earlier stages. These results suggest that normalization may be a fundamental operation, one that shapes population responses more profoundly than might have been expected from earlier studies.
We are grateful to S Katzner and A Benucci for help with the experiments in cats and to J. Rowland for help with the experiments in humans, and to D. Fitzpatrick and S. MacEvoy for helpful discussions. This work was supported by National Institutes of Health grants EY-17396 (MC) and EY-18157 (ARW), by a Project Grant from the UK charity Fight for Sight (MC) and by a postdoctoral fellowship of the German Academy of Sciences Leopoldina BMBF-LPD9901/8-165 (LB). MC holds the GlaxoSmithKline/Fight for Sight Chair in Visual Neuroscience.
Here we show that contrast invariance of population responses follows from independence of contrast sensitivity and orientation selectivity across neurons. The argument is that the sum of separable functions with two factors is itself separable only if the two factors are statistically independent. Contrast-invariance for an individual neuron i means that the neuron’s response ri(θ,c) is the product of a tuning curve for orientation and a contrast response curve, gi(θ)fi(c). The population response Rϕ(θ,c) is the sum of many such individual responses, Σiri(θ,c), and therefore a sum of products, Σigi(θ)fi(c). In general, this sum will not itself be separable (contrast-invariant). Yet, it will be separable in the special case in which the terms gi(θ) and fi(c) are statistically independent, i.e. if there is no consistent relationship between a neuron’s orientation tuning and its contrast response.
If the two component contrasts are similar (c1 ≈ c2 ≈ c), Equation (4) becomes equivalent to the weighted-sum model (Equation (3)) under the scenario of equal weights. Specifically, the predicted weights are
If the two component contrasts are dissimilar (c1 c2), Equation (4) becomes equivalent to the weighted-sum model (Equation (3)) under the scenario of winner-take-all weights. Specifically, the predicted weights are
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.