|Home | About | Journals | Submit | Contact Us | Français|
Statistical dependencies in the responses of sensory neurons govern both the amount of stimulus information conveyed and the means by which downstream neurons can extract it. Although a variety of measurements indicate the existence of such dependencies1–3, their origin and importance for neural coding are poorly understood. Here we analyse the functional significance of correlated firing in a complete population of macaque parasol retinal ganglion cells using a model of multi-neuron spike responses4,5. The model, with parameters fit directly to physiological data, simultaneously captures both the stimulus dependence and detailed spatio-temporal correlations in population responses, and provides two insights into the structure of the neural code. First, neural encoding at the population level is less noisy than one would expect from the variability of individual neurons: spike times are more precise, and can be predicted more accurately when the spiking of neighbouring neurons is taken into account. Second, correlations provide additional sensory information: optimal, model-based decoding that exploits the response correlation structure extracts 20% more information about the visual scene than decoding under the assumption of independence, and preserves 40% more visual information than optimal linear decoding6. This model-based approach reveals the role of correlated activity in the retinal coding of visual stimuli, and provides a general framework for understanding the importance of correlated activity in populations of neurons.
How does the spiking activity of a neural population represent the sensory environment? The answer depends critically on the structure of neuronal correlations, or the tendency of groups of neurons to fire temporally coordinated spike patterns. The statistics of such patterns have been studied in a variety of brain areas, and their significance in the processing and representation of sensory information has been debated extensively2,3,7–13.
Previous studies have examined visual coding by pairs of neurons11 and the statistics of simultaneous firing patterns in larger neural populations14,15. However, no previous approach has addressed how correlated spiking activity in complete neural populations depends on the pattern of visual stimulation, or has answered the question of how such dependencies affect the encoding of visual stimuli.
Here we introduce a model-based methodology for studying this problem. We describe the encoding of stimuli in the spike trains of a neural population with a generalized linear model (Fig. 1a), a generalization of the well-known linear–nonlinear–Poisson (LNP) cascade model4,5,16,17. In this model, each cell’s input is described by a set of linear filters: a stimulus filter, or spatio-temporal receptive field; a post-spike filter, which captures dependencies on spike-train history (for example, refractoriness, burstiness and adaptation); and a set of coupling filters, which capture dependencies on the recent spiking of other cells. For each neuron, the summed filter responses are exponentiated to obtain an instantaneous spike rate. This is equivalent to exponentiating the filter outputs and then multiplying; the exponentiated post-spike and coupling filters (as plotted in Fig. 1) may therefore be interpreted as spike-induced gain adjustments of the neuron’s firing rate.
Although this model is strictly phenomenological, its components can be loosely compared to biophysical mechanisms: the stimulus filter approximates the spatio-temporal integration of light in the outer retina and passive dendritic filtering; the post-spike filter mimics voltage-activated currents following a spike; coupling filters resemble synaptic or electrical interactions between cells (and can mimic the effects of shared input noise); and the exponential nonlinearity implements a ‘soft threshold’, converting membrane potential to instantaneous spike probability. Note that the post-spike and coupling filters, which allow stochastic spiking in one cell to affect subsequent population activity, give rise to shared, non-Poisson variability in the model response.
We fit the model to data recorded in vitro from a population of 27 ON and OFF parasol ganglion cells (RGCs) in a small patch of isolated macaque monkey retina, stimulated with 120-Hz spatio-temporal binary white noise. The receptive fields of each of the two cell types formed a complete mosaic covering a small region of visual space (Fig. 1b), indicating that every parasol cell in this region was recorded15,18. Such complete recordings, which have not been achieved elsewhere in the mammalian nervous system, are essential for understanding visual coding in neural populations.
The model contains many parameters that specify the shapes of all filters, but fitting by maximizing likelihood remains highly tractable5. A penalty on coupling filters was used to obtain a minimally sufficient set of coupling filters, which yields an estimate of the network’s functional connectivity19,20.
Figure 1 shows the estimated filters describing input to example ON and OFF cells. The stimulus filters exhibit centre-surround receptive field organization consistent with previous characterizations of parasol cells. Post-spike filters show the time course of recovery from refractoriness after a spike, and coupling filters show the effects of spikes from nearby cells: for the ON cell (top), spikes in neighbouring ON cells elicit a large, transient excitation (increasing the instantaneous spike rate by a factor of three), whereas spikes in nearby OFF cells elicit suppression. These effects are reversed in the OFF cell, which is excited/suppressed by spikes in neighbouring OFF/ON cells. Both populations exhibit approximate nearest-neighbour connectivity, with coupling strength falling as a function of distance between receptive field centres15. We found that fitted stimulus filters have smaller surrounds than the spike-triggered average, indicating that a portion of the classical surround can be explained by interactions between cells21 (see Supplementary Information).
To assess accuracy in capturing the statistical dependencies in population responses, we compared the pairwise cross-correlation function (CCF) of RGCs and simulated model spike trains (Fig. 2). For nearby ON–ON and OFF–OFF pairs, the CCF exhibits a sharp peak at zero, indicating the prevalence of synchronous spikes; however, for ON–OFF pairs, a trough at zero indicates an absence of synchrony. For all 351 possible pairings, the model accurately reproduces the CCF (Fig. 2a–c, e, f).
To examine whether inter-neuronal coupling was necessary to capture the response correlation structure, we re-fitted the model without coupling filters (that is, so that each cell’s response depends only on the stimulus and its own spike-train history). This ‘uncoupled model’ assumes that cells encode the stimulus independently, although correlations may still arise from the overlap of stimulus filters. However, the uncoupled model fails to reproduce the sharp CCF peaks observed in the data. These peaks are also absent from CCFs computed on trial-shuffled data, indicating that fast-timescale correlations are not stimulus-induced and therefore cannot be captured by any independent encoding model.
Higher-order statistical dependencies were considered by inspecting correlations in three-neuron groups: triplet CCFs show the spike rate of one cell as a function of the relative time to spikes in two other cells (Fig. 2e–g)15. For adjacent neurons of the same type, triplet CCFs have substantial peaks at zero (‘triplet synchrony’), which are well matched by the full model.
Although the full and uncoupled models differ substantially in their statistical dependencies, the two models predict average light responses in individual cells with nearly identical accuracy, capturing 80–95% of the variance in the peri-stimulus time histogram (PSTH) in 26 out of 27 cells (Fig. 3a–c). Both models therefore accurately describe average single-cell responses to new stimuli. However, the full model achieves higher accuracy, predicting multi-neuronal spike responses on a single trial (8 ± 3% more bits per spike, Fig. 3d). This discrepancy can be explained by the fact that noise is shared across neurons. Shared variability means that population activity carries information about a single cell’s response (owing to coupling between cells) beyond that provided by the stimulus alone. Individual neurons therefore appear less noisy when conditioned on spiking activity in the rest of the population than they appear in raster plots.
We measured the effect of correlations on single-trial, single-cell spike-train prediction by using the model to draw samples of a single cell’s response given both the stimulus and the spiking activity in the rest of the population on a single trial (Fig. 3e, f). Averaging the resulting raster plot gives a prediction of the cell’s single-trial spike rate, or ‘population-conditioned’ PSTH for a single trial. We compared these predictions with the cell’s true spike times (binned at 2 ms) across all trials and found that on nearly every trial, the model-based prediction is more highly correlated with the observed spikes than the neuron’s full PSTH (Fig. 3g). Note that the full PSTH achieves the highest correlation possible for any trial-independent prediction. Thus, by exploiting the correlation structure, the coupled model predicts single-neuron spike times more accurately than any independent encoding model.
Although the full model accurately captures dependencies in the activity of RGCs, it is not obvious a priori whether these dependencies affect the amount of sensory information conveyed by RGC responses. In principle, the correlation structure could be necessary to predict the responses, but not to extract the stimulus information that the responses carry13. To examine this issue directly, we used the full and uncoupled models to perform Bayesian decoding of the population response (Fig. 4a), which optimally reconstructs stimuli given an accurate description of the encoding process. For comparison, we also performed Bayesian decoding under a Poisson (that is, LNP) model and optimal linear decoding6.
Each decoding method was used to estimate short (150-ms) segments of the stimulus given all relevant spike times from the full population (Fig. 4b). Bayesian decoding under the coupled model recovers 20% more information than Bayesian decoding under the uncoupled model, indicating that knowledge of the correlation structure is critical for extracting all sensory information contained in the population response. This improvement was invariant to enhancements of the model’s stimulus filters and nonlinearities (see Supplementary Information), indicating that the difference in performance arises specifically from the coupled model’s ability to incorporate the correlation structure. Our results also show that spike history is relevant for decoding (a Poisson model preserves 6% less information than the uncoupled model22) and that restricting to a linear decoder further reduces the information that can be recovered from RGC responses.
Decoding analysis can also be used to examine the coding fidelity of specific stimulus features. As a simple illustration, we examined the temporal frequency spectrum of reconstructed stimuli and found that the response correlation structure is most important for decoding those stimulus frequencies (6–20 Hz) that are encoded with highest fidelity (Fig. 4c).
These results demonstrate that the responses of a population of retinal ganglion cells are well described by a generalized linear model, and that correlations in the response can be exploited to recover 20% more visual information than if responses are regarded as independent given the stimulus. In contrast, previous studies have reported this information gain to be less than 10% for pairs of neurons9,12. However, pairwise analyses provide little evidence about the importance of correlations across an entire population. Second-order correlations between pairs of neurons could give rise to either much larger (scaling with the number of neurons n) or much smaller (falling as 1/n) gains for a full population (see Supplementary Information). To compare more directly with previous findings, we performed Bayesian decoding using isolated pairs of neurons from the same population; we found a ≤10% gain in sensory information when correlations were included (see Supplementary Information). This is consistent with previous findings, and shows that the information gain for a complete population is larger than that observed for pairs. We also compared the model to a pairwise maximum-entropy model, which has recently been shown to capture the instantaneous spiking statistics of groups of retinal ganglion cells14,15. The coupled model exhibits similar accuracy in capturing these statistics, but has the advantage that it accounts for the temporal correlation structure and stimulus dependence of responses, which are essential for assessing the effect of correlations on sensory coding.
Although it provides an accurate functional description of correlated spike responses, the generalized linear model does not reveal the biophysical mechanisms underlying the statistical dependencies between neurons: coupling does not necessarily imply anatomical connections between cells, but could (for example) reflect dependencies due to shared input noise1. The model also lacks several mechanisms known to exist in retinal ganglion cells (for example, contrast gain-control23), which may be required for characterizing responses to a wider variety of stimuli. One additional caveat is that Bayesian decoding provides a tool for measuring the sensory information available in the population response, but it does not reveal whether the brain makes use of this information. Physiological interpretations of the model and mechanisms for neural read-out of sensory information in higher brain areas are thus important directions for future research.
Nevertheless, the generalized linear model offers a concise, computationally tractable description of the population encoding process, and provides the first generative description of the space–time dependencies in stimulus-induced population activity. It allows us to quantify the relative contributions of stimulus, spike history and network interactions to the encoding and decoding of visual stimuli, and clarifies the relationship between single-cell and population variability. More generally, the model can be used to assess which features of the visual environment are encoded with highest and lowest fidelity, and to determine how the structure of the neural code constrains perceptual capabilities. We expect this framework to extend to other brain areas, and to have an important role in revealing the information processing capabilities of spiking neural populations4,19,24,25.
Multi-electrode extracellular recordings were obtained in vitro from a segment of isolated, peripheral macaque monkey (Macaca mulatta) retina, and analysis was restricted to two cell types (ON and OFF parasol)15,26,27. A standard spike-sorting procedure, followed by a specialized statistical method for detecting simultaneous spikes, was used to sort spikes (see ref. 28). The retina was stimulated with a photopic, achromatic, optically reduced spatio-temporal binary white noise stimulus refreshing at 120 Hz, with a root-mean-square contrast of 96%.
Model parameters were fitted to 7 min of spike responses to a non-repeating stimulus. Each cell’s parameters consisted of a stimulus filter (parametrized as a rank-2 matrix), a spike-history filter, a set of incoming coupling filters and a constant. Temporal filters were represented in a basis of cosine ‘bumps’22. Parameters for the uncoupled and Poisson (LNP) models were fitted independently. Parameters were fitted by penalized maximum likelihood4,5, with an L1 penalty on the vector length of coupling filters to eliminate unnecessary connections.
Spike prediction was cross-validated using the log-likelihood of 5 min of novel spiking data (scaled to units of bits per s). Repeat rasters were obtained using 200 presentations of a novel 10-s stimulus. Population-conditional rasters were obtained from the coupled model by sampling the model-defined probability distribution over the neuron’s response given the stimulus and surrounding-population activity on a single trial29.
Population responses were decoded using the Bayes’ least-squares estimator (posterior mean) to reconstruct 18-sample single-pixel stimulus segments (cross-validation data). Linear decoding was performed using the optimal linear estimator6. Decoding performance was quantified using the log signal-to-noise ratio (SNR) of each technique, which gives an estimate of mutual information. Breakdown by temporal frequency was obtained by computing the Fourier power spectra of the stimuli and residuals and then computing log SNR.
We thank M. Bethge, C. Brody, D. Butts, P. Latham, M. Lengyel, S. Nirenberg and R. Sussman for comments and discussions; G. Field, M. Greschner, J. Gauthier and C. Hulse for experimental assistance; M. I. Grivich, D. Petrusca, W. Dabrowski, A. Grillo, P. Grybos, P. Hottowy and S. Kachiguine for technical development; H. Fox, M. Taffe, E. Callaway and K. Osborn for providing access to retinas; and S. Barry for machining. Funding was provided a Royal Society USA/Canada Research Fellowship (J.W.P.); NSF IGERT DGE-03345 (J.S.); NEI grant EY018003 (E.J.C., L.P. and E.P.S.); Gatsby Foundation Pilot Grant (L.P.); Burroughs Wellcome Fund Career Award at the Scientific Interface (A.S.); US National Science Foundation grant PHY-0417175 (A.M.L.); McKnight Foundation (A.M.L. and E.J.C.); and HHMI (J.W.P., L.P. and E.P.S.).