|Home | About | Journals | Submit | Contact Us | Français|
Understanding how the brain performs computations requires understanding neuronal firing patterns at successive levels of processing—a daunting and seemingly intractable task. Two recent studies have made dramatic progress on this problem by showing how its dimensionality can be reduced. Using the retina as a model system, they demonstrated that multineuronal firing patterns can be predicted by pairwise interactions.
One of the most challenging problems we face in systems neuroscience is understanding how the brain performs computations. Understanding this means, essentially, understanding how the brain takes a set of inputs and transforms it into a set of outputs. When we study vision, for example, we do this at many levels. We present a visual stimulus (an input) to the photoreceptors and try to determine how it is transformed into a pattern of action potentials (an output) at the level of the ganglion cells, then how the pattern of action potentials at the level of the ganglion cells is transformed into a pattern of action potentials at the level of the lateral geniculate nucleus, then various levels of cortex, until, finally, a behavior is produced.
One of the main reasons this problem has been so difficult is that it requires accurate descriptions of the input and output data at each level. Take the visual system again—at each level (retina, lateral geniculate nucleus, cortex) there are hundreds to thousands of cells in a processing unit, and, at each moment in time, each of these cells is either firing or not firing an action potential. Add to this the fact that the firing of each cell is at least somewhat dependent on the firing of other cells, one can see right away that the problem is very high dimensional.
So how can it be simplified? Two approaches come to mind. One is a top-down approach. For this, one takes the firing patterns produced when an animal performs a task and determines the crucial features (i.e. those that are needed to perform the task) . This provides a way to identify the relevant quantities in the firing patterns (e.g. spike count, spike timing, temporal correlations), and discard the irrelevant ones. The other way to simplify the problem, the way that is the subject of this review, is to directly parameterize the firing patterns in a low-dimensional way. At first glance, it might seem that any low-dimensional parameterization would be hopelessly inaccurate, but what is exciting, and what we review here, is that it is not. Two recent papers show that a low-dimensional parameterization is dramatically effective—at least at the level of the retina.
The two papers are ‘The structure of multi-neuron firing patterns in primate retina’ by Shlens et al. [2••] and ‘Weak pairwise correlations imply strongly correlated network states in a neural population’ by Schneidman et al. [3••].
We start with the Shlens et al. paper [2••]. In this study, the authors focus on finding the simplest characterization of the response distribution of the output cells of the retina. Previous studies had shown that characterizing the response distribution assuming that the cells are independent would not suffice, that is, would not give the correct frequency of the joint firing events of multiple cells [4-6]. This led them to take the next simplest approach, which was to characterize the response distribution taking pairwise correlations into account. To carry out this, they used a maximum entropy method (see Box 1). Their rationale was that it gives the simplest distribution consistent with the measured pairwise correlations.
The maximum entropy principle  (see references [20,21] for pioneering applications to neuroscience) is a method to create a probability distribution from a limited set of measurements. The basic idea is to determine the most random probability distribution consistent with a set of constraints. Randomness is measured by ‘entropy’: if Pj indicates the probability that event j occurs (with ΣPj = 1), then the entropy is defined by H = −ΣPj log Pj. The maximum entropy distribution is the set of probabilities Pj that maximize H, subject to a set of specified constraints.
Since a maximum entropy distribution is as random as possible given a set of constraints, it provides null hypotheses for quantities not explicitly constrained. Shlens et al. [2••] and Schneidman et al. [3••] showed that the empirical frequency of multineuronal firing patterns in the retina was consistent with the maximum entropy null hypothesis generated by their pairwise firing frequencies.
Maximum entropy distributions arise in many familiar contexts. A Gaussian distribution is the maximum entropy distribution constrained by a specified mean and variance. A Poisson process is the maximum entropy distribution of events, constrained by an overall firing rate. A product distribution of several variables is the maximum entropy multivariate distribution constrained by the marginal distribution of the individual variables. The Wiener systems-analytic formalism  (and, the special cases of spike-triggered averaging and covariance) amount to modeling a system's input–output distribution as a maximum entropy distribution constrained by the input–output correlations and the response variance .
Specifically, what Shlens et al. [2••] did was analyze the firing patterns of clusters of seven neurons within the retinal mosaic. They tabulated the frequency of each of the possible firing patterns during every 10 ms period. This comes to 27 = 128 firing patterns. (The reason the base is 2 is that ‘firing’ was represented as a binary quantity—that is, each neuron was designated as ‘firing’ or ‘not firing’ during each 10 ms epoch.) The question they asked then was: Can they capture the complete set of firing pattern frequencies using a much smaller number of parameters—the firing frequency of each cell by itself (7 parameters, 1 for each cell) and the frequencies of the pairwise firings (21 parameters, 1 for each pair of cells)? They found, remarkably, that the maximum entropy distribution constructed this way was virtually indistinguishable from the true 128-pattern distribution. Even more remarkably, when they performed the same analysis using only the nearest neighbor pairs, they could also recover the original distribution. (See Box 2 for how they quantified the level of agreement between the predictions of the pairwise model and the observed frequencies of the multineuronal firing events.) For a discussion on why local interactions between pairs of neurons can account for global patterns of activity, see Figure 1.
How can one quantify the agreement of a model for the frequency of global firing patterns (i.e. a probability model) with data? The conceptual problem is that even if one has a correct probability model, one may not be able to predict particular events. For example, even though one knows the correct probability of all poker hands, one cannot predict the results of any particular deal. The basic strategy that Shlens et al. [2••] used was as follows. For each reduced probability model (e.g. a maximum entropy model based on frequencies of pairwise interactions), they calculated the likelihood that the model would predict the observed set of global firing patterns. They then determined the ratio of the likelihood for a reduced model to the likelihood for the full empirical model (i.e. one based on the experimentally determined frequency of each firing pattern).
This likelihood ratio indicates how difficult it is to distinguish the predictions of the reduced model from that of the full empirical model. For example, Shlens et al. [2••] determined that it would take about a minute of data before the likelihood of the pairwise model and the likelihood of the full empirical model would differ by a factor of two.
The logarithm of the likelihood ratio is closely related to the Kullback-Leibler divergence , which is a principled way to measure how difficult it is to distinguish two probability distributions. Schneidman et al. [3••] used a conceptually similar strategy based on the related Jensen–Shannon divergence .
Note that this kind of analysis was possible because of the recording techniques Shlens et al. [2••] used. With a multielectrode array, they were able to record from nearly all the ON and OFF cells of one cell class in a small region of retina (ON and OFF parasol cells in a 4 × 8 degree region of peripheral macaque retina). From this, they drew clusters of more or less adjacent cells from either the ON or OFF populations (clusters of 7 cells drawn from 118 ON cells and 175 OFF cells).
The implications of this analysis are most striking when we consider how they apply to a large network of neurons. In a network of N neurons, there are 2N multineuronal firing patterns whose frequencies must be explained. For N = 20, this is over a million; for N = 50, it is astronomical. In a pairwise model, N + N(N − 1)/2 parameters (the individual neurons' firing frequencies and their pairwise firing frequencies) suffice to account for the multineuronal frequencies; the parameter count has now been reduced substantially (210 for N = 20, 1275 for N = 50). The further reduction provided by the nearest neighbor model reduces the parameter count to approximately 4N: Each of the N neurons has approximately 6 nearest neighbors in a retinal mosaic, so there are 6N/2 = 3N nearest neighbor interactions and N individual firing rates. The number of parameters that need to be measured is then proportional to the number of neurons. That is, if nearest neighbor interactions determine the full correlational structure, the complexity of the model, and the length of time required to measure its parameters, becomes greatly reduced.
Schneidman et al. [3••] performed a similar analysis and also found that maximum entropy models based on pairwise interactions could account for retinal ganglion cell firing patterns. Their paper, though, uses the analysis for an additional purpose—to make inferences about the neural code. Here we briefly discuss what the inferences are and whether or not they are justified.
The main proposal the authors put forth is that the pairwise interactions present in the firing patterns imply that the retina uses an error-correcting code. Briefly, an error-correcting code is one in which the signals in a system are correlated so that messages can be correctly decoded in the face of noise (see Box 3 for an example).
An error-correcting code is a means for obtaining reliable signaling from unreliable components. To see how an error-correcting code works, we consider a toy sensory system and environment. The environment consists of a stimulus that is present half of the time. The sensory system is built out of neurons that signal the presence or absence the stimulus by the presence or absence of a spike—but do so incorrectly with probability P. If there is only one neuron in the sensory system, then the probability of an incorrect message is P. The error probability can be reduced by adding identical neurons that observe the same stimulus and by decoding the message by a majority vote. For example, with three neurons, errors will occur only when all neurons signal incorrectly (probability P3) or when two neurons are in error and one is not (probability 3P2(1 − P). The total chance of an error (P3 + 3P2 (1 − P)) is less than P. With sufficiently many neurons, this majority-rule code can achieve any desired degree of accuracy.
The possibility of error correction occurring in the retina is reasonable. Ganglion cell firing is noisy, so one could imagine that some correlation in the system might be useful for ensuring reliable information transmission. (The benefits, though, must be weighed against a potential loss of efficiency .) What the authors propose, though, is that there is much more correlation than one would have thought, and from this they conclude that error correction is a dominating coding mechanism.
There are two problems with this proposal. The first concerns the method they used to determine the extent of the correlations, and the second concerns the source of the correlations.
To determine the extent of the correlations, Schneidman et al. [3••] used an information-theoretic measure. They applied the measure to analyses of subsets of N observed neurons within the retinal mosaic. For each N-neuron subset, they compared an estimate of the entropy of the observed pattern of responses, SN, with the entropy of a network of independent neurons, S1. The difference between these two quantities, IN = S1 − SN, is a measure of the total amount of correlation. Correlations are considered to dominate network behavior when IN approaches S1 because at this point, the correlations are so strong so as to reduce the entropy of observed firing patterns, SN to 0.
Using the pairwise maximum entropy model, Schneidman et al. [3••] determined these quantities for small N and extrapolated to the whole network. In their extrapolation, IN approaches S1 for N ~ 200, that is, correlations dominate. However, the data that form the basis for the extrapolation stop at N = 15. At this point, IN is only one-tenth of S1, which means that correlations are far from dominating. Thus, their conclusion that error correction dominates is built on an extrapolation that extends an order of magnitude beyond the limits of the data. What is troubling is that the rationale for the form of the extrapolation they used is unclear [25•]. Many other extrapolations would also have fit the data but would have led to different conclusions.
The second issue is the source of the correlation. The authors use natural scenes as their stimuli. This makes sense in that the goal is to determine if error correction occurs with behaviorally relevant stimuli. The problem is that natural scene stimuli themselves have correlations , and this is not controlled for. As a result, it is not clear whether correlations in the ganglion cell output reflect an error correcting mechanism in the retina, or merely correlations in the stimulus. As the authors agree (Elad Schneidman, personal communication), this could be addressed experimentally.
In sum, two recent studies show that the problem of characterizing the activity of large populations of neurons, at least in the retina, can be dramatically reduced. In the first study, Shlens et al. [2••] found, using a maximum entropy model, that the activity could be accounted for by pairwise interactions. Moreover, nearest neighbor interactions sufficed. Schneidman et al. [3••] used the same approach and also found that pairwise interactions sufficed. They used their analysis, though, for a different purpose: to assert that the retina uses an error-correcting code. The jury is still out on whether their data support this conclusion.
The success of maximum entropy methods in achieving a striking dimensional reduction opens the door to further applications of this approach, in retina and beyond (Ohiorhenuan et al., abstract III-90, CoSyNe, Salt Lake City, UT, February 2007; Tang et al., abstract II-62, CoSyNe, Salt Lake City, UT, February 2007; Yu et al., abstract I-16, CoSyNe, Salt Lake City, UT, February 2007). Further applications could also consider firing patterns distributed over time [9-16] and how firing patterns might be contingent on the stimulus [4,17,18]. Both kinds of extensions are straightforward, at least in principle.
The authors thank Peter Latham for his very helpful comments. This work is supported partly by National Eye Institute Grants 1RO1 EY12978 to S Nirenberg and 2RO1 EY9314 to J Victor.
Papers of particular interest, published within the period of review, have been highlighted as:
• of special interest
•• of outstanding interest