Recordings

Preparation and recording methods are described elsewhere (

Litke et al., 2004;

Frechette et al., 2005;

Chichilnisky and Kalmar, 2002). Briefly, eyes were obtained from 12 deeply and terminally anesthetized macaque monkeys (

*Macaca mulatta*) used by other experimenters in accordance with institutional guidelines for the care and use of animals. Immediately after enucleation, the anterior portion of the eye and vitreous were removed in room light and the eye cup was stored in darkness for at least 20 minutes prior to dissection, in a bicarbonate buffered Ames’ solution (Sigma; St. Louis, MO) bubbled with 95% O

_{2}, 5% CO

_{2} at 32–34 degrees C, pH 7.4. Under infrared illumination pieces of peripheral retina 3–5 mm in diameter, isolated from the retinal pigment epithelium, were placed flat against a planar array of 512 extracellular microelectrodes, covering an area of approximately 1,800 × 900 µ

*m*. Data were obtained from 15–120 minute periods of recording. The preparation was perfused with Ames’ solution.

The voltage on each electrode was digitized at 20 kHz and stored for off-line analysis. Details of recording methods and spike sorting are given elsewhere (

Litke et al., 2004). Briefly, spikes were identified using a threshold of four times the voltage SD. For each spike, the waveform of the spike and the simultaneous waveforms on 6 adjacent electrodes were extracted. Three to five waveform features were identified using principal components analysis. A mixture of Gaussians model was fit to the distribution of features using expectation maximization (

Duda et al., 2001). The number of clusters and initial conditions for the model was determined automatically using an adapted watershed transformation (

Castleman, 1996;

Roerdink and Meijster, 2001) . All clusters were visually inspected and when necessary, a mixture of Gaussians model was fitted using manually selected initial conditions. Clusters with a large number of refractory period violations (more than 10% estimated contamination based on refractory period violations), or spike rates below 1 Hz, were excluded from further analysis.

Stimulation and receptive field analysis

An optically reduced stimulus from a gamma-corrected cathode ray tube computer display refreshing at 120 Hz was focused on the photoreceptor outer segments. The mean intensity was adjusted to a low photopic light level by including neutral density filters in the light path. The mean photon absorption rate for the long (middle, short) wavelength sensitive cones was approximately equal to the rate that would have been caused by a spatially uniform monochromatic light of wavelength 561 (530, 430) nanometers and intensity 4300 (4200, 2400) photons/µ*m*^{2}/sec, incident on the photoreceptors. For the collection of ON-parasol cells shown in , the mean firing rate during exposure to a steady, spatially uniform display at this light level was 11.8 ± 3.6 Hz and 22.4 ± 4.8 Hz for each preparation respectively. Across all preparations examined firing rates varied from roughly 5–20 Hz.

Spatio-temporal receptive fields were measured using a dynamic white noise stimulus in which the intensity of each display phosophor at each pixel location was selected randomly and independently over space and time from a binary distribution. RMS stimulus contrast was 96%, stimulus duration was 30 minutes. The pixel size (60 µ

*m* in 10 preparations, 96 µ

*m* and 120 µ

*m* in two others) was selected to accurately capture the spatial structure of parasol cell receptive fields. For each RGC, the spike-triggered average stimulus was computed; this summarizes how the cell integrates visual inputs over space and time (

Marmarelis and Naka, 1972;

Chichilnisky, 2001). An elliptical 2-dimensional Gaussian function was fitted to the the spatial profile; outlines in represent the boundary for these fits at the SD values specified in the figure caption. The averaged major and minor axes of the 1 SD contour of each parasol cell was ~ 200µ

*m*.

Fraction of spikes in multi-neuron firing patterns

The following method was used to calculate the fraction of spikes in each cell that were synchronized with those of any of its immediate neighbors in the mosaic, over and above the chance expectation. For a given cell, consider the observed probability *p* with which the cell fired simultaneously (±5 ms) with any of its immediate neighbors. Also consider the probability *q* that this would occur if the neurons fired independently, obtained by multiplying the respective firing probabilities of the cells. The probability that the cell participates in unexpected synchronized firing is thus *p* – *q*. Dividing by the total firing probability of the cell *r* yields the fraction of spikes that were part of an unexpected firing pattern, (*p*–*q*)/*r*. Across 20 cells in each of four preparations, this value was 31±5%, 22±4%, 15±2% and 14±4%, respectively (mean ± SD).

Measuring deviations from spatial isotropy

The invariant spatial properties of synchronized firing were quantified by measuring the degree of directional preference in the strength of synchrony. For each cell of interest in the mosaic interior, a polygon was constructed in which the angle and magnitude of each point measures the angle and strength of synchrony with neighboring neurons. The angle and magnitude of these points were uncorrelated (*R*^{2} = 0.02, *n* = 286), and the major and minor axes of an ellipse fitted to all the polygons simultaneously differed by less than 7%, consistent with largely isotropic patterns of synchronized firing. Note that this analysis of cell pairs does not rule out violations of isotropy and translation invariance in higher order interactions. The observation that firing patterns was isotropic was consistent across all preparations examined.

Identifying adjacent neurons in mosaic

Several statistics as well as the formulation of the pairwise-adjacent model required defining adjacent cells in the ON-parasol mosaic. A non-parametric technique was used to define adjacency based on receptive field center locations. Each location in space was associated with the nearest receptive field center. This definition delineates a collection of boundaries between territories associated with neighboring cells, known as a Voronoi tessellation (

Voronoi, 1907;

Okabe et al., 2000). Pairs of cells with shared boundaries are connected with a line in .

The Voronoi tessellation misidentifies adjacency if a hole exists in the mosaic due to an unrecorded cell (e.g. , bottom left). Likewise, the tessellation misidentifies adjacency between cells along the edge of the mosaic. To deal with the latter problem, a secondary stipulation was added that adjacency requires a shared edge within the convex hull of the collection of receptive field centers.

Maximum entropy

Maximum entropy methods are used in statistical inference to identify an unknown distribution given several constraints which are insufficient to fully specify the answer. The method involves selecting the distribution with the greatest

*entropy* consistent with the constraints,

where

*H*[

*P*] = –Σ

_{x} *P*(

*x*)log

_{2} P(

*x*) is the entropy of a firing pattern distribution P(

*x*) and {λ

_{i}} are Lagrange multipliers enforcing the constraints (

Jaynes, 1957a;

Jaynes, 1957b;

Cover and Thomas, 1991;

Amari, 2001). In our application the specified constraints are the firing rates expressed as probabilities and the joint firing patterns

*P*(

*x*_{i},

*x*_{j}) for pairs of neurons

*x*_{i},

*x*_{j} that are adjacent. The Lagrange multipliers enforcing each pairwise-adjacent constraint specify the sensitivity of the maximum entropy solution to slight changes to the strength of synchrony (, red lines). Note that a model with pairwise-adjacent constraints where a spike (or no spike) labeled with a +1 (or −1) is identical to a nearest-neighbor Ising model (

Schneidman et al., 2006;

Tang et al., 2008). The interaction terms of this model () do not directly reflect the strength of synchronized firing between a pair of neurons: for example, interaction terms can propagate correlation to explain observed synchronized firing between non-adjacent cells (

Shlens et al., 2006).

A dual form of this problem is to recognize that computing the maximum entropy distribution is equivalent to computing the maximum likelihood estimate of the Lagrange multipliers {λ

_{i}} given the data (

Csiszar, 1975;

Berger et al., 1996). Calculating the maximum likelihood estimate of the parameters is a special case of estimating the parameters of a discrete Markov random field (

Jordan, 1998) or equivalently, a Boltzmann machine (

Hinton and Sejnowski, 1986). This problem is recognized to be log-concave everywhere, meaning that a unique global maximum exists and can be found with techniques from convex optimization (

Press et al., 1988;

Malouf, 2002;

Berger et al., 1996;

Malouf, 2002;

Darroch and Ratcliff, 1972), simulated annealing (

Hinton and Sejnowski, 1986;

Hertz et al., 1991), or approximations to the full likelihood (

Perez, 1998;

Hinton, 2002). In the case of large models, estimating model parameters is an area of active research (

X et al., 2006).

A drawback of these methods is that most optimization techniques require an explicit calculation of the probability distribution, which in turn requires normalizing across all possible firing patterns. In the case of synchronized firing, the firing pattern distribution across

*n* neurons consists of a set of binary random variables

*x* = (

*x*_{1},

*x*_{2}, …,

*x*_{n}) with over 2

^{n} elements. Hence, determining the normalization constant across the firing pattern distribution for the entire neural population (e.g. 2

^{104} ~ 10

^{30} patterns) is not practical. The parameters of the pairwise-adjacent model can be estimated by making draws from the unnormalized probability distribution by exploiting Markov Chain Monte Carlo (MCMC) techniques (

Kennel et al., 2005;

Gamerman and Lopes, 2006). Traditional MCMC sampling can be slow, thus a specialized form of MCMC sampling was employed (

Swendsen and Wang, 1987). MCMC samples of the estimated distribution were drawn in order to calculate the likelihood gradient (

Perez, 1998). To assure convergence, an adaptive learning procedure was employed, which is equivalent to a form of simulated annealing (

Jordan, 1998).

Because estimating the model parameters is complex, validation of the results was performed in two ways. First, the convergence of the procedure was examined by checking its ability to reproduce the desired constraints. In the case of the pairwise-adjacent model this amounts to matching the firing rates of individual neurons and the degree of synchronized firing between adjacent neurons. The trained model accurately matched the observed firing rate and strength of synchrony (*R*^{2} > 0.99). The remaining error may reflect the convergence criterion selected and statistical variability between the training and testing data sets. Second, the overall ability of the learning procedure to correctly infer the parameters of a truly pairwise-adjacent distribution was tested. 30 minutes worth of samples were generated from a pairwise-adjacent model, and a new pairwise-adjacent model was estimated on these simulated data samples. The newly trained model recovered an unbiased estimate of the parameters {λ_{i}} from the simulated data (*R*^{2} > 0.99) indicating that the learning procedure correctly inferred the appropriate model structure.

Likelihood and information theoretic analysis

Models were compared using likelihood analysis. Details of this analysis are given elsewhere (

Shlens et al., 2006). Briefly, the likelihood of a data set under a given model is the probability of having observed that data set under the assumption that the model is correct. For each of the statistics examined (), the number of observations was counted, producing a histogram

*c* = {

*c*_{i}}, where

*m* = Σ

_{i}c_{i} is the total number of measurements. This produces an estimate of the probability of each value of the statistic,

*P* = {

*p*_{i}} where

For a given model

*Q* = {

*qi*}, the probability of having observed the histogram counts

*c* if the model

*Q* actually generated the observations is the multinomial likelihood (

Duda et al., 2001)

The likelihood shrinks multiplicatively with increasing measurements (larger

*m*). A quantity that is invariant to the number of measurements is the average likelihood

The Kullback-Leibler divergence from information theory (

*D*_{KL}) bears a simple relationship to the average likelihood:

in the limit of infinite measurements (

*m* → ∞). Estimating the KL divergence in a finite data is nontrivial and subject to severe biases, especially in high-dimensional data. To address this issue, techniques for estimating entropy efficiently were used (

Paninski, 2003;

Krichevsky and Trofimov, 1981;

Nemenman et al., 2002;

Orlitsky et al., 2004).

Time scale of analysis

The selection of time scale (or bin size) for defining synchronized firing could influence the results. Ideally, the choice of bin size should match the width of the cross-correlation peak (~10 ms; in (

Shlens et al., 2006)), and encompass a significant fraction of the refractory period, in order to accurately reflect the strength of correlation while avoiding statistical dependencies associated with refractoriness. Selecting too small of a bin size could produce a poor estimate of correlation strength due to limited counting statistics. Selecting too large a bin size could mask synchrony by averaging over times outside the window of synchronization.

A twofold change in the bin size did not significantly affect the results. For the preparation of (top panels), using bin sizes of 5 ms and 20 ms, the pairwise-adjacent model accounted for, respectively, (99.2%, 99.4%, 99.5%) and (99.0%, 98.2%, 98.6%) of the departures from independence in the three statistics examined, similar to the results obtained with a 10 ms bin size (see insets in ). The fraction of firing entropy of an individual RGC predictable from its neighbors was also little affected by bin size. For the retina in , using bin sizes of 5 ms and 20 ms, 7.3%±0.9% and 5.1%±1.1% of firing entropy respectively was predictable from the firing of neighboring cells. Similar results were obtained in the presence of visual stimulation.

Failures of the pairwise-adjacent model

The pairwise-adjacent model failed to explain ~1–2% of the deviations from statistical independence (). Several potential sources of the deviations were examined.

First, when the waveforms of spikes in different neurons overlap in time, the recorded (summed) voltage waveform differs from the waveform produced by either cell alone, potentially causing spikes from both cells to be missed in the spike sorting procedure and leading to an underestimate of synchronized firing events. To test for this possibility, the following control procedure was performed. 30 minutes of data was sampled from a pairwise-adjacent model fitted to recorded data. Then artifacts were introduced that mimicked the hypothetical spike sorting bias. In a first test, roughly 5% of pairwise synchronous events between adjacent neurons were removed. In a second test, 5% of triplet synchronous events between adjacent neurons were removed. A new pairwise-adjacent model was then fitted to and compared with the altered data sets. In both manipulations, no systematic biases were found in the three statistics of interest () and the model accounted for 99.3, 99.6 and 99.6% of the failures of statistical independence. Thus, artifacts associated with the spike sorting procedure are unlikely to entirely account for the observed failures of the model.

Second, to assess whether the failures were attributable to non-stationarity or finite counting statistics, the accuracy of the model was compared to that of a second firing pattern distribution (*P*_{emp}) which consists of the firing probabilities estimated directly from an independent period of recording from the same neurons. The accuracy of *P*_{emp} as a model for *P*_{obs} reveals the reproducibility of the experiment. *P*_{emp} always accounted for at least 99.5% of the deviations from statistical independence. Thus, the failures of the model cannot be explained entirely by counting statistics or non-stationarity of the recordings.

Third, the possibility of systematic deviations between the model and data were examined in more detail (, bottom rows). In 11 of the 12 preparations examined, the model did not systematically overpredict or under-predict any of the three statistics examined. In one preparation (), the model over-predicted the frequency of large firing patterns but not small ones, and the over-prediction grew systematically with size of the firing pattern. Thus, although the results were not entirely consistent across data sets, systematic deviations could not explain the failures of the model in all data sets.

Finally, a potential source of the discrepancy between data and model is an implicit assumption that was made about the temporal structure of the data: that the sequential time samples from the recording are independent observations. This assumption does not take into account interactions over time in the firing of cells. A notable temporal interaction is action potential refractoriness, which reduces the probability that a neuron will generate a spike for a period of several milliseconds after the occurrence of a spike. For temporal interactions to have a significant effect on the results, the distribution of firing statistics must depend substantially on the pattern of activity in preceding time samples. To test this, time samples were divided into two groups: those for which the preceding time sample contained more than the median number of neurons firing, and those for which the preceding time sample contained fewer than the median. Each of the three statistics was then examined separately for the two groups of firing patterns. The effect of the preceding time sample was large. For example, in the preparation in , the Kullback-Leibler divergence between the full distribution of numbers of neurons firing and the group with high and low preceding firing was 1.74e-4 and 1.30e-4 bits/neuron respectively. These values represent 2.4% and 3.2% of the magnitude of the departures from statistical independence in the data, respectively. Similar departures were observed for the other statistics examined. Thus, the effect of temporal interactions on the measured firing pattern statistics was potentially large enough to account for the observed failures of the model.