Investigations of the neural code have historically focused upon how single neurons respond to stimuli 
. In many systems, this approach has led to an observation that many neurons have “preferred” stimuli, corresponding to an increased discharge rate. Indeed the concept of the classical (and other types of) receptive field has guided investigations of V1 since Hubel and Wiesel introduced it 
. However most V1 neurons are strongly variable trial to trial, even when identical stimuli are presented 
. In addition, the majority of V1 studies have employed simplified “laboratory” type stimuli such as gratings or moving bars (see 
for a review). These issues raise the question of how dominant CRFs are when more “naturalistic” stimuli are used, or if the surround and other factors have increased importance. In this paper we undertook to determine exactly how much of V1 neurons’ spiking (quantified by the pseudo R2
) is due to the CRF versus the surround when naturalistic stimuli are used. We also quantified the roles played by the neuron’s own spike history dependent biophysics, and by the average population activity (LFP).
We found that not only did all of these factors modulate the spiking probability of V1 neurons, but that taken collectively the surround, spike history and LFP were of nearly equal importance to the CRF (60 versus 40%). We note that for each natural scene movie, we only employed two surround modulations, aperture masked and time reversed. Had more modulations been used, the influence of the surround might have been stronger compared to the CRF, although this would depend upon the modulations’ exact nature (naturalistic versus white noise for example). Regardless, the entire stimulus (CRF and surround together) explained a relatively small percentage of the spike statistics, R2
3% (population mean) at 1 ms resolution, and R2
5% at 20 ms resolution. For natural scenes, the CRF, and indeed the stimulus as a whole, is only the tip of the iceberg.
Our study showed strong surround modulation of the V1 neuronal response to natural stimuli. This modulation was both facilitative and suppressive, often in the same neuron. Indeed, the mean firing rates of many neurons varied little. Of particular interest is that modulation was observed not only between FF and AM movies but also between FF and TR movies. Thus the dynamic modulation we observe is evidence of a complex non-linear interaction between CRF and surround, not merely a function of the surround’s presence or absence. Several prior studies have varied the size of a natural scenes stimulus 
. However to our knowledge ours is the first that has changed the correlational (between center and surround) structure of naturalistic stimuli and demonstrated a similar dynamic modulation. We note that this has been done for artificial stimuli, see for example 
It should be noted that determining the exact boundary of the receptive field is difficult, and can depend the exact method used to map it. Barlow used small moving bars and edges to determine the spatial extent of the excitatory region, or “minimum response field” (MRF) 
. Reverse correlation methods using either bars 
white noise 
, randomly flashed squares 
or other artificial stimuli are also commonly used. A third technique is to increase the size of a grating patch and denote the RF as the patch size for which the response no longer increases 
. This third technique tends to give larger estimates than the MRF or reverse correlation techniques, and has been noted to depend upon grating contrast 
. Other researchers have modeled both the excitatory center and inhibitory surround using Gaussian based models 
. Good discussions of these issues can be found in 
. In our case, we used a reverse correlation type procedure that employed a long moving bar stimulus and the computation of an average multi-unit activity response. Since we used MUA it is possible that for some neurons our apertures contained some of the proximal inhibitory surround. However it should also be noted that some studies have suggested the size of the distal surround to be up to five times that of the CRF 
Surround suppression of the V1 neuronal response has long been noted by studies using grating type stimuli in anesthetized cats and monkeys 
and also during the free viewing of natural scenes by awake monkeys 
. Others have found surround driven changes in grating orientation tuning 
. Occasionally surround facilitation has been noted, but deemed weak 
. A recent grating based study in anesthetized cats has placed the number of V1 neurons exhibiting surround facilitation at 6% 
. This result is at odds with our study in which many neurons displayed both facilitation and suppression. The difference may lie in the use of natural scenes versus gratings. Indeed, it has been noted by others that contrast levels can dictate whether the surround is suppressive or facilitative 
Another possibility is that surround suppression versus facilitation is a function of distinct neuron types. Haider et. al. 
performed intracellular recordings in anesthetized cats while varying the size of a natural scenes stimulus. They found that excitatory, regular spiking pyramids tended to exhibit surround suppression, while fast spiking interneurons exhibited enhancement. This study also found that regular spiking neurons tended to spike more reliably when the surround was included, and hypothesized that increased activity of inhibitory cells enforced a sparser code in the pyramids. Several earlier studies have speculated on the role of a natural scenes surround in enforcing sparsity, demonstrating not only sparser coding 
, but also improved information transmission by neurons 
. These studies, and our own strongly indicate that interactions between CRF and surround are fundamental for sensory processing in V1.
In spite of this, forward modeling studies of how visual stimuli are transformed into spiking have tended to focus on estimations of neuron’s spatio-temporal receptive fields (STRFs). Sometimes these are coupled with a non-linearity (LN Model) 
and/or a Poisson spike generator (LNP model) 
. This approach has achieved some success in predicting responses to grating orientation and tuning, although contrast induced non-linearities have often been noted, (see 
for a review). STRF type models have also been used to capture the response of V1 simple cells to natural scenes 
More complex models, such as spike triggered covariance (STC) 
have also been used to describe the response to natural images, particularly the response of complex cells. Forward model quality has generally been evaluated by comparing predictions to the PSTHs of repeated trials in a validation set. When corrected for finite data sizes, this is the “percentage of explainable variance” of Gallant and colleagues. For natural scenes, the percentage of variance explained in V1 has tended to be no more than 40% 
These results should be contrasted with studies in the LGN 
which have achieved much better (~80%) predictability). It should be noted that the explained variance is a very different measure than the pseudo R2
(see below). Possible reasons for poor performance in V1 are temporal non-linearities, center surround interactions, and complex network activity, none of which are captured by the STRF.
Our goal was different from that of forward modelers. We wanted to quantify, on an absolute scale, how much the spiking was driven by the CRF versus the surround. For this reason we employed the pseudo R2
to quantify the single trial statistics, rather than the explained variance. We also did not attempt a forward model, but instead used a non-parametric, basis spline based, model for the stimulus. This has the advantage that the explained variance is extremely high (96% for training and 92% for test data, see Figure S6
) and one does not have to worry that the stimulus model is “wrong” when trying to quantify how important different effects are on the spikes. Our results show that the CRF is not sufficient, even for describing the trial averaged response, suggesting that forward models employing a STRF type filter, localized within the CRF, will always be problematic for natural scenes stimuli and that center surround interactions must be included. Further, the trial averaged response is an exceedingly poor descriptor of the single trial statistics, suggesting that if vision is to be understood, the stimulus must be treated as a small perturbation of an individual neuron’s dynamics.
Spikes are binary variables with a ms precise, time scale dictated by their width. It has been shown that neurons can respond with ms precision to sharp changes in membrane potential 
and are capable of learning fine timing encoding representations 
. It is therefore important to use a model that operates at fine temporal resolution and also respects the binary character of the data. Our spline-based stimulus model was nested in a GLM which simultaneously also models the influence of the spike history and LFP. The dynamics of the stimulus, spike history and LFP all have different time scales. The GLM provides a multiscale model of how they influence the neuron at the time scale appropriate for describing spikes. Still, an argument that could be put forward is that since the stimulus changes at a slower (here 20 ms frame rate) speed, any analysis of its importance should be based upon this time scale. At a 20 ms time scale, the data is described by spike counts, Poisson variables from which a pseudo R2
can also be calculated. When we did this, we found that the natural scene spike counts were slightly better fit than the individual spikes, but not dramatically so. Thus even at stimulus’s own time scale, neuronal activity is not well described. We note that the situation may be different for experiments that generate more coherent neuronal activity, such as those that use grating type stimuli or anesthetized protocols. Indeed we found we could fit the single trial spiking statistics of both grating and moving bar stimuli with much higher accuracy than natural scenes movies ().
Most likely, the dominant factor driving V1 neurons is the network, a view supported by the fact that recurrent, lateral and top down connections dominate over feed forward 
and the fact that ongoing network states are known to strongly influence spiking 
. In the absence of detailed information about the network, we used the LFP as a surrogate. It is important to recognize that the LFP is a population averaged measure of local network activity, whose exact meaning is strongly debated. We found that spikes tended to be localized on sharp LFP transients in the gamma range. This indicates that during natural scenes viewing the ongoing LFP carries information important for predicting spike timing despite there being no gamma peak in the LFP power spectrum. (Figure S8
) It was the use of the PTA, based upon the nested GLM, that allowed us to uncover this feature.
Many studies use the spike triggered average (STA) or spike triggered spectrogram to make inferences about how spikes depend upon the LFP 
. These measures average the LFP (or its power spectra) as a function of when spikes occur. In contrast, the PTA leverages the GLM spike probability model to average the LFP when spikes are most probable
. These are not the same, because spikes do not always occur when they are maximally probable. Directly comparing the PTA and the STA (see Methods
below) shows that the STA largely reflects a deflection after
the spike. In contrast the PTA reveals the entire feature (sharp transient) that increases the spike probability. Another difference between the approaches is that our nested GLM takes into account the effect of stimuli and previous spiking history in addition to the LFP. Thus it can dissociate between the case when spiking and LFP are being simultaneously driven by the stimulus, and are thus correlated, and when spikes are correlated with the LFP independent of the stimulus.
The observation that spike timing is coupled to LFP oscillations is not new. However the majority of studies comparing LFPs to spikes have focused on either spectral power, or phase relationships between low (<10 Hz) frequency LFPs and multiunit activity (MUA) 
. In contrast we found phase relationships between the gamma band and individual spikes. At these higher frequencies, it has been shown that increased gamma power correlates with MUA 
. Further, intracellular studies have shown that inhibitory neurons, thought to be involved in gamma, tend to fire in the gamma trough 
. Still, studies of phase relationships have tended to be confined to grating stimuli, anesthetized animals, or both 
. Based upon these and similar studies, it has been hypothesized that gamma implements a temporal coding scheme (see 
for a review). However gamma power has been shown to be a function of grating contrast 
and it is also known that the LFP power spectrum is sharply modulated by different (grating versus natural scene) stimuli 
(and see Figure S8
). Our results suggest that even when gamma is incoherent, as during our natural scenes stimulus, it may still induce timing codes and play a computational role.
LFPs provide one measure of network activity, and indeed the pseudo R2 of the LFP portion of the GLM was comparable in magnitude to that of the stimulus (). However, the overall fit, even including the LFP was poor (R2<5%). This suggests that under natural scenes conditions, the dynamics of the V1 network are highly complex, and neither the stimulus, nor the LFP, are the dominant drivers of V1 neurons. Instead, ongoing and mostly incoherent network activity driven by input from other cortical areas or even processes intrinsic to V1 predominates at the single neuron level. At some point in the visual pathway, information must be combined into a collective representation. Our results suggest that this is already happening in V1 and that vision must be considered as a tightly integrated, and complex, phenomenon of the network, not the sum of the individual neurons receptive fields.