Home | About | Journals | Submit | Contact Us | Français |

**|**Front Comput Neurosci**|**v.4; 2010**|**PMC2996170

Formats

Article sections

- Abstract
- 1. Introduction: Patterns In the Neural Response
- 2. Methods
- 3. Results
- 4. Discussion
- Conflict of Interest Statement
- References

Authors

Related links

Front Comput Neurosci. 2010; 4: 145.

Published online 2010 November 23. doi: 10.3389/fncom.2010.00145

PMCID: PMC2996170

Edited by:Israel Nelken, Hebrew University, Israel

Reviewed by: Stefano Panzeri, Italian Institute of Technology, Italy; Jakob H. Macke, University College London, UK

*Correspondence: Hugo Gabriel Eyherabide, Centro Atómico Bariloche, San Carlos de Bariloche, 8400 Rìo Negro, Argentina. e-mail: ra.vog.aenc.bi@hbarehye

Received 2010 May 29; Accepted 2010 October 22.

Copyright © 2010 Eyherabide and Samengo.

This is an open-access article subject to an exclusive license agreement between the authors and the Frontiers Research Foundation, which permits unrestricted use, distribution, and reproduction in any medium, provided the original authors and source are credited.

This article has been cited by other articles in PMC.

Sensory stimuli are usually composed of different features (the *what*) appearing at irregular times (the *when*). Neural responses often use spike patterns to represent sensory information. The *what* is hypothesized to be encoded in the identity of the elicited patterns (the pattern categories), and the *when*, in the time positions of patterns (the pattern timing). However, this standard view is oversimplified. In the real world, the *what* and the *when* might not be separable concepts, for instance, if they are correlated in the stimulus. In addition, neuronal dynamics can condition the pattern timing to be correlated with the pattern categories. Hence, timing and categories of patterns may not constitute independent channels of information. In this paper, we assess the role of spike patterns in the neural code, irrespective of the nature of the patterns. We first define information-theoretical quantities that allow us to quantify the information encoded by different aspects of the neural response. We also introduce the notion of synergy/redundancy between time positions and categories of patterns. We subsequently establish the relation between the *what* and the *when* in the stimulus with the timing and the categories of patterns. To that aim, we quantify the mutual information between different aspects of the stimulus and different aspects of the response. This formal framework allows us to determine the precise conditions under which the standard view holds, as well as the departures from this simple case. Finally, we study the capability of different response aspects to represent the *what* and the *when* in the neural response.

Sensory neurons represent external stimuli. In realistic conditions, different stimulus features (for example, the presence of a predator or a prey) appear at irregular times. Therefore, an efficient sensory system should not only represent the identity of each perceived stimulus, but also, its timing. Colloquially, qualitative differences between stimulus features have been called the *what* in the stimulus, whereas the temporal locations of the features constitute the *when*. Spike trains can encode both the *what* and the *when*, for example, as a sequence of spike patterns. This idea constitutes a standard view (Theunissen and Miller, 1995; Borst and Theunissen, 1999; Krahe and Gabbiani, 2004), where the timing of patterns indicates *when* stimulus features occur, while the pattern identities tag *what* stimulus features happened (Martinez-Conde et al., 2002; Alitto et al., 2005; Oswald et al., 2007; Eyherabide et al., 2008). The information provided by the distinction between different spike patterns is here called *category information*. In the same manner, the information transmitted by the timing of spike patterns is here called *time information*. According to the standard view, the category and the time information represent the knowledge of the *what* and the *when* in the stimulus, respectively. In this work, we address the conditions under which these assumptions hold, as well as departures from the standard view.

Many studies have shown the ubiquitous presence of patterns in the neural response. The patterns can be, for instance, high-frequency burst-like discharges of varying length and latency. Examples have been found in primary auditory cortex (Nelken et al., 2005), the salamander retina (Gollisch and Meister, 2008), the mammalian early visual system (DeBusk et al., 1997; Martinez-Conde et al., 2002; Gaudry and Reinagel, 2008), and grasshopper auditory receptors (Eyherabide et al., 2009; Sabourin and Pollack, 2009). In other cases, the patterns are spike doublets of different inter-spike interval (ISI) duration. Reich et al. (2000) presented an example of this type in primate V1; and Oswald et al. (2007) found a similar code in the electrosensory lobe of the weakly electric fish. In yet other cases, patterns are more abstract spatiotemporal combinations of spikes and silences defined in single neurons (Fellous et al., 2004) and neural populations (Nádasdy, 2000; Gütig and Sompolinsky, 2006).

If different spike patterns represent different stimulus features, which aspects of the pattern are relevant to the distinction between the different features? To answer this question, previous studies have classified the response patterns into different types of categories, depending on different response aspects. The relevance of each candidate aspect was addressed using what we here define as the category information. For example, in the auditory cortex, Furukawa and Middlebrooks (2002) assessed how informative patterns were when categorized in three different ways, using the first spike latency, the total number of spikes, or the variability in the spike timing. In an even more ambitious study, Gawne et al. (1996) have not only compared the information separately transmitted by response latency and spike count, but also related these two response properties to two different stimulus features: contrast and orientation, respectively. However, these works have not addressed how the stimulus timing is represented by the response patterns.

The role of patterns in signaling the occurrence of the stimulus features can only be addressed in those experiments where the stimulus features appear at irregular times. In this context, previous approaches have estimated the time information (Gaudry and Reinagel, 2008; Eyherabide and Samengo, 2010), or have either employed other statistical measures such as reverse correlation (Martinez-Conde et al., 2000; Eyherabide et al., 2008). The time information was calculated as the one encoded by the pattern onsets alone, without distinguishing between different types of patterns.

In this paper, we analyze the role of timing and categories of patterns in the neural code. To this aim, we build different representations of the neural response preserving one of these two aspects at a time. This allows us to quantify the time and the category information separately. We determine the precise meaning of these quantities and study of their variations for different representations of the neural response. Unlike previous works (Gaudry and Reinagel, 2008; Eyherabide et al., 2009; Foffani et al., 2009), we quantify the information preserved and lost when the neural response is read out in such a way that only the categories (timing) of patterns are preserved. As a result, the relevance of each aspect of the neural response is unambiguously determined.

In principle, the timing and the categories of spike patterns may be correlated. These interactions may be due to properties of the encoding neuron (such as latency codes Furukawa and Middlebrooks, 2002; Gollisch and Meister, 2008), properties of the decoding neuron (when reading a pattern-based code Lisman, 1997; Reinagel et al., 1999), the convention used to assigned a time reference to the patterns (Nelken et al., 2005; Eyherabide et al., 2008), or the convention used to identify the patterns from the neural response (Fellous et al., 2004; Alitto et al., 2005; Gaudry and Reinagel, 2008). A statistical dependence between timing and categories of patterns may, for example, introduce redundancy between the time and category information. Thus, the same information may be contained in different aspects of the response (categorical or temporal aspects). In addition, the statistical dependence might also induce synergy, in which case extracting all the information about the *what* and the *when* requires the simultaneous read-out of both aspects. The presence of synergy and redundancy between the time and category information may affect the way each of them represents the *what* and the *when* in the stimulus.

In the present study, we provide a formal framework to gain insight of the interaction between the timing and the categories of patterns for different neural codes. We formally define the *what* and the *when* as representations of the stimulus preserving only the identities and timing of stimulus features, respectively. We then establish the conditions under which the pattern categories encode the *what* in the stimulus, and the timings the *when*. We also study departures from this standard interpretation, in particular, when the time position of patterns depends on their internal structure. We show the impact of this dependence on both the link with the *what* and the *when* and the relative relevance of the timing and categories of patterns. Our study is therefore intended to motivate more systematic explorations of the neural code in sensory systems.

A *representation* is a description of the neural response. Formally, it is obtained by transforming the recorded neural activity through a deterministic mapping. Throughout this paper, the expressions “deterministic mapping” and “function” are used as synonyms. We only consider functions that transform the unprocessed neural response **U** into sequences of events *e*_{i}=(*t*_{i}, *c*_{i}), characterized by their time positions (*t*_{i}) and categories (*c*_{i}). An event is a definite response stretch. Based on their internal structure, events are classified into different categories, as explained later in this section. Individual spikes may be regarded as the simplest events. In this case, the sequence of events is called the *spike representation* (see Figure Figure1A),1A), comprising events belonging to a single category: the category “spikes.”

**Representations of the neural response**. **(A)** In the spike representation, only the timing of action potentials is described, discarding the fine structure of the voltage traces. In the pattern representation, only the timing and categories of spike patterns **...**

From the spike representation, we can define more complex events, hereafter called *patterns* (see bold symbols in the spike representation in Figure Figure1A).1A). Patterns may be defined in terms of spikes, bursts or ISIs (Alitto et al., 2005; Luna et al., 2005; Oswald et al., 2007; Eyherabide et al., 2008). They may involve one or several neurons. Examples of population patterns are coincident firing, precise firing events and sequences, or distributed patterns (Hopfield, 1995; Abeles and Gat, 2001; Reinagel and Reid, 2002; Gütig and Sompolinsky, 2006). The sequence of patterns obtained by transforming the spike representation is called the *pattern representation*. Analogously, the sequence of patterns only characterized by either their time positions or their categories constitute the *time representation* and *category representation*, respectively. Details on how to build these sequences are explained below. For simplicity, these sequences are represented in Figure Figure11 as sequences of symbols *n*, indicating specific events (*n*>0) and silences (*n*=0).

Formally, to obtain the spike representation (**R**), the unprocessed neural response (**U**) is transformed into a sequence of spikes (1) and silences (0) (Figure (Figure1A).1A). The time bin is taken small enough to include at most one spike. Differences in shape of action potentials are ignored, while their time positions are preserved, with temporal precision limited by the bin size. As a result, several sequences of action potentials may be represented by the same spike sequence (see Figure Figure11B).

In the pattern representation (**B**), the spike sequence is transformed into a sequence of silences (*n*=0) and spike patterns (*n*=*b*>0), distinguished solely by their category *b*. For example, in Figure Figure1,1, patterns are defined as response stretches containing consecutive spikes separated by at most one silence. The time positions of the patterns are defined as the first spike in each pattern stretch, whereas patterns with the same number of spikes are grouped into the same pattern category. Only information about pattern categories and time positions remains (compare the bold symbols in the spike and the pattern representation in Figure Figure1A).1A). By ignoring differences among patterns within categories, several spike sequences can be mapped into the same pattern sequence, as shown in Figure Figure11C.

The time position of patterns is measured with respect to a common origin, in general, the beginning of the experiment. It can be defined, for example, as the first (or any other) spike of the pattern or as the mean response time (Lisman, 1997; Nelken et al., 2005; Eyherabide et al., 2009). Patterns are classified into categories according to different aspects describing their internal structure, such as the latency, the number of spikes or the spike-time dispersion (Theunissen and Miller, 1995; Gawne et al., 1996; Furukawa and Middlebrooks, 2002). Notice that latencies are usually defined with respect to the stimulus onset, which is not a response property (Chase and Young, 2007; Gollisch and Meister, 2008). Thus, latencies and timing of spike patterns are different concepts, and the latency cannot be read out from the neural response alone. However, latencies have also been defined with respect to the local field potential (Montemurro et al., 2008) or population activity (Chase and Young, 2007). These definitions can be regarded as internal aspects of spatiotemporal spike patterns (Theunissen and Miller, 1995; Nádasdy, 2000).

Categories of patterns can be built by discretizing the range of one or several internal aspects. For example, Reich et al. (2000) defined patterns as individual ISIs, and categorized them in terms of their duration. Three categories were considered, depending on whether the ISI was short, medium or large. In other cases, patterns may be sequences of spikes separated by less than a certain time interval. Categories of patterns can then be defined, depending on the number of spikes in each pattern (Reinagel and Reid, 2000; Martinez-Conde et al., 2002; Eyherabide and Samengo, 2010), as shown in Figure Figure1,1, or depending on the length of the first ISI (Oswald et al., 2007). The theory developed in this paper is valid irrespective of the way in which one chooses to define the pattern time positions and the pattern categories.

From the pattern sequence, we obtain the time representation (**T**) by only keeping the time positions of patterns. As a result, the neural response is transformed into a sequence of silences (0) and events (1), indicating the occurrence of a pattern in the corresponding time bin and disregarding its category. The temporal precision of the pattern representation is preserved in the time representation. However, by ignoring differences between categories, different pattern sequences can be mapped into the same time representation, as illustrated in Figure Figure11D.

The category representation (**C**) is complementary to the time representation. It is obtained from the pattern sequence, by only keeping information about the categories of patterns while ignoring their time positions. The neural response is transformed into a sequence of integer symbols *n*>0, representing the sequence of pattern categories in the response. The exact time position of patterns is lost: only their order remains. Therefore, several pattern sequences may be mapped onto the same category sequence, as indicated in Figure Figure11E.

The spike (**R**), pattern (**B**), time (**T**), and category (**C**) representations are derived through functions that depend only on the previous representation, as denoted by the arrows in Figure Figure1A,1A, and formally expressed by the following equations:

Neural response | $$\mathbf{\text{U}}\left(\text{experiment}\right)$$ (1a) |

Spike representation | $$\mathbf{\text{R}}={h}_{\text{U}\to \text{R}}(\mathbf{\text{U}})$$ (1b) |

Pattern representation | $$\mathbf{\text{B}}={h}_{\text{R}\to \text{B}}(\mathbf{\text{R}})$$ (1c) |

Time representation | $$\mathbf{\text{T}}={h}_{\text{B}\to \text{T}}(\mathbf{\text{B}})$$ (1d) |

Category representation | $$\mathbf{\text{C}}={h}_{\text{B}\to \text{C}}(\mathbf{\text{B}});$$ (1e) |

where *h*_{X→Y} represents the function *h* that is applied to the representation **X** to obtain the representation **Y**. These transformations progressively reduce both the variability in the neural response and the number of possible responses

$$H(\mathbf{\text{U}})\ge H(\mathbf{\text{R}})\ge H(\mathbf{\text{B}})\ge \{\begin{array}{c}H(\mathbf{\text{T}})\\ H(\mathbf{\text{C}})\end{array};$$

(2a)

$$\left|\mathbf{\text{U}}\right|\text{\hspace{0.17em}}\ge \text{\hspace{0.17em}}\left|\mathbf{\text{R}}\right|\text{\hspace{0.17em}}\ge \text{\hspace{0.17em}}\left|\mathbf{\text{B}}\right|\text{\hspace{0.17em}}\ge \{\begin{array}{c}\left|\text{T}\right|\\ \left|\mathbf{\text{C}}\right|\end{array};$$

(2b)

where *H*(**X**) means the entropy *H* of the set **X** (Cover and Thomas, 1991), and |**X**| indicates its cardinality, i.e., the number of elements of the set **X**.

The *mutual information* *I*(**X**; **S**) between two random variables **X** and **S** is defined as the reduction in the uncertainty of one of the random variables due to the knowledge of the other. It is formally expressed as a difference between two entropies

$$I(\mathbf{\text{X}};\mathbf{\text{S}})=H(\mathbf{\text{X}})-H(\mathbf{\text{X}}|\mathbf{\text{S}});$$

(3)

where *H*(**X**) is the *total entropy* of **X** and *H*(**X**|**S**) represents the *conditional* or *noise entropy* of **X** provided that **S** is known (Cover and Thomas, 1991).

We estimate the mutual information between the stimulus **S** and a representation **X** of the neural response using the so-called *Direct Method*, introduced by Strong et al. (1998). The unprocessed neural response **U** is divided into time intervals **U**_{τ} of length τ. Each response stretch **U**_{τ} is then transformed into the discrete-time representation **X**_{τ}(**X**_{τ}=*h*_{U→X}(**U**_{τ})), also called *words*. As a result

$$I(\mathbf{\text{S}};{\mathbf{\text{V}}}_{\tau})\ge I(\mathbf{\text{S}};{\mathbf{\text{X}}}_{\tau}).$$

(4)

This inequality is valid for every time interval of length τ (Cover and Thomas, 1991) and is not limited to the asymptotic regime for long time intervals, as in previous calculations (Gaudry and Reinagel, 2008; Eyherabide et al., 2009). The mutual information calculated with words of length τ only quantifies properly the contribution of spike patterns that are shorter than τ. In order to include the correlations between these patterns, even longer words are needed. Therefore, in this study, the maximum window length ranged between 3 and 4 times the maximum pattern duration.

The total entropy (*H*(**X**_{τ})) and noise entropy (*H*(**X**_{τ}|**S**)) are estimated using the distributions of words **X**_{τ} unconditional (*P*(**X**_{τ})) and conditional (*P*(**X**_{τ}|**S**)) on the stimulus **S**, respectively. The mutual information *I*(**S**; **X**_{τ}) is computed by subtracting *H*(**X**_{τ}|**S**) from *H*(**X**_{τ}) (Eq. 3). This calculation is repeated for increasing word lengths, and the *mutual information rate* *I*(**S**; **X**) between the stimulus **S** and a representation **X** of the neural response is estimated as

$$I(\mathbf{\text{S}};\mathbf{\text{X}})=\underset{\tau \to \infty}{\mathrm{lim}}\frac{I(\mathbf{\text{S}};{\mathbf{\text{X}}}_{\tau})}{\tau}.$$

(5)

This quantity represents the mutual information per unit time when the stimulus and the response are read out with very long words. In this work we always calculate mutual information *rates* unless it is otherwise indicated. However, for compactness, we sometimes refer to this quantity simply as “information.”

The estimation of information suffers from both bias and variance (Panzeri et al., 2007). In this work, the sampling bias of the information estimation was corrected using the NSB approach for the experimental data (Nemenman et al., 2004). For the simulations, we used instead the quadratic extrapolation (Strong et al., 1998), due to its simplicity and the possibility of generating large amounts of data. The standard deviation of the information was estimated from the linear extrapolation to infinitely long words (Rice, 1995). The bias correction was always lower than 1.5% and the standard deviation, always lower than 1%, for all simulations and all word lengths; thus error bars are not visible in the figures. When comparisons between information estimations were needed, one-sided *t*-tests were performed (Rice, 1995).

Simulations are used to exemplify the theoretical results and to gain additional insight on how different response conditions affect information transmission in well-known neural models and neural codes. They represent highly idealized cases, with unrealistically long runs and number of trials, that allow us to readily exemplify the theoretical results and transparently obtain reliable information estimates. Firstly, we define the parameters used in the simulations and relate them to the specific aspects of the stimulus and the response. Then, we report the specific values for the parameters.

In the simulations, the stimulus consists of a random sequence of instantaneous discrete events, here called *stimulus features*. Each stimulus feature is characterized by specific physical properties, as for example, the color of a visual stimulus, the pitch of an auditory stimulus, the intensity of a tactile stimulus, or the odor of an olfactory stimulus (Poulos et al., 1984; Rolen and Caprio, 2007; Nelken, 2008; Mancuso et al., 2009). In the real world, however, features are not necessarily discrete. If they are continuous, one can discretize them by dividing their domain into discrete categories (Martinez-Conde et al., 2002; Eyherabide et al., 2008; Marsat et al., 2009). The present framework sets no upper limit to the number of features, nor to the similarity between different categories. In addition, features might not be instantaneous but rather develop in extended time windows, as it happens with the chirps in the weakly electric fish (Benda et al., 2005), the oscillations in the electric field potential (Oswald et al., 2007) and the amplitude of auditory stimuli (Eyherabide et al., 2008). In order to capture the duration of real stimuli, in the simulations we define a *minimum inter-feature interval* ${\lambda}_{\mathrm{min}}^{s},$ for each feature *s*. After the presentation of a feature *s*, no other feature may appear in an interval lower or equal to ${\lambda}_{\mathrm{min}}^{s}.$

In the simulated data, each stimulus feature elicits a neural response (see Figure Figure2A).2A). Since in this paper we are interested in pattern-based codes, each feature generates a pattern of spikes belonging to some pattern category. The correspondence between stimulus features and pattern categories may be noisy. We consider both categorical noise (the pattern category varies from trial to trial) and temporal noise (the timing of the pattern varies from trial to trial). In Figure Figure2B,2B, we show examples of all noise conditions using burst-like response patterns. In those examples, categories were defined according to the number of spikes in each burst.

**Simulations: design and construction**. **(A)** Example of a stimulus stretch and the elicited response. The stimulus is depicted as an integer sequence of silences (0) and features (*s*>0), one symbol per time bin of size Δ **...**

Symbolically, the stimulus **S** is represented as a sequence of symbols *s*, one per time bin Δ*t*. Each *s* is drawn randomly from the set of all possible outcomes Σ_{s}={0, 1,…,*N*_{S}}. The symbol *s*=0 indicates a silence (the absence of a feature), whereas *s*>0 tags the presence of a given feature. Each feature *s* elicits a response pattern **r**, drawn from the set Σ_{r} of all possible patterns, with probability *P*_{r}(**r**|*s*). The response pattern **r** may appear with latency μ_{r}, which might depend on the evoked pattern **r**. A neural response **R**, elicited by a sequence of stimulus features, may be composed of several response patterns (see bold symbol sequences in Figure Figure22A).

Figure Figure2B2B shows example neural codes with no noise (upper left panel), categorical noise alone (upper right), temporal noise alone (lower left), and a mixture of categorical and temporal noise (lower right). The categorical noise is defined by *P*_{b}(*b*|*s*), quantifying the probability that a response category *b* be elicited in response to stimulus *s* (see Appendix A for the relation between *P*_{b}(*b*|*s*) and *P*_{r}(**r**|*s*)). The temporal noise is implemented as jitter in the pattern onset time. That is, temporal jitter affects the pattern as a whole, displacing all spikes in the pattern by the same amount of time. The temporal displacement is drawn from a uniform distribution in the interval (−σ_{b}, σ_{b}), where the jitter σ_{b} may depend on the pattern *b*.

Simulated neural responses consisted of four different patterns, elicited by a stimulus with four different features. The response patterns were bursts of spikes, containing between 1 and 4 spikes. The intra-burst ISI was γ_{min}=2ms. However, since the neural response is transformed into the pattern representation, the results are valid irrespective of the nature of the patterns (see Section 2.1). The stimulus was presented 200 times, each one lasting for 2000s. The minimum inter-feature time interval is λ_{min}=12ms. In all cases, no interference between patterns was considered (see Section 3.8). We used a time bin of size Δ*t*=1ms.

This simulation is used to illustrate the effect of using different representations of the neural response, and to compare an ideal situation where the correspondence between features and patterns is known, with a more realistic case, where the neural code is unknown. The temporal jitter was σ=1ms and the latency was μ=1ms. Stimulus features probability *p*(*s*) were set to: *p*(1) =0.06, *p*(2) =0.04, *p*(3) =0.03, *p*(4) =0.02. Categorical noise (*p*(*b*|*s*), *b*≠*s*): *p*(*i*+1|*i*) =0.1 (4− *i*), 0< *i*<4; otherwise *p*(*b*|*s*)=0.

These simulations are used to address the role of the timing and category of patterns in the neural code, and to study the relation with the *what* and the *when* in the stimulus. The latency was μ=1ms. When present, temporal jitter was set to σ=1ms and categorical noise (*p*(*b*|*s*), *b*≠*s*) was given by: *p*(*i*+1|*i*)= *p*(*i*|*i*+1)= *p*(3|1)= *p*(2|4)=0.1, 0< *i*<4; otherwise *p*(*b*|*s*)=0. Stimulus features probability *p*(*s*)=0.025, 0< *s*≤ 4.

Experimental neural data were provided by Ariel Rokem and Andreas V. M. Herz; they performed intracellular recordings *in vivo*, on the auditory nerve of *Locusta Migratoria* (see Rokem et al., 2006, for details). Auditory stimuli consisted of a 3kHz carrier sine wave, amplitude modulated by a low pass filtered signal with a Gaussian distribution. The AM signal had a mean amplitude of 53.9dB, a 6dB standard deviation and a cut-off frequency of 25Hz (see Figure Figure3A3A upper cell). Each stimulation lasted for 1000ms with a pause of 700ms between repeated presentations of the stimulus, in order to minimize the influence of slow adaptation. To eliminate fast adaptation effects, the first 200ms of each trial were discarded. The recorded response (see Figure Figure3A3A lower panel) consisted of 479 trials, with a mean firing rate of 108 ±6spikes/s (mean± standard deviation across trials). Burst activity was observed and associated with specific features in the stimulus (see Eyherabide et al., 2008, for the analysis of burst activity in the whole data set). Bursts contained up to 14 spikes; Figure Figure3B3B shows the firing probability distribution as a function of the intra-burst spike count.

In order to understand how stimuli are encoded in the neural response, the recorded neural activity **U** is transformed into several different representations. Each representation keeps some aspects of the original neural response while discarding others. The spike representation **R** is probably the most widely used (see Section 2.1). We define the *spike information* *I*(**S**; **R**) as the mutual information rate between the stimulus **S** and the spike representation **R** of the neural response.

The spike sequence can be further transformed into a sequence of patterns of spikes, called the pattern representation **B**. To that end, all possible patterns of spikes are classified into pre-defined categories, for example, burst codes, ISI codes, etc. (see Section 2.1 and references therein). We define *pattern information* *I*(**S**; **B**) as the information about the stimulus **S**, carried by the sequence of patterns **B**.

The pattern information cannot be greater than the spike information, which in turn cannot be greater than the information in the unprocessed neural response

$$I(\mathbf{\text{S}};\mathbf{\text{B}})\le I(\mathbf{\text{S}};\mathbf{\text{R}})\le I(\mathbf{\text{S}};\mathbf{\text{U}}).$$

(6)

This result can be directly proved from the deterministic relation between **U**, **R** and **B** (Eqs. 1) and the *data processing inequality* (Cover and Thomas, 1991). Notwithstanding, several neuroscience papers have reported data contradicting Eq. 6 (see Section 4.3). Intuitively, out of all the information carried by the unprocessed neural response, the spike information only contains the information preserved in the spike timing. Analogously, out of the information carried in the spike representation, the pattern information only preserves the information carried by both the time positions and the categories of the chosen patterns.

In this paper, we quantify the amount of time and category information encoded by pattern-based codes. This information depends critically on the choice of the pattern representation. In this subsection, we discuss how to evaluate whether a given choice is convenient or not. One can choose any set of pattern categories to define the alphabet of the pattern representation. Some choices, however, preserve more information about the stimulus than others. The comparison between the information carried by different pattern representations gives insight on how relevant to information transmission the preserved structures are (Victor, 2002; Nelken and Chechik, 2007), i.e., formally, on whether they constitute sufficient statistics (Cover and Thomas, 1991). A suitable representation should reduce the variability in the neural response due to noise, while preserving the variability associated with variations in the encoded stimulus. Thus, any representation preserving less information than the spike information is neglecting informative variability. In addition, one may also be interested in a neural representation that can be easily or rapidly read out, or that is robust to environmental changes, etc. The chosen neural representation typically results from a trade-off between these requirements.

Here we focus on analyzing whether the chosen representation alters the correspondence between the stimulus and the response. For us, a good representation is one where the informative variability is preserved, and the non-informative variability is discarded. As an example, we analyze two different situations (Figure (Figure4).4). In Figure Figure4A,4A, we use simulated data, where we know exactly how the neural code is structured. We can therefore compare the performance of the spike representation, with two pattern representations: one of them intentionally tailored to capture the true neural code that generated the data, and another representation discarding some informative variability. The neural response consists of a sequence of four different patterns, associated with each of four stimulus features, in the presence of temporal jitter and categorical noise (see Section 2.3.2 Simulation 1). In Figure Figure4B,4B, we study experimental data (see Section 2.4), so the neural code is unknown. Therefore, in this case we compare the spike representation with two candidate pattern representations, ignoring *a priori* which is the most suitable.

**Information per unit time transmitted by different choices of patterns**. The spike representation (**R**) is transformed into a sequence of patterns grouped in categories according to the intra-pattern spike count (**B**^{α}), which is further transformed **...**

For both simulation and experimental data, we estimated the information conveyed by the spike representation **R**; a pattern representation **B**^{α}, where all bursts are grouped into categories according to their intra-burst spike count; and a second pattern representation **B**^{β}, with only two categories comprising isolated spikes and complex patterns. This is shown in Figure Figure4,4, where the information per unit time is plotted as a function of the window size used to read the neural response. The representations are related through functions, in such a way that **B**^{β} is a transformation of **B**^{α}, which is in turn a transformation of **R**. Therefore, *I*(**S**; **B**^{β})≤ *I*(**S**; **B**^{α})≤ *I*(**S**; **R**), for all finite response windows (see Eq. 6). Nevertheless, notice that **B**^{β} may be a faster-to-read code than **B**^{α}, since the latter requires a time window long enough to distinguish not only the differences between isolated spikes and bursts, but also the differences among bursts of different categories.

In the simulation (Figure (Figure4A),4A), the information carried by **B**^{α} is equal to the spike information (*I*_{Sim}(**S**; **R**)= *I*_{Sim}(**S**; **B**^{α})=254.2±0.2bits/s, one-sided *t*-test, *p*(10) =0.5). This is expected since, by construction, the neural code used in the simulations is, indeed, **B**^{α}. Therefore, in this case, **B**^{α} is a lossless representation. The choice of an adequate representation is more difficult in the experimental example (Figure (Figure4B),4B), where the neural code is not known beforehand. In this case, **B**^{α} preserves less information than the spike sequence (*I*_{Exp}(**S**; **R**) =133 ±4bits/s, *I*_{Exp}(**S**; **B**^{α})=121±3bits/s, one-sided *t*-test, *p*(10) =0.004). The information *I*(**S**; **B**^{α}) represents 91% of the spike information. In general, whether this amount of information is acceptable or not depends on whether the loss is compensated by the advantages of attaining a reduced representation of the response (Nelken and Chechik, 2007).

Distinguishing only between isolated spikes and bursts (**B**^{β}) diminishes the information considerably in both examples (one-sided *t*-test, *p*(10) <0.001, both cases). In the simulation, the information carried by **B**^{β} is *I*_{Sim}(**S**; **B**^{β}) =208.7 ±0.6bits/s, representing about 82.1% of the spike information. This is expected since, by construction, different stimulus features are encoded by different patterns. For the experimental data, *I*_{Exp}(**S**; **B**^{β})=91±7bits/s, representing about 68% of the spike information. In both examples, the representation **B**^{α} is “more sufficient” than **B**^{β}. The difference *I*(**S**; **B**^{α}) −*I*(**S**; **B**^{β}) constitutes a quantitative measure of the role of distinguishing between bursts of 2, 3, …, *n* spikes, provided that the distinction between isolated spikes and bursts has already been made (*I*(**S**; **B**^{α}|**B**^{β}). However, **B**^{α} still preserves other response aspects, such as pattern timing, number of patterns, etc. In what follows, we study the role of different response aspects in information transmission.

The pattern representation may preserve one or several aspects of the neural response that could, in principle, encode information about the stimulus. More specifically, if the response is analyzed using windows of duration τ, there are several candidate response aspects that might be informative, namely:

- the number of patterns in the window (number of events – Figure Figure55A)
- the precise timing of each pattern in the window (time representation – Figure Figure11D)
- the pattern categories present in the window with no specification of their ordering (response set of categories – Figure Figure55B)
- the temporally ordered pattern categories in the window (category representation – Figure Figure11E).

We find that these aspects are related through deterministic functions. Indeed, aspect *a* can be univocally determined from aspects *b*, *c* or *d*. Thus, the information transmitted by aspect *a* is also carried by any of the other aspects. In the same manner, aspect *c* can be determined from *d*. However, in Appendix B we prove that the number of patterns in the window (aspect *a*) makes a vanishing contribution to the information rate. That is, although aspect *a* might be informative for a finite window of length τ, its contribution becomes negligible in the limit of long windows. Surprisingly, the unordered set of pattern categories (aspect *c*) also makes no contribution to the information rate, as shown in Appendix C. Even more, the entropy rates of both aspects tend to zero in the limit of long time windows. Therefore, their information rate with respect to any other aspect, of either the stimulus and/or the neural response, vanishes as the window size increases. We thus do not discuss aspects *a* and *c* any further.

This is not the case of response aspects *b* and *d*. In other words, they may sometimes be informative; their definitions do not constrain them to be non-informative. Therefore, in what follows, we transform the pattern representation into two other representations preserving the precise timing of each pattern (the time representation) and the temporally ordered pattern categories (the category representation). Our goal is to determine in which way the precise timing of each pattern conveys information about the time positions of stimulus features (the *when*), and how the temporally ordered pattern categories provide information about the identity of the stimulus features (the *what*).

We define the *time information* *I*(**S**; **T**) as the mutual information rate between the stimulus **S** and the time representation **T**. In addition, we define the *category information* *I*(**S**; **C**) as the mutual information rate between the stimulus **S** and the category representation **C**. The category information is novel and, unlike and complementing previous studies (Gaudry and Reinagel, 2008; Eyherabide et al., 2009), allows us to address the relevance of pattern categories in the neural code (see Section 3.5). Since both **T** and **C** are transformations of the pattern representation **B** (see Eqs 1), the time and category information cannot be greater than the pattern information, i.e.,

$$\begin{array}{c}I(\mathbf{\text{S}};\text{T})\\ I(\mathbf{\text{S}};\text{C})\end{array}\}\le I(\mathbf{\text{S}};\mathbf{\text{B}}).$$

(7)

When **T** and **C** are read out simultaneously, the pair (**T**,**C**) carries the same information as the pattern sequence **B** (*I*(**T**,**C**;**S**) =*I*(**B**; **S**)). In fact, **B** and the pair (**T**,**C**) are related through a bijective function. To prove this, consider any pattern representation **B**_{i} of a neural response **U**_{i}. The pair (**T**_{i}, **C**_{i}) associated with **U**_{i} is a function of **B**_{i} (see Eqs. 1). Conversely, given the pair (**T**_{i}, **C**_{i}) associated with **U**_{i}, all the information about the time positions and categories of patterns present in **U**_{i} is available, and thus **B**_{i} is univocally determined. Notice that the pairs (**T**, **C**) are a subset of the Cartesian product **T**× **C**.

The time positions of patterns may depend on their categories, and vice versa. To explore this relationship, and how it affects the transmitted information, we separate the pattern information as

$$I(\mathbf{\text{B}};\mathbf{\text{S}})=I(\mathbf{\text{S}};\mathbf{\text{T}})+I(\mathbf{\text{S}};\mathbf{\text{C}})+{\Delta}_{SR};$$

(8)

where Δ_{SR} represents the synergy/redundancy between the time and the category representations, defined by

$${\Delta}_{SR}=-I(\mathbf{\text{S}};\mathbf{\text{T}};\mathbf{\text{C}}).$$

(9)

Here, *I*(*X*; *Y*; *Z*)= *I*(*X*; *Y*)− *I*(*X*; *Y*|*Z*) is called *triple mutual information* (Cover and Thomas, 1991; Tsujishita, 1995). If Δ_{SR} is positive, time and category information are synergistic: more information is available when **T** and **C** are read out simultaneously. Conversely, if Δ_{SR} is negative, time and category information are redundant. The proof of Eqs. 8 and 9 is shown in Appendix D. Previous studies have already defined the synergy/redundancy for populations of neurons (Schneidman et al., 2003). It has also been applied to single neurons, to determine how different aspects of response patterns encode the identity of single stimulus features (Furukawa and Middlebrooks, 2002; Nelken et al., 2005). Here we extend the concept to encompass also dynamic stimuli where stimulus features arrive at random times, as well as for arbitrary patterns, defined in time and/or across neurons.

As an example, consider the data presented in Figure Figure4,4, when the neural responses represented as a sequence of bursts (**B**^{α}). For the case of the simulations (Figure (Figure4A),4A), the time information is *I*_{Sim}(**S**, **T**^{α})=180.4±0.2bits/s, and the category information, *I*_{Sim}(**S**, **C**^{α}) =74.2 ±0.5bits/s. The synergy/redundancy term is slightly negative, but not significant (${\Delta}_{\mathbf{\text{SR}}}^{\text{Sim}}=-0.4\pm 0.5,$ two-sided *t*-test, *p*(15)=0.44). By construction, in the simulation the time and category information are neither redundant nor synergistic. For the experimental data (Figure (Figure4B),4B), *I*_{Exp}(**S**, **T**^{α})=63±2bits/s and *I*_{Exp}(**S**, **C**^{α}) =50.6 ±0.6bits/s. In this case, we do not know whether the time information and the category information are redundant or synergistic beforehand. Yet, by comparing them with the pattern information we obtain ${\Delta}_{\mathbf{\text{SR}}}^{\text{Exp}}$ 7±3 bits/s, indicating that timings and categories of patterns are slightly synergistic (two-sided *t*-test, *p*(15)=0.063).

The pattern, time and category information depend on the choice of the alphabet of patterns. For example, the category information may increase or decrease depending on the nature of the aspect defining the pattern categories (Furukawa and Middlebrooks, 2002; Gollisch and Meister, 2008). No general rules can be given, predicting these changes: they depend on the neural representation at hand. However, when the alternative pattern representations are linked through functions, some relations between their variations can be predicted, without numerical calculations. Compare, for instance, **B**^{α} and **B**^{β} as defined in Section 3.1. By grouping all bursts with more than one spike into a single category, not only **B**^{β} is a function *h*_{α→β} of **B**^{α} (**B**^{β}= *h*_{α→β}(**B**^{α})), but also **C**^{β}= *h*_{α→β}(**C**^{α}). The time representation remains intact (**T**^{β}= **T**^{α}). As a result, neither the pattern information nor the category information can increase, whereas the time information remains constant. In addition, if **T**^{α} and **C**^{α} are independent and conditionally independent given the stimulus, so are **T**^{β} and **C**^{β}. Therefore, the difference in the category information equals the difference in the pattern information (*I*(**S**; **C**^{α})− *I*(**S**; **C**^{β})= *I*(**S**; **B**^{α})− *I*(**S**; **B**^{β})).

Analogously, consider a representation **B**^{γ} in which the time positions of patterns identified in **B**^{α} are read out with lower precision (2Δ*t*). Since **B**^{γ} is a function of **B**^{α}, two different responses ${\mathbf{\text{B}}}_{i}^{\alpha}$and ${\mathbf{\text{B}}}_{j}^{\alpha}$ that only differ little in the pattern time positions are indistinguishable in the representation ${\mathbf{\text{B}}}^{\gamma}({\mathbf{\text{B}}}_{i}^{\gamma}={\mathbf{\text{B}}}_{j}^{\gamma}).$ In this case, the comparison between **B**^{α} and **B**^{γ} is analogous to the case analyzed in the previous paragraph, with the role of the time and category representations interchanged.

We illustrate these results with an example. In Figure Figure6,6, the pattern, time and category information are shown for three different choices of the pattern representation. The simulated neural response is taken from Figure Figure4A.4A. In the three cases, there is no synergy or redundancy between the time and the category information (Δ_{SR}=0). From Figure Figure4A,4A, we already know that *I*(**S**; **B**^{β}) <*I*(**S**; **B**^{α}). Comparing the left and middle panels of Figure Figure6,6, we find that this reduction is due to a decrement in the category information (*I*(**S**, **C**^{α}) =74.2 ±0.5bits/s, *I*(**S**, **C**^{β})=28.6±0.3bits/s, one-sided *t*-test, *p*(10) <0.001), as expected (see Section 3.1). In agreement with the theoretical prediction, the time information remains unchanged (*I*(**S**, **T**^{α}) =*I*(**S**, **T**^{β}) =180.4 ±0.2bits/s, one-sided *t*-test, *p*(10)=0.5).

**Pattern, time and category information carried by different neural representations**. The spike representation is transformed into a sequence of patterns: **B**^{α}*(left)*: grouped in categories according to the intra-pattern spike count; **B**^{β} *(middle):* **...**

Analogously, compare the left and right panels of Figure Figure6.6. In this case, both the pattern and time information decrease (*I*(**S**, **B**^{α})=254.2±0.2bits/s, *I*(**S**, **B**^{γ}) =230.1 ±0.1bits/s, *I*(**S**, **T**^{α})=180.4±0.2bits/s, *I*(**S**, **T**^{γ}) =156.0 ±0.2bits/s, in both cases, one-sided *t*-test, *p*(10)<0.001), while the category information remains unchanged (*I*(**S**, **C**^{α})= *I*(**S**, **C**^{γ})=74.2±0.5bits/s, one-sided *t*-test, *p*(10) =0.5). Thus, as mentioned previously, a reduction in the precision with which the patterns are read out always decreases the time information, while keeping the category information constant.

In other examples, the variations in the time and category information may not be directly accompanied by variations in the pattern information, due to the presence of synergy and redundancy. For example, Alitto et al. (2005) studied the encoding properties of tonic spikes, long-ISI tonic spikes (tonic spikes preceded by long ISIs) and bursts. To evaluate the relevance of distinguishing between tonic spikes and long-ISI tonic spikes, one can compare the information conveyed by two representations: **B**^{ξ}, preserving the difference between tonic spikes and long-ISI tonic spikes, and **B**^{}, grouping them into the same category (Gaudry and Reinagel, 2008). Both **B**^{ξ} and **B**^{} only differ in the category representation, like **B**^{α} and **B**^{β}. However, unlike those representations, ${\Delta}_{SR}^{\xi}$ and ${\Delta}_{SR}^{\u0278}$ need not be either equal or zero, and thus $\left[I(\mathbf{\text{S}};{\mathbf{\text{B}}}^{\xi})-I(\mathbf{\text{S}};{\mathbf{\text{B}}}^{\u0278})\right]=\left[I(\mathbf{\text{S}};{\mathbf{\text{C}}}^{\xi})-I(\mathbf{\text{S}};{\mathbf{\text{C}}}^{\u0278})\right]+\left[{\Delta}_{SR}^{\xi}-{\Delta}_{SR}^{\u0278}\right].$ Indeed, by reading simultaneously the timing and category of a pattern, the uncertainty on whether the following pattern will be a long-ISI tonic spike is reduced. Hence, this reduction is a source of redundancy in **B**^{ξ}, where the long-ISI tonic spikes are explicitly identified. On the other hand, the inter-pattern time interval (IPI) preceding a long-ISI tonic spike may reveal the duration of the previous pattern. Any information contained in it constitutes a source of synergy in **B**^{ξ}. The distinction between tonic spikes and bursts produces analogous effects on the synergy and redundancy, affecting both representations **B**^{ξ} and **B**^{}.

As shown in Cover and Thomas (1991), *I*(**S**; **T**; **C**) is symmetric in **S**, **T** and **C**. Hence, Δ_{SR} is upper and lower-bounded by

$$-I(\mathbf{\text{X}};Y)\le {\Delta}_{SR}\le I\left(\mathbf{\text{X}};Y|Z\right);$$

(10)

where *X*, *Y* and *Z* represent the variables **S**, **T** and **C** in such an ordering that *I*(*X*; *Y*) =min{*I*(**T**; **C**),*I*(**S**; **T**),*I*(**S**; **C**)} (see proof in Appendix E). The same ordering applies for both bounds, in such a way that, for example, if *I*(**S**; **T**|**C**) is the least upper-bound, then *I*(**S**; **T**) is the greatest lower-bound, from the set of bounds derived in Eq. 10. These bounds are novel, tighter than the bounds previously mentioned Schneidman et al. (2003).

If the left side of Eq. 10 is zero, time and category information are non-redundant (Δ_{SR}≥0). However, they may still be synergistic (0 ≤Δ_{SR}), even in the case when they are both zero (*I*(**S**; **T**) =*I*(**S**; **C**)=0→Δ_{SR}≥ 0). This property has often been overlooked (see, for example, Foffani et al., 2009). Time and category information are non-synergistic if and only if the right side of Eq. 10 is zero. From the definition of the synergy/redundancy Δ_{SR} (Eq. 9), we show that

$${\Delta}_{\mathbf{\text{S}}R}=0\iff I(X;Y)=I(X;Y|Z);$$

(11)

where *X*, *Y* and *Z* represent the variables **S**, **T** and **C** in any order. In this case, the time and category information add up to the pattern information. This situation may occur when either *I*(*X*; *Y*)= *I*(*X*; *Y*|*Z*)=0 or *I*(*X*; *Y*)= *I*(*X*; *Y*|*Z*)>0 (Nirenberg and Latham, 2003; Schneidman et al., 2003).

Previous studies have addressed the relevance of pattern timing in information transmission by quantifying the time information and comparing it with the pattern information (Denning and Reinagel, 2005; Gaudry and Reinagel, 2008; Eyherabide et al., 2009). In other words, the relevance of pattern timing is given by the amount of information carried by a representation that only preserves the time positions of patterns. We call this paradigm *criterion I*. Indeed, one can also address the relevance of pattern categories using *criterion I*. However, instead of quantifying the amount of information carried by the category representation, these previous works have determined the information loss due to ignoring the pattern categories. Here, this point of view is called *criterion II*. In what follows, we prove that *criterion I* and *criterion II* take into account different information, and can thus lead to opposite results when both of them are applied to the same aspect of the response.

Formally, under *criterion I*, the pattern timing is relevant (or sufficient) for information transmission if

$$I(\mathbf{\text{S}};\mathbf{\text{B}})-\Delta {I}_{th}^{I}\le I(\mathbf{\text{S}};\mathbf{\text{T}}).$$

(12)

Here, $\Delta {I}_{th}^{I}$ represents a previously set threshold. Although Cover and Thomas (1991) have defined sufficiency only for the case when $\Delta {I}_{th}^{I}=0,$ in practice, some amount of information loss ($\Delta {I}_{th}^{I}>0$) is usually accepted (Nelken and Chechik, 2007). We can also employ this criterion to address the relevance of pattern categories, comparing

$$I(\mathbf{\text{S}};\mathbf{\text{B}})-\Delta {I}_{th}^{I}\le I(\mathbf{\text{S}};\mathbf{\text{C}}).$$

(13)

On the other hand, under *criterion II*, the pattern categories are relevant to information transmission if

$$I(\mathbf{\text{S}};\mathbf{\text{T}})\le I(\mathbf{\text{S}};\mathbf{\text{B}})-\Delta {I}_{th}^{II}.$$

(14)

Therefore, pattern categories are relevant if pattern timings transmit little information, irrespective of the information carried by categories themselves. Remarkably, if $\Delta {I}_{th}^{I}=\Delta {I}_{th}^{II},$ the pattern categories are relevant (irrelevant) if and only if the pattern timings are irrelevant (relevant) (compare Eqs. 12 and 14).

From the bijectivity between **B** and (**T**; **C**) (see Section 3.4), we find that *criterion II* can be written as

$$\Delta {I}_{th}^{II}-{\Delta}_{SR}\le I(\mathbf{\text{S}};\mathbf{\text{C}}).$$

(15)

As a result, under *criterion II*, the relevance of an aspect depends not only on the information conveyed by that very aspect – as in *criterion I*– but also on the synergy/redundancy between that aspect and the complementary ones. Both criteria coincide when $\Delta {I}_{th}^{I}+\Delta {I}_{th}^{II}=I(\mathbf{\text{S}};\mathbf{\text{B}})+{\Delta}_{SR}$(compare Eqs. 13 and 15), implying that equality in the thresholds is neither necessary nor sufficient to obtain a coincidence.

By using *criterion I* for the relevance of pattern timing and *criterion II* for the relevance of pattern categories, the information that is repeated in both aspects (redundant information) only contributes to the relevance of the pattern timing. However, the information that is carried in both aspects simultaneously (synergistic information) only contributes to the relevance of the pattern categories. The discrepancies in this way induced are shown in the following example. Consider that *I*(**S**; **R**)=10bits/s, *I*(**S**; **T**) =9bits/s, *I*(**S**; **C**)=10bits/s and Δ*I*_{th}=2bits/s. Under *criterionII*, **C** is irrelevant because *I*(**S**; **B**) −*I*(**S**; **T**) =1bit/s. Nevertheless, under *criterion I*, **C** is necessarily relevant, since it constitutes a sufficient statistics (*I*(**S**; **B**)= *I*(**S**; **C**)). Analogous results are obtained for the relevance of pattern timing. In addition, different thresholds are used for the relevance of each aspect (compare Eqs. 12 and 14). In the previous example, the pattern timing is relevant only if *I*(**S**; **T**)>8bits/s whereas the pattern categories are relevant only if *I*(**S**; **C**) >2bits/s, showing an unjustified asymmetry between both aspects.

Many studies have interpreted that pattern-based codes function as *feature extractors*, where the identity of each stimulus feature (the *what*) is represented in the pattern category **C**, and the timing of each stimulus feature (the *when*), in the pattern temporal reference **T** (see Introduction and references therein). To assess this standard view, we formally define the *what* and the *when* in the stimulus, and relate them with the time and category information. In the next subsection, we determine the conditions that are necessary and sufficient for the standard view to hold. Finally, we show that small category-dependent changes in the timing of patterns (such as latencies) may induce departures from the standard view (altering both the amount and the composition of the information carried by **T** and **C**).

Since the stimulus **S** is composed of discrete features (see Methods for a discussion on continuous stimuli), it can also be written in terms of a time (**S**_{T}) and a category (**S**_{C}) representation, such that **S** and the pair (**S**_{T}, **S**_{C}) are related through a bijective map. We formally define the *what* in the stimulus as the category representation **S**_{C}, and the *when* as the time representation **S**_{T}. Indeed, **S**_{T} indicates *when* the stimulus features occurred, whereas **S**_{C} tags *what* features appeared.

The *stimulus entropy* is defined as the entropy rate *H*(**S**), while the stimulus *time entropy* and *category entropy* are the entropy rates *H*(**S**_{T}) and *H*(**S**_{C}), respectively. The time and category entropies are intimately related to *when* and *what* features happened: they are a measure of the variability in the time positions and categories of stimulus features, respectively. These quantities were previously defined for Poisson stimuli in Eyherabide and Samengo (2010), and here these definitions are generalized to encompass any stochastic stimulus. Since **S** and (**S**_{T}, **S**_{C}) are related through a bijective function,

$$H(\mathbf{\text{S}})=H({\mathbf{\text{S}}}_{\text{T}})+H({\mathbf{\text{S}}}_{\text{C}})-I({\mathbf{\text{S}}}_{\text{T}},{\mathbf{\text{S}}}_{\text{C}});$$

(16)

where the information rate *I*(**S**_{T}, **S**_{C}) is a measure of the redundancy between the time and category entropies of the stimulus. Since *I*(**S**_{T}, **S**_{C}) is always non-negative, **S**_{T} and **S**_{C} cannot be synergistic.

The standard view of the role of patterns formally implies that the category information *I*(**S**, **C**) (the time information *I*(**S**, **T**)) can be reduced to the mutual information *I*(**S**_{C}, **C**)(*I*(**S**_{T}, **T**)). Therefore, *H*(**S**_{C}) and *H*(**S**_{T}) must be upper-bounds for the category and time information, respectively. However, these bounds are not guaranteed by the mere presence of patterns in the neural response. Some cases may be more complicated because, for example, **S**_{C} and **S**_{T} may not be independent variables (see Section 2.3). A dependency between these two stimulus properties implies that the *what* and the *when* are not separable concepts.

In this section, we determine the conditions under which the standard interpretation holds: The category information represents the knowledge on the *what* in the stimulus, and the time information, the knowledge on the *when*. To that aim, we define a *canonical feature extractor* as a neuron model in which

$$I(\mathbf{\text{T}};{\mathbf{\text{S}}}_{\text{C}}|{\mathbf{\text{S}}}_{\text{T}})=0$$

(17a)

$$I\left(\mathbf{\text{C}};{\mathbf{\text{S}}}_{\text{T}}|{\mathbf{\text{S}}}_{\text{C}}\right)=0.$$

(17b)

Under each of these conditions, the time and category information become

$$I(\mathbf{\text{T}};\mathbf{\text{S}})=I\left(\mathbf{\text{T}};{\mathbf{\text{S}}}_{\text{T}}\right)\le H\left({\mathbf{\text{S}}}_{\text{T}}\right)$$

(18a)

$$I(\mathbf{\text{C}};\mathbf{\text{S}})=I(\mathbf{\text{C}};{\mathbf{\text{S}}}_{\text{C}})\le H({\mathbf{\text{S}}}_{\text{C}}).$$

(18b)

Consequently, the response pattern categories represent *what* stimulus features are encoded, whereas the pattern time positions represent *when* the stimulus features occur. In particular, the time and category information are upper-bounded by the stimulus time and category entropies, respectively.

Condition 17a implies that all the information *I*(**S**_{C}; **T**) is already contained in the information *I*(**S**_{T}; **T**). In other words, *I*(**S**_{C}; **T**) is completely redundant with *I*(**S**_{T}; **T**), and *I*(**S**_{C}; **T**)≤ *I*(**S**_{T}; **T**). In this sense, we say that the time information represents the *when* in the stimulus. Analogous implications can be obtained from condition 17b for the category representation **C**, by interchanging **T** with **C**, and **S**_{T} with **S**_{C} (see formal proof in Appendix F). Therefore, conditions 17 are necessary and sufficient to ensure that the standard view of the role of patterns in the neural code actually holds (see Section 4.1).

A canonical feature extractor does not require **T** and **C** to be independent nor conditionally independent given the stimulus. In other words, the time and category information may or may not be synergistic or redundant, and the timing (category) of each individual pattern may or may not be correlated with other pattern time positions (pattern categories) or even with pattern categories (pattern time positions). In addition, conditions 17 may also encompass situations in which some information about **S**_{C} (**S**_{T}) is carried by **T** (**C**), but not by **C** (**T**).

In order to see how synergy and redundancy behave in a canonical feature extractor, we replace Eqs. 18 in Eq. 8, and obtain

$$I(\mathbf{\text{B}};\mathbf{\text{S}})=I(\mathbf{\text{T}};{\mathbf{\text{S}}}_{\text{T}})+I(\mathbf{\text{C}};{\mathbf{\text{S}}}_{\text{C}})+{\Delta}_{SR}.$$

(19)

We find that, for a canonical feature extractor, the synergy/redundancy ${\Delta}_{SR}$ is lower-bounded by

$$-I({\mathbf{\text{S}}}_{\text{T}};{\mathbf{\text{S}}}_{\text{C}})\le {\Delta}_{SR};$$

(20)

(see proof in Appendix G). In other words, the synergy/redundancy term Δ_{SR} cannot be smaller than the redundancy – already present in the stimulus – between the timing and categories of stimulus features. In addition, the absence of redundancy in the stimulus (*I*(**S**_{T}; **S**_{C})=0) constrains the neural model to be non-redundant (Δ_{SR}≥ 0).

Consider a neural model in which **T**= *f*(**S**_{T};ψ_{T}) and **T**= *f*(**S**_{C};ψ_{C}), where ψ_{T} and ψ_{C} are independent sources of noise, such that *p*(ψ_{T}, ψ_{C}, **S _{T}**,

The independent channels of information may be regarded as the simplest canonical feature extractor. Since **T** and **C** are independent and conditionally independent given **S**, the time and category information add up to the pattern information (Δ_{SR}=0). An example of this model is shown in Figure Figure7.7. In the four simulations carried out, the neural responses consist of a sequence of four different patterns, associated with four different stimulus features, under the presence or absence of temporal jitter and categorical noise (see Section 2.3.2 Simulation 2 for a detailed description; Figure Figure2B2B shows examples of the different noise conditions). In Figure Figure7,7, the spike information is omitted because it coincides with the pattern information (all cases, one-sided *t*-test, *p*(10)=0.5). Indeed, by construction, all the information is transmitted by patterns, which can be univocally identified in the response. In agreement with the theoretical results (Eq. 18), the time and the category information are always upper-bounded by the stimulus time and category entropy, respectively (all cases, one-sided *t*-test, *p*(10)>0.4).

**Information transmitted by a canonical feature extractor under different noise conditions**. The left side of each panel shows the stimulus entropy, whereas the right side shows the pattern, time and category information. In all cases, Δ_{SR}= **...**

Comparing upper and lower panels of Figure Figure7,7, we show that the time information is degraded by the addition of temporal jitter (both cases, one-sided *t*-test, *p*(10)<0.001), while the category information remains constant (both cases, one-sided *t*-test, *p*(10)>0.14). Analogously, comparing left and right panels of Figure Figure7,7, we find that the addition of categorical noise decreases the category information (both cases, one-sided *t*-test, *p*(10)<0.001), while keeping the time information constant (Figures (Figures7A,B,7A,B, *I*(**S**; **T**^{A})=223.3±0.1bits/s, *I*(**S**; **T**^{B}) =222.8 ±0.1bits/s, one-sided *t*-test, *p*(10)=0.08; Figures Figures7C,D,7C,D, one-sided *t*-test, *p*(10)=0.5). This is expected since, by construction, the categorical noise only depends on the stimulus categories and affects solely the pattern categories, whereas the temporal jitter considered here only affects the pattern time positions, irrespective of their categories or the stimulus.

The example shown in Figure Figure77 turns out to be more complicated if the pattern timing depends on the pattern category, as occurs in latency codes (Gawne et al., 1996; Furukawa and Middlebrooks, 2002; Chase and Young, 2007; Gollisch and Meister, 2008). Indeed, in those cases, the comparison between the timing of response patterns and the timing of stimulus features carries information about the stimulus categories (*I*(**S**_{C}; **T**|**S**_{T})>0). As a result, Eq. 17a does not hold. Latency codes may be an intrinsic property of the encoding neuron, may result as a consequence of synaptic transmission (Lisman, 1997; Reinagel et al., 1999), or may either arise from the convention used to construct the pattern representation, for example, ascribing the timing of a pattern as the mean response time, the first or any other spike inside the pattern (Nelken et al., 2005; Eyherabide et al., 2008). In all these cases, a latency-like dependence between the time positions and categories of patterns may arise.

To assess the effect of different latencies associated with each pattern category on the neural response, consider the neural model used in Figure Figure7,7, except that now, the pattern latencies vary with the pattern category *b*, according to μ_{b}=1 +α_{μ} (4 −*b*). Here α_{μ} is the *latency index*, representing the difference between the latencies of consecutive pattern categories. Three values of α_{μ} were considered: 0, 2, and 4ms. When α_{μ}=0ms, all patterns have the same latencies. This case was analyzed in Figure Figure7.7. As α_{μ} increases, so does the latency difference of different patterns.

Due to the deterministic link between the pattern latencies and pattern categories, the pattern representations (**B**^{0}, **B**^{2}, and **B**^{4}), associated with the different values of α_{μ} are related bijectively. In addition, the category representation does not depend on α_{μ}. Only the time representation is altered by a change in the latency index, irrespective of the presence of absence of temporal jitter and categorical noise. Therefore, any change in the time information is immediately reflected in the synergy/redundancy term

$${\Delta}_{SR}^{X}=I(\mathbf{\text{S}};{\mathbf{\text{T}}}^{0})-I(\mathbf{\text{S}};{\mathbf{\text{T}}}^{x}).$$

(21a)

$$=-\left[H({\mathbf{\text{T}}}^{x})-H({\mathbf{\text{T}}}^{0})\right]+\left[H({\mathbf{\text{T}}}^{x}|\mathbf{\text{S}})-H({\mathbf{\text{T}}}^{0}|\mathbf{\text{S}})\right].$$

(21b)

Here, ${\Delta}_{SR}^{x}$ and **T**^{x} represent the synergy/redundancy term and the time representation, respectively, for α_{μ}=*x*ms.

The impact of different latencies is twofold. In the first place, the presence of categorical noise increments the temporal noise through the deterministic link between latencies and categories. Therefore, the time noise entropy (time information) when α_{μ}>0 is greater (less) than that when α_{μ}=0. However, this does not occur when the time and category representations are read out simultaneously. Indeed, given the category representation, any time representation for α_{μ}=*x*>0 can be univocally determined from the time representation for α_{μ}=0, and vice versa, counteracting the effect of the temporal noise. Therefore, the variation in the time noise entropy (*H*(**T**^{x}|**S**) −*H*(**T**^{0}|**S**) in Eq. 21) can be regarded as a source of synergy.

In the second place, the variation in the latencies modifies the inter-pattern time interval distribution, incrementing the time total entropy (and the time information) when α_{μ}>0 with respect to the case when α_{μ}=0. In addition, this variation introduces information about the pattern categories in the inter-pattern time interval, and consequently it also introduces information about the stimulus identities. For example, a short interval between two consecutive patterns indicates that the second patterns belongs to a category with a short latency. In consequence, the increment in the time total entropy (*H*(**T**^{x}) −*H*(**T**^{0}) in Eq. 21) can be regarded as a source of redundancy.

To illustrate these theoretical inferences, the results of the simulations are shown in Figure Figure8.8. As expected, when α_{μ}=*x*>0, the latencies alter the time information. However, they do not alter the pattern nor the category information, and thus any variation in the time information is compensated by an opposite variation in the synergy/redundancy term. Notice that the changes in the time information not only depend on the latency index, but also on the presence of temporal and categorical noise. Indeed, in the absence of categorical noise, (*H*(**T**^{x}|**S**) =*H*(**T**^{0}|**S**) =0, and thus Δ_{SR}≤0. The effect of the temporal jitter depends on its distribution as well as the distribution of the inter-pattern time intervals, so this analysis if left for future work.

**Examples of departures from the behavior of the canonical feature extractor: The effect of pattern-category dependent latencies**. *From left to right*: Absence (**A,C**) and presence (**B,D**) of categorical noise. *From top to bottom*: Absence (**A,B**) and presence **...**

In these examples we see that for non-canonical feature extractors, one can no longer say that the pattern categories represent the *what* in the stimulus and the pattern timings represent the *when*, not even in the absence of synergy/redundancy. As shown in Eq. 21, Δ_{SR} results from a complex trade-off between the effect of categorical noise on the total and noise time response entropies. This trade-off depends on the latency index and the amount of temporal noise in the system, as shown in Figure Figure88.

Latency-like effects may be involved in a translation from a pattern duration code into an inter-spike interval code (Reich et al., 2000; Denning and Reinagel, 2005). Indeed, bursts may increase the reliability of synaptic transmission (Lisman, 1997), making it more probable to occur at the end of the burst. In that case, the duration of the burst determines the latency of the postsynaptic firing. In particular, this indicates that bursts can be simultaneously involved in noise filtering and stimulus encoding, in spite of the belief that these two functions cannot coexist (Krahe and Gabbiani, 2004). Notice that here, latency codes have been studied for well-separated stimuli. However, if patterns are elicited close enough in time, they may interfere in a diversity of manners (Fellous et al., 2004), precluding the code from being read out. Although we cannot address all these cases in all generality, the framework proposed here is valid to address each particular case.

In this paper, we have focused on the analysis of temporal and categorical aspects, both in the stimulus and the response. Our results, however, are also applicable to other aspects. In the case of responses, these aspects can be latencies, spike counts, spike-timing variability, autocorrelations, etc. Examples of stimulus aspects are color, contrast, orientation, shape, pitch, position, etc. The only requirement is that the considered aspects be obtained as transformations of the original representation, as defined in Section 2.1 (see Section 4.2). The information transmitted by generic aspects can be analyzed by replacing **B** (**S**) with a vector representing the selected response (stimulus) aspects. The amount of synergy/redundancy between aspects is obtained from the comparison between the simultaneous and individual readings of the aspects. In addition, the results can be generalized for aspects defined as statistical (that is, non-deterministic) transformations of the neural response, or of the stimulus. The data processing inequality also holds in those cases (Cover and Thomas, 1991).

In this paper, we defined the category and the time information in terms of properties of the neural response. The category (time) information is the mutual information between the *whole stimulus* **S** and the categories **C** (timing **T**) of response patterns (see Figure Figure9A).9A). These definitions only require the neural response to be structured in patterns. No requirement is imposed on the stimulus, i.e., the stimulus need not be divided into features. Our definitions, hence, are not symmetric in the stimulus and the response. In some cases, however, the stimulus is indeed structured as a sequence of features. One may ask how the stimulus identity (the *what*) and timing (the *when*) is encoded in the neural response (see Figure Figure9B).9B). To that end, we defined the *what* in the stimulus in terms of the category representation (**S**_{C}), and the *when*, in terms of the time representation (**S**_{T}).

**Analysis of the role of spike patterns: relationship with the what and the when in the stimulus**.

These rigorous definitions allowed us to disentangle how the *what* and the *when* in the stimulus are encoded in the category and time representations of the neural response. We calculated the mutual information rates between different aspects of the stimulus and different aspects of the neural response (see Figure Figure9C).9C). In the standard view, the pattern categories are assumed to encode the *what* in the stimulus, and the timing of patterns, the *when* (Theunissen and Miller, 1995; Borst and Theunissen, 1999; Martinez-Conde et al., 2002; Krahe and Gabbiani, 2004; Alitto et al., 2005; Oswald et al., 2007; Eyherabide et al., 2008). These assumptions have been stated in qualitative terms. There are two different ways in which the standard view can be formalized as a precise assertion.

On one hand, the standard view can be seen as the assumption that the category (time) representation only conveys information about the *what* (the *when*). Evaluating this assumption involves the comparison between the information conveyed by the category (time) representation about the whole stimulus (dotted lines in Figure Figure9C)9C) with the information that this same representation conveys about the *what* (the *when*) in the stimulus (solid lines in Figure Figure9C).9C). Formally, this means to address whether *I*(**S**; **C**)= *I*(**S**_{C}; **C**) (whether *I*(**S**; **T**)= *I*(**S**_{T}; **T**)). In this sense, we say that the category (time) information only represents the *what* (the *when*) in the stimulus. A system complying with this first interpretation of the standard view was called a canonical feature extractor (see Section 3.7).

On the other hand, the second way to define the standard view rigorously is to assume that the *what* (the *when*) is completely encoded by the category (time) representation. Testing this second assumption involves the comparison between the information about the *what* (the *when*), conveyed by the category (time) representation (solid lines in Figure Figure9C)9C) and by the pattern representation of the neural response (dashed lines in Figure Figure9C).9C). Formally, it involves assessing whether *I*(**S**_{C}; **B**)= *I*(**S**_{C}; **C**) (whether *I*(**S**_{T}; **B**)= *I*(**S**_{T}; **T**)). In this sense, we say that all the information about the *what* (the *when*) in the stimulus is encoded in the category (time) representation of the neural response. A system for which these equalities hold is called a canonical feature interpreter. It is analogous to the canonical feature extractor, with the role of the stimulus and the response interchanged (see Appendix H).

The two formalizations of the standard view are complementary. The first one assesses how different aspects of the stimulus are encoded in each aspect on the neural response. The second one focuses on how each aspect of the stimulus is encoded in different aspects of the neural response. Thus, the second approach is a symmetric version of the first one. However, a canonical feature extractor might or might not be a canonical feature interpreter, and vice versa. A perfect correspondence between the *what* and the *when* on one side, and pattern timing and categories, on the other, is found for systems that are canonical feature extractors and canonical feature interpreters, simultaneously.

In order to understand a neural code, one needs to identify those aspects of the neural response that are relevant to information transmission. To that aim, two different paradigms have been used: *criterion I*, assessing the information that one aspect conveys about the stimulus, and *criterion II*, assessing the information loss due to ignoring that aspect (see Section 3.5). Previous studies have used *criterion I* to analyze the relevance of spike counts (Furukawa and Middlebrooks, 2002; Foffani et al., 2009), spike patterns (Reinagel et al., 1999; Eyherabide et al., 2008), and pattern timing (Denning and Reinagel, 2005; Gaudry and Reinagel, 2008; Eyherabide et al., 2009). However, when assessing the relevance of the complementary aspects, such as spike timing and internal structure of patterns, these studies have used *criterion II*. As a result, in these studies the relevance of the tested aspect is conditioned to the irrelevance of the other aspects.

There are cases where building a representation that preserves a definite response aspect is not evident (nor perhaps possible). Such is the case, for example, when assessing the differential roles of spike timing and spike count: It is not possible to build a representation preserving the timing of the spikes without preserving the spike count (see Section 3.3). It is instead possible to only preserve the spike count. Since the spike-count representation is a function of the spike-timing representation, one may argue that there is an intrinsic hierarchy between the two aspects. The same situation is encountered when evaluating the information encoded by the pattern representation, as compared to the spike representation (see Section 3.1). There, it was not possible to construct a representation only containing those aspects that had been discarded in the pattern representation. However, this is not the case when evaluating the differential role between pattern timing and pattern categories, or the relevance of a specific pattern category.

In the present study, we take advantage of both approaches. Firstly, we notice that pattern timing and pattern categories are complementary response aspects, and quantify the information preserved by each aspect (see Section 3.4). Then, we determine whether there is synergy or redundancy between the time and category information, which is formally equivalent to comparing the information preserved by (*criterion I*) and lost due to ignoring (*criterion II*) each of the two aspects. As a result, we gain insight on the relevance of each aspect as well as how the aspects interact to transmit information (see Section 4.4). These procedures can be extended to encompass any two different aspects of the neural response (see Section 4.5).

Notice that the role of correlations, both in time and/or across neurons, has been evaluated using *criterion II* (Brenner et al., 2000; Dayan and Abbott, 2001; Nirenberg et al., 2001; Petersen et al., 2002; Schneidman et al., 2003; Montemurro et al., 2007). However, these authors did not build two complementary representations of the neural response ignoring and preserving the correlations, as proposed here. Instead, they ignored correlations by constructing artificial neural responses (or artificial response probabilities) where different neurons were independent or conditionally independent. Thus, their analysis involves a comparison between the real and the artificial neural code. Our analysis, instead, is completely based on complementary reductions of the real neural response. Moreover, in previous studies, the artificial neural responses are not a transformed version of the real response in a well defined time window. Thus, in some cases, the difference between the information with and without preserving correlations is not guaranteed to be non-negative by the *data processing inequality* (Cover and Thomas, 1991).

In some previous studies, the information encoded by different response aspects was assessed, as here, by transforming each neural response window (**R**_{τ}) of size τ through functions, into the pattern representation (**B**_{τ}) (Furukawa and Middlebrooks, 2002; Petersen et al., 2002; Nelken et al., 2005; Gollisch and Meister, 2008). Examples of those response aspects are the first spike latencies, spike counts, spike-timing variabilities and first (second, third, etc.) spikes in a pattern. As a result, the information carried by the individual response aspects cannot be greater than that provided by the neural response in the same window (*I*(**S**; **B**_{τ})≤ *I*(**S**; **R**_{τ})), irrespective of the length τ (see Section 2.2). In other studies, however, the pattern representation was obtained by transforming the spike representation inside a sliding window of variable length: the length of the window depended on the category of the actual pattern. Then, was read out with time windows of size τ. That is the case, for example, when addressing the information conveyed by inter-spike intervals of length >38ms using words of length τ=14.8ms (Reich et al., 2000) and by patterns of length >104, >10, and >56ms, using time windows up to 64, 3.2, and 16ms, respectively (Reinagel and Reid, 2000; Eyherabide et al., 2008; Gaudry and Reinagel, 2008). Unlike the first approach, in this case the *data processing inequality* does not apply, since _{τ} is not a function of **R**_{τ}. Therefore, *I*(**S**;**B**_{τ}) can be larger or smaller than *I*(**S**; **R**_{τ}). However, when τ→∞, _{τ}=**B**_{τ}, so asymptotically, both approaches coincide.

One of the main goals of the analysis of the neural code is to identify the response aspects that are relevant to information transmission. In this context, two important questions arise: how relevant the chosen aspects are, and how autonomously they stand. Their relevance to information transmission is assessed with information-theoretical measures, as exemplified here with the category and time information (see Section 4.2). Their autonomy refers to whether each aspect transmits information by itself or not, and whether the transmitted information is shared by other aspects or not. The degree of autonomy is assessed by quantifying the synergy/redundancy term (Δ_{SR}) between the different aspects.

The concept of synergy/redundancy entails the comparison between the effect of the whole and the sum of the individual effects of the constituent parts. The concept requires the constituent parts to be univocally determined by the whole, as well as the whole to be completely determined given its constituent parts. In other words, the whole and the constituent parts must be related through a bijective function. In neuroscience, the synergy/redundancy between groups of neurons has been addressed by comparing the information carried by the group of neurons (the whole) and the sum of the information of each and every neuron from the group (the constituent parts) (Brenner et al., 2000; Schneidman et al., 2003). As a result, Δ_{SR} can be interpreted as a trade-off between synergy and redundancy (Schneidman et al., 2003).

Intuitively, the presence of synergy (Δ_{SR}>0) between two aspects indicates that, for many responses, the aspects must be read out simultaneously in order to obtain information about the stimulus. For some specific responses, however, one of the aspects may be enough to identify the stimulus. But on average, aspects cooperate. On the other hand, the presence of redundancy (Δ_{SR}<0) indicates that, for many responses, the information conveyed by both aspects overlaps. Therefore, some of the information that can be extracted from one aspect taken alone can also be extracted from the other aspect taken alone. There might still be a few individual responses for which it is necessary to read both aspects simultaneously to obtain information about the stimulus. But on average, messages tend to be replicated in the different aspects.

In the absence of synergy/redundancy (Δ_{SR}=0), the aspects might or might not be independent and conditionally independent given the stimulus (Nirenberg and Latham, 2003; Schneidman et al., 2003). If they are, then both aspects are fully autonomous. However, if they are not, then synergy and redundancy coexist. Some responses might require the simultaneous read out of both aspects. However, for other responses, at least one of the individual aspects might be enough to obtain information about the stimulus. In this case, by considering both aspects separately, one cannot recover the entire encoded information.

The main ideas in this paper can also be extended to encompass any neuron response aspects, different from pattern timing and pattern category. In particular, they allow us to analyze the information conveyed by different types of patterns and the synergy/redundancy between them, extending the formalism derived in Eyherabide et al. (2008). In addition, aspects may also be defined in continuous time since (Mackay and McCulloc, 1952). Even more, any neural population response can be represented as a single sequence of colored spikes, each color indicating the neuron that fired the spike (Brown et al., 2004). Therefore, single neuron codes and population codes can be analyzed under the same formalism.

For example, during the last decades, many studies have focused on assessing whether different neurons transmit information about different stimulus aspects (Gawne et al., 1996; Denning and Reinagel, 2005; Eyherabide et al., 2008). To that aim, different neurons (and different neural response aspects) have been interpreted as information channels (Dan et al., 1998; Montemurro et al., 2008; Krieghoff et al., 2009), often addressing whether they constitute independent channels of information (see Section 3.7). However, these studies have focused on whether the two aspects (or neurons) are independent and conditionally independent given the stimulus (Gawne and Richmond, 1993; Schneidman et al., 2003). Indeed, in this case, the response aspects constitute independent channels of information. However, these conditions do not identify which stimulus aspects are encoded by different neurons, nor they guarantee that they are independent.

To gain insight on the relation between stimulus and response aspects, we determine whether the neuron constitutes a canonical feature extractor and/or a canonical feature interpreter. For independent channels of information, the response aspects are canonical feature extractors, canonical feature interpreters, and also independent and conditionally independent given the stimulus. However, none of these conditions can be derived from the other. In effect, a canonical feature extractor or a canonical feature interpreter may or may not exhibit synergy or redundancy between the time and category information (see Section 3.7 and Appendix H). Moreover, even if **T** and **C** are independent and conditionally independent given the stimulus, **T** (**C**) may still convey information about **S**_{C} (**S**_{T}) once the information about **S**_{T} (**S**_{C}) has been read out. Formally, each of the equalities defining a canonical feature extractor or a canonical feature interpreter constitutes a relation between one aspect of the stimulus and one aspect of the response. Such relations cannot be derived from the independence or conditional independence between two aspects of the response. For the same reason, the *what* and the *when* are not guaranteed to be independent aspects.

Finally, the analysis performed in this work relies on the mutual information between the stimulus and different aspects of neural response, and thus it is related to both the encoding operation and the decoding operation (Shannon, 1948; Brown et al., 2004; Nelken and Chechik, 2007). Indeed, the mutual information is symmetric by definition (Cover and Thomas, 1991). Therefore, one can interpret the information between the stimulus and a particular aspect of the neural response from both points of view, characterizing both how the stimulus is encoded into a specific response aspect (Reich et al., 2000; Reinagel and Reid, 2000; Nelken et al., 2005; Eyherabide et al., 2008; Gollisch and Meister, 2008) and what can be inferred about the stimulus from a decoder that only decodes that specific aspect. To this aim, an explicit representation of each response aspect is needed, for example, as defined in Section 2.1. Such representations are not always available, as for example, in studies assessing the role of correlations (see Section 4.2).

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

This work was supported by the Alexander von Humboldt Foundation, the Consejo de Investigaciones Científicas y Técnicas, the Agencia de Promoción Científica y Tecnológica of Argentina, and Comisión Nacional de Energía Atómica.

The categorical noise is characterized by the probability *P*_{b}(*b*|*s*) that a stimulus feature *s* elicits a pattern response of category *b*. This probability is related to *P*_{r}(**r**|*s*), the probability of inducing the response **r** due to the feature *s*, according to

$${P}_{\text{b}}(b|s)={\displaystyle \sum _{\text{r}}{P}_{\text{r}}(\mathbf{\text{r}}|s),}$$

(A-1)

where the sum runs through all patterns of spikes **r** whose category is *b*.

Previous studies have shown that the information per unit time carried by the spike count decreases with the size of the response time window (Petersen et al., 2002; Montemurro et al., 2007). In this appendix, we formally prove this result and also that the information per unit time vanishes in the limit of long windows. We extend its validity not only for spikes, but for any response patterns, as defined in Section 2.1, irrespective of the number of pattern categories. To that aim, consider a representation η that only preserves the number of patterns in each response segment **R**_{τ} of length τ (Figure (Figure5A).5A). In this representation, two responses stretches ${\mathbf{\text{R}}}_{\tau}^{1}$ and are ${\mathbf{\text{R}}}_{\tau}^{2}$ different if and only if they contain a different number of patterns $(\eta ({\mathbf{\text{R}}}_{\tau}^{1})\ne \eta ({\mathbf{\text{R}}}_{\tau}^{2})),$ otherwise they are equal.

In a real experiment, patterns (and spikes) are not instantaneous (Mackay and McCulloc, 1952). Thus, without loss of generality, consider the time divided into time bins of size Δ*t* shorter than the shortest pattern. The number of events present in any response stretch **R**_{w} of length *w* bins is bounded by 0 ≤η_{w}≤*w*, and therefore *H*(η_{w}) ≤log(*w*+1). Hence, the entropy rate *H*(η) becomes zero, since

$$H(\eta )=\underset{w\to \infty}{\mathrm{lim}}\frac{H\left({\eta}_{w}\right)}{w}\le \underset{w\to \infty}{\mathrm{lim}}\frac{\mathrm{log}(w+1)}{w}=0.$$

(B-1)

As a result, the information rate carried by η about any other random variable vanishes. In particular, *I*(**S**;η) ≤*H*(η) =0. The result is valid for response patterns of any nature (see Section 2.1 for the definition and examples of patterns).

In this appendix, we prove that the information per unit time transmitted by the response set of pattern categories decreases with the length of the response time window, and it vanishes in the limit of long time windows. To this aim, we consider a representation Θ in which two response segments are indistinguishable if and only if they have the same pattern categories, irrespective of their temporal ordering (see Figure Figure5B).5B). Hence, two neural responses can be different in the category representation and equal in the Θ representation. Analogously to Appendix B, we only assume that the response events are not instantaneous.

We first prove the result for the case where the number |Σ_{b}| of possible different pattern categories is finite; a neural response **B** may be composed of several response patterns. This is indeed the most frequent situation in the real neural system (Mackay and McCulloc, 1952), valid for all the examples of pattern-based codes mentioned in Section 2.1 and throughout this paper. Consider that the neural response is read with words of length *w* bins, smaller than the shortest pattern. The number of patterns is bounded by 0 ≤η_{w}≤*w* (see Appendix B). In addition, each response pattern may belong to one out of |Σ_{b}| pattern categories. Thus, the number of possible different responses Θ_{w} in the representation Θ is upper-bounded by $\left|{\Theta}_{w}\right|\text{\hspace{0.17em}\hspace{0.17em}}\le {(w+1)}^{\left|{\Sigma}_{b}\right|}$ (Cover and Thomas, 1991). As a result, its entropy is upper-bounded by *H*(Θ_{w}) ≤log |Θ_{w}|, and its entropy rate is

$$H(\Theta )=\underset{w\to \infty}{\mathrm{lim}}\frac{H({\Theta}_{w})}{w}\le \underset{w\to \infty}{\mathrm{lim}}\left|{\Sigma}_{\Theta}\right|\frac{\mathrm{log}(w+1)}{w}=0.$$

(C-1)

Therefore, there is no mutual information rate between the response set of pattern categories and any other random variable. Particularly, *I*(**S**;Θ)=0.

We now generalize the result for infinite codes, under the only condition that patterns belonging to different categories have different durations. These codes can be regarded as academic examples since, in any real condition, they would be impractical due to the long time periods required to read out the codewords. Examples of such infinite codes are bursts codes with no restriction in their duration, inter-spike intervals or latencies divided into an infinite number of finite ranges and the number of spikes in arbitrarily long time response windows. In a neural response of size *w* bins, only patterns up to a length *w* can be read (see Section 4.3 for examples). In addition, a neural response may contain several patterns. Thus, the sum of the length of the patterns cannot be greater than the length of the response containing them. Under these conditions, the number of response sets of pattern categories |Θ_{w}| is upper-bounded by

$$\left|{\Theta}_{w}\right|\text{\hspace{0.17em}\hspace{0.17em}}\le {\displaystyle \sum _{k=0}^{w}p(k)};$$

(C-2)

where *p*(*k*) represents the number of partitions of the integer number *k*. By using the *Hardy-Ramanujan–Uspensky* asymptotic approximation (Apostol, 1990)

$$\left|{\Theta}_{w}\right|\text{\hspace{0.17em}}\le {\displaystyle \sum _{k=0}^{w}p(k)}\approx \mathbf{\text{C}}+{\displaystyle \sum _{k={k}_{0}}^{w}\frac{{\text{e}}^{A\sqrt{k}}}{\mathbf{\text{B}}k}\le \frac{{\text{e}}^{A\sqrt{w}}}{\mathbf{\text{B}}};}$$

(C-3)

where *A* and *B* are positive constants, *k*_{0} represents an integer for which the approximation is valid, and the right-most inequality is valid $\mathbf{\text{C}}={\sum}_{k=0}^{k={k}_{0}-1}p(k)$ for long enough words. Therefore, the entropy rate *H*(Θ) results

$$H(\Theta )=\underset{w\to \infty}{\mathrm{lim}}\frac{\mathrm{log}(|{\Theta}_{w}|)}{w}$$

(C-4a)

$$\text{\hspace{1em}\hspace{1em}\hspace{0.17em}\hspace{0.17em}}\approx \underset{w\to \infty}{\mathrm{lim}}\frac{A\mathrm{log}(\text{e})}{\sqrt{w}}-\underset{w\to \infty}{\mathrm{lim}}\frac{\mathrm{log}(\mathbf{\text{B}})}{w}$$

(C-4b)

$$=0.$$

(C-4c)

Thus, the entropy rate *H*(Θ) tends to zero, and consequently the mutual information rate that the response set of pattern categories can carry about any other random variable vanishes.

As mentioned previously, the pattern sequence **B** and the pair (**T**,**C**) carry the same information about the stimulus, since they are related through a bijective transformation. Therefore

$$I(\mathbf{\text{S}};\mathbf{\text{B}})=I(\mathbf{\text{S}};\mathbf{\text{T}},\mathbf{\text{C}})$$

(D-1a)

$$=I(\mathbf{\text{S}};\mathbf{\text{T}})+I(\mathbf{\text{S}};\mathbf{\text{C}}|\mathbf{\text{T}})+I(\mathbf{\text{S}};\mathbf{\text{C}})-I(\mathbf{\text{S}};\mathbf{\text{C}})$$

(D-1b)

$$\text{\hspace{0.17em}}=I(\mathbf{\text{S}};\mathbf{\text{T}})+I(\mathbf{\text{S}};\mathbf{\text{C}})-(I(\mathbf{\text{S}};\mathbf{\text{C}})-I(\mathbf{\text{S}};\mathbf{\text{C}}|\mathbf{\text{T}}))$$

(D-1c)

$$\text{\hspace{0.17em}}=I(\mathbf{\text{S}};\mathbf{\text{T}})+I(\mathbf{\text{S}};\mathbf{\text{C}})-I(\mathbf{\text{S}};\mathbf{\text{T}};\mathbf{\text{C}})$$

(D-1d)

$$\text{\hspace{0.17em}}=I(\mathbf{\text{S}};\mathbf{\text{T}})+I(\mathbf{\text{S}};\mathbf{\text{C}})+{\Delta}_{SR};$$

(D-1e)

where Eq. D-1e is obtained from Eq. D-1d by replacing ${\Delta}_{SR}=-I(\mathbf{\text{S}};\mathbf{\text{T}};\mathbf{\text{C}}).$ Here, *I*(*X*; *Y*; *Z*) =*I*(*X*; *Y*) −*I*(*X*; *Y*|*Z*) represents the triple mutual information (Cover and Thomas, 1991; Tsujishita, 1995).

The synergy/redundancy term (Δ_{SR}), defined in Eq. 9, can be written as

$${\Delta}_{SR}=I(\mathbf{\text{S}};\mathbf{\text{T}}|\mathbf{\text{C}})-I(\mathbf{\text{S}};\mathbf{\text{T}})$$

(E-2a)

$$\text{\hspace{0.17em}}=I(\mathbf{\text{S}};\mathbf{\text{C}}|\mathbf{\text{T}})-I(\mathbf{\text{S}};\mathbf{\text{C}})$$

(E-2b)

$$\text{\hspace{0.17em}}=I(\mathbf{\text{T}};\mathbf{\text{C}}|\mathbf{\text{S}})-I(\mathbf{\text{T}};\mathbf{\text{C}}).$$

(E-2c)

Hence, the upper- and lower-bounds of Δ_{SR} are

$$-\mathrm{min}\left\{\begin{array}{c}I(\mathbf{\text{T}};\mathbf{\text{C}})\\ I(\mathbf{\text{S}};\mathbf{\text{T}})\\ I(\mathbf{\text{S}};\mathbf{\text{C}})\end{array}\right\}\le {\Delta}_{SR}\le \mathrm{min}\left\{\begin{array}{c}I(\mathbf{\text{T}};\mathbf{\text{C}}|\mathbf{\text{S}})\\ I(\mathbf{\text{S}};\mathbf{\text{T}}|\mathbf{\text{C}})\\ I(\mathbf{\text{S}};\mathbf{\text{C}}|\mathbf{\text{T}})\end{array}\right\}.$$

(E-3)

In addition, these upper- and lower-bounds are related through Eqs. E-2, in such a way that

$$\mathrm{min}\left\{\begin{array}{c}I(\mathbf{\text{T}};\mathbf{\text{C}})\\ I(\mathbf{\text{S}};\mathbf{\text{T}})\\ I(\mathbf{\text{S}};\mathbf{\text{C}})\end{array}\right\}=I(X,Y)\iff \mathrm{min}\left\{\begin{array}{c}I(\mathbf{\text{T}};\mathbf{\text{C}}|\mathbf{\text{S}})\\ I(\mathbf{\text{S}};\mathbf{\text{T}}|\mathbf{\text{C}})\\ I(\mathbf{\text{S}};\mathbf{\text{C}}|\mathbf{\text{T}})\end{array}\right\}=I(X,Y|Z);$$

(E-4)

where *X*, *Y*, and *Z* represent the variables **S**, **T**, and **C** in any order. This proves the upper- and lower-bounds for the synergy/redundancy term Δ_{SR} of Eq. 10.

The information that a representation **X** of the neural response conveys about the stimulus can be decomposed as

$$I(\mathbf{\text{S}};\mathbf{\text{X}})=I({\mathbf{\text{S}}}_{\text{T}};\mathbf{\text{X}})+I({\mathbf{\text{S}}}_{\text{C}};\mathbf{\text{X}})+{\Delta}_{SR}^{X};$$

(F-1)

where *I*(**S**_{T}; **X**) is the information conveyed by **X** about the *when* in the stimulus, and *I*(**S**_{C}; **X**) is the information conveyed by **X** the *what*. Here, ${\Delta}_{SR}^{X}$ represents the synergy/redundancy between the information conveyed about the *when* and the *what*, and it is given by

$${\Delta}_{SR}^{X}=-I({\mathbf{\text{S}}}_{\text{T}};{\mathbf{\text{S}}}_{\text{C}};\mathbf{\text{X}});$$

(F-2)

which is lower-bounded by the redundancy in the stimulus

$${\Delta}_{SR}^{X}\ge -I({\mathbf{\text{S}}}_{\text{T}};{\mathbf{\text{S}}}_{\text{C}}).$$

(F-3)

Tighter upper- and lower-bounds for ${\Delta}_{SR}^{X}$ can be derived analogously to the ones derived for Δ_{SR} (see Eq. 10), as well as analogous conditions for the absence of either synergy or redundancy between *I*(**S**_{T}; **X**) and *I*(**S**_{C}; **X**).

Notice that when ${\Delta}_{SR}^{X}>0,$ the information provided about the stimulus is greater than the sum of the information about *when* and *what* stimulus features happen, i.e.,

$$I(\mathbf{\text{S}};\mathbf{\text{X}})>I({\mathbf{\text{S}}}_{\text{T}};\mathbf{\text{X}})+I({\mathbf{\text{S}}}_{\text{C}};\mathbf{\text{X}}).$$

(F-4)

This may occur, for example, if the latency in the response depends on the feature category. In this case, the information that the time representation **T** carries about the time positions of stimulus features **S**_{T} might be increased due to the knowledge of the feature categories **S**_{C}. In conclusion, there is an information component that is not uniquely related to either *when* or *what*: it refers to both.

In the case that *I*(**S**_{C}; **X**|**S**_{T}) =0, the synergy/redundancy ${\Delta}_{SR}^{X}$ becomes

$${\Delta}_{SR}^{X}=-I({\mathbf{\text{S}}}_{\text{C}};\mathbf{\text{X}}).$$

(F-5)

Thus, *I*(**S**_{C}; **X**) is completely redundant with and lower than *I*(**S**_{T}; **X**). That is,

$$I(\mathbf{\text{S}};\mathbf{\text{X}})=I({\mathbf{\text{S}}}_{\text{T}};\mathbf{\text{X}})+I({\mathbf{\text{S}}}_{\text{C}};\mathbf{\text{X}})+{\Delta}_{SR}^{X}$$

(F-6a)

$$I({\mathbf{\text{S}}}_{\text{C}};\mathbf{\text{X}})+I({\mathbf{\text{S}}}_{\text{T}};\mathbf{\text{X}}|{\mathbf{\text{S}}}_{\text{C}})=I({\mathbf{\text{S}}}_{\text{T}};\mathbf{\text{X}})$$

(F-6b)

$$I({\mathbf{\text{S}}}_{\text{C}};\mathbf{\text{X}})\le I({\mathbf{\text{S}}}_{\text{T}};\mathbf{\text{X}}).$$

(F-6c)

Notice that this is the case of the time representation in a canonical feature extractor. Indeed, the definition of a canonical feature extractor states that *I*(**S**_{C}; **T**|**S**_{T}) =0, and consequently

$${\Delta}_{SR}^{T}=-I({\mathbf{\text{S}}}_{\text{C}};\mathbf{\text{T}}).$$

(F-7)

The implications of condition 17a mentioned in Section 3.7 follow directly from this equation. By interchanging **S**_{T} and **S**_{C} in Eqs. F-5 and F-6, analogous conclusions can be derived for the category representation.

To prove Eq. 20, we expand

$$I(\mathbf{\text{T}},{\mathbf{\text{S}}}_{\text{T}};\mathbf{\text{C}},{\mathbf{\text{S}}}_{\text{C}})=$$

$$=I\left({\mathbf{\text{S}}}_{\text{T}};{\mathbf{\text{S}}}_{\text{C}}\right)+\underset{=0(Eq.17a)}{\underbrace{I\left(\mathbf{\text{T}};{\mathbf{\text{S}}}_{\text{C}}|{\mathbf{\text{S}}}_{\text{T}}\right)}}+\underset{=0(Eq.17b)}{\underbrace{I\left({\mathbf{\text{S}}}_{\text{T}};\mathbf{\text{C}}|{\mathbf{\text{S}}}_{\text{C}}\right)}}+I\left(\mathbf{\text{T}};\mathbf{\text{C}}|{\mathbf{\text{S}}}_{\text{T}},{\mathbf{\text{S}}}_{\text{C}}\right)$$

(G-1a)

$$=I\left(\mathbf{\text{T}};\mathbf{\text{C}}\right)+\underset{\ge 0}{\underbrace{I\left({\mathbf{\text{S}}}_{\text{T}};\mathbf{\text{C}}|\mathbf{\text{T}}\right)}}+\underset{\ge 0}{\underbrace{I\left(\mathbf{\text{T}};{\mathbf{\text{S}}}_{\text{C}}|\mathbf{\text{C}}\right)}}+\underset{\ge 0}{\underbrace{I\left({\mathbf{\text{S}}}_{\text{T}};{\mathbf{\text{S}}}_{\text{C}}|\mathbf{\text{T}},\mathbf{\text{C}}\right)}}.$$

(G-1b)

Applying both conditions 17a and 17b, the second and third term of Eq. G-1a vanish respectively, and the synergy/redundancy between the time and category information (Δ_{SR}) is lower-bounded by

$$-I({\mathbf{\text{S}}}_{\text{T}};{\mathbf{\text{S}}}_{\text{C}})\le {\Delta}_{SR}.$$

(G-2)

This is the bound that we wanted to prove.

We define a *canonical feature interpreter* as a neuron model in which

$$I(\mathbf{\text{C}};{\mathbf{\text{S}}}_{\text{T}}|\mathbf{\text{T}})=0$$

(H-3a)

$$I(\mathbf{\text{T}};{\mathbf{\text{S}}}_{\text{C}}|\mathbf{\text{C}})=0.$$

(H-3b)

Under each of these conditions, the information conveyed by the neural response **B** about the *what* (*I*(**B**; **S**_{C})) and about the *when* (*I*(**B**; **S**_{T})) becomes

$$I\left(\mathbf{\text{B}};{\mathbf{\text{S}}}_{\text{T}}\right)=I\left(\mathbf{\text{T}};{\mathbf{\text{S}}}_{\text{T}}\right)\le H\left({\mathbf{\text{S}}}_{\text{T}}\right)$$

(H-4a)

$$I\left(\mathbf{\text{B}};{\mathbf{\text{S}}}_{\text{C}}\right)=I\left(\mathbf{\text{C}};{\mathbf{\text{S}}}_{\text{C}}\right)\le H\left({\mathbf{\text{S}}}_{\text{C}}\right).$$

(H-4b)

Consequently, the *what* (the *when*) in the stimulus is completely represented in the category (time) representation. In other words, *I*(**S**_{C}; **T**) (*I*(**S**_{T}; **C**)) is completely redundant with *I*(**S**_{C}; **C)** (*I*(**S**_{T}; **T**), and *I*(**S**_{C}; **T**) ≤*I*(**S**_{C}; **C)** (*I*(**S**_{T}; **C**) ≤*I*(**S**_{T}; **T**)). The canonical feature interpreter is analogous to the canonical feature extractor. In fact, it can be obtained by interchanging the role of the stimulus and the response in Section 3.7.

- Abeles M., Gat I. (2001). Detecting precise firing sequences in experimental data. J. Neurosci. Methods 107, 141–15410.1016/S0165-0270(01)00364-8 [PubMed] [Cross Ref]
- Alitto H. J., Weyand T. H., Usrey W. M. (2005). Distinct properties of stimulus-evoked bursts in the lateral geniculate nucleus. J. Neurosci. 25, 514–52310.1523/JNEUROSCI.3369-04.2005 [PubMed] [Cross Ref]
- Apostol T. M. (1990). Modular Functions and Dirichlet Series in Number Theory, 2nd Edn New York: Springer-Verlag
- Benda J., Longtin A., Maler L. (2005). Spike-frequency adaptation separates transient communication signals from background oscillations. J. Neurosci. 25, 2312–232110.1523/JNEUROSCI.4795-04.2005 [PubMed] [Cross Ref]
- Borst A., Theunissen F. E. (1999). Information theory and neural coding. Nat. Neurosci. 2, 947–95710.1038/14731 [PubMed] [Cross Ref]
- Brenner N., Strong S. P., Korbele R., Bialek W., de Ruyter van Steveninck R. R. (2000). Synergy in a neural code. Neural. Comput. 12, 1531–155210.1162/089976600300015259 [PubMed] [Cross Ref]
- Brown E. N., Kass R. E., Mitra P. P. (2004). Multiple neural spike train data analysis: state-of-the-art and future challenges. Nat. Neurosci. 7, 456–46110.1038/nn1228 [PubMed] [Cross Ref]
- Chase S. M., Young E. D. (2007). First-spike latency information in single neurons increases when referenced to population onset. Proc. Natl. Acad. Sci. U.S.A. 104, 5175–518010.1073/pnas.0610368104 [PubMed] [Cross Ref]
- Cover T. M., Thomas J. A. (1991). Elements of Information Theory. New York: Wiley; 10.1002/0471200611 [Cross Ref]
- Dan Y., Alonso J. M., Usrey W. M., Reid R. C. (1998). Coding of visual information by precisely correlated spikes in the lateral geniculate nucleus. Nat. Neurosci. 1, 501–50710.1038/2217 [PubMed] [Cross Ref]
- Dayan P., Abbott L. F. (2001). Theoretical Neuroscience: Computational and Mathematical Modeling of Neural Systems. Cambridge, MA: MIT Press
- DeBusk B. C., DeBruyn E. J., Snider R. K., Kabara J. F., Bonds A. B. (1997). Stimulus-dependent modulation of spike burst length in cat striate cortical cells. J. Neurophysiol. 78, 199–213 [PubMed]
- Denning K. S., Reinagel P. (2005). Visual control of bursts priming in the anesthetized lateral geniculate nucleus. J. Neurosci. 25, 3531–353810.1523/JNEUROSCI.4417-04.2005 [PubMed] [Cross Ref]
- Eyherabide H. G., Rokem A., Herz A. V. M., Samengo I. (2008). Burst firing is a neural code in an insect auditory system. Front. Comput. Neurosci. 2: 3.10.3389/neuro.10.003.2008 [PMC free article] [PubMed] [Cross Ref]
- Eyherabide H. G., Rokem A., Herz A. V. M., Samengo I. (2009). Bursts generate a non-reducible spike pattern code. Front. Neurosci. 3: 1.10.3389/neuro.01.002.2009 [PMC free article] [PubMed] [Cross Ref]
- Eyherabide H. G., Samengo I. (2010). The information transmitted by spike patterns in single neurons. J. Physiol. (Paris) 104, 147–15510.1016/j.jphysparis.2009.11.018 [PubMed] [Cross Ref]
- Fellous J. M., Tiesinga P. H. E., Thomas P. J., Sejnowski T. J. (2004). Discovering spike patterns in neuronal responses. J. Neurosci. 24, 2989–300110.1523/JNEUROSCI.4649-03.2004 [PMC free article] [PubMed] [Cross Ref]
- Foffani G., Morales M. L., Aguilar J. (2009). Spike timing, spike count, and temporal information for the discrimination of tactile stimuli in the rat ventrobasal complex. J. Neurosci. 29, 5964–597310.1523/JNEUROSCI.4416-08.2009 [PubMed] [Cross Ref]
- Furukawa S., Middlebrooks J. C. (2002). Cortical representation of auditory space: information-bearing features of spike patterns. J. Neurophysiol. 87, 1749–1762 [PubMed]
- Gaudry K. S., Reinagel P. (2008). Information measure for analyzing specific spiking patterns and applications to lgn bursts. Netw. Comput. Neural Syst. 19, 69–9410.1080/09548980701819198 [PubMed] [Cross Ref]
- Gawne T. J., Kjaer T. W., Richmond B. J. (1996). Latency: Another potential code for feature binding in striate cortex. J. Neurophysiol. 76, 1356–1360 [PubMed]
- Gawne T. J., Richmond B. J. (1993). How independent are the messages carried by adjacent inferior temporal cortical neurons? J. Neurosci. 73, 2758–2771 [PubMed]
- Gollisch T., Meister M. (2008). Rapid neural coding in the retina with relative spike latencies. Science 319, 1108–111110.1126/science.1149639 [PubMed] [Cross Ref]
- Gütig R., Sompolinsky H. (2006). The tempotron: a neuron that learns spike timing-based decisions. Nat. Neurosci. 9, 420–42810.1038/nn1643 [PubMed] [Cross Ref]
- Hopfield J. J. (1995). Pattern recognition computation using action potential timing for stimulus representation. Nature 376, 33–3610.1038/376033a0 [PubMed] [Cross Ref]
- Krahe R., Gabbiani F. (2004). Burst firing in sensory systems. Nat. Rev. Neurosci. 5, 13–2310.1038/nrn1296 [PubMed] [Cross Ref]
- Krieghoff V., Brass M., Prinz W., Waszak F. (2009). Dissociating what and when of intentional actions. Front. Hum. Neurosci. 3 [PMC free article] [PubMed]
- Lisman J. E. (1997). Bursts as a unit of neural information: making unreliable synapses reliable. Trends Neurosci. 20, 38–4310.1016/S0166-2236(96)10070-9 [PubMed] [Cross Ref]
- Luna R., Hernández A., Brody C. D., Romo R. (2005). Neural codes for perceptual discrimination in primary somatosensory cortex. Nat. Neurosci. 8, 1210–121910.1038/nn1513 [PubMed] [Cross Ref]
- Mackay D. M., McCulloc W. S. (1952). The limiting information capacity of a neuronal link. Bull. Math. Biophys. 14, 127–13510.1007/BF02477711 [Cross Ref]
- Mancuso K., Hauswirth W. W., Li Q., Connor T. B., Kuchenbecker J. A., Mauck M. C., Neitz J., Neitz M. (2009). Gene therapy for red-green colour blindness in adult primates. Nature 461, 784–78810.1038/nature08401 [PMC free article] [PubMed] [Cross Ref]
- Marsat G., Proville R. D., Maler L. (2009). Transient signals trigger synchronous bursts in an identified population of neurons. J. Neurophysiol. 102, 714–72310.1152/jn.91366.2008 [PubMed] [Cross Ref]
- Martinez-Conde S., Macknik S. L., Hubel D. H. (2000). Microsaccadic eye movements and firing of single cells in the striate cortex of macaque monkeys. Nat. Neurosci. 3, 251–25810.1038/72961 [PubMed] [Cross Ref]
- Martinez-Conde S., Macknik S. L., Hubel D. H. (2002). The function of bursts of spikes during visual fixation in the awake primate lateral geniculate nucleus and primary visual cortex. Proc. Natl. Acad. Sci. U.S.A. 99, 13920–1392510.1073/pnas.212500599 [PubMed] [Cross Ref]
- Montemurro M. A., Panzeri S., Maravall M., Alenda A., Bale M. R., Brambilla M., Petersen R. S. (2007). Role of precise spike timing in coding of dynamic vibrissa stimuli. J. Neurophysiol. 98, 1871–188210.1152/jn.00593.2007 [PubMed] [Cross Ref]
- Montemurro M. A., Rasch M. J., Murayama Y., Logothetis N. K., Panzeri S. (2008). Phase-of-firing coding of natural visual stimuli in primary visual cortex. Curr. Biol. 18, 375–38010.1016/j.cub.2008.02.023 [PubMed] [Cross Ref]
- Nádasdy Z. (2000). Spike sequences and their consequences. J. Physiol. (Paris) 94, 505–52410.1016/S0928-4257(00)01103-7 [PubMed] [Cross Ref]
- Nelken I. (2008). Processing of complex sounds in the auditory system. Curr. Opin. Neurobiol. 18, 413–41710.1016/j.conb.2008.08.014 [PubMed] [Cross Ref]
- Nelken I., Chechik G. (2007). Information theory in auditory research. Hear. Res. 229, 94–10510.1016/j.heares.2007.01.012 [PubMed] [Cross Ref]
- Nelken I., Chechik G., Mrsic-Flogel T. D., King A. J., Schnupp J. W. H. (2005). Encoding stimulus information by spike numbers and mean response time in primary auditory cortex. J. Comput. Neurosci. 19, 199–22110.1007/s10827-005-1739-3 [PubMed] [Cross Ref]
- Nemenman I., Bialek W., de Ruyter van Steveninck R. R. (2004). Entropy and information in neural spike trains: Progress on the sampling problem. Phys. Rev. E 69, 056111.10.1103/PhysRevE.69.056111 [PubMed] [Cross Ref]
- Nirenberg S., Carcieri S. M., Jacobs A. L., Latham P. E. (2001). Retinal ganglion cells act largely as independent decoders. Nature 411, 698–70110.1038/35079612 [PubMed] [Cross Ref]
- Nirenberg S., Latham P. E. (2003). Decoding neuronal spike trains: How important are correlations. Proc. Natl. Acad. Sci. U.S.A. 100, 7348–735310.1073/pnas.1131895100 [PubMed] [Cross Ref]
- Oswald A. M. M., Doiron B., Maler L. (2007). Interval coding. I. burst interspike intervals as indicators of stimulus intensity. J. Neurophysiol. 97, 2731–274310.1152/jn.00987.2006 [PubMed] [Cross Ref]
- Panzeri S., Senatore R., Montemurro M. A., Petersen R. S. (2007). Correcting for the sampling bias problem in spike train information measures. J. Neurophysiol. 98, 1064–107210.1152/jn.00559.2007 [PubMed] [Cross Ref]
- Petersen R. S., Panzeri S., Diamond M. E. (2002). The role of individual spikes and spike patterns in population coding of stimulus location in rat somatosensory cortex. BioSystems 67, 187–19310.1016/S0303-2647(02)00076-X [PubMed] [Cross Ref]
- Poulos D. A., Mei J., Horch K. W., Tuckett R. P., Wei J. Y., Cornwall M. C., Burgess P. R. (1984). The neural signal for the intensity of a tactile stimulus. J. Neurosci. 4, 2016–2024 [PubMed]
- Reich D. S., Mechler F., Purpura K. P., Victor J. D. (2000). Interspike intervals, receptive fields, and information encoding in primary visual cortex. J. Neurosci. 20, 1964–1974 [PubMed]
- Reinagel P., Godwin D., Sherman S. M., Koch C. (1999). Encoding of visual information by lgn bursts. J. Neurophysiol. 81, 2558–2569 [PubMed]
- Reinagel P., Reid R. C. (2000). Temporal coding of visual information in the thalamus. J. Neurosci. 20, 5392–5400 [PubMed]
- Reinagel P., Reid R. C. (2002). Precise firing events are conserved across neurons. J. Neurosci. 22, 6837–6841 [PubMed]
- Rice J. A. (1995). Mathematical Statistics and Data Analysis, 2nd Edn Belmont, CA: Duxbury Press, Wadsworth Publishing Company
- Rokem A., Watzl S., Gollisch T., Stemmler M., Herz A. V. M., Samengo I. (2006). Spike-timing precision underlies the coding efficiency of auditory receptor neurons. J. Neurophysiol. 95, 2541–255210.1152/jn.00891.2005 [PubMed] [Cross Ref]
- Rolen S. H., Caprio J. (2007). Processing of bile salt odor information by single olfactory bulb neurons in the channel catfish. J. Neurophysiol. 97, 4058–406810.1152/jn.00247.2007 [PubMed] [Cross Ref]
- Sabourin P., Pollack G. S. (2009). Behaviorally relevant burst coding in primary sensory neurons. J. Neurophysiol. 102, 1086–109110.1152/jn.00370.2009 [PubMed] [Cross Ref]
- Schneidman E., Bialek W., Berry M. J. (2003). Synergy, redundancy, and independence in population codes. J. Neurosci. 23, 11539–11553 [PubMed]
- Shannon C. E. (1948). A mathematical theory of communication. Bell Syst. Tech. J. 27, 379–423
- Strong S. P., Koberle R., de Ruyter van Steveninck R. R., Bialek W. (1998). Entropy and information in neural spike trains. Phys. Rev. Lett. 80, 197–20010.1103/PhysRevLett.80.197 [Cross Ref]
- Theunissen F., Miller J. P. (1995). Temporal encoding in nervous systems: a rigorous definition. J. Comput. Neurosci. 2, 149–16210.1007/BF00961885 [PubMed] [Cross Ref]
- Tsujishita T. (1995). On triple mutual information. Adv. Appl. Math. 16, 269–27410.1006/aama.1995.1013 [Cross Ref]
- Victor J. D. (2002). Binless strategies for estimation of information in neural data. Phys. Rev. E 05190310.1103/PhysRevE.66.051903 [PubMed] [Cross Ref]

Articles from Frontiers in Computational Neuroscience are provided here courtesy of **Frontiers Media SA**

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |