Point process observation model of neural spiking activity

We assume that the spike train that is the observation process of our state-space model can be represented as a point process (

Brown 2005;

Brown et al. 2003;

Daley and Vere-Jones 2003;

Kass et al. 2005). A point process is a time series of random binary (0, 1) events that occur in continuous time. For neural spike train, the 1’s are the individual spike times and the 0’s are the times of no spike occurrences. Given a trial interval (0,

*T*], let

*N*(

*t*) be the number of spikes counted in interval (0,

*t*] for

*t* (0,

*T*] in some trial. A point process model of a neural spike train can be completely characterized by its conditional intensity function,

*λ*(

*t*|

*H*_{t}), defined as

where

*H*_{t} denotes the history of spikes up to time

*t* (

Brown 2005;

Brown et al. 2003;

Daley and Vere-Jones 2003). It follows from

Eq. 1 that given the history up to time

*t*, the probability of a single spike in a small interval (

*t*,

*t* + Δ] is approximately

*λ* (

*t*|

*H*_{t})Δ. The conditional intensity function generalizes the definition of the rate function of a Poisson process to a rate function that is history dependent. Because the conditional intensity function characterizes a point process, defining a model for the conditional intensity function defines a model for the spike train.

To facilitate presentation of our model, we present the specific form of the conditional intensity function in discrete time. Assume that a repeated trial experiment is conducted with

*K* trials of the same stimulus or task and neural activity is simultaneously recorded with the application of the stimulus or the execution of the task. We index the trials

*k* = 1, …,

*K*. We define the observation interval as (0,

*T*], where

*T* is the length of each trial. To obtain the discrete time representation of the conditional intensity function, we choose

*L* large and divide the time interval

*T* into subintervals of width Δ =

*TL*^{−1}. We choose

*L* large so that each subinterval contains at most one spike. We index the subintervals

= 1, …,

*L* and define

*n*_{k}_{,} to be the indicator, which is 1 if there is a spike in subinterval ((

− 1)Δ,

Δ] on trial

*k* and is 0 otherwise. We let

*n*_{k} = {

*n*_{k}_{,1}, …,

*n*_{k}_{,}_{L}} be the set of spikes recorded on trial

*k* and

*N*_{1:}_{k} = {

*n*_{1}, …,

*n*_{k}} be the set of spikes recorded from trial 1 to trial

*k*. We let

*H*_{k}_{,} denote the history of spikes in trial

*k*, up to time

Δ.

We assume that the conditional intensity function for the spike trains recorded in our experiment at time

Δ of trial

*k* has the form

where the component

*λ*^{S}(

Δ|

*θ*_{k}) describes the effect of the stimulus on the neural response and the component

*λ*^{H}(

Δ|

*γ*,

*H*_{k}_{,}) defines the effect of spike history on the neural response. The units of ,

*λ*_{k}(

Δ|

*θ*_{k},

*γ*,

*H*_{k}_{,}) are spikes per second. By convention (

Kass and Ventura 2001) we also take

*λ*^{S}(

Δ|

*θ*_{k}) in our analysis to have units of spikes per second and let

*λ*^{H}(

Δ|

*γ*,

*H*_{k}_{,}) be dimensionless. The idea to express the conditional intensity function as a product of a stimulus component and a temporal or spike-history component was first suggested by

Kass and Ventura (2001). An important appeal of this approach is that if the spike-history component does not contribute to the model fit (i.e., if its value is close to 1 for all times), then

Eq. 2 becomes an inhomogeneous Poisson process within each trial. In this way, Kass and Ventura pointed out that this formulation of the conditional intensity function makes it possible to measure the degree to which complex spiking models differed from more widely used Poisson models. Furthermore, the spiking model should explain two effects established as important in previous studies: the stimulus and spike-history effects (

Dayan and Abbott 2001). Although there are many ways to model these two effects,

Eq. 2 is equivalent to expressing the log of the conditional intensity function as the sum of these two effects (see

Eq. 5). Therefore we can view this formulation as approximating the first two terms in a Wiener kernel expansion of the log of a general function of the stimulus and the spike history (

Rieke et al. 1997). We have successfully used this formulation of the conditional intensity function in previous reports (

Frank et al. 2002,

2004;

Kass et al. 2005;

Truccolo et al. 2005). If this formulation of the conditional intensity function fails to provide a good description of the data, then this lack of fit will be apparent in the goodness-of-fit analysis described in the following text. The parameters

*γ* and

*θ*_{k} are subsequently defined.

To model the effect of the stimulus or task on the neural spiking activity we assume that the first component of the conditional intensity function of

Eq. 2 has the form

where the

*g*_{r}(

Δ) are a set of

*R* functions that model the within-trial stimulus or task-specific effect on spiking activity parameterized by

*θ*_{k} = {

*θ*_{k}_{,1}, …,

*θ*_{k,r}, …,

*θ*_{k,R}} for each trial

*k* = 1, …,

*K*. That is,

Eq. 3 defines for each trial how the stimulus or task modulates the spiking activity. There is a different

*θ*_{k} for each trial. We use a state-space model to define the relation between the

*θ*_{k} values on different trials. This dependence defines the between-trial dynamics. There is flexibility in the choice of the

*g*_{r}(

Δ) functions. Possible choices include polynomials, splines, and unit pulse functions. We use the unit pulse functions to represent the

*g*_{r}(

Δ) values so that, as subsequently described, PSTH is a special case of our state-space model ().

To model the effect of spike history on current neural spiking activity, we assume that the second component on the right side of

Eq. 2 can be defined as

for

*k* = 1, …,

*K*, where

*γ* = (

*γ*_{1}, …,

*γ*_{j}, …,

*γ*_{J}) is the vector of parameters that describes the dependence of the current spiking activity on recent spike history. In this model, we take the history to be

*H*_{k}_{,} = {

*n*_{k,}_{ −} _{J}, …,

*n*_{k}_{, − 1}}, which is the spiking activity during the preceding

*J* time intervals prior to time

Δ. We chose the linear autoregressive expansion on the right side of

Eq. 4 to relate the spike history to current spiking activity because it had been used successfully in many other spike train analyses (

Brillinger 1988;

Kass and Ventura 2001;

Kass et al. 2005;

Truccolo et al. 2005). The coefficients in the model in

Eq. 4 capture spike-history effects at a resolution that is at most Δ. In the model in

Eq. 4 we assume that the dependence on spike history defined by the parameter

*γ* does not change across trials. However, the effects of the spike history will be different at different times because the history of spiking activity preceding different times differs.

By combining

Eqs. 2,

3, and

4, we see that the conditional intensity function, which defines the observation equation of our state space model of neural activity, may be written as

State-space model of between-trial neural dynamics

To complete the definition of our state-space model and to model the relation of spiking activity between trials we define the state equation. We assume stochastic dependence between the *θ*_{k} = {*θ*_{k}_{,1}, …, *θ*_{k}_{,}_{r}, …, *θ*_{k}_{,}_{R}} defined by the random walk model

for

*k* = 1, …,

*K*, where

*ε*_{k} is an

*R*-dimensional Gaussian random vector with mean 0 and unknown covariance matrix Σ. We also assume that the initial vector

*θ*_{0} is unknown. The random walk model provides a simple way to let the coefficients of the stimulus functions be different on different trials subject to a plausible constraint that prevents the model estimation problem from becoming intractable. It is reasonable to assume that the coefficients on the different trails are different to model between-trial dynamics. However, this assumption creates a challenging estimation problem if there is not a further constraint because it means that there is a large number of free parameters to estimate in each analysis. The constraint imposed by the random walk model is the plausible assumption that stimulus coefficients on trial

*k* are close to the coefficients on trial

*k* − 1.

Equation 6 states this constraint in stochastic terms. That is, if no information is available, then given the coefficients on trial

*k* − 1, the average or expected value for the coefficients on trial

*k* is the value of the coefficient on trial

*k* − 1. The variance about this mean is defined by Σ. Stated otherwise, given the coefficients on trial

*k* − 1, and no other information, a good guess for values of the coefficients on trial

*k* are the coefficients on trial

*k* − 1. When spike train data are observed, the relation between

*θ*_{k} and

*θ*_{k}_{−1} is given by the Filter and Fixed-Interval algorithms in

APPENDIX A.

Equation 6 is a stochastic continuity constraint in that as the variance of

*ε*_{k} values goes to zero, the probability that

*θ*_{k} differs from

*θ*_{k}_{−1} goes to zero. The smaller (larger) the components of Σ, the smaller (larger) the average difference between

*θ*_{k} and

*θ*_{k}_{−1} and the smoother (larger) the between-trial differences in the stimulus component.

Equations 5 and

6 define our state-space generalized linear model (SS-GLM). This model can be used to analyze between-trial and within-trial neural spiking dynamics provided we can estimate the unknown parameter

*ψ* = (

*γ*,

*θ*_{0}, Σ). Because the values of

*θ*_{k} are unobservable and

*ψ* is an unknown parameter, we use the EM algorithm to compute their estimates by maximum likelihood (

Dempster et al. 1977). The EM algorithm is a well-known procedure for performing maximum-likelihood estimation when there is an unobservable process or missing observations. In our model, the unobservable or hidden processes are the

*θ*_{k} coefficients that define the between-trial dynamics. The EM algorithm has been used previously to estimate state-space models from point process observations with linear Gaussian state processes (

Smith and Brown 2003). Our EM algorithm is based on the same ideas and its derivation is given in

APPENDIX A. We denote the maximum-likelihood estimate of the model parameters

*ψ* as

= (

,

_{0} ).

To understand what is accomplished in the EM model fitting, we note that the log of the joint probability density of the spike train data and the state process or the complete data log likelihood is (

APPENDIX A,

Eq. A1)

Equation 7 is a penalized log-likelihood function and shows, as mentioned earlier, that the random walk model (

Eq. 6) for the stimulus coefficients imposes a multivariate stochastic smoothness or regularization constraint on the between-trial component of the conditional intensity or spike rate function (

Fahrmeir and Tutz 2001;

Kitagawa and Gersch 1996). The factor Σ

^{−1} is the smoothing or regularization matrix. In general, the larger the elements of Σ, the rougher the estimates of the between-trial component of the spike rate function, whereas the smaller the elements of Σ, the smoother the estimates of this component. Thus the maximum-likelihood estimate of Σ governs the smoothness of the spike rate function by determining the degree of smoothing that is most consistent with the data.

The PSTH and GLM are special cases of the SS-GLM

If we assume that the effect of the stimulus is the same between trials, then *θ*_{k} = {*θ*_{k}_{,1}, …, *θ*_{k}_{,}_{r}, …, *θ*_{k}_{,}_{R}} = *θ* is the same for all trials. If we further assume that there is no dependence on spike history, then the GLM model becomes an inhomogeneous Poisson process with conditional intensity function

for

*k* = 1, …,

*K* and

= 1, …,

*L*. Thus the conditional intensity function is the same for all trials

*k* = 1, …,

*K*. Next, we choose the basis functions

*g*_{r} to be unit pulse functions with equal length

for

= 1, …,

*L* (). For the bin in which

*g*_{r}(

Δ) = 1, the spiking activity obeys a homogeneous Poisson with mean rate exp(

*θ*_{r}). It is easy to show that because the basis functions in

Eq. 9 are orthogonal, the values exp(

*θ*_{r}) (

*r* = 1, …,

*R*) can be estimated independently of each other. Furthermore, it can be shown that the maximum-likelihood estimate of exp(

*θ*_{r}) is the number of spikes that occur in the bin in which

*g*_{r}(

Δ) = 1, summed across all trials, divided by the number of trials and the bin width. This is equal to the height of the corresponding bar in a PSTH. Since this is true for all pulse functions, it follows that the PSTH is the maximum-likelihood estimate of the conditional intensity function defined by

Eqs. 8 and

9. That is, we choose the unit pulse functions as the basis functions for the right side of

Eq. 3 so that the PSTH is a special case of our SS-GLM model. Therefore we term the model

Eq. 8, with the

*g*_{r} values being equal-sized unit pulse functions (

Eq. 9), the PSTH model.

The state-space model in

Eqs. 5 and

6 is an extension of the GLM model in

Truccolo et al. (2005). In that report, the stimulus effect is the same across trials. That is, the conditional intensity function has the form

Thus the conditional intensity at trial

*k* and time

Δ is a function of spike history in trial

*k* and the time since the onset of the stimulus. This conditional intensity differs between trials because, in any given trial, it depends on the spike history in that trial. The spike history for different trials or, more generally, for different spike events is almost always different. This model can therefore capture implicitly between-trial neural dynamics. We refer to this model as a GLM model.

Equations 2–

4 make the GLM model with a log-link function (

Eq. 10) a special case of our SS-GLM model when the coefficients of the stimulus term are the same for all trials.

Truccolo et al. (2005) showed that this formulation of the GLM provided a simple way to perform maximum-likelihood analyses of neural spike trains by using a point process likelihood discretized at a resolution of Δ, readily available GLM fitting procedures such as glmfit in Matlab (The MathWorks, Natick, MA) and point process–based goodness-of-fit procedures based on the time-rescaling theorem as subsequently described.

Model goodness-of-fit

Before making an inference from a statistical model, it is crucial to assess the extent to which the model describes the data. Measuring quantitatively the agreement between a proposed model and a neural spike train or point process time series is a more challenging problem than measuring agreement between a model and continuous data. Standard distance measures applied in analyses of continuous data, such as average sum of squared errors, are not designed for point process data. One alternative solution to this problem is to apply the time-rescaling theorem (

Brown et al. 2002;

Ogata 1988;

Truccolo et al. 2005) to transform point process measurements such as a neural spike train into a continuous measure appropriate for a goodness-of-fit assessment.

Once a conditional intensity function model has been fit to a neural spike train, we can compute rescaled times from the estimated conditional intensity function as

where

*u*_{k}_{,}_{m} is the time of spike

*m* in trial

*k* for

*m* = 1, …,

*M*_{k} is the number of spikes in trial

*k* for

*k* = 1, …,

*K*, and

is the vector of maximum-likelihood estimates of the parameters

*γ*. The vectors

*θ*_{k}_{|}_{K} = (

*θ*_{k}_{,1}, …,

*θ*_{k}_{,R}) are the estimates of the unobserved states

*θ*_{k} based on all

*K* trials and

on the maximum-likelihood estimates of the parameter

*ψ*. We obtain

*θ*_{k}_{|}_{K} from the E-step of the last EM iteration (

APPENDIX A). The

*z*_{k}_{,}_{m} values will be independent, uniformly distributed random variables on the interval [0, 1) if the conditional intensity function model is a good approximation to the true conditional intensity of the point process. Because the transformation in

Eq. 11 is one to one, measuring the agreement between the

*z*_{k}_{,}_{m} and a uniform distribution directly evaluates how well the original model agrees with the spike train data.

To evaluate whether the

*z*_{k}_{,}_{m} values are uniformly distributed, we order the

*z*_{k}_{,}_{m} from smallest to largest, denote the ordered values as

*z*_{k}_{*}, and plot the quantiles of the cumulative distribution function of the uniform density on [0, 1), defined as

*b*_{k}_{*} = (

*k** − 0.5)/

*K** against the

*z*_{k}_{*} for

*k** = 1, …,

*K**, where

is the total number of interspike intervals to which the model was fit. We term these Kolmogorov–Smirnov (K-S) plots. If the model is consistent with the data, then the points should lie on a 45° line. Approximate large-sample 95% confidence intervals for the degree of agreement between a model and the data may be constructed using the distribution of the K-S statistic as

*b*_{k}_{*} ± 1.36

*K**

^{−1/2} (

Brown et al. 2002).

Evaluating whether the *z*_{k}_{,}_{m} values are independent is in general difficult. Here we can take advantage of the structure of our problem and assess independence by further transforming the rescaled times to

where Φ(·) is the cumulative distribution function of a Gaussian random variable with zero mean and variance 1. If the rescaled times

*z*_{k}_{,}_{m} are independent with a uniform distribution on the interval [0, 1), then the rescaled times

are independent Gaussian random variables with mean 0 and variance 1 (and vice versa). Independence of a set of Gaussian random variables can be evaluated by analyzing the correlation structure because being uncorrelated is equivalent to being independent for Gaussian random variables. Thus if the

are uncorrelated, then the

are independent and thus the

*z*_{k,m} are independent. We assess the independence of the rescaled times by plotting the autocorrelation function (ACF) of the

(

Eq. 12) with its associated approximate confidence intervals calculated as ±z

_{1−(α/2)} *K**

^{−1/2}, where z

_{1−(α/2)} is the 1 − (

*α*/2) quantile of the Gaussian distribution with mean 0 and variance 1 (

Box et al. 1994). If the ACF of the Gaussian rescaled times is indistinguishable from zero up to lag

*r*, then we can conclude that the rescaled times are independent up to a lag

*r* consecutive interspike intervals.

Answering specific neurophysiological questions

Once the SS-GLM model has been estimated and its goodness-of-fit has been established, the principal objective is to use functions of the model and the model parameters to make inferences about the neural system being studied based on the recorded experimental data. An important property of maximum-likelihood estimation is invariance. That is, if

is the maximum-likelihood estimate of a parameter

*ψ* in a given model and

*h*(

*ψ*) is a function of

*ψ*, then the maximum-likelihood estimate of

*h*(

*ψ*) is

*h*(

) (

Pawitan 2001). In particular, all of the optimality properties of maximum-likelihood estimation that are true for

are also true for

*h*(

) (

Pawitan 2001). This seemingly obvious property is not true of all estimation procedures. We use this invariance principle explicitly in all of our analyses by simply computing

*h*(

) as the obvious and optimal estimate of

*h*(

*ψ*).

The five principal, neurophysiologically relevant functions, *h*(*ψ*), we compute using this important property of maximum-likelihood estimation are: *1*) the stimulus effect on the spiking activity; *2*) the effect of spike history (intrinsic and network dynamics) on current spiking activity; *3*) the spike rate per trial; *4*) between-trial comparisons of the spike rate function; and *5*) within-trial comparisons of the spike rate function. We show how these functions can be used to answer specific neurophysiological questions.

ESTIMATE OF THE STIMULUS EFFECT ON THE SPIKING ACTIVITY From

Eq. 3 it follows that the maximum-likelihood estimate of the stimulus effect,

*λ*^{S}(

Δ|

*θ*_{k}), at time

Δ and trial

*k*, is

where

*θ*_{k}_{|}_{K} = (

*θ*_{k}_{,1}, …,

*θ*_{k}_{,}_{R}) is the estimate of the hidden state

*θ*_{k} based on all

*K* trials and on the maximum-likelihood estimates of the parameter

*ψ*. We obtain

*θ*_{k}_{|}_{K} from the E-step of the last EM iteration (

APPENDIX A). The construction of simultaneous confidence intervals for the stimulus component is described in

APPENDICES B and

D.

ESTIMATE OF THE SPIKE-HISTORY FUNCTION We denote exp{*γ*_{j}}, *j* = 1, …, *J* as the spike-history function, which describes the effects on spiking propensity of the absolute and relative refractory period as well as other biophysical properties such as rebound hyperpolarization and network dynamics. This function estimates the extent to which current spiking activity can be predicted from recent spiking history. Its maximum-likelihood estimate is

for

*j* = 1, …,

*J*. Because each

_{j} is the maximum-likelihood estimate of

*γ*_{j} its approximate distribution is Gaussian (

Pawitan 2001). Therefore we can use the fact that exp{

_{j}} is approximately distributed as a lognormal random variable to compute the confidence intervals for exp{

*γ*_{j}} (

Smith and Brown 2003). The algorithm to compute the standard errors for the model’s spike-history component is given in

APPENDIX C.

ESTIMATE OF THE SPIKE RATE FUNCTION We define the expected number of spikes on trial

*k* in the interval [

*t*_{1},

*t*_{2}] as

. Furthermore, we define the spike rate function on interval [

*t*_{1},

*t*_{2}] as

The maximum-likelihood estimate of the spike rate function on trial *k* in the interval [*t*_{1}, *t*_{2}] is

where

_{1}Δ =

*t*_{1} and

_{2}Δ =

*t*_{2}. We compute confidence intervals for

*Λ*_{k}(

*t*_{1},

*t*_{2}) using the Monte Carlo method in

APPENDIX E.

BETWEEN-TRIAL COMPARISONS OF THE SPIKE RATE FUNCTION An important objective of these analyses is to compare spike rate functions between two trials. These comparisons are central to identifying a learning effect in behavioral learning experiments. We can do this by comparing the spike rates in the two trials in the same time interval. That is, for a given interval [

*t*_{1},

*t*_{2}] and trials

*m* and

*k* we can compute using the maximum-likelihood estimates in

Eq. 17
for any

*k* = 1, …,

*K* − 1 and

*m* >

*k*. We compute this probability using the Monte Carlo methods described in

APPENDIX F.

WITHIN-TRIAL COMPARISONS OF THE SPIKE RATE FUNCTION A second important objective of these analyses is to compare the spike rate function between two disjoint periods within a trial. For example, it is important to compare spiking activity during a baseline period with spiking activity during another period in the trial such as the scene period or the delay period (). Differences in neural activity between a baseline and a specified period are used as one criterion to classify the neurophysiological properties of neurons (

Wirth et al. 2003;

Wise and Murray 1999). We compare the two spike rates on trial

*k* given the two intervals [

*t*_{1},

*t*_{2}] and [

*t*_{3},

*t*_{4}] defined from

Eq. 16 as

and the maximum-likelihood estimate of the difference between these two spike rates is

If the entire confidence interval for the difference between the two spike rate functions is positive (negative), then it is reasonable to infer that the spike rate function in period [

*t*_{3},

*t*_{4}] is larger (smaller) than the spike rate in period [

*t*_{1},

*t*_{2}]. If the confidence interval for the difference in the two rates contains zero, then we conclude that the two spike rates do not differ. We compute confidence intervals for the difference (

*t*_{4} −

*t*_{3})

^{− 1} *Λ*_{k} (

*t*_{3},

*t*_{4}) − (

*t*_{2} −

*t*_{1})

^{− 1} *Λ*_{k}(

*t*_{1},

*t*_{2}) using the Monte Carlo methods in

APPENDIX E.

Experimental protocol: location-scene association learning task

To illustrate our framework in the analysis of actual experimental data, we analyze responses of six hippocampal neurons recorded from macaque monkeys performing a location-scene association task. A detailed description of the experiment is in

Wirth et al. (2003). The objective of that study was to characterize the response of the neurons to individual scenes and relate it to learning behavior. Here, we analyze in detail the responses of a single hippocampal neuron (see RESULTS) and we summarize the analyses of five other neurons.

The spikes of the neuron that we analyze in detail (RESULTS) were recorded during the presentation of the same scene across 55 trials. Each trial lasted 1,700 ms () and the spikes were recorded with a resolution of 1 ms. For this neuron, each trial time was divided into four epochs (). These were: the fixation period from 0 to 300 ms; the scene presentation period from 300 to 800 ms; the delay period from 800 to 1,500 ms; and the response period from 1,500 to 1,700 ms. Across these 55 trials, this neuron fired 1,911 spikes in response to this scene.