|Home | About | Journals | Submit | Contact Us | Français|
Kernel smoother and a time-histogram are classical tools for estimating an instantaneous rate of spike occurrences. We recently established a method for selecting the bin width of the time-histogram, based on the principle of minimizing the mean integrated square error (MISE) between the estimated rate and unknown underlying rate. Here we apply the same optimization principle to the kernel density estimation in selecting the width or “bandwidth” of the kernel, and further extend the algorithm to allow a variable bandwidth, in conformity with data. The variable kernel has the potential to accurately grasp non-stationary phenomena, such as abrupt changes in the firing rate, which we often encounter in neuroscience. In order to avoid possible overfitting that may take place due to excessive freedom, we introduced a stiffness constant for bandwidth variability. Our method automatically adjusts the stiffness constant, thereby adapting to the entire set of spike data. It is revealed that the classical kernel smoother may exhibit goodness-of-fit comparable to, or even better than, that of modern sophisticated rate estimation methods, provided that the bandwidth is selected properly for a given set of spike data, according to the optimization methods presented here.
Neurophysiologists often investigate responses of a single neuron to a stimulus presented to an animal by using the discharge rate of action potentials, or spikes (Adrian 1928; Gerstein and Kiang 1960; Abeles 1982). One classical method for estimating spike rate is the kernel density estimation (Parzen 1962; Rosenblatt 1956; Sanderson 1980; Richmond et al. 1990; Nawrot et al. 1999). In this method, a spike sequence is convoluted with a kernel function, such as a Gauss density function, to obtain a smooth estimate of the firing rate. The estimated rate is sometimes referred to as a spike density function. This nonparametric method is left with a free parameter for kernel bandwidth that determines the goodness-of-fit of the density estimate to the unknown rate underlying data. Although theories have been suggested for selecting the bandwidth, cross-validating with the data (Rudemo 1982; Bowman 1984; Silverman 1986; Scott and Terrell 1987; Scott 1992; Jones et al. 1996; Loader 1999a, b), individual researchers have mostly chosen bandwidth arbitrarily. This is partly because the theories have not spread to the neurophysiological society, and partly due to inappropriate basic assumptions of the theories themselves. Most optimization methods assume a stationary rate fluctuation, while the neuronal firing rate often exhibits abrupt changes, to which neurophysiologists, in particular, pay attention. A fixed bandwidth, optimized using a stationary assumption, is too wide to extract the details of sharp activation, while in the silent period, the fixed bandwidth would be too narrow and may cause spurious undulation in the estimated rate. It is therefore desirable to allow a variable bandwidth, in conformity with data.
The idea of optimizing bandwidth at every instant was proposed by Loftsgaarden and Quesenberry (1965). However, in contrast to the progress in methods that vary bandwidths at sample points only (Abramson 1982; Breiman et al. 1977; Sain and Scott 1996; Sain 2002; Brewer 2004), the local optimization of bandwidth at every instant turned out to be difficult because of its excessive freedom (Scott 1992; Devroye and Lugosi 2000; Sain and Scott 2002). In earlier studies, Hall & Schucany used the cross-validation of Rudemo and Bowman, within local intervals (Hall and Schucany 1989), yet the interval length was left free. Fan et al. applied cross-validation to locally optimized bandwidth (Fan et al. 1996), and yet the smoothness of the variable bandwidth was chosen manually.
In this study, we first revisit the fixed kernel method, and derive a simple formula to select the bandwidth of the kernel density estimation, similar to the previous method for selecting the bin width of a peristimulus time histogram (See Shimazaki and Shinomoto 2007). Next, we introduce the variable bandwidth into the kernel method and derive an algorithm for determining the bandwidth locally in time. The method automatically adjusts the flexibility, or the stiffness, of the variable bandwidth. The performance of our fixed and variable kernel methods are compared with established density estimation methods, in terms of the goodness-of-fit to underlying rates that vary either continuously or discontinuously. We also apply our kernel methods to the biological data, and examine their ability by cross-validating with data.
Though our methods are based on the classical kernel method, their performances are comparable to various sophisticated rate estimation methods. Because of the classics, they are rather convenient for users: the methods simply suggest bandwidth for the standard kernel density estimation.
In neurophysiological experiments, neuronal response is examined by repeatedly applying identical stimuli. The recorded spike trains are aligned at the onset of stimuli, and superimposed to form a raw density, as
where n is the number of repeated trials. Here, each spike is regarded as a point event that occurs at an instant of time ti (i=1,2,, N) and is represented by the Dirac delta function δ(t). The kernel density estimate is obtained by convoluting a kernel k(s) to the raw density xt,
Throughout this study, the integral that does not specify bounds refers to . The kernel function satisfies the normalization condition, , a zero first moment, , and has a finite bandwidth, . A frequently used kernel is the Gauss density function,
where the bandwidth w is specified as a subscript. In the body of this study, we develop optimization methods that apply generally to any kernel function, and derive a specific algorithm for the Gauss density function in the Appendix.
Assuming that spikes are sampled from a stochastic process, we consider optimizing the estimate to be closest to the unknown underlying rate λt. Among several plausible optimizing principles, such as the Kullback-Leibler divergence or the Hellinger distance, we adopt, here, the mean integrated squared error (MISE) for measuring the goodness-of-fit of an estimate to the unknown underlying rate, as
where E refers to the expectation with respect to the spike generation process under a given inhomogeneous rate λt. It follows, by definition, that Ext=λt.
In deriving optimization methods, we assume the Poisson nature, so that spikes are randomly sampled at a given rate λt. Spikes recorded from a single neuron correlate in each sequence (Shinomoto et al. 2003, 2005, 2009). In the limit of a large number of spike trains, however, mixed spikes are statistically independent and the superimposed sequence can be approximated as a single inhomogeneous Poisson point process (Cox 1962; Snyder 1975; Daley and Vere-Jones 1988; Kass et al. 2005).
Given a kernel function such as Eq. (3), the density function Eq. (2) is uniquely determined for a raw density Eq. (1) of spikes obtained from an experiment. A bandwidth w of the kernel may alter the density estimate, and it can accordingly affect the goodness-of-fit of the density function to the unknown underlying rate λt. In this subsection, we consider applying a kernel of a fixed bandwidth w, and develop a method for selecting w that minimizes the MISE, Eq. (4).
The integrand of the MISE is decomposed into three parts: . Since the last component does not depend on the choice of a kernel, we subtract it from the MISE, then define a cost function as a function of the bandwidth w:
Rudemo and Bowman suggested the leave-one-out cross-validation to estimate the second term of Eq. (5) (Rudemo 1982; Bowman 1984). Here, we directly estimate the second term with the Poisson assumption (See also Shimazaki and Shinomoto 2007).
By noting that λt=Ext, the integrand of the second term in Eq. (5) is given as
Here, to obtain the second equality, we used the assumption of the Poisson point process (independent spikes).
Equation (8) is composed of observable variables only. Hence, from sample sequences, the cost function is estimated as
In terms of a kernel function, the cost function is written as
The minimizer of the cost function, Eq. (10), is an estimate of the optimal bandwidth, which is denoted by w. The method for selecting a fixed kernel bandwidth is summarized in Algorithm 1. A particular algorithm developed for the Gauss density function is given in the Appendix.
The method described in Section 2.3 aims to select a single bandwidth that optimizes the goodness-of-fit of the rate estimate for an entire observation interval [a, b] . For a non-stationary case, in which the degree of rate fluctuation greatly varies in time, the rate estimation may be improved by using a kernel function whose bandwidth is adaptively selected in conformity with data. The spike rate estimated with the variable bandwidth wt is given by
Here we select the variable bandwidth wt as a fixed bandwidth optimized in a local interval. In this approach, the interval length for the local optimization regulates the shape of the function wt, therefore, it subsequently determines the goodness-of-fit of the estimated rate to the underlying rate. We provide a method for obtaining the variable bandwidth wt that minimizes the MISE by optimizing the local interval length.
To select an interval length for local optimization, we introduce the local MISE criterion at time t as
where is an estimated rate with a fixed bandwidth w. Here, a weight function localizes the integration of the squared error in a characteristic interval W centered at time t. An example of the weight function is once again the Gauss density function. See the Appendix for the specific algorithm for the Gauss weight function. As in Eq. (5), we introduce the local cost function at time t by subtracting the term irrelevant for the choice of w as
The optimal fixed bandwidth w is obtained as a minimizer of the estimated cost function:
The derivation follows the same steps as in the previous section. Depending on the interval length W, the optimal bandwidth w varies. We suggest selecting an interval length that scales with the optimal bandwidth as γ−1w. The parameter γ regulates the interval length for local optimization: With small γ(1) , the fixed bandwidth is optimized within a long interval; With large γ(~1) , the fixed bandwidth is optimized within a short interval. The interval length and fixed bandwidth, selected at time t, are denoted as and .
The locally optimized bandwidth is repeatedly obtained for different t([a,b]). Because the intervals overlap, we adopt the Nadaraya-Watson kernel regression (Nadaraya 1964; Watson 1964) of as a local bandwidth at time t:
The variable bandwidth obtained from the same data, but with different γ, exhibits different degrees of smoothness: With small γ(1) , the variable bandwidth fluctuates slightly; With large γ(~1) , the variable bandwidth fluctuates significantly. The parameter γ is thus a smoothing parameter for the variable bandwidth. Similar to the fixed bandwidth, the goodness-of-fit of the variable bandwidth can be estimated from the data. The cost function for the variable bandwidth selected with γ is obtained as
where is an estimated rate, with the variable bandwidth . The integral is calculated numerically. With the stiffness constant γ that minimizes Eq. (18), local optimization is performed in an ideal interval length. The method for optimizing the variable kernel bandwidth is summarized in Algorithm 2.
By using spikes sampled from an inhomogeneous Poisson point process, we examined the efficiency of the kernel methods in estimating the underlying rate. We also used a sequence obtained by superimposing ten non-Poissonian (gamma) sequences (Shimokawa and Shinomoto 2009), but there was practically no significant difference in the rate estimation from the Poissonian sequence.
Figure Figure11 displays the result of the fixed kernel method based on the Gauss density function. The kernel bandwidth selected by Algorithm 1 applies a reasonable filtering to the set of spike sequences. Figure 1(d) shows that a cost function, Eq. (10), estimated from the spike data is similar to the original MISE, Eq. (4), which was computed using the knowledge of the underlying rate. This demonstrates that MISE optimization can, in practice, be carried out by our method, even without knowing the underlying rate.
Figure Figure2(a)2(a) demonstrates how the rate estimation is altered by replacing the fixed kernel method with the variable kernel method (Algorithm 2), for identical spike data (Fig. 1(b)). The Gauss weight function is used to obtain a smooth variable bandwidth. The manner in which the optimized bandwidth varies in the time axis is shown in Fig. Fig.2(b):2(b): the bandwidth is short in a moment of sharp activation, and is long in the period of smooth rate modulation. Eventually, the sharp activation is grasped minutely and slow modulation is expressed without spurious fluctuations. The stiffness constant γ for the bandwidth variation is selected by minimizing the cost function, as shown in Fig. 2(c).
We wish to examine the fitting performance of the fixed and variable kernel methods in comparison with established density estimation methods, by paying attention to their aptitudes for either continuous or discontinuous rate processes. Figure 3(a) shows the results for sinusoidal and sawtooth rate processes, as samples of continuous and discontinuous processes, respectively. We also examined triangular and rectangular rate processes as different samples of continuous and discontinuous processes, but the results were similar. The goodness-of-fit of the density estimate to the underlying rate is evaluated in terms of integrated squared error (ISE) between them.
The established density estimation methods examined for comparison are the histogram (Shimazaki and Shinomoto 2007), Abramson’s adaptive kernel (Abramson 1982), Locfit (Loader 1999b), and Bayesian adaptive regression splines (BARS) (DiMatteo et al. 2001; Kass et al. 2003) methods, whose details are summarized below.
A histogram method, which is often called a peristimulus time histogram (PSTH) in neurophysiological literature, is the most basic method for estimating the spike rate. To optimize the histogram, we used a method proposed for selecting the bin width based on the MISE principle (Shimazaki and Shinomoto 2007).
Abramson’s adaptive kernel method (Abramson 1982) uses the sample point kernel estimate , in which the bandwidths are adapted at the sample points. Scaling the bandwidths as was suggested, where w is a pilot bandwidth, , and is a fixed kernel estimate with w. Abramson’s method is a two-stage method, in which the pilot bandwidth needs to be selected beforehand. Here, the pilot bandwidth is selected using the fixed kernel optimization method developed in this study.
The Locfit algorithm developed by Loader (1999b) fits a polynomial to a log-density function under the principle of maximizing a locally defined likelihood. We examined the automatic choice of the adaptive bandwidth of the local likelihood, and found that the default fixed method yielded a significantly better fit. We used a nearest neighbor based bandwidth method, with a parameter covering 20% of the data.
The BARS (DiMatteo et al. 2001; Kass et al. 2003) is a spline-based adaptive regression method on an exponential family response model, including a Poisson count distribution. The rate estimated with the BARS is the expected splines computed from the posterior distribution on the knot number and locations with a Markov chain Monte Carlo method. The BARS is, thus, capable of smoothing a noisy histogram without missing abrupt changes. To create an initial histogram, we used 4 [ms] bin width, which is small enough to examine rapid changes in the firing rate.
Figure Figure3(a)3(a) displays the density profiles of the six different methods estimated from an identical set of spike trains (n=10) that are numerically sampled from a sinusoidal or sawtooth underlying rate (2 [s]). Figure 3(b) summarizes the goodness-of-fit of the six methods to the sinusoidal and sawtooth rates (10 [s]) by averaging over 20 realizations of samples.
For the sinusoidal rate function, representing continuously varying rate processes, the BARS is most efficient in terms of ISE performance. For the sawtooth rate function, representing discontinuous non-stationary rate processes, the variable kernel estimation developed here is the most efficient in grasping abrupt rate changes. The histogram method is always inferior to the other five methods in terms of ISE performance, due to the jagged nature of the piecewise constant function.
We examine, here, the fixed and variable kernel methods in their applicability to real biological data. In particular, the kernel methods are applied to the spike data of an MT neuron responding to a random dot stimulus (Britten et al. 2004). The rates estimated from n=1, 10, and 30 experimental trials are shown in Fig. 4. Fine details of rate modulation are revealed as we increase the sample size (Bair and Koch 1996). The fixed kernel method tends to choose narrower bandwidths, while the variable kernel method tends to choose wider bandwidths in the periods in which spikes are not abundant.
The performance of the rate estimation methods is cross-validated. The bandwidth, denoted as wt for both fixed and variable, is obtained with a training data set of n trials. The error is evaluated by computing the cost function, Eq. (18), in a cross-validatory manner:
where the test spike times are obtained from n spike sequences in the leftovers, and . Figure 4(d) shows the performance improvements by the variable bandwidth over the fixed bandwidth, as evaluated by Eq. (19). The fixed and variable kernel methods perform better for smaller and larger sizes of data, respectively. In addition, we compared the fixed kernel method and the BARS by cross-validating the log-likelihood of a Poisson process with the rate estimated using the two methods. The difference in the log-likelihoods was not statistically significant for small samples (n=1, 5 and 10), while the fixed kernel method fitted better to the spike data with larger samples (n=20 and 30).
In this study, we developed methods for selecting the kernel bandwidth in the spike rate estimation based on the MISE minimization principle. In addition to the principle of optimizing a fixed bandwidth, we further considered selecting the bandwidth locally in time, assuming a non-stationary rate modulation.
We tested the efficiency of our methods using spike sequences numerically sampled from a given rate (Figs. 1 and and2).2). Various density estimators constructed on different optimization principles were compared in their goodness-of-fit to the underlying rate (Fig. 3). There is in fact no oracle that selects one among various optimization principles, such as MISE minimization or likelihood maximization. Practically, reasonable principles render similar detectability for rate modulation; the kernel methods based on MISE were roughly comparable to the Locfit based on likelihood maximization in their performances. The difference of the performances is not due to the choice of principles, but rather due to techniques; kernel and histogram methods lead to completely different results under the same MISE minimization principle (Fig. 3(b)). Among the smooth rate estimators, the BARS was good at representing continuously varying rate, while the variable kernel method was good at grasping abrupt changes in the rate process (Fig. 3(b)).
We also examined the performance of our methods in application to neuronal spike sequences by cross-validating with the data (Fig. 4). The result demonstrated that the fixed kernel method performed well in small samples. We refer to Cunningham et al. (2008) for a result on the superior fitting performance of a fixed kernel to small samples in comparison with the Locfit and BARS, as well as the Gaussian process smoother (Cunningham et al. 2008; Smith and Brown 2003; Koyama and Shinomoto 2005). The adaptive methods, however, have the potential to outperform the fixed method with larger samples derived from a non-stationary rate profile (See also Endres et al. 2008 for comparisons of their adaptive histogram with the fixed histogram and kernel method). The result in Fig. 4 confirmed the utility of our variable kernel method for larger samples of neuronal spikes.
We derived the optimization methods under the Poisson assumption, so that spikes are randomly drawn from a given rate. If one wishes to estimate spike rate of a single or a few sequences that contain strongly correlated spikes, it is desirable to utilize the information as to non-Poisson nature of a spike train (Cunningham et al. 2008). Note that a non-Poisson spike train may be dually interpreted, as being derived either irregularly from a constant rate, or regularly from a fluctuating rate (Koyama and Shinomoto 2005; Shinomoto and Koyama 2007). However, a sequence obtained by superimposing many spike trains is approximated as a Poisson process (Cox 1962; Snyder 1975; Daley and Vere-Jones 1988; Kass et al. 2005), for which dual interpretation does not occur. Thus the kernel methods developed in this paper are valid for the superimposed sequence, and serve as the peristimulus density estimator for spike trains aligned at the onset or offset of the stimulus.
Kernel smoother is a classical method for estimating the firing rate, as popular as the histogram method. We have shown in this paper that the classical kernel methods perform well in the goodness-of-fit to the underlying rate. They are not only superior to the histogram method, but also comparable to modern sophisticated methods, such as the Locfit and BARS. In particular, the variable kernel method outperformed competing methods in representing abrupt changes in the spike rate, which we often encounter in neuroscience. Given simplicity and familiarity, the kernel smoother can still be the most useful in analyzing the spike data, provided that the bandwidth is chosen appropriately as instructed in this paper.
We thank M. Nawrot, S. Koyama, D. Endres for valuable discussions, and the Diesmann Unit for providing the computing environment. We also acknowledge K. H. Britten, M. N. Shadlen, W. T. Newsome, and J. A. Movshon, who made their data available to the public, and W. Bair for hosting the Neural Signal Archive. This study is supported in part by a Research Fellowship of the Japan Society for the Promotion of Science for Young Scientists to HS and Grants-in-Aid for Scientific Research to SS from the MEXT Japan (20300083, 20020012).
Open Access This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
In this appendix, we derive definite MISE optimization algorithms we developed in the body of the paper with the particular Gauss density function, Eq. (3).
The estimated cost function is obtained, as in Eq. (10):
where, from Eq. (11),
A symmetric kernel function, including the Gauss function, is invariant to exchange of ti and tj when computing kw(ti−tj) . In addition, the correlation of the kernel function, Eq. (11), is symmetric with respect to ti and tj. Hence, we obtain the following relationships
where . A simplified equation is obtained by evaluating the MISE in an unbounded domain: a→−∞ and b→+∞. Using , we obtain
The local cost function is obtained, as in Eq. (15):
where, from Eq. (16),
For the summations in the local cost function, Eq. (15), we have equalities:
Hideaki Shimazaki, Email: pj.nekir.niarb@ikazamihs.
Shigeru Shinomoto, Email: pj.ca.u-otoyk.syhpcs@otomonihs.