|Home | About | Journals | Submit | Contact Us | Français|
For the analysis of neuronal cooperativity, simultaneously recorded extracellular signals from neighboring neurons need to be sorted reliably by a spike sorting method. Many algorithms have been developed to this end, however, to date, none of them manages to fulfill a set of demanding requirements. In particular, it is desirable to have an algorithm that operates online, detects and classifies overlapping spikes in real time, and that adapts to non-stationary data. Here, we present a combined spike detection and classification algorithm, which explicitly addresses these issues. Our approach makes use of linear filters to find a new representation of the data and to optimally enhance the signal-to-noise ratio. We introduce a method called “Deconfusion” which de-correlates the filter outputs and provides source separation. Finally, a set of well-defined thresholds is applied and leads to simultaneous spike detection and spike classification. By incorporating a direct feedback, the algorithm adapts to non-stationary data and is, therefore, well suited for acute recordings. We evaluate our method on simulated and experimental data, including simultaneous intra/extra-cellular recordings made in slices of a rat cortex and recordings from the prefrontal cortex of awake behaving macaques. We compare the results to existing spike detection as well as spike sorting methods. We conclude that our algorithm meets all of the mentioned requirements and outperforms other methods under realistic signal-to-noise ratios and in the presence of overlapping spikes.
In order to understand higher brain functions and the interactions between single neurons, an analysis of the simultaneous activity of a large number of individual neurons is essential. One common way to acquire the necessary amount of neuronal activity data is to use simultaneous extracellular recordings, either with single electrodes or, more recently, with multi electrodes like tetrodes (O’Keefe and Recce 1993). However, the recorded data does not directly provide the isolated activity of single neurons, but a mixture of neuronal activity from many neurons additionally corrupted by noise. The task of so called “spike sorting” algorithms is to reconstruct the single neuron signals (i.e. spike trains) from these recordings. Many approaches for analyzing the data after acquisition, i.e. offline spike sorting algorithms, have been developed in the last years; see for example Vargas-Irwin and Donoghue (2007), Delescluse and Pouzat (2006), Pouzat et al. (2004), Kim and Kim (2003), Takahashi et al. (2003), Shoham et al. (2003), Hulata et al. (2002), Lewicki (1998), Fee et al. (1996a). Although more methods are available in this category, there are several reasons to favor methods which provide results already during the recordings, termed realtime online sorting algorithms. For example, realtime online spike sorting techniques are indispensable for conducting “closed-loop” experiments and for brain-machine interfaces (Rutishauser et al. 2006; Obeid and Wolf 2004). The few existing approaches to realtime online sorting (Thakur et al. 2007; Rutishauser et al. 2006; Aksenova et al. 2003) are clustering based and have at least one of the following drawbacks: 1) They are not explicitly formulated for data acquired from multi electrodes, 2) they do not resolve overlapping spikes, 3) they do not perform well on data with a low signal-to-noise ratio 4) they are not able to adapt to non-stationarities of the data as caused by tissue drifts. We discuss the reasons and importance of these issues in the following:
An approach based on blind source separation (BSS) techniques and addressing primarily problems 1) and 4) was presented in Takahashi et al. (2002), in which independent component analysis (ICA) was applied to multichannel data recorded by tetrodes (4 channels). Later, the method was adopted to data recorded by dodecatrodes (12 channels) (Takahashi and Sakurai 2005). However, both approaches had to deal with several new problems: Amongst others, time delays between the channels were not considered, biologically meaningless independent components had to be discarded manually, and different neuronal signals with similar channel distributions could not be classified correctly. Furthermore, the methods can only be applied to data recorded with certain electrode types (i.e. tetrodes, dodecatrodes). The most severe problem, though, is the fact that the method cannot deal with data containing neuronal activity from a greater number of neurons than recording channels (over-completeness).
In this work, we present a realtime online spike sorting method based on the BSS idea, which explicitly addresses the four issues 1)–4), but also avoids the drawbacks of the method in Takahashi et al. (2002) and Takahashi and Sakurai (2005). In sum, a spike sorting algorithm for multi electrode data, which detects and resolves overlapping spikes with the same computational cost as non-overlapping spikes, is formulated. The method makes optimal use of an arbitrary number of simultaneously recorded channels and can even run on single channel data. Moreover, since spike detection, spike alignment, and spike classification are not separate parts, but are combined into a single algorithm, our method performs well on data with low SNR and containing many overlapping spikes. By incorporating a direct feedback, the algorithm adapts to varying spike shapes and to non-stationary noise characteristics. The algorithm is fully automatic and due to its linear and parallel computation steps it is ideally suited for realtime applications (see Fig. 4 for a summary of our method).
This paper is organized as follows: In Section 2 we present our method step by step. First, we briefly introduce linear filters. These filters were used in radar applications (Turin 1960), geophysics (Robinson and Treitel 1980) as well as for spike detection (Thakur et al. 2007; Vollgraf et al. 2005), but to our knowledge have not been applied to spike sorting yet. Moreover, in contrast to those studies, we do not directly apply a threshold to the filter outputs, but consider them as a new representation of the data. In this representation the spike sorting task can be handled as a well defined BSS problem, which we solve with a un-mixing technique we will refer to as “Deconfusion”.
The evaluation of our method is done on two different datasets from real recordings and also on simulated data. The experimental setup, used equipment and the characteristic of recorded data are described in Section 3. The advantages and abilities of the method are demonstrated in Section 4. Evaluations of the spike detection performance are done using data from simultaneous intra- and extracellular recordings made in slices of rat visual cortex, and show that the proposed algorithm is superior to conventional spike detection methods. The noise robustness and the ability to successfully resolve overlapping spikes is evaluated systematically on synthetic data. Finally, the method is applied to data from extracellular recordings made in the prefrontal cortex of awake behaving macaques. This data is particularly challenging, because the tetrodes are not implanted chronically, but inserted before every experiment anew, leading to tissue drifts. We conclude that our method adopts to non-stationarities and also successfully resolves overlapping spikes in real data. A summary and a discussion of further improvements is given in Section 5.
We use a notation in which symbols for scalar quantities are represented by lower case letters, vectorial quantities are represented by bold lower case letters, and operators or matrices are represented by bold upper case letters. Matrices representing several vectorial quantities, but not linear transformations, are labeled with an additional bar. In Table 1 all important quantities are listed. The corresponding vectorial quantities are defined by concatenating all channel-wise defined vectors. As an example the vectorial template of neuron i is given by
where the superscript means transpose. The vectors , , are defined in the same way. Analogously, covariance matrices, e.g, the data covariance matrix , are defined as
with . is a symmetric N ·Tf by N ·Tf Toeplitz matrix. Alternatively, it can be expressed as
We assume an explicit model for the neuronal data recorded extracellularly. The underlying assumptions are:
As discussed extensively in Pouzat et al. (2002), these assumptions are reasonable and are used explicitly or implicitly in most spike sorting techniques. Consequently the measured data can be expressed as
The measured data are a convolution of the mean waveforms with the corresponding intrinsic spike trains corrupted by colored Gaussian noise (see also Fig. 1(a)–(c)).
Spike sorting is achieved when the intrinsic spike trains are reconstructed from the measured data . Since, according to the model assumptions, the data were generated by a convolution of intrinsic spike trains with fixed waveforms, the most straightforward procedure would be to apply a deconvolution on in order to retrieve . For an exact deconvolution a filter with an infinite impulse response is necessary. In general, such a filter is not stable and would amplify noise (Robinson and Treitel 1980). Nevertheless, a noise robust approximation for an exact deconvolution can be achieved with finite impulse response filters, to which we will refer as linear filter.
Let us briefly summarize the idea of these filters: The goal is to construct a set of filters such that each filter has a well defined response of 1 to its matching template at shift 0 (i.e. ), but minimal response to the rest of the data. This means that the spikes of neuron i are the signal for filter to detect but will be treated as noise by filter .
Incorporating these conditions leads to a constrained optimization problem
to which the solution are the desired filters (see Appendix A for a more detailed derivation). A major advantage is the fact that the mentioned optimization problem can be solved analytically. In particular, the filters are given by the following expression:
where is the data covariance matrix defined in Section 2.1. Linear filters maximize the signal-to-noise ratio and minimize the sum of false negative and false positive detections, and are, therefore, optimal in this sense (Melvin 2004).
Once the filters are calculated, they are cross-correlated with the measured signal, i.e. . Note that we do not have to pre-process the data with a whitening filter, but the filters can be applied directly to . This is because the noise statistics is already captured in the matrix .
From a different point of view, the filtering just changes the representation of the templates. While in the original space the template i was represented by , its representation in the filter output space is given by the vectors , j=1,...,M, where , see also Fig. 2. This interpretation of filtering will be useful in the next section.
The linear filters derived in Section 2.3 should suppress all signal components except their corresponding template with zero shift. Thus, the filter response to all templates (and their shifted variants) has to be minimal. This already leads to minimization constraints; a number which is normally greater than the number of free variables of a filter which is Tf ·N. In addition, if the SNR is low, the noise covariance matrix dominates Eq. (1).
The lower the SNR, the less spikes from other neurons a filter will suppress. Thresholding of every filter output individually will, thus, lead to false positive detections. The idea is to de-correlated the filter output in order to achieve an improved spike detection and classification.
We have seen in the previous section that each template can be represented in the filter output by M vectors , j=1,...,M. Since the detection and classification of the spikes is based on the detection of high positive peak values in the filter output (by construction), all values below zero in the filter output are irrelevant, and thus, can be discarded. As a result, we ignore all values below zero by applying a half-wave rectification I(x) to the filter output , where
The next step is to consider as a linear mixture of different sources, where every source is the intrinsic spike train of a neuron. Since there are as many filters as neurons, the dimension of the filter output space is equal to the number of neurons, and therefore, the detection and classification problem can be considered as a complete BSS problem. However, it is not guaranteed that the maximal response of filter to spikes from neuron j will be at a shift of 0, i.e., when the filter and the template overlap entirely. This leads to the following model for the rectified filter output:
with being the mixture matrix, and τi,j being the shifts between the maximal response of filter to template ; i.e.,
where and τi,i=0 ∀i by construction. We want to reconstruct the sources by solving the corresponding inverse problem:
with . Here, the relation to ICA becomes clear, since this is a similar inverse problem ICA solves. In contrast to ICA, we do not have to estimate and τi,j from the data, but can calculate them directly from the responses (i.e. cross-correlation functions) of all filters to all templates, as illustrated in Fig. 2.
All steps of these procedure are summarized under the term “Deconfusion” (see also Fig. 1(d)–(e) for a schematic illustration). After Deconfusion the false responses of the filters to non-matching templates are suppressed (see Fig. 3). In principle, it is possible that the inverse problem in Eq. (8) is not exactly solvable, if the shifts are not consistent. Consistent shifts have to satisfy the following equation:
A derivation is given in Appendix B. For arbitrary templates and data covariance structures, Eq. (9) can in principle be violated. However, with templates from real experiments we did not observe this to be a problem.
In the final step, thresholding is applied to every row i of . Again, by construction we have only to consider positive peaks. All local maxima after a threshold crossing are identified as spiking times of neuron i. In this sense, spike detection and spike classification is performed simultaneously.
The threshold is set for each row of individually such that the total error of false negative and false positive detections is minimal. Amongst others, the threshold depends on the variance of the noise, on the Deconfusion output, and on the firing frequencies of the neurons. A detailed derivation is given in Appendix C.
Artifacts were removed from our data in two ways. First, all periods during which the animal had to perform a physical task (e.g., pressing a button) were not considered for further analysis. Secondly, for each period of length 10 ms the number of zero-crossings on each data channel was counted and summed up. All periods, in which this number was below 10% of the maximal number of possible zero crossings, were not considered for further analysis. This second type of heuristic removal aims at eliminating artifacts caused by oscillations of the electrode shaft inside the guiding tube (e.g., caused by movement of the animal).
The noise covariance matrix is determined by calculating the auto- and cross correlation functions of every channel. Only data points which were not part of any spike nor any artifact period, were used for the calculation. The noise covariance matrix is needed for the initialization phase, see Section 2.10, and for evaluation of the sorting result on real data, see Section 4.2.3.
Due to tissue relaxations the measured waveforms change over time as the relative distance between the multi electrode and the neurons change. In order to track these changes we re-estimate the templates as well as the data covariance matrix after every time period of length T. Each template is re-estimated as the mean of the last 350 spikes (see Section 5 for a discussion of this value) detected from neuron i; whereas the spikes of neuron i are aligned on the maximal peak of the response of filter . For the re-estimation only spikes which were classified by our method as non-overlapping spikes are used. The data covariance matrix is re-estimated from the last 30 s of the recordings and the linear filters are re-calculated. Consequently, the Deconfusion and the thresholds are re-computed as well. In Section 4.2.3 we show that we can indeed track drifts with this approach.
Templates whose SNR decreases over time might be a concern. By constantly adapting the template, finally, there is a risk of getting a template which is very close to the noise signature, and the corresponding filter will detect pure noise. This can be prevented by removing filters at the appropriate moment. Consequently, we stop tracking templates whose SNR drops below 0.65. This value proved to be appropriate during simulations (see Section 4.2.2).
Most of the analysis done in the precedent sections was based on the assumption of known initial templates. Hence, before applying our method, one needs an initialization phase during which the templates are found. In principle, any supervised or unsupervised learning method can be applied.
We want to emphasize that the initialization phase is only necessary at the beginning of a recording session (Fig. 4): Once the initial templates are estimated, the main algorithm runs online. Furthermore, because of the feedback described in Section 2.9, the initialization does not have to be very accurate, as the templates are re-estimated after every period of length T. Usually we used an initialization phase of about 30 s in our real recordings (Section 3.3). This time window is short enough so that the templates change only very slightly in time and can, therefore, be clustered reliably, but long enough to acquire enough spikes to estimate robustly the mean waveforms.
During the initialization phase spike detection can be done with any conventional technique. We used an energy based approach, since it usually delivers a better performance than other methods (Mtetwa and Smith 2006; Obeid and Wolf 2004).
In particular, we applied the MTEO detector (see Section 4.1 for definition) with k-values [1,3,5] to each recording channel separately and set the threshold to 3.5 times the median of its output. Spike periods were defined as intervals of length 1.5 ms, in which the output of the MTEO detector exceeded the threshold value at least once.
Correct spike alignment is crucial for a good clustering result. While in many studies an alignment based on the maximal and/or minimal peak value of a spike is used, again, methods based on the energy of a spike usually yield better results (Fee et al. 1996a). After cutting out all spikes around the peak of the detector, we used the following algorithm for alignment:
In our experiments described in Section 3.3 the average number of spikes in the first 30 s of recordings is around 2500 and convergence is obtained after 15 to 20 iterations.
Although a broad range of sophisticated clustering algorithms is available, we used a standard approach, since a very accurate initialization is not crucial for our method. The aligned spikes are whitened (e.g., see Pouzat et al. 2002) and projected into the space of the first 6 principle components. The clustering consists of a Gaussian mixture model in combination with the Expectation-Maximization algorithm (Xu and Wunsch 2005). For every number of cluster means between 1 and 15 the clustering procedure is executed 3 times with random initial means. The covariance matrices are fixed to 2.5 times the identity matrix. The run and the number of means with the highest score according to the Bayesian inference criterion (Xu and Wunsch 2005) are selected as initialization for the main algorithm.
The SNR is a scalar value which is an indicator for the difficulty of detecting a signal in noisy data. In this sense, the SNR definition should be dependent on the method used for signal detection. Several definitions of the SNR are used in the spike sorting literature. A very common one is to define the SNR by some maximal value, e.g., the maximal amplitude, the maximal difference in amplitudes (peak to peak distance), or the maximum of the absolute value of the amplitude, divided by the variance of noise σ2, i.e.,
(e.g. see Choi et al. 2006). Another current definition for the SNR is based on the energy of a signal, i.e.,
(e.g. see Rutishauser et al. 2006). We introduce a definition of SNR which is based on the Mahalanobis distance of a template to zero:
In the special case of single electrode data and of 1-dimensional templates (Tf=1), all SNR definitions are equivalent. To show that is an appropriate SNR definition for the linear filters, while the other definitions are in contradiction with the meaning of signal-to-noise ratio, we simulated datasets containing a single neuron, which fired according to a Poisson statistic, and a noise covariance matrix , where denotes the identity matrix, and is a noise covariance matrix from one of the experiments described in Section 3.1, with for all i. The used template was extracted from the same experiment. We simulated datasets for ten different α values between 0 and 1. The decreased with increasing α, and consistently the detection performance of our method decreased, see Fig. 5. Note that for all α values, which means that those definitions are inappropriate for the proposed method. Nevertheless, we always provide values for all three definitions of SNR in order to allow comparisons with other publications.
For the performance evaluation of our method, three different datasets were used. All experiments were performed in accordance with German law for the protection of experimental animals, approved by the local authorities (“Regierungspräsidium”), and are in full compliance with the guidelines of the European Community (EUVD 86/609/EEC) for the care and use of laboratory animals.
The experiments were done in acute brain slices from Long Evans rats (P17–P25). In every experiment a pyramidal cell from visual cortex, Layer 3 or 5 depending on the experiment, was simultaneously recorded intracellularly and extracellularly. Extracellular spike waveforms were recorded using a 4-core-Multifiber Electrode (Tetrode) from Thomas RECORDING GmbH, Germany. The cell was intracellularly stimulated by a current injection (varying from experiment to experiment between 80 pA and 350 pA). Extracellular recordings were sampled at 28 kHz and filtered with a bandpass FIR filter (300 Hz to 5000 Hz).
The intracellularly recorded spikes were detected using a manually set threshold on the membrane potential. The threshold crossings in the membrane potential were used as triggers to cut out periods from the extracellular recordings (2 ms before and 5 ms after the trigger). In total, data was recorded from 6 different cells, which resulted in 9957 intracellularly detected spikes. For analysis only the recording channel with the highest SNR was considered. The SNR of the different experiments varied from (, ) to (, ). A short period of recordings with a moderate SNR (, , ) is shown in Fig. 6, top row.
The artificially generated data simulates a single channel recording of 15 s length at a sample frequency of 32 kHz containing activity from three neurons. Every dataset contained exactly 750 equidistantly distributed spikes of every neuron, which corresponds to a firing frequency of 50 Hz. The three used templates were extracted from the recordings described in Section 3.1 and had a length of 2.1 ms. The noise was generated by an ARMA model (Hayes 1996) approximating the noise characteristic shown in Fig. 5(b).
The relative number of overlapping spikes was systematically varied from 1% up to 50%. 75% of all overlapping spikes consist of overlaps between two templates (25% for each combination), and 25% of all overlapping spikes consist of overlaps between all three templates. The amount of overlap, i.e., how much the templates overlap, is distributed according to a uniform distribution on the interval [1/3, 1]. The SNR was kept constant for all overlapping ratios, namely, all three templates were scaled to an equal SNR, which was . This corresponds to and (average values over the three templates).
The was systematically varied from 0.6 to 1.4 (which is equivalent to 2.71 to 6.32 average and 1.06 to 2.48 average ). The amount of overlapping spikes was constant and set to 7%, which is approximatively the overlap ratio resulting by chance under the assumption of independent spike trains.
The over-completeness, the equal SNR of all three templates, and the presence of overlapping spikes make these datasets particularly challenging.
Tetrodes were placed in ventral prefrontal cortex for individual recording sessions, sampling data from the same region across experiments. Recordings were performed simultaneously from up to 16 adjacent sites with an array of individually movable fiber micro-tetrodes (Eckhorn and Thomas 1993). Recording positions of individual tetrodes were manually chosen to maximize the recorded activity and the signal quality. Data were sampled at 32 kHz and bandpass filtered between 0.5 kHz and 10 kHz.
Neuronal activity was recorded while 2 macaque monkeys performed a visual short-term memory task. The task required the monkeys to compare a test stimulus to a sample stimulus presented after a 3 s long delay and to decide by differential button press whether both stimuli were the same or not. Stimuli consisted of 20 different pictures of fruits and vegetables which were presented for 0.5 s (test stimulus) or for 2 s (sample stimulus). Correct responses were rewarded. Match and non-match trials were randomly presented with an equal probability. This experimental setup was presented in Wu et al. (2008).
Approximately, the monkeys perform 2000 trials per session, which is equivalent to almost 4 h of recording time. For the evaluation of our algorithm only the first 5 s of every trial were processed, as the remaining data might contain severe artifacts caused by the monkey’s movement.
The performance of a spike sorting method depends on its capability to detect spikes and to assign every spike to a putative neuron. As described in Section 2.6, our method achieves both simultaneously. We evaluated the performance of our approach, first, as a pure detection method, and then, as a combined detection and classification technique. In both categories we compared it against techniques commonly used.
The evaluation was done on the in-vitro dataset described in Section 3.1. Although the extracellular signal was recorded with a tetrode, we used only one recording channel for further analysis, since most conventional spike detection methods are only defined for single channel data. The detectors used are:
A short piece of the recordings and some of the corresponding detector outputs are shown in Fig. 6.
We compared the performance of the different spike detection methods using receiver operating characteristic (ROC) curves. For every detector the threshold is systematically varied between 0, resulting in zero false negative detections (FN), and the minimal value which does not detect any spikes; i.e., zero true positive detections (TP). For every threshold the percentage of TP is plotted against the false positive (FP) rate. Such a curve is shown for one exemplary experiment in Fig. 7. The curve for the best possible detector (i.e. no FP, but 100% TP detections) would pass through the point (0,100). The area under such a curve (AUC) is, thus, a measure for the performance of a detector. The normalized AUC values for the area up to 30 Hz of FPs of all detectors averaged over all available datasets are shown in Table 2. Although only the average performance is presented, our method and the optimal linear filter also achieved higher scores on every individual dataset described in Section 3.1. In all experiments the optimal filter was superior to the other detectors, while our method scored second with a very similar performance. This shows that taking into account the full waveforms as well as the data statistic always greatly improves the detection performance. The optimal linear filter was included into the evaluation to provide an upper bound on the performance one can achieve with our method. Our method offers another advantage for the detection of spikes, namely a bigger robustness to threshold variations, see Fig. 8. This means that a deviation from the optimal threshold has a less drastic impact on the total error (FP + FN) than for the other methods.
We recall that the applied operations to the recorded data could be summarized in Eq. (8). The cross-correlation between the filters and the data is a linear operation. The following Deconfusion consists of a half-wave rectification, which is a non-linear operation, but affects only noise and not the action potentials (represented in the filter output), and the un-mixing, which is linear again. Hence, one can expect that if the superposition of spike waveforms is also linear, overlaps should be resolved successfully. We validated this assumption on the dataset described in Section 3.2.1. The algorithm was executed in the same way as described in Section 2. In order to allow the method to adapt (Section 2.9), the method was iterated 5 times on the same dataset. We also compared the performance of our method to those of two popular clustering based offline methods, one of them being the method described in Section 2.10.2, which will be abbreviated as “GMM”. Since this is also the method which is used for initialization of our algorithm, the comparison with GMM directly provides information about the improvements in sorting when our method is used.
The other algorithm, called “KlustaKwik”, was explicitly developed for clustering neuronal data and was first introduced in Harris et al. (2000). The clustering parameters were set to their default values. Spike detection and alignment was done in the same way as described in Section 2.10.1. To provide an upper bound on the performance our approach could achieve, we included the evaluation with the optimal filters calculated directly from the real templates. Note that other existing, purely clustering-based sorting methods, either in the PCA space or in the original data space, would perform similarly to GMM and KlustaKwik.
The simulations show that our method indeed resolves overlapping spikes and outperforms the clustering based methods; see Fig. 9. Our method works even for datasets with a large amount of overlapping spikes, and the performance is close to the theoretical bound of this approach. On the other hand, the performance of the purely clustering based methods rapidly decreases with an increasing amount of overlapping spikes. Overlapping spikes are mostly detected as single events by conventional spike detection techniques, which leads to a high FN rate. Furthermore, since the waveforms of overlapping spikes are distorted, their distances to the corresponding cluster means are large, making it difficult to assign them to a neuron. This results in a low TP score for clustering based methods.
The evaluation on the dataset with a varying SNR (see Section 3.2.2) was done in the same way as in the previous section. The results are shown in Fig. 10. The performance of the clustering based methods is severely affected by a low SNR. The performance of the proposed method follows the one of the GMM algorithm, since it relies on its output for initialization. Nevertheless, our method is always superior to it. Because of the rapid decrease in performance from a SNR level of 0.7 to an SNR level of 0.6, we stopped the algorithm from detecting spikes for templates with a lower SNR than 0.65 in real recordings by deleting the corresponding templates and filters. In contrast, the optimal filter method is only slightly affected by a low SNR level, indicating that a more elaborate initialization would increase the performance of the proposed method on datasets with very low SNRs.
We applied our method to data recorded in the prefrontal cortex of monkeys performing a short-term memory task as described in Section 3.3. For illustrative purposes, we show the results obtained by processing data from one tetrode, since the qualitative outcomes from processing other tetrodes and different recording sessions are similar.
For the initialization phase we used the first 7 trials of the recording. The initial spike detection and clustering was done as described in Section 2.10, resulting in a total of 3219 detected spikes, which were assigned to 8 clusters. This basic clustering was used as an initialization for the main algorithm, which was executed in the same way and with the same parameters as described in Section 2 (see also Fig. 4 for a summarization). The 7 trials used for initialization were also processed with the main method in order to improve the sorting quality.
The templates after the first 90 trials are shown in Fig. 11, and seem to be reasonable by visual inspection of an expert. In total, our method found almost 200000 spikes (57111, 18060, 50724, 51709, 3974, 7057, 444, 10915 for each template). Two well-established tests to quantitatively asses the sorting quality of a method performing on real data are the inter spike interval distribution and the projection test (Rutishauser et al. 2006; Pouzat et al. 2002); the evaluation of our sorting with both tests is shown in Fig. 11. The relative number of spikes during the first 3 ms is smaller than 1.5% for all neurons, implying that the refractory period is respected. On the other hand, the projection test verifies that the spikes of a single neuron have not been artificially split by the sorting algorithm into multiple clusters or that spikes from multiple neurons are assigned to the same cluster. The sorting of our method also passes the projection test since the cluster distributions do not overlap and are close to the theoretical prediction of a normal distribution with a variance 1. In sum, the good results of these two tests imply that the found clusters are well separated and indeed correspond to single neurons, as well as that the assumptions made in Section 2.2 are justified.
Since we inserted the tetrodes before every experiment anew, our algorithm has to deal with the variability in the data caused by tissue drifts. The adaption procedure described in Section 2.9 was executed after every trial and adapted the algorithm correspondingly. The time period over which the templates were assumed to be constant was set to T=5 s.1 As a result, 2 neurons could be tracked from the beginning to the very end of the experiment, see Fig. 12. The other templates were deleted earlier, since their dropped below 0.65. The importance of taking temporal variations for sorting into account is demonstrated in Fig. 13. If the drift is not accounted for, the clusters are elongated and their spread is larger, making any classification more difficult.
The disappearance of neurons from the recording volume is a common phenomenon in our recordings. However, the opposite, i.e., the appearance of new neurons during recordings, is rarely observed. This might be explained by the fact, that at the beginning of the experiments, the tetrodes are explicitly placed at a position where a lot of neuronal activity is measured. Therefore, it is more probable that during the tissue drifts the high activity population of neurons disappears than that new, highly active neurons appear. We discuss this problem also in Section 4.4.
In Section 4.2.1 we have already demonstrated on simulated data the ability of our method to resolve overlapping spikes instantaneously. This is also the case for real data, see Fig. 14. The same figure also shows, that it would be very difficult to classify correctly these overlapping spikes with a purely clustering based algorithm.
The evaluation in Fig. 11 and Fig. 13 shows that the clustered spikes, although whitened, are not perfectly Gaussian distributed. This deviation is caused by overlapping spikes, but it is also due to an intrinsic waveform variability, as it is observed for example during bursts (Fee et al. 1996b). In this sense, the generative model assumed in Section 2.2 is not strictly valid anymore. Nevertheless, our method achieves a good performance, even for datasets containing bursting neurons identified by visual inspection. This can be explained by the fact that the scaling of the waveform during burst is close to linear (Rutishauser et al. 2006). Because of the linear character of our method (e.g. see Section 4.2.1), the response to a linearly scaled waveform will also only be scaled by the same factor. Hence, the algorithm classifies spikes from bursting neurons correctly as long as the amplitude degradation of the spikes is not too strong.
We have shown that our method is of great potential for spike detection and classification applications. However, there is a principle limitation: Since the filtering and the Deconfusion are linear operations, it is impossible to discriminate waveforms which are strictly linear dependent, i.e., when the spike waveform of one neuron is a multiple of the waveform of another neuron. A possible way to solve this problem is to sort the templates according to their SNR. Spikes with the highest SNR are detected first. Whenever a spike is found, the corresponding template is subtracted from the data and all other filter outputs are re-calculated for the affected period. This procedure is repeated for templates with a lower SNR. Further, if the sum of the waveforms of two different neurons with a certain shift is nearly identical to another neurons spike waveform, it is impossible to judge whether a spike is an overlap or not. Only probabilistic methods or soft clustering could give a hint at where the waveform came from.
We have not addressed the problem of neurons which are not detected during the initialization phase. As we observe spikes from neurons whose SNR decreases due to tissue drifts, and finally disappear completely from the recorded data, the opposite might also happen; i.e., neurons, previously undetected, slowly appear in the recording volume. A possible solution would be to run a conventional spike detection method in parallel to our method. All spikes detected by the conventional spike detection technique, but not by our method, could be collected, aligned and clustered. Respecting the newly found clusters, corresponding filters could be initialized and the Deconfusion procedure adapted accordingly.
Especially for a real-time implementation the runtime of an algorithm is crucial. After the initialization phase, the proposed method consists mainly of linear operations. The adaptation of the covariance matrix, of the templates and of the Deconfusion parameters need only to be computed every few seconds. Therefore, the computational burden lies in the application of the linear filters and the Deconfusion to a new sample of recorded (multichannel) data. The current implementation was done in Matlab, however the source code is not ready for publication yet. We will make the method available e.g. on ModelDB as soon as the implementation is finished.
If a new multichannel sample of data is recorded, first the cross-correlation between the filters and the data has to be calculated and afterwards Deconfusion is applied. The number of operations needed for the cross-correlation of a filter (the number of filters equals the number of neurons M) and the data is directly proportional to the product of the length of the filter Tf and the number of recording channels N. The Deconfusion procedure consists of a half-wave rectification, which is just a sample wise trivial non-linearity, and a matrix-vector multiplication between the square matrix W of dimension M×M and the shifted and half-wave rectified filter outputs. To sum up, the computational complexity for a newly arriving data sample is . Since we can assume the number of filters to be higher than the number of recording channels, the resulting complexity is . This means the runtime complexity mainly depends on the number of filters and the filter length.2
It is important to note that the cross-correlation for every filter—even for every channel of every filter—are independent of each other and can, thus, be computed in parallel as simple vector-matrix multiplications. For a so called vector processor such a multiplication would be one single operation only. E.g this could be implemented on a modern consumer computer-graphics hardware or on programmable digital signal processors.
An automatic method for simultaneous spike detection and spike classification was presented, having several advantages which were demonstrated on various datasets. Explicitly, the method makes use of the additional information provided by multi electrodes and has no constraints concerning the number of recording channels or the number of neurons present in the data. It resolves overlapping spikes instantaneously, performs well on datasets with a low SNR, and it adapts to non-stationarities present in the data. Moreover, the method operates online and is well suited for a realtime implementation.
In the first step of our algorithm, optimal linear filters were used to enhance the SNR. Linear filters, being an approximation to an exact deconvolution, account for the noise statistics as well as for the full, multi-channel template, and are, therefore, superior to other methods in detecting spikes of a specific neuron. An evaluation on simultaneous intra/extra-cellular recordings in slices of rat visual cortex and on realistic synthetic data shows that the difference in performance is considerable.
Further, we used the output of the linear filters as a new representation of the data. The advantage of the filter output space is that its dimension is equal to the number of neurons, whereas this was not the case in the original data space. This allowed us to treat the spike sorting problem as a well defined source separation problem and solve it by Deconfusion.
In the final step, a channel specific threshold was applied providing simultaneous spike detection and classification. Unlike in many other methods, the thresholds need not to be set manually by a human supervisor but are determined automatically in an optimal way. The advantage of a combined spike detection and classification, in contrast to existing spike sorting methods, was demonstrated on simulated datasets. Especially in the presence of overlapping spike and low SNR, our method achieved better performances. We showed that, in the case of linear filters, a proper definition of the signal-to-noise ratio is based on the Mahalanobis distance, whereas other commonly used definitions do not reflect the difficulty in detecting the signal.
By iteratively updating all quantities, namely the linear filters, the Deconfusion parameters, and the thresholds, the algorithm adopts to non-stationarities present in the data. As such, the method is also suitable for recordings made in acute experiments in which the multi electrodes are inserted each time anew. The number of spikes detected by a filter which were used for the calculation of the template, was set manually to a fixed value, equal for all filters. Instead, one could develop a model for the tissue drift and derive an optimal value which depends on the estimated drifting velocity, the firing rate of the neurons, on the SNR, and on the error tolerance. This is the aim of a future study.
Two drawbacks of the proposed method were discussed, namely the incapability to detect newly appearing neurons and the problem of strictly linear dependent templates. However, for both problems a possible solution was sketched. The detailed study and realization of these solutions will be the scope a future study.
By qualitative arguments, systematic runs on realistically simulated data and on real data from awake behaving macaques, we have shown that the algorithm is capable of resolving overlapping spikes; without additional computing time. However, for the acute recordings in awake behaving monkeys we cannot proof that the found solution is correct, since the ground truth is unknown. Only massive simultaneous intra- and extracellular recordings in vivo could be used to asses the quality of the sorting in real experiments. Due to technical limitations, such a dataset is currently not available.
The algorithm mainly consist of linear, independent operations, which can be executed in parallel and implemented in hardware. Therefore, the algorithm can be used for realtime implementations, making it an potential spike sorting method for brain-machine interfaces and for the execution of closed-loop experiments.
This research was supported by the Federal Ministry of Education and Research (BMBF) with the grants 01GQ0743 and 01GQ0410. We thank Sven Dähne for technical support.
Open Access This article is distributed under the terms of the Creative Commons Attribution Noncommercial License which permits any noncommercial use, distribution, and reproduction in any medium, provided the original author(s) and source are credited.
Filter should respond with a peak to its matching template , but should have minimal response to the rest of the data. In particular, one demands that the response to the matching template is 1, i.e. . The response of the filter to the data is , where . Using the third assumption of Section 2.2 the response of a filter to will be small (and therefore well distinguishable from the peak response of 1 to the matching template) if the variance of the filter output is small, i.e., one has to minimize . In summary, the constrained minimization problem is stated as
A short calculation shows that
Thus, the Lagrangian L of this minimization problem is given by
where λ is the Lagrange multiplier. Since the objective function is convex in , there exists a single minimum, which can be found by solving . In fact, the minimum is attained at
Often, linear filters are derived in the frequency domain instead, but linear filter defined in the time domain have several advantages, see Vollgraf and Obermayer (2006).
If we assume that the noise in the Deconfusion output is still a mixture of Gaussians (as an approximation for a mixture of truncated Gaussians), it follows for its variance
where are shifted covariance matrices, i.e. taking temporal correlations into account of order Tf+|τk,j−τk,i|.
The optimal threshold for the detection and classification of spikes from neuron k is chosen such that the overlap between the distribution of the spikes from neuron k and the distribution of the other spikes (from neurons j, j=1,...,M, j≠k) is minimal. We assume the distributions to be Gaussian, with means μk,j and variance σj2. The μk,j are given by the maximal response values of filter j to template k after Deconfusion, i.e.
whereas the variance is given by Eq. (20). One has only to consider the maximal false response and not the whole response, because the refractory period is in general longer than the length of the template. Thus the optimal threshold εk is given by
where erfc denotes the complementary error function, and βj is a normalized weight proportional to the firing frequency of neuron j in order to minimize the total error. Note that the threshold must lie in the interval [0,1], hence this minimization problem can be solved numerically with a line search algorithm, for example using the “fminbnd” command of MATLAB.
1The value of T was set to 5 s just for convenience of implementation, since the first 5 s of each trial were processed.
2In principle, the cross-correlation can be calculated with the help of the fast Fourier transform more efficiently. However, this pays off for long data pieces only, and thus would require to buffer the data first, spoiling the real-time idea.
Felix Franke, Email: ed.nilreb-ut.sc@ff.
Michal Natora, Email: ed.nilreb-ut.sc@arotan.
Clemens Boucsein, Email: ed.grubierf-inu.eigoloib@niescuob.
Matthias H. J. Munk, Email: firstname.lastname@example.org.
Klaus Obermayer, Email: ed.nilreb-ut.sc@ybo.