Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2799196

Formats

Article sections

- Abstract
- 1 Introduction
- 2 Discrete-Time Markov Modulated Probabilistic State-Space Model
- 3 Continuous-Time Markovian and Semi-Markovian State-Space Models
- 4 Experimental Results
- 5 Discussion
- 6 Conclusion
- References

Authors

Related links

Neural Comput. Author manuscript; available in PMC 2009 December 29.

Published in final edited form as:

PMCID: PMC2799196

NIHMSID: NIHMS142917

Zhe Chen, Neuroscience Statistics Research Laboratory, Department of Anesthesia and Critical Care, Massachusetts General Hospital, Harvard Medical School, Boston, MA 02114, U.S.A., and Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, U.S.A;

Zhe Chen: ude.tim@nehcehz; Sujith Vijayan: ude.dravrah.tsop@nayajiv; Riccardo Barbieri: ude.tim.tatsoruen@ireibrab; Matthew A. Wilson: ude.tim@nosliwm; Emery N. Brown: ude.tim.tatsoruen@bne

The publisher's final edited version of this article is available at Neural Comput

See other articles in PMC that cite the published article.

UP and DOWN states, the periodic fluctuations between increased and decreased spiking activity of a neuronal population, are a fundamental feature of cortical circuits. Understanding UP-DOWN state dynamics is important for understanding how these circuits represent and transmit information in the brain. To date, limited work has been done on characterizing the stochastic properties of UP-DOWN state dynamics. We present a set of Markov and semi-Markov discrete- and continuous-time probability models for estimating UP and DOWN states from multiunit neural spiking activity. We model multiunit neural spiking activity as a stochastic point process, modulated by the hidden (UP and DOWN) states and the ensemble spiking history. We estimate jointly the hidden states and the model parameters by maximum likelihood using an expectation-maximization (EM) algorithm and a Monte Carlo EM algorithm that uses reversible-jump Markov chain Monte Carlo sampling in the E-step. We apply our models and algorithms in the analysis of both simulated multiunit spiking activity and actual multiunit spiking activity recorded from primary somatosensory cortex in a behaving rat during slow-wave sleep. Our approach provides a statistical characterization of UP-DOWN state dynamics that can serve as a basis for verifying and refining mechanistic descriptions of this process.

The state of the neural system reflects the phase of an active recurrent network, which organizes the internal states of individual neurons into synchronization through recurrent network synaptic activity with balanced excitation and inhibition.^{1} The neuronal state dynamics can be externally or internally driven. The externally driven dynamics results from either sensory-driven adaptation or encoding of sensory percept; the internally driven dynamics results from changes in internal factors, such as attention shift. Different levels of neuronal state also bring in the dynamics of state transition. Generally state transitions are network controlled and can be triggered by the activation of single cells, which are reflected by changes in their intracellular membrane conductance.

From a computational modeling point of view, two types of questions arise. First, how do neurons generate, maintain, and transit between different states? Second, given the neuronal (intracellular or extracellular) recordings, how can the neuronal states be estimated? The computational solutions to the first question emphasize the underlying neuronal physiology or neural mechanism, which we call mechanistic models, whereas the solutions to the second question emphasize the representation or interpretation of the data, which we call statistical models. In this article, we are taking the second approach to model a specific phenomenon regarding the neuronal state.

The notion of neuronal UP and DOWN states refers to the observation that neurons have two distinct subthreshold membrane potentials that are relevant for action potential (i.e., spike) generation. A neuron is said to be depolarized (or excited) if its intracelluar membrane potential is above the resting membrane potential threshold (around −70 to −80 mV) and is said to be hyperpolarized (or inhibited) if its membrane potential is below the threshold. When a sufficient level of excitation is reached, a spike is likely to occur. Essentially, membrane potential fluctuations define two states of the neocortex. The DOWN state defines a quiescent period during which little or no activity occurs, whereas the UP state corresponds to an active cortical state with depolarized membrane potentials and action potential firing driven by synaptic input. It was generally believed that the spontaneous UP and DOWN states are generated by a balance of excitatory and inhibitory neurons in recurrent networks (Haider, Duque, Hasentaub, & McCormick, 2006). In recent years, many neurophysiological studies have been reported regarding the neuronal UP and DOWN states, ranging from intracellular or extracellular recordings (e.g., Sanchez-Vives & McCormick, 2000; Haider, Duque, Hasentaub, Yu, & McCormick, 2007). The UP and DOWN states that are characterized by the cortical slow oscillation in intracellular membrane potentials are also reflected in extracellular recordings, such as local field potential (LFP) or electroencephalograph (EEG), single-unit activity or multiunit activity (MUA). In the literature, the UP and DOWN states have been characterized by examining extracellular LFP recordings (Sirota, Csicsvari, Buhl, & Buzsáki, 2003; Battaglia, Sutherland, & McNaughton, 2004; Wolansky, Clement, Peters, Palczak, & Dickson, 2006) in either the somatosensory cortex of anesthetized or awake animals (e.g., Haslinger, Ulbert, Moore, Brown, & Devor, 2006; Luczak, Barthó, Marguet, Buzsáki, & Harris, 2007) or the visual cortex of nonanesthetized animals during sleep (e.g., Ji & Wilson, 2007). Recently, attention has also turned to multiunit spike trains in an attempt to relate spike firing activities with EEG recordings (Ji & Wilson, 2007).

In order to examine the relationship between sleep and memory in rats or animals, simultaneous recordings are often conducted in the neocortex and hippocampus with the goal of studying the cortico-hippocampal circuit and the functional connectivity of these two regions while the animals perform different tasks. It has been reported (e.g., Volgushev, Chauvette, Mukovski, & Timofeev, 2006) that during the slow wave sleep (SWS), which is characterized by 0.5 to 2.0 Hz slow oscillations (Buzsáki, 2006), neorcortical neurons undergo near-synchronous transitions, every second or so, between UP and DOWN states. The process of the alternating switch between the two states appears to be a network phenomenon that originates in the neocortex (Ji & Wilson, 2007; Vijayan, 2007).

The work reported here was driven by the experimental data accumulated in our lab (Ji & Wilson, 2007; Vijayan, 2007). The growing interest in UP and DOWN states in the neuroscience literature motivated us to develop probabilistic models for the UP and DOWN modulated MUA. Specifically, the UP-DOWN states are modeled as a latent two-state Markovian (or semi-Markovian) process (Battaglia et al., 2004), and the modeling goal is to establish the probability for state transition or the probability density of UP or DOWN state duration and the likelihood model that takes into account both a global hidden state variable and individual history dependence of firing. In comparison with the standard and deterministic threshold-based method, our proposed stochastic models provide a means for representing the uncertainty of state estimation given limited experimental recordings.

A stochastic process is said to be Markovian if it satisfies the Markov property; the knowledge of the previous history of states is irrelevant for the current and future states. A Markov chain is a discrete-time Markov process with the Markov property. The Markov process and Markov chain are both “memoryless.” A Markov process or Markov chain contains either continuous-valued or finite discrete-valued states. A discrete-state Markov process contains a finite alphabet set (or finite state space), with each element representing a distinct discrete state. The change of the state is called the transition, and the probability of changing from one state to the other is called the transition probability. For the Markov chain, the current state has only finite-order dependence on the previous states. Typically the first-order Markov property is assumed; in this case, the probability of *S _{k}*

When the state space is not directly observable, a Markov process is called hidden or latent. The so-called hidden Markov process is essentially a probabilistic function of the stochastic process (for a review, see Ephraim & Merhav, 2002). In the discrete-time context, the hidden Markov model (HMM) is a probabilistic model that characterizes the hidden Markov chain. The HMM is a generative model in that its full model {** π**,

A point process is a continuous-time stochastic process with observations being either 0 or 1. Spike trains recorded from either single or multiple neurons are point processes. We will give a brief mathematical background for point process in a later section and refer the reader to Brown (2005) and Brown, Barbieri, Eden, and Frank (2003) for the complete and rigorous mathematical details of point processes in the context of computational neuroscience treatment. An important feature of spike trains is that the point process observations are not independently distributed; in other words, the current observation (either 0 or 1) is influenced by the previous spiking activities. This type of history dependence requires special attention for probabilistic modeling of the point process.

A Cox process is a doubly stochastic process, which defines a generalization of Poisson process (Cox & Isham, 1980; Daley & Vere-Jones, 2002). Specifically, the time-dependent conditional intensity function (CIF), often denoted as λ_{t}, is a stochastic process by its own.^{2} A representative example of the Cox process is the Markov-modulated Poisson process, which has a state-dependent Poisson rate parameter.

Hidden Markov processes have a rich history of applications in biology. Tremendous effort has been devoted to modeling ion channels as discrete- or continuous-time Markov chains; several inference algorithms were developed for these models (Chung, Krishnamurthy, & Moore, 1991; Fredkin & Rice, 1992; Ball, Cai, Kadane, & O'Hagan, 1999). However, the observations used in ion-channel modeling are continuous, and the likelihood is often modeled by a gaussian or gaussian mixture distribution.

For discrete observations, Albert (1991) proposed a two-state Markov mixture model of a counting Poisson process and provided a maximum likelihood estimate (MLE) for the parameters. A more efficient forward-backward algorithm was later proposed by Le, Leroux, and Puterman (1992) with the same problem setup. In these two models, the two-state Markov transition probability is assumed to be stationary; although Albert also pointed out the possibility of modeling nonstationarity, no exact algorithm was given. In addition, efficient EM algorithms have been developed for discrete- or continuous-time Markov-modulated point processes (Deng & Mark, 1993; Rydén, 1996; Roberts, Ephraim, & Dieguez, 2006), but applying them to neural spike trains is not straightforward.

In the context of modeling neural spike trains, many authors (e.g., Radons, Becker, Dülfer, & Krüger, 1994; Abeles et al., 1995; Gat, Tishby, & Abeles, 1997; Jones, Fontanini, Sadacca, & Katz, 2007; Achtman et al., 2007; Kemere et al., 2008) used HMM for the purpose of analyzing and classifying the patterns of neural spike trains, but their models are restricted to discrete time and the Markov chain is homogeneous (i.e., the transition probability is stationary). In these studies, the hidden states are discrete, and the spike counts were used as the discrete observations for the likelihood models. Smith and Brown (2003) extended the standard linear state-space model (SSM) with continuous state and observations to an SSM with a continuous state Markov-modulated point process, and an EM algorithm was developed for the hidden state estimation problem. Later the theory was extended to the SSM with mixed continuous, binary, and point process observations (Coleman & Brown, 2006; Prerau et al., 2008; Eden & Brown, 2008), but the latent process was still limited to the continuous-valued state. In a similar context, Danóczy and Hahnloser (2006) also proposed a two-state HMM for detecting the “singing-like” and “awake-like” states of sleeping songbirds with neural spike trains; their model assumes a continuous-time Markov chain (with the assumption of knowing the exact timing of state transitions), and the sojourn time follows an exponential distribution; in addition, the CIF of the point process was assumed to be discrete in their work. All of these restricted assumptions have limited the computational model for analyzing real-world spike trains. Recently, more modeling efforts have been dedicated to estimating the hidden state and parameters using an HMM (or its variants) for estimating the stimulus-response neuronal model (Jones et al., 2007; Escola & Paninski, 2008). Xi and Kass (2008) recently also used a RJMCMC method to characterize the bursty and nonbursty states from goldfish retinal neurons.

In modeling the hidden semi-Markov processes or semi-Markov chains, in which the sojourn time is no longer exponentially distributed, Guon (2003) developed an EM algorithm for a hidden semi-Markov chain with finite discrete-state sojourn time, but the computational complexity of the EM algorithm is much greater than the conventional HMM.^{3}

In this article, with the goal of estimating the population neuron's UP or DOWN state, we propose discrete-state Markov or semi-Markov probabilistic models for neural spikes trains, which are modeled as doubly stochastic point processes. Specifically, we propose discrete-time and continuous-time SSMs and develop the associated inference algorithms for tackling the joint (state and parameter) estimation problem.

Our contributions have three significant distinctions from the published literature: (1) the point-process observations are not i.i.d. Specifically, the rate parameters or the CIFs of the spike trains are modulated by a latent discrete-state variable and past spiking history. (2) In the continuous-time probabilistic models, the state transition is not necessarily Markovian; in other words, the hidden state is semi-Markovian in the sense that the sojourn time is no longer exponentially distributed. (3) The maximum likelihood inference algorithms are derived for discrete-time and continuous-time probabilistic models for estimating the neuronal UP or DOWN states, and the proposed Monte Carlo EM (MCEM) algorithm is rooted in a RJMCMC sampling method and is well suited for various probabilistic models of the sojourn time.

The rest of the article is organized as follows. In section 2, we present the discrete-time HMM and the EM-based inference algorithm. In section 3, we develop the continuous-time probabilistic Markov and semi-Markov chains and their associated inference algorithms. In section 4, we demonstrate and validate our proposed models and inference algorithms with both simulated and real-world spike train data. We present some discussions in section 5, followed by the conclusion in section 6.

In neural spike analysis, we examine spike trains from either single or multiunit activity. Due to digitalized recordings, we assume that the time interval [0, *T*] of continuous-time neural spike train observations is properly discretized with a small time resolution Δ, so the time indexes are discrete integers within *k* [1, *K*], such that *k*Δ ((*k* − 1)Δ, *k*Δ] and *K*Δ = *T*. Let
${N}_{{t}_{k}}^{c}\equiv {\mathit{\text{N}}}_{k}^{c}$ denote the counting process for spike train *c* at time *t _{k}*, and let
$d{N}_{{t}_{k}}^{c}\equiv d{\mathit{\text{N}}}_{k}^{c}$ denote the indicator variable for 0/1 observation:
$d{\mathit{\text{N}}}_{k}^{c}=1$ if there is a spike and 0 otherwise. Other notations are rather straightforward, and we will define them in the proper places. Most notations used in this article are summarized in Table 1.

To infer the neuronal UP and DOWN states, in this section we develop a simple, discrete-time Markov modulated state-space model that can be viewed as a variant of the standard HMM applied to spike train analysis. The underlying probabilistic structure is Markovian and homogeneous, and the inference algorithm is efficient in identifying the statistics of the hidden state process. Based on that, in the next section we develop a continuous-time probabilistic model in order to overcome some of limitations imposed by this discrete-time probabilistic model.

Let us consider a discrete-time homogeneous Markov chain. By discrete time, we assume that the time is evenly discretized into fixed-length intervals, which have time indices *k* = 1, …, *K*. The neuronal UP or DOWN state, which is characterized by a latent discrete-time first-order Markov chain, is unobserved (and therefore hidden), and the observed spike trains or the spike counts recorded from the MUA are functionally determined by the hidden state. The standard HMM is characterized by three elements: transition probability, emission probability,^{4} and initial state probability (Rabiner, 1989). At the first approximation, we assume that the underlying latent process follows a two-state HMM with stationary transition and emission probabilities.

- The initial probability of state is denoted by a vector
= {*π**π*}, where_{i}*π*= Pr(_{i}*S*_{0}=*i*) (*i*= 0, 1). Without loss of generality, we assume that the amplitude of the hidden state is predefined, and the discrete variable*S*{0, 1} indicates either a DOWN (0) or UP (1) state._{k} - The transition probability matrix is written aswith$$\mathbf{P}=\left(\begin{array}{cc}{P}_{00}& {P}_{01}\\ {P}_{10}& {P}_{11}\end{array}\right),$$(2.1)
*P*_{01}= 1 −*P*_{00}and*P*_{10}= 1 −*P*_{11}corresponding to the transition probabilities from state 0 to state 1 and from state 1 to state 0, respectively. - Given the discrete hidden state
*S*, the observed numbers of total spikes across all tetrodes (i.e., MUA),_{k}*y*_{1},*y*_{2}, …,*y*(_{K}*y*), follow probability distributions that depend on the time-varying rate λ_{k},_{k}where the parameter λ$$\text{Pr}({Y}_{k}={y}_{k}\mid {S}_{k}=i)=\frac{{e}^{{-\mathrm{\lambda}}_{k}}{\mathrm{\lambda}}_{k}^{{y}_{k}}}{{y}_{k}!},$$(2.2)is determined by_{k}where exp($${\mathrm{\lambda}}_{k}=exp(\mu +\alpha {S}_{k}+\beta ({N}_{k-1}-{N}_{k-J})),$$(2.3)*μ*) denotes the baseline firing rate and*S*denotes the hidden discrete-state variable at time_{k}*k*. The term*N*_{k−1}−*N*_{k−J}represents the total number of spikes observed during the history period (*k*−*J*,*k*− 1], which accounts for the history dependence of neuronal firing. The choice of the length of history dependence is often determined empirically based on the preliminary data analysis, such as the histogram of the interspike interval (ISI). Equations 2.2 and 2.3 can be understood in terms of a generalized linear model (GLM) (e.g., McCullagh & Nelder, 1989; Truccolo, Eden, Fellow, Donoghue, & Brown, 2005), where the link function is a log function and the distribution is Poisson. Note that when*β*= 0 (i.e., history independence is assumed), we obtain an inhomogeneous Poisson process, and λ_{k}reduces to a Poisson rate parameter.

Taking the logarithm to both sides of equation 2.2, equation 2.3 can be rewritten as

$$log{\mathrm{\lambda}}_{k}=\mu +\alpha {S}_{k}+\beta {\stackrel{\sim}{n}}_{k},$$

(2.4)

where *ñ _{k}* =

$$\begin{array}{l}log{\mathrm{\lambda}}_{k}=\mu +\alpha {S}_{k}+\sum _{j}{\beta}_{j}{\stackrel{\sim}{n}}_{k,\phantom{\rule{0.1em}{0ex}}j}\\ \phantom{\rule{2.6em}{0ex}}=\mu +\alpha {S}_{k}+{\mathbf{\beta}}^{T}{\stackrel{\sim}{\mathbf{n}}}_{k}\phantom{\rule{0.1em}{0ex}},\end{array}$$

(2.5)

where ** β** = {

$$p({y}_{1:K}\mid {S}_{1:K},\mathbf{\theta})=\prod _{k=1}^{K}\frac{exp({-\mathrm{\lambda}}_{k}){\mathrm{\lambda}}_{k}^{{y}_{k}}}{{y}_{k}!}.$$

(2.6)

Note that λ* _{k}* λ(

$$p({S}_{1:K},{y}_{1:K}\mid \mathbf{\theta})=p({y}_{1:K}\mid {S}_{1:K},\mathbf{\theta})p({S}_{1:K}\mid \mathbf{\theta}).$$

(2.7)

And the complete data log likelihood, denoted as , is derived as (by ignoring the constant)

$$\begin{array}{l}\mathcal{L}=log\phantom{\rule{0.2em}{0ex}}p({S}_{0:K},{y}_{1:K}\mid \mathbf{\theta})=\sum _{k=1}^{K}({y}_{k}\phantom{\rule{0.2em}{0ex}}log\phantom{\rule{0.2em}{0ex}}{\mathrm{\lambda}}_{k}-{\mathrm{\lambda}}_{k})+\sum _{i=0}^{1}i\phantom{\rule{0.2em}{0ex}}log\phantom{\rule{0.2em}{0ex}}{\pi}_{i}\\ \phantom{\rule{10.8em}{0ex}}+\phantom{\rule{0.2em}{0ex}}\sum _{k=2}^{K}\phantom{\rule{0.1em}{0ex}}\sum _{i=0}^{1}\phantom{\rule{0.1em}{0ex}}\sum _{j=0}^{1}{\xi}_{k}(i,\phantom{\rule{0.1em}{0ex}}j)\phantom{\rule{0.2em}{0ex}}log\phantom{\rule{0.2em}{0ex}}{P}_{ij}\phantom{\rule{0.1em}{0ex}},\end{array}$$

(2.8)

where *ξ _{k}*(

The inference and learning procedure for the standard HMM is given by an efficient estimation procedure known as the EM algorithm, which is also known as the Baum-Welch algorithm (Baum et al., 1970; Baum, 1972). Rooted in maximum likelihood estimation, the EM algorithm iteratively and monotonically maximizes (or increases) the log-likelihood function given the incomplete data (Dempster et al., 1977). In the E-step, a forward-backward procedure is used to recursively estimate the hidden state posterior probability. In the M-step, based on the missing state statistics (estimated from the E-step), the reestimation procedure and Newton-Ralphson algorithm are used to estimate the unknown parameters ** θ** = (

$$\begin{array}{l}Q({\mathbf{\theta}}^{\mathit{\text{new}}}\mid {\mathbf{\theta}}^{\mathit{\text{old}}})=\mathbb{E}[log\phantom{\rule{0.2em}{0ex}}p({\widehat{S}}_{1:K},{y}_{1:K}\mid \mathbf{\theta})\mid {\mathbf{\theta}}^{\mathit{\text{old}}}]\\ \phantom{\rule{6.2em}{0ex}}=\mathbb{E}\phantom{\rule{0.2em}{0ex}}[\sum _{k=1}^{K}({y}_{k}\phantom{\rule{0.2em}{0ex}}log\phantom{\rule{0.1em}{0ex}}{\widehat{\mathrm{\lambda}}}_{k}-{\widehat{\mathrm{\lambda}}}_{k})+\sum _{i=0}^{1}i\phantom{\rule{0.2em}{0ex}}log\phantom{\rule{0.1em}{0ex}}{\widehat{\pi}}_{i}\\ \phantom{\rule{7em}{0ex}}+\phantom{\rule{0.2em}{0ex}}\sum _{k=2}^{K}\phantom{\rule{0.1em}{0ex}}\sum _{i=0}^{1}\phantom{\rule{0.1em}{0ex}}\sum _{j=0}^{1}{\xi}_{k}\phantom{\rule{0.1em}{0ex}}(i,j)\phantom{\rule{0.2em}{0ex}}log\phantom{\rule{0.2em}{0ex}}{\widehat{P}}_{ij}\mid {\mathbf{\theta}}^{\mathit{\text{old}}}]\phantom{\rule{0.1em}{0ex}};\end{array}$$

(2.9)

the new * θ^{new}* is obtained by maximizing the incomplete data likelihood conditional on the old parameters

In the E-step, the major task of the forward-backward procedure is to compute the conditional state probabilities for the two states:

$$\begin{array}{l}\text{Pr}({S}_{k}=1\mid {y}_{1:K}\phantom{\rule{0.1em}{0ex}},\mathbf{\theta})=\frac{\text{Pr}({y}_{1:K}\phantom{\rule{0.1em}{0ex}},{S}_{k}=1\mid \mathbf{\theta})}{\text{Pr}({y}_{1:K}\mid \mathbf{\theta})}\\ \phantom{\rule{8em}{0ex}}=\frac{\text{Pr}({y}_{1:K}\phantom{\rule{0.1em}{0ex}},{S}_{k}=1\mid \mathbf{\theta})}{{\sum}_{l=0}^{1}\text{Pr}({y}_{1:K}\phantom{\rule{0.1em}{0ex}},{S}_{k}=l\mid \mathbf{\theta})}\end{array}$$

(2.10)

$$\text{Pr}({S}_{k}=0\mid {y}_{1:K}\phantom{\rule{0.1em}{0ex}},\mathbf{\theta})=1-\text{Pr}({S}_{k}=1\mid {y}_{1:K}\phantom{\rule{0.1em}{0ex}},\mathbf{\theta}),$$

(2.11)

as well as the conditional state joint probability:

$$\begin{array}{l}\text{Pr}({S}_{k-1}=i,\phantom{\rule{0.2em}{0ex}}{S}_{k}=j\mid {y}_{1:K}\phantom{\rule{0.1em}{0ex}},\mathbf{\theta})\\ \phantom{\rule{1em}{0ex}}=\frac{\text{Pr}({y}_{0:K}\phantom{\rule{0.1em}{0ex}},\phantom{\rule{0.2em}{0ex}}{S}_{k-1}=i,\phantom{\rule{0.2em}{0ex}}{S}_{k}=j\mid \mathbf{\theta})}{\text{Pr}({y}_{1:K}\mid \mathbf{\theta})}\\ \phantom{\rule{1em}{0ex}}=\frac{\text{Pr}({y}_{0:K}\phantom{\rule{0.1em}{0ex}},\phantom{\rule{0.2em}{0ex}}{S}_{k-1}=i,\phantom{\rule{0.2em}{0ex}}{S}_{k}=j\mid \mathbf{\theta})}{{\sum}_{l=0}^{1}\phantom{\rule{0.1em}{0ex}}{\sum}_{m=0}^{1}\phantom{\rule{0.1em}{0ex}}\text{Pr}({y}_{1:K}\phantom{\rule{0.1em}{0ex}},\phantom{\rule{0.2em}{0ex}}{S}_{k-1}=l,\phantom{\rule{0.2em}{0ex}}{S}_{k}=m\mid \mathbf{\theta})}.\end{array}$$

(2.12)

To make the notation simple, in the derivation below, we let the conditional ** θ** be implicit in the equation.

To estimate equations 2.10 and 2.11, we first factorize the joint probability as

$$\begin{array}{l}\text{Pr}({y}_{1:K}\phantom{\rule{0.1em}{0ex}},\phantom{\rule{0.2em}{0ex}}{S}_{k}=l)=\text{Pr}({y}_{1:k}\phantom{\rule{0.1em}{0ex}},\phantom{\rule{0.2em}{0ex}}{S}_{k}=l)\phantom{\rule{0.2em}{0ex}}\text{Pr}({y}_{k+1:K}\mid {y}_{1:k}\phantom{\rule{0.1em}{0ex}},\phantom{\rule{0.2em}{0ex}}{S}_{k}=l)\\ \phantom{\rule{6.6em}{0ex}}=\text{Pr}({y}_{1:k}\phantom{\rule{0.1em}{0ex}},\phantom{\rule{0.2em}{0ex}}{S}_{k}=l)\phantom{\rule{0.2em}{0ex}}\text{Pr}({y}_{k+1:K}\mid {y}_{1:k}\phantom{\rule{0.2em}{0ex}}{S}_{k}=l)\\ \phantom{\rule{6.6em}{0ex}}\equiv {a}_{k}\phantom{\rule{0.1em}{0ex}}(l)\phantom{\rule{0.1em}{0ex}}{b}_{k}\phantom{\rule{0.1em}{0ex}}(l)\phantom{\rule{1em}{0ex}}\text{for}\phantom{\rule{0.5em}{0ex}}l=0,1,\end{array}$$

(2.13)

where

$$\begin{array}{l}{a}_{k}\phantom{\rule{0.1em}{0ex}}(l)=\text{Pr}({y}_{1:k},\phantom{\rule{0.2em}{0ex}}{S}_{k}=l)\phantom{\rule{1em}{0ex}}\text{for}\phantom{\rule{0.5em}{0ex}}k=2,\dots ,K\\ {a}_{1}(l)=\text{Pr}({y}_{1:k},\phantom{\rule{0.2em}{0ex}}{S}_{k}=l)\phantom{\rule{0.2em}{0ex}}\text{Pr}({y}_{1}\mid {S}_{1}=l)\\ {b}_{k}\phantom{\rule{0.1em}{0ex}}(l)=\text{Pr}({y}_{k+1:K}\mid {S}_{k}=l)\phantom{\rule{1em}{0ex}}\text{for}\phantom{\rule{0.5em}{0ex}}k=1,\dots ,K-1\\ {b}_{1}(l)=1,\end{array}$$

and the forward and backward messages *a _{k}*(

$$\begin{array}{l}{a}_{k}\phantom{\rule{0.1em}{0ex}}(l)=\sum {a}_{k-1}(i)\phantom{\rule{0.1em}{0ex}}{P}_{il}\phantom{\rule{0.2em}{0ex}}\text{Pr}({y}_{k}\mid {S}_{k}=l)\\ \phantom{\rule{2.5em}{0ex}}=\sum {a}_{k-1}(i)\phantom{\rule{0.1em}{0ex}}{P}_{il}\frac{exp(-{\mathrm{\lambda}}_{k}){\mathrm{\lambda}}_{k}^{{y}_{k}}}{{y}_{k}!}\\ {b}_{k}\phantom{\rule{0.1em}{0ex}}(l)=\sum {b}_{k+1}(i)\phantom{\rule{0.1em}{0ex}}{P}_{li}\phantom{\rule{0.2em}{0ex}}\text{Pr}({y}_{k+1}\mid {S}_{k+1}=i)\\ \phantom{\rule{2.5em}{0ex}}=\sum {b}_{k+1}(i)\phantom{\rule{0.1em}{0ex}}{P}_{li}\frac{exp(-{\mathrm{\lambda}}_{k+1}){\mathrm{\lambda}}_{k+1}^{{y}_{k+1}}}{{y}_{k+1}!},\end{array}$$

where *P _{il}* denotes the transition probability from state

Given {*a _{k}*,

$$\text{Pr}({S}_{k-1}=i,\phantom{\rule{0.2em}{0ex}}{S}_{k}=j\mid {y}_{1:K}\phantom{\rule{0.1em}{0ex}},\phantom{\rule{0.2em}{0ex}}\mathbf{\theta})={a}_{k}\phantom{\rule{0.1em}{0ex}}(i)\phantom{\rule{0.1em}{0ex}}{P}_{ij}\phantom{\rule{0.2em}{0ex}}\text{Pr}({y}_{k+1}\mid {S}_{k+1}=j){b}_{k+1}(j).$$

(2.14)

Furthermore, we can compute the observed likelihood (of the incomplete data) by

$$p({y}_{1:K})=\sum _{l=0}^{1}\text{Pr}({y}_{1:K}\phantom{\rule{0.1em}{0ex}},\phantom{\rule{0.2em}{0ex}}{S}_{K}=l)=\sum _{l=0}^{1}{a}_{K}\phantom{\rule{0.1em}{0ex}}(l).$$

(2.15)

Given equations 2.13 and 2.15, the state posterior conditional probability is given by Bayes' rule,

$$\text{Pr}({S}_{k}=i\mid {y}_{1:K})=\frac{\text{Pr}({S}_{k}=i,\phantom{\rule{0.1em}{0ex}}{y}_{1:K})}{p({y}_{1:K})}=\frac{{a}_{k}\phantom{\rule{0.1em}{0ex}}(i)\phantom{\rule{0.1em}{0ex}}{b}_{k}\phantom{\rule{0.1em}{0ex}}(i)}{{\sum}_{l=0}^{1}{a}_{K}\phantom{\rule{0.1em}{0ex}}(l)}.$$

(2.16)

In the term of the computational overhead for the above-described two-state HMM, the forward-backward procedure requires a linear order of computational complexity (4*K*) and memory storage (2*K*).

In the M-step, we update the unknown parameters (based on their previous estimates) by setting the partial derivatives of the Q-function to zeros: $\frac{\partial Q(\mathbf{\theta})}{\partial \mathbf{\theta}}=0$, from which we may derive either closed-form or iterative solutions.

Let *ξ _{k}*(

$${\gamma}_{k}\phantom{\rule{0.1em}{0ex}}(i)=\frac{{a}_{k}\phantom{\rule{0.1em}{0ex}}(i)\phantom{\rule{0.1em}{0ex}}{b}_{k}\phantom{\rule{0.1em}{0ex}}(i)}{{\sum}_{i=0}^{1}{a}_{k}\phantom{\rule{0.1em}{0ex}}(l)\phantom{\rule{0.1em}{0ex}}{b}_{k}\phantom{\rule{0.1em}{0ex}}(l)}=\sum _{j}{\xi}_{k}\phantom{\rule{0.1em}{0ex}}(j,i)=\sum _{j}{\xi}_{k+1}(i,\phantom{\rule{0.2em}{0ex}}j).$$

(2.17)

The transition probabilities are given by Baum's reestimation procedure:

$${\widehat{P}}_{ij}=\frac{{\sum}_{k=2}^{K}{\xi}_{k}\phantom{\rule{0.1em}{0ex}}(i,\phantom{\rule{0.2em}{0ex}}j)}{{\sum}_{k=2}^{K}{\sum}_{j}{\xi}_{k}\phantom{\rule{0.1em}{0ex}}(i,\phantom{\rule{0.2em}{0ex}}j)}=\frac{{\sum}_{k=2}^{K}{\xi}_{k}\phantom{\rule{0.1em}{0ex}}(i,\phantom{\rule{0.2em}{0ex}}j)}{{\sum}_{k=2}^{K}{\gamma}_{k}\phantom{\rule{0.1em}{0ex}}(i)}.$$

(2.18)

Specifically the transition probabilities *P*_{01} and *P*_{10} are estimated by closed-form expressions,

$${\widehat{P}}_{01}=\frac{{\sum}_{k=2}^{K}{\xi}_{k}\phantom{\rule{0.1em}{0ex}}(0,1)}{{\sum}_{k=2}^{K}{\sum}_{j=0}^{1}{\xi}_{k}\phantom{\rule{0.1em}{0ex}}(0,\phantom{\rule{0.2em}{0ex}}j)}=\frac{{\sum}_{k=2}^{K}{\xi}_{k}\phantom{\rule{0.1em}{0ex}}(0,1)}{{\sum}_{k=2}^{K}{\gamma}_{k}\phantom{\rule{0.1em}{0ex}}(0)},$$

(2.19)

$${\widehat{P}}_{10}=\frac{{\sum}_{k=2}^{K}{\xi}_{k}\phantom{\rule{0.1em}{0ex}}(1,0)}{{\sum}_{k=2}^{K}{\sum}_{j=0}^{1}{\xi}_{k}\phantom{\rule{0.1em}{0ex}}(1,\phantom{\rule{0.2em}{0ex}}j)}=\frac{{\sum}_{k=2}^{K}{\xi}_{k}\phantom{\rule{0.1em}{0ex}}(1,0)}{{\sum}_{k=2}^{K}{\gamma}_{k}\phantom{\rule{0.1em}{0ex}}(1)}.$$

(2.20)

Next, we need to estimate the other unknown parameters (*μ*, *α*, *β*) that appear in the likelihood model. Since there is no closed-form solution for *μ*, *α*, and *β*, we may use the iterative optimization methods, such as the Newton-Ralphson algorithm or the iterative weighted least squares (IWLS) algorithm (e.g., Pawitan, 2001), to optimize the parameters in the M-step.

Let
${\widehat{S}}_{k}={\sum}_{i=0}^{1}i\phantom{\rule{0.1em}{0ex}}{\gamma}_{k}\phantom{\rule{0.1em}{0ex}}(i)$ denote the computed mean statistic of a hidden state at time *k;* by setting the derivatives of with regard to the parameters *α*, *μ*, and *β* (and similarly for vector ** β**), to zeros, we obtain

$$\sum _{k=J}^{K}{\widehat{S}}_{k}{y}_{k}=\sum _{k=J}^{K}{\widehat{S}}_{k}\phantom{\rule{0.1em}{0ex}}exp(\mu +\alpha {\widehat{S}}_{k}+\beta {\stackrel{\sim}{n}}_{k}),$$

(2.21)

$$\sum _{k=J}^{K}{\stackrel{\sim}{n}}_{k}{y}_{k}=\sum _{k=J}^{K}{\stackrel{\sim}{n}}_{k}\phantom{\rule{0.1em}{0ex}}exp(\mu +\alpha {\widehat{S}}_{k}+\beta {\stackrel{\sim}{n}}_{k}),$$

(2.22)

$$\sum _{k=J}^{K}{y}_{k}=\sum _{k=J}^{K}exp(\mu +\alpha {\widehat{S}}_{k}+\beta {\stackrel{\sim}{n}}_{k}),$$

(2.23)

respectively. Typically, a fixed number of iterations is preset for the Newton-Ralphson algorithm in the internal loop within the M-step.

Finally, the convergence of the EM algorithm is monitored by the incremental changes of the log likelihood as well as the parameters. If the absolute value of the change is smaller than 10^{−5}, the EM algorithm is terminated.

Upon estimating parameters ** θ** = (

$${\widehat{S}}_{k}^{\text{MAP}}=\underset{i\in \left\{0,1\right\}}{argmax{\gamma}_{k}\phantom{\rule{0.1em}{0ex}}(i)}\phantom{\rule{1em}{0ex}}1\le k\le K.$$

(2.24)

The computational overhead of the forward Viterbi algorithm has an overall time complexity (4*K*) and space complexity (2*K*).

The discrete-time probabilistic model discussed in the previous section imposes strong assumptions on the transition between the UP and DOWN states. First, it is stationary in the sense that the transition probability is time invariant; second, the transition is strictly Markovian. In this section, we relax these assumptions and further develop continuous-time, data-driven (either Markovian or semi-Markovian) state-space models, which is more appropriate and realistic in characterizing the nature or statistics of the state transitions. In addition, the inference algorithm for the continuous-time models uses the estimation output from the discrete-time model (developed in section 2) as the initialization condition, which also helps to accelerate the algorithmic convergence process.

One important distinction between the discrete-time and continuous-time Markov chains is that the former allows state changes to occur only at regularly spaced intervals, whereas the latter is aperiodic, in that the time between state changes is exponentially distributed. Therefore, the notion of a “single-step transition probability” is no longer valid in continuous time since the “step” does not exist. In fact, the transition probability in continuous time is characterized by either the transition rate or the sojourn time probability density function (pdf) between the two state change events. Let us assume that the pdf of the sojourn time for state *j* characterized by a parameter vector *θ _{j}*. Hence, the transition probability between state 0 (DOWN) and 1 (UP) is characterized by the corresponding pdfs

$$F(z)\equiv Pr[0\le z\le t]={\int}_{0}^{t}\phantom{\rule{0.2em}{0ex}}p({\theta}_{j},z)\phantom{\rule{0.1em}{0ex}}dz,$$

(3.1)

and the probability of remaining in the present state will be computed from

$$Pr[z>t]=1-F(z)={\int}_{t}^{\infty}\phantom{\rule{0.2em}{0ex}}p({\theta}_{j},z)\phantom{\rule{0.1em}{0ex}}dz,$$

(3.2)

which is known as the survival function in reliability and survival analysis. As seen, the transition probability matrix in continuous time now depends on the elapsed time (starting from the state onset) as well as the present state status. In general, we write the transition probability matrix as a parameterized form ** P**(

In modeling the neural spike train point processes, the CIF characterizes the instantaneous firing probability of a discrete event (i.e., spike). Specifically, the product between the CIF λ(*t*) and the time interval Δ tells approximately the probability of observing a spike within the interval [*t*, *t* + Δ) (e.g., Brown et al., 2003):

$$\mathrm{\lambda}(t)=\underset{\mathrm{\Delta}\to 0}{lim}\frac{Pr\{N(t+\mathrm{\Delta})-N(t)=1\mid {\mathscr{H}}_{0:t}\}}{\mathrm{\Delta}}.$$

For each spike train, we model the CIF in a parametric form,^{5}

$${\mathrm{\lambda}}^{c}(t)\equiv {\mathrm{\lambda}}^{c}(t\mid {\mathscr{H}}_{0:t})=exp\phantom{\rule{0.2em}{0ex}}\left({\mu}_{c}+{\alpha}_{c}\phantom{\rule{0.1em}{0ex}}{S}_{t}+{\gamma}_{c}{\int}_{0}^{t}{e}^{-{\beta}_{c}\tau}\phantom{\rule{0.1em}{0ex}}d{N}^{c}(t-\tau )\phantom{\rule{0.1em}{0ex}}d\tau \right),$$

(3.3)

where exp(*μ _{c}*) denotes the baseline firing rate for the

$${\mathrm{\lambda}}_{k}^{c}\equiv {\mathrm{\lambda}}^{c}(k\mid {\mathscr{H}}_{1:k})=exp\phantom{\rule{0.2em}{0ex}}({\mu}_{c}+{\alpha}_{c}\phantom{\rule{0.1em}{0ex}}{S}_{k}+{\mathbf{\beta}}_{c}^{T}\phantom{\rule{0.1em}{0ex}}{\stackrel{\sim}{\mathbf{n}}}_{k}),$$

(3.4)

where * ñ_{k}* is a vector containing the number of spike counts within the past intervals, and the length of the vector defines a finite number of windows of spiking history. By assuming that the spike trains are mutually independent, the observed data likelihood is given by (Brillinger, 1988; Brown et al., 2003)

$$p\left(d{N}_{1:K}^{1:C}\mid {S}_{1:K},\mathbf{\theta}\right)=\prod _{c=1}^{C}\prod _{k=1}^{K}exp\phantom{\rule{0.1em}{0ex}}\left(d{\mathit{\text{N}}}_{k}^{c}log\phantom{\rule{0.2em}{0ex}}\left[{\mathrm{\lambda}}_{k}^{c}\mathrm{\Delta}\right]-{\mathrm{\lambda}}_{k}^{c}\mathrm{\Delta}\right).$$

(3.5)

The complete statistics of the continuous-time latent process may be characterized by a triplet: = (*n*, ** τ**,

$$p(d{N}_{1:K}^{1:C}\mid \mathcal{S},\mathbf{\theta})=\prod _{c=1}^{C}\prod _{l=1}^{n}Pr\phantom{\rule{0.2em}{0ex}}(d{N}_{l}^{1:C}\mid {\chi}_{l},\mathbf{\theta}),$$

(3.6)

where
$d{N}_{l}^{1:C}$ denotes all of spike train observations during the sojourn time [*ν*_{l−1}, *ν _{l}*] for the continuous-time (semi-)Markov process {

If we model each spike train as an inhomogeneous Poisson process with time-varying CIF λ* ^{c}*(

$${\mathrm{\lambda}}_{l}^{c}={\int}_{{\nu}_{l-1}}^{{\nu}_{l}}{\mathrm{\lambda}}^{c}\phantom{\rule{0.1em}{0ex}}(t)\phantom{\rule{0.1em}{0ex}}dt.$$

(3.7)

Correspondingly, the observed data likelihood function, equation 3.5, is given by (Daley & Vere-Jones, 2002)

$$\begin{array}{l}p\left(d{N}_{1:K}^{1:C}\mid \mathcal{S},\mathbf{\theta}\right)=\prod _{c=1}^{C}\prod _{l=1}^{n}\phantom{\rule{0.2em}{0ex}}\left(exp\phantom{\rule{0.1em}{0ex}}\left(-{\mathrm{\lambda}}_{l}^{c}\right)\prod _{i=1}^{{\mathit{\text{y}}}_{l}^{c}}{\mathrm{\lambda}}^{c}({t}_{i})\right)\\ \phantom{\rule{6.8em}{0ex}}=\prod _{c=1}^{C}\prod _{l=1}^{n}\phantom{\rule{0.2em}{0ex}}\left(exp\phantom{\rule{0.1em}{0ex}}\left(-{\mathrm{\lambda}}_{l}^{c}\right)\frac{{({\mathrm{\lambda}}_{l}^{c})}^{{\mathit{\text{y}}}_{l}^{c}}}{{\mathit{\text{y}}}_{l}^{c}!}\right)\phantom{\rule{0.1em}{0ex}},\end{array}$$

where
${\mathit{\text{y}}}_{l}^{c}$ denotes the spike counts of the *c*th spike train during the interval (*ν*_{l−1}, *ν _{l}*], and

Ultimately, we can write the complete-data log likelihood in a compact form:^{6}

$$\begin{array}{l}\mathcal{L}=log\phantom{\rule{0.1em}{0ex}}p({S}_{0:T},{y}_{1:n}\mid \mathbf{\theta})\\ \phantom{\rule{1em}{0ex}}=\sum _{l=1}^{n}\phantom{\rule{0.1em}{0ex}}\sum _{j=0}^{1}\phantom{\rule{0.2em}{0ex}}\left({\xi}_{l}\phantom{\rule{0.1em}{0ex}}(j,j)log[F({\mathbf{\theta}}_{j},{\tau}_{l})]+\sum _{i\ne j}{\xi}_{l}\phantom{\rule{0.1em}{0ex}}(i,j)\phantom{\rule{0.1em}{0ex}}log[1-F({\mathbf{\theta}}_{j},{\tau}_{l})]\right)\\ \phantom{\rule{1.8em}{0ex}}+\phantom{\rule{0.2em}{0ex}}\sum _{c=1}^{C}\sum _{l=1}^{n}\phantom{\rule{0.2em}{0ex}}\left({\int}_{{\nu}_{l-1}}^{{\nu}_{l}}log{\mathrm{\lambda}}^{c}\phantom{\rule{0.1em}{0ex}}(t)\phantom{\rule{0.1em}{0ex}}d{N}^{c}\phantom{\rule{0.1em}{0ex}}(t)-{\mathrm{\lambda}}_{l}^{c}\right)\phantom{\rule{0.1em}{0ex}},\end{array}$$

(3.8)

where * θ_{j}* denotes the parameter(s) of the probability model of the sojourn time associated with state

In a continuous-time Markov chain (i.e., Markov process), state transitions from one state to another can occur at any instant of time. Due to the Markov property, the time that the system spends in any given state is memoryless, and the distribution of the survival time depends on the state (but not on the time already spent in the state); in other words, the sojourn time is exponentially distributed,^{7} which can be characterized by a single rate parameter. The rate parameter, also known as the continuous-time state transition rate, defines the probability per time unit that the system makes a transition from one state to the other during an infinitesimal time interval:

$${q}_{ij}=\underset{\mathrm{\Delta}t\to 0}{lim}\frac{Pr({S}_{t+\mathrm{\Delta}t}=j\mid {S}_{t}=i)}{\mathrm{\Delta}t},\phantom{\rule{1em}{0ex}}i\ne j.$$

(3.9)

The total transition rate of state *i* satisfies the rate balance condition:

$${r}_{i}\equiv {q}_{ii}=-\sum _{j\ne i}{q}_{ij}.$$

(3.10)

The holding time of the sojourn for state *i* follows an exponential distribution exp(−*r _{i}τ*), or, equivalently, the transition times of state

$$Q=\left(\begin{array}{c}{q}_{00}\\ {q}_{10}\end{array}\begin{array}{c}{q}_{01}\\ {q}_{11}\end{array}\right)=\left(\begin{array}{c}{r}_{0}\\ -{r}_{1}\end{array}\begin{array}{c}-{r}_{0}\\ -{r}_{1}\end{array}\right).$$

(3.11)

For an exponential random variable *z*, its cdf is computed as
$Pr[z\le t]={\int}_{-\infty}^{t}r{e}^{-rz}\phantom{\rule{0.1em}{0ex}}dz=1-{e}^{-rt}$, where *re*^{−rz} is the pdf of the exponential distribution with a rate parameter *r*. The reciprocal of the rate parameter, 1/*r*, is sometimes called the survival parameter in the sense that the exponential random variable *z* that survives the duration of time has the mean [*z*] = 1/*r*. In light of equations 3.1 and 3.2, at a given specific time *t*, the probability of remaining within the current state sojourn is
$Pr[z>t]=1-Pr[z\le t]={\int}_{t}^{\infty}r{e}^{-rz}\phantom{\rule{0.1em}{0ex}}dz={e}^{-rt}$. Let *r*_{0} and *r*_{1} denote the transition rate for states 0 and 1, respectively. Let *τ* = *t* − *ν* denote the elapsed time from the state transition up to the current time instant *t*; then the parameterized transition probability ** P** = {

$${P}_{ij}(\tau \mid {r}_{i})=\{\begin{array}{ll}exp(-{r}_{i}\tau ),& i=j\\ 1-exp(-{r}_{i}\tau ),& i\ne j\end{array}\phantom{\rule{0.2em}{0ex}}(i,j\in \{0,1\}).$$

(3.12)

Now, the transition probability, instead of being constant, is a probabilistic function of the time after the Markov process makes a transition to or from a given state (the holding time from the last transition or the survival time to the next transition).

Due to biophysical or physiological constraints, the sojourn time for a specific state might be subject to a certain range constraint, reflected in terms of the pdf. Without loss of generality, let *p*(*τ*) denote the standard pdf for a random variable *τ*, and let (*τ*) denote the “censored” version of *p*(*τ*),^{8} which is defined by

$$\stackrel{\sim}{p}(\tau )=\frac{1}{c}p(\tau ){\mathbb{I}}_{\left[a,b\right]}(\tau )=\{\begin{array}{ll}\frac{1}{c}p(\tau ),& \tau \in \left[a,b\right]\\ 0,& \text{otherwise}\end{array},$$

where (·) is an indicator function and *a* > 0 and *b* (*a*, ∞) are the lower and upper bounds of the constrained random variable *τ* (which is always positive for the duration of the sojourn time), respectively. The scalar *c* is a normalized constant determined by *c* = *F* (*b*) − *F* (*a*), where *F* (·) denotes the corresponding cdf of the standard pdf *p*(*τ*) and *F* (∞) = 1. Likewise, the censored version of the cdf is computed by
$\stackrel{\sim}{F}(\tau )={\int}_{-\infty}^{\tau}\stackrel{\sim}{p}(\tau )\phantom{\rule{0.1em}{0ex}}d\tau =\frac{1}{c}{\int}_{a}^{\tau}\phantom{\rule{0.1em}{0ex}}p(\tau )\phantom{\rule{0.1em}{0ex}}d\tau $.

Suppose the sojourn time *τ* of a continuous-time Markov chain follows a censored version of the exponential distribution; then we can write its censored pdf as

$$\stackrel{\sim}{p}(\tau )=\{\begin{array}{ll}\frac{rexp\left(-r\tau \right)}{exp\left(-ra\right)},& \tau \ge a>0\\ 0,& \text{otherwise}\end{array},$$

(3.13)

where the normalizing constant is given by *c* = *F*(∞) − *F*(*a*) = 1 − [1 − exp(−*ra*)] = exp(−*ra*).

In contrast to the Markov process, the semi-Markov process is a continuous-time stochastic process {*S _{t}*} that draws the sojourn time {

Summary of Continuous-Time Probability Models for the Transition Probability Density Function *p*(*τ*) (Where *τ* Is a Nonnegative or Positive Random Variable That Denotes the Holding Time), Cumulative Distribution Function *F*(*τ*), and **...**

To characterize the nonexponential survival time behavior of semi-Markov processes, here we restrict our attention to three probability distributions that belong to the two-parameter exponential family of continuous probability distributions. We define the censored versions of three pdfs as follows:

- The censored gamma distribution (
*τ*;*s*,*κ*):where$$\stackrel{\sim}{p}(\tau ;s,\kappa )=\frac{1}{c}{\mathbb{I}}_{\left[a,b\right]}\phantom{\rule{0.1em}{0ex}}(\tau )\phantom{\rule{0.1em}{0ex}}p\phantom{\rule{0.1em}{0ex}}(\tau ;s,\kappa )=\frac{1}{c}{\mathbb{I}}_{\left[a,b\right]}\phantom{\rule{0.1em}{0ex}}(\tau ){\tau}^{s-1}\frac{exp(-\tau /\kappa )}{\Gamma (s){\kappa}^{s}},$$*κ*and*s*represent the scale and shape parameters, respectively. If*s*is an integer, then the gamma distribution represents the sum of*s*exponentially distributed random variables, each with a mean*κ*. Similarly,*c*is a normalized constant for the censored pdf (*τ*;*s*,*κ*):*c*=*F*(*b*) −*F*(*a*), and*F*(·) is the cdf of the standard gamma distribution. - The censored log-normal distribution (
*τ*;*μ*,*σ*):where the mean, median, and variance of the log-normal distribution are exp($$\begin{array}{l}\stackrel{\sim}{p}(\tau ;\mu ,\sigma )=\frac{1}{c}{\mathbb{I}}_{\left[a,b\right]}\phantom{\rule{0.1em}{0ex}}(\tau )\phantom{\rule{0.1em}{0ex}}p\phantom{\rule{0.1em}{0ex}}(\tau ;\mu ,\sigma )\\ \phantom{\rule{4em}{0ex}}=\frac{1}{c}{\mathbb{I}}_{\left[a,b\right]}\phantom{\rule{0.1em}{0ex}}(\tau )\frac{1}{x\sigma \sqrt{2\pi}}exp\phantom{\rule{0.2em}{0ex}}\left(-\frac{{\left[ln(\tau )-\mu \right]}^{2}}{2{\sigma}^{2}}\right),\end{array}$$*μ*+*σ*^{2}/2), exp(*μ*), and exp(2*μ*+*σ*^{2})[exp(*σ*^{2}) − 1], respectively. - The censored inverse gaussian distribution (
*τ*;*μ*,*s*):where$$\begin{array}{l}\stackrel{\sim}{p}(\tau ;\mu ,s)=\frac{1}{c}{\mathbb{I}}_{\left[a,b\right]}\phantom{\rule{0.1em}{0ex}}(\tau )\phantom{\rule{0.1em}{0ex}}p\phantom{\rule{0.1em}{0ex}}(\tau ;\mu ,s)\\ \phantom{\rule{4em}{0ex}}=\frac{1}{c}{\mathbb{I}}_{\left[a,b\right]}\phantom{\rule{0.1em}{0ex}}(\tau )\sqrt{\frac{s}{2\pi {\tau}^{3}}}exp\phantom{\rule{0.2em}{0ex}}\left(-\frac{s{\left(\tau -\mu \right)}^{2}}{2{\mu}^{2}\tau}\right),\end{array}$$*μ*and*s*represent the mean and shape parameters, respectively.

The choice of the last two probability distributions is mainly motivated by the empirical data analysis published earlier (Ji & Wilson, 2007). Typically, for a specific data set, a smoothed histogram analysis is conducted to visualize the shape of the distribution, and the Kolmogorov-Smirnov (KS) test can be used to empirically evaluate the fit of specific probability distributions. Mostly likely, none of parametric probability distribution candidate would fit perfectly (i.e., within 95% confidence interval) for the real-world data; we often choose the one that has the best fit.^{9} In the simulation data shown in Figure 1, we have used the following constraints for the UP and DOWN states: [0.1, 3] (unit: second) for UP state and [0.05, 1] (unit: second) for DOWN state. The lower and upper bounds of these constraints are chosen in light of the results reported from Ji and Wilson (2007). Note that the shapes of the log-normal and inverse gaussian pdfs and cdfs are very similar, except that inverse gaussian distribution is slightly sharper when the value of the random variable is small (Takagi, Kumagai, Matsunaga, & Kusaka, 1997). In addition, the tail behavior of these two distributions differs; however, provided we consider only their censored versions (with finite duration range), the tail behavior is not a major concern here.

In the continuous-time model, we treat the individual spike trains separately and aim to estimate their own parameters. Let
$\mathbf{\theta}=({\mathbf{\theta}}_{\mathit{\text{up}}},{\mathbf{\theta}}_{\mathit{\text{down}}},{\{{\mu}_{c}\}}_{c=1}^{C},{\{{\alpha}_{c}\}}_{c=1}^{C},{\{{\mathbf{\beta}}_{c}\}}_{c=1}^{C},)$ denote the unknown parameters of interest, where *θ _{up}* and

Let denote all the spiking timing information of the observed spike trains. Similar to the discrete-time HMM, we aim to maximize the Q-function, defined as follows:

$$Q(\mathbf{\theta})=\sum _{S}p\left(\mathcal{S}\mid \mathcal{Y},\mathbf{\theta}\right)log\phantom{\rule{0.1em}{0ex}}p(\mathcal{Y},\mathcal{S}\mid \mathbf{\theta}).$$

(3.14)

The inference can be tackled in a similar fashion by the EM algorithm.

First, let us consider a simpler task where the transition time of the latent process is known and the number of state jumps is given. In other words, the parameters *n* and ** τ** are both available, so the estimation goal becomes less demanding. We need to estimate only

Since the number of state transition, *n*, is known, *p*(, ** θ**) is simplified to

$$p\left(\mathcal{S},\mathcal{Y}\mid \mathbf{\theta}\right)=\sum _{l=1}^{n}{P}_{{\mathcal{S}}_{l-1},{\mathcal{S}}_{l}}\phantom{\rule{0.1em}{0ex}}({\tau}_{l})\phantom{\rule{0.1em}{0ex}}P({\mathcal{Y}}_{l}\mid \mathcal{S}({\chi}_{l})).$$

(3.15)

Let *ξ _{l}*(

$$\begin{array}{l}\mathcal{L}\left(\mathcal{S},\mathbf{\theta}\right)=log\phantom{\rule{0.2em}{0ex}}p({S}_{0:T,}\mathcal{Y}\mid \mathbf{\theta})\\ \phantom{\rule{3.3em}{0ex}}=\sum _{l=1}^{n}\phantom{\rule{0.2em}{0ex}}\sum _{j=0}^{1}\phantom{\rule{0.2em}{0ex}}\left({\xi}_{l}\phantom{\rule{0.1em}{0ex}}(j,j)(-{r}_{j}{\tau}_{l})+\sum _{i\ne j}{\xi}_{l}\phantom{\rule{0.1em}{0ex}}(j,i)log\phantom{\rule{0.1em}{0ex}}[1-exp(-{r}_{j}{\tau}_{l})]\right)\\ \phantom{\rule{4em}{0ex}}+\sum _{c=1}^{C}\phantom{\rule{0.2em}{0ex}}\sum _{l=1}^{n}\phantom{\rule{0.1em}{0ex}}\left({\int}_{{\nu}_{l-1}}^{{\nu}_{l}}log{\mathrm{\lambda}}^{c}(t)\phantom{\rule{0.1em}{0ex}}d{N}^{c}(t)-{\mathrm{\lambda}}_{l}^{c}\right),\end{array}$$

(3.16)

where ** θ** = (

$$\begin{array}{l}\mathcal{L}=log\phantom{\rule{0.1em}{0ex}}p\left({S}_{0:T},\mathcal{Y}\mid \mathbf{\theta}\right)\\ \phantom{\rule{1em}{0ex}}=\sum _{l=1}^{n}\phantom{\rule{0.2em}{0ex}}\left(\sum _{j=0}^{1}{\xi}_{l}\phantom{\rule{0.1em}{0ex}}\left(j,\phantom{\rule{0.2em}{0ex}}j\right)\phantom{\rule{0.1em}{0ex}}log\phantom{\rule{0.1em}{0ex}}[1-F({\tau}_{l};{\mathbf{\theta}}_{j})]+\sum _{i\ne j}{\xi}_{l}\phantom{\rule{0.1em}{0ex}}\left(j,\phantom{\rule{0.1em}{0ex}}i\right)log\phantom{\rule{0.1em}{0ex}}[F({\tau}_{l};{\mathbf{\theta}}_{j})]\right)\\ \phantom{\rule{1.6em}{0ex}}+\phantom{\rule{0.2em}{0ex}}\sum _{c=1}^{C}\phantom{\rule{0.2em}{0ex}}\sum _{l=1}^{n}\phantom{\rule{0.2em}{0ex}}\left({\int}_{{\nu}_{l-1}}^{{\nu}_{l}}log{\mathrm{\lambda}}^{c}\left(t\right)\phantom{\rule{0.1em}{0ex}}d{N}^{c}\left(t\right)-{\mathrm{\lambda}}_{l}^{c}\right),\end{array}$$

(3.17)

where *F* (·) denotes the cdf of the nonexponential probability distribution.

Conditioned on the parameter ** θ**, the posterior probabilities

$${\mathbf{\theta}}^{\mathit{\text{new}}}=arg\underset{\mathbf{\theta}}{max}\sum _{\mathcal{S}}Pr\{\mathcal{S}\mid \mathcal{Y},\phantom{\rule{0.1em}{0ex}}{\mathbf{\theta}}^{\mathit{\text{old}}}\}log\left(Pr\left\{\mathcal{S},\phantom{\rule{0.2em}{0ex}}\mathcal{Y}\mid \mathbf{\theta}\right\}\right).$$

(3.18)

More specifically, for the parameters associated with the transition probability density model, we might, for example, assume that the UP and DOWN state durations are both log normal distributed with parameters * θ_{j}* = {

$$\begin{array}{l}\{{\mu}_{j},\phantom{\rule{0.1em}{0ex}}{\sigma}_{j}\}=arg\underset{\mu ,\phantom{\rule{0.1em}{0ex}}\sigma}{max}\phantom{\rule{0.3em}{0ex}}\{\sum _{l=1}^{n}{\xi}_{l}\phantom{\rule{0.1em}{0ex}}\left(j,\phantom{\rule{0.1em}{0ex}}j\right)log[1-F({\tau}_{l};\mu ,\sigma )]\\ \phantom{\rule{4.2em}{0ex}}+\phantom{\rule{0.2em}{0ex}}\sum _{l=1,i\ne j}^{n}{\xi}_{l}\phantom{\rule{0.1em}{0ex}}\left(j,\phantom{\rule{0.1em}{0ex}}i\right)log[F({\tau}_{l};\mu ,\sigma )]\}.\end{array}$$

(3.19)

In light of Table 2, setting the derivatives of *μ _{j}* and

$$\begin{array}{l}0=\sum _{l=1}^{n}\frac{-1}{\sigma}exp\phantom{\rule{0.2em}{0ex}}\left(-\frac{{\left(ln{\tau}_{l}\phantom{\rule{0.2em}{0ex}}-\phantom{\rule{0.2em}{0ex}}\mu \right)}^{2}}{2{\sigma}^{2}}\right)\phantom{\rule{0.2em}{0ex}}\left(\frac{{\xi}_{l}\phantom{\rule{0.1em}{0ex}}\left(j,\phantom{\rule{0.1em}{0ex}}j\right)}{1-F\left({\tau}_{l};\mu ,\sigma \right)}+\frac{{\sum}_{i\ne j}{\xi}_{l}\phantom{\rule{0.1em}{0ex}}\left(j,\phantom{\rule{0.1em}{0ex}}i\right)}{F\left({\tau}_{l};\mu ,\sigma \right)}\right),\\ 0=\sum _{l=1}^{n}\frac{\mu -ln{\tau}_{l}}{{\sigma}^{2}}exp\phantom{\rule{0.2em}{0ex}}\left(-\frac{{\left(ln{\tau}_{l}-\mu \right)}^{2}}{2{\sigma}^{2}}\right)\phantom{\rule{0.2em}{0ex}}\left(\frac{{\xi}_{l}\phantom{\rule{0.1em}{0ex}}\left(j,\phantom{\rule{0.1em}{0ex}}j\right)}{1-F\left({\tau}_{l};\mu ,\sigma \right)}+\frac{{\sum}_{i\ne j}{\xi}_{l}\phantom{\rule{0.1em}{0ex}}\left(j,\phantom{\rule{0.1em}{0ex}}i\right)}{F\left({\tau}_{l};\mu ,\sigma \right)}\right),\end{array}$$

where we have used $\frac{d\phantom{\rule{0.1em}{0ex}}\text{erf}\phantom{\rule{0.1em}{0ex}}\left(z\right)}{dz}=\frac{2}{\sqrt{\pi}}exp(-{z}^{2})$ in light of Table 2. The above two equations can be solved iteratively with the Newton-Ralphson algorithm.

Once the state estimate *Ŝ*(*t*) is available from the E-step,^{10} the parameters associated with the likelihood model can also be estimated by the Newton-Ralphson or the IWLS algorithm,

$$\{{\mu}_{c},{\alpha}_{c},{\beta}_{c}\}=arg\underset{\mu ,\alpha ,\beta}{max}\phantom{\rule{0.3em}{0ex}}\left(\sum _{l=1}^{n}{\int}_{{\nu}_{l-1}}^{{\nu}_{l}}log\phantom{\rule{0.2em}{0ex}}{\mathrm{\lambda}}^{c}\left(t\right)\phantom{\rule{0.2em}{0ex}}d{N}^{c}\left(t\right)-{\mathrm{\lambda}}_{l}^{c}\right),$$

(3.20)

where λ^{c}(*t*) and
${\mathrm{\lambda}}_{l}^{c}$ are defined by equations 3.3 (or 3.4) and 3.7, respectively.

Notably, the EM algorithm described above has a few obvious drawbacks. Basically, it assumes that the information as to when and how many state transitions occur during the latent process is given; once the number of state jumps (say, *n*) is determined, it does not allow the number *n* to change, so it is incapable of online model selection. Furthermore, it is very likely that the EM algorithm suffers from the local maximum problem, especially if the initial conditions of the parameters are far from the true estimates. In the following, we present an alternative inference method to overcome these two drawbacks, and the method can be regarded as a generalization of the EM algorithm, except that the E-step state estimation is replaced by a Monte Carlo sampling procedure. This method is often called the Monte Carlo EM (MCEM) algorithm (Chan & Ledolter, 1995; McLachlan & Krishnan, 1996; Tanner, 1996; Stjernqvist, Rydén, Sköld, & Staaf, 2007). The essence of MCEM is the theory of Markov chain Monte Carlo (MCMC), which will be detailed below.

The basic idea of MCMC sampler is to draw a large number of samples randomly from the posterior distribution and then obtain a sample-based numerical approximation of the posterior distribution. Unlike other Monte Carlo samplers (such as importance sampling and rejection sampling), the MCMC method is well suited for sampling from a complex and high-dimensional probability distribution. Instead of drawing independent samples from the posterior distribution directly, MCMC constructs a Markov chain such that its equilibrium will eventually approach the posterior distribution. The Markov chain theory states that given an arbitrary initial value, the chain will ultimately converge to the equilibrium point provided the chain is run sufficiently long. In practice, determining the convergence as well as the “burn-in time” for MCMC samplers requires diagnostic tools (see Gilks, Richardson, & Spiegelhalter, 1995, for general discussions of the MCMC methods). Depending on the specific problem, the MCMC method is typically computationally intensive, and the convergence process can be very slow. Nevertheless, here we focus on methodology development, and therefore the computational demand is not the main concern. Specifically, constructing problem-specific MCMC proposal distributions (densities) is the key to obtain an efficient MCMC sampler that has a fast convergence speed and a good mixing property (Brooks, Guidici, & Roberts, 2003). For the variable-dimension RJMCMC method, we present some detailed mathematical treatments in appendix A.

The Monte Carlo EM (MCEM) algorithm works just like an ordinary EM algorithm, except that in the E-step, the expectation operations (i.e., computation of sufficient statistics) are replaced by Monte Carlo simulations of the missing data. Specifically, *M* realizations of the latent process *S*(*t*) (0 ≤ *t* ≤ *T*) are simulated, and in this case the Q-function can be written as

$$Q\left(\mathcal{S},\mathbf{\theta}\right)\approx \widehat{Q}\left(\mathcal{S},\mathbf{\theta}\right)=\frac{1}{M}\sum _{m=1}^{M}Q\left({\mathcal{S}}^{\left(m\right)},\mathbf{\theta}\right),$$

(3.21)

where ^{(m)} denotes the *m*th simulated latent process for the unknown state (missing data).

The M-step of MCEM is the same as the conventional M-step in the EM algorithm. Specifically, the parameters of the CIF appearing in the likelihood model are estimated using equation 3.20. However, the estimation of the parameters for the UP or DOWN state sojourn depends on the type of probability distribution. Here we distinguish three different possible scenarios.

First, when the latent process is a continuous-time Markov chain and the sojourn time durations for the UP and DOWN states are both exponentially distributed, then * θ_{up}* and

$$\begin{array}{l}Q\left(\mathcal{S},\mathbf{\theta}\right)=\sum _{i=0,j\ne i}^{1}({\widehat{n}}_{ij}\phantom{\rule{0.2em}{0ex}}log{q}_{ij}+{r}_{i}{T}_{i})\\ \phantom{\rule{4em}{0ex}}+\phantom{\rule{0.1em}{0ex}}\sum _{c=1}^{C}\phantom{\rule{0.1em}{0ex}}\sum _{l=1}^{n}\phantom{\rule{0.2em}{0ex}}\left({\int}_{{\nu}_{l-1}}^{{\nu}_{l}}log{\mathrm{\lambda}}^{c}\left(t\right)\phantom{\rule{0.1em}{0ex}}d{N}^{c}\left(t\right)-{\mathrm{\lambda}}_{l}^{c}\right),\end{array}$$

(3.22)

where *n _{ij}* denotes the number of jumps from state

$$r=\frac{1}{M}\sum _{m=1}^{M}\frac{{\sum}_{l=0}^{n}\mathbb{I}\left({\chi}_{l}^{\left(m\right)}=0\right)}{{\int}_{0}^{T}\mathbb{I}\left({S}^{\left(m\right)}\left(t\right)=0\right)\phantom{\rule{0.2em}{0ex}}dt}.$$

In the second scenario, when the latent process is a continuous-time semi-Markov chain where the sojourn time durations for both UP and DOWN states are nonexponentially distributed, the Q-function can be written as

$$\begin{array}{l}Q\left(\mathcal{S},\mathbf{\theta}\right)=\frac{1}{M}\sum _{m=1}^{M}\sum _{l=0}^{n-1}\phantom{\rule{0.2em}{0ex}}\left(\sum _{j=0}^{1}\sum _{j\to i}^{i\ne j}log\phantom{\rule{0.2em}{0ex}}\left[1-F\left({\tau}_{l}^{\left(m\right)};{\mathbf{\theta}}_{j}\right)\right]\right)\\ \phantom{\rule{3.8em}{0ex}}+\phantom{\rule{0.2em}{0ex}}\sum _{c=1}^{C}\sum _{l=1}^{n}\phantom{\rule{0.2em}{0ex}}\left({\int}_{{\nu}_{l-1}}^{{\nu}_{l}}log{\mathrm{\lambda}}^{c}\left(\overline{S}\left(t\right),\mathbf{\theta}\right)\phantom{\rule{0.2em}{0ex}}d{N}^{c}\left(t\right)-{\mathrm{\lambda}}_{l}^{c}\right),\end{array}$$

(3.23)

where
$\overline{S}\left(t\right)=\frac{1}{M}{\sum}_{m=1}^{M}{\widehat{S}}^{\left(m\right)}\left(t\right)$ is obtained from the Monte Carlo mean statistic of *M* simulated latent processes. Similarly, the parameters of the nonexponential probability distributions can be estimated by their MLE based on their Monte Carlo realizations. For instance, in the case of inverse gaussian distribution, the mean parameter is given by
${\mu}_{j}=\frac{1}{{n}_{j}\phantom{\rule{0.1em}{0ex}}M}{\sum}_{m=1}^{M}{\sum}_{l=0}^{n}{\tau}_{l}^{\left(m\right)}\mathbb{I}({\chi}_{l}^{\left(m\right)}=j)$, and the shape parameter is given by
${\sigma}_{j}=[\frac{1}{{n}_{j}\phantom{\rule{0.1em}{0ex}}M}{\sum}_{m=1}^{M}{\sum}_{l=0}^{n}(({\tau}_{l}^{(m)}{)}^{-1}-{\mu}_{j}^{-1})\mathbb{I}({\chi}_{l}^{(m)}=j){]}^{-1}$. In the case of log-normal distribution, the mean parameter is given by
${\mu}_{j}=\frac{1}{{n}_{j}\phantom{\rule{0.1em}{0ex}}M}{\sum}_{m=1}^{M}{\sum}_{l=0}^{n}\mathbb{I}({\chi}_{l}^{\left(m\right)}=j)ln{\tau}_{l}^{\left(m\right)}$, and the SD parameter is given by
${\sigma}_{j}=\frac{1}{{n}_{j}\phantom{\rule{0.1em}{0ex}}M}{\sum}_{m=1}^{M}{\sum}_{l=0}^{n}\mathbb{I}({\chi}_{l}^{\left(m\right)}=j)|ln\phantom{\rule{0.2em}{0ex}}{\tau}_{l}^{\left(m\right)}-{\mu}_{j}|$.

If, in the third scenario, one of the state sojourn time durations is exponentially distributed and the other is nonexponential, then the resulting latent process is still a semi-Markov chain, and the estimation procedure remains similar to that in the above two cases.

It is important to choose sensible initial values for the simulated (semi-) Markov chain since a poor choice of the initial ^{(0)} can lead to a sampler that takes a very long time to converge or result in a poor mixing of the (semi-) Markov chain. In our experiment, we typically use a discrete-time HMM model (with a 10 ms bin size) to estimate the hidden state sequence and then interpolate it to obtain an initial estimate of the continuous-time state process (with 1 ms bin size), from which we obtain the initial values of {*n*, ** τ**,

In summary, the MCEM algorithm is run as follows:

- Initialize the MCMC sampler state for = {
*n*,,*τ*}.*ν* - Iterate the MCEM algorithm's E and M steps until the log likelihood reaches a local maximum or saddle point.
- Monte Carlo E-step: Given an initial state
^{(0)}, run the RJMCMC sampling procedure to draw*M*Monte Carlo samples ${\{{\mathcal{S}}^{\left(m\right)}\}}_{m=1}^{M}$, based on which to compute the necessary Monte Carlo sufficient statistics. - M-step: estimate the parameters {
,**θ**_{up}} with their MLE.**θ**_{down} - M-step: optimize the parameters ${\{{\mu}_{c},{\alpha}_{c},{\mathbf{\beta}}_{c}\}}_{c=1}^{C}$ according to equation 3.20.

- Upon convergence, reconstruct the hidden state
*Ŝ*(*t*) in the continuous-time domain. - Compute λ
(^{c}*t*) for each spike train*c*, and conduct goodness-of-fit tests (see below) for all spike trains being modeled.

There are two ways to reconstruct the hidden state of the latent process. The first is to apply the Viterbi algorithm once the MCEM inference is completed (when *n* and ** τ** have been determined). In the second, and simpler, approach, we can determine the state by the following rule (Ball et al., 1999). For

$${\widehat{S}}^{\left(m\right)}\left(t\right)=\{\begin{array}{ll}1,& \text{if}\phantom{\rule{0.5em}{0ex}}{\chi}_{t}^{\left(m\right)}\in \text{UP},\\ 0,& \text{if}\phantom{\rule{0.5em}{0ex}}{\chi}_{t}^{\left(m\right)}\in \text{DOWN},\end{array}$$

(3.24)

and let $\overline{S}\left(t\right)=\frac{1}{M}{\sum}_{m=1}^{M}{\widehat{S}}^{\left(m\right)}\left(t\right)$, and the point estimate of the hidden state is

$$\widehat{S}\left(t\right)=\{\begin{array}{ll}1& \text{if}\phantom{\rule{0.3em}{0ex}}\overline{S}\left(t\right)\ge 0.5,\\ 0& \text{if}\phantom{\rule{0.3em}{0ex}}\overline{S}\left(t\right)<\mathrm{0.5.}\end{array}$$

(3.25)

Furthermore, the marginal prior probability of the hidden state is given by

$${\widehat{\pi}}_{i}=\frac{1}{M}\sum _{m=1}^{M}\mathbb{I}\phantom{\rule{0.1em}{0ex}}\left({S}^{\left(m\right)}\left(0\right)=i\right).$$

(3.26)

Upon estimating the CIF model λ* ^{c}*(

Furthermore, we measure the independence of the time-rescaled time series by computing the autocorrelation function of *g _{j}*s:
$\text{ACF}(m)=\frac{1}{J-m}{\sum}_{j=1}^{J-m}{g}_{j}\phantom{\rule{0.1em}{0ex}}{g}_{j+m}$. Since

Let the triple = (*n*, ** τ**,

To simulate the Markov chain, we first obtain the initial conditions for both the state and parameters {^{(0)}, *θ*^{(0)}}. Next, we run the MCMC sampler (see appendix A for details) to generate a sequence of Monte Carlo samplers {^{(k)}} for *k* = 1, 2, …, *M*, and the realizations {^{(k)}} can be viewed as the samples drawn from the conditional posterior *p*( , ** θ**). At each MCEM step, the parameter vector

At the end of each Monte Carlo sampling step, the sufficient statistics for computing the acceptance probability and updating the parameter ** θ** (in the M-step) need to be stored, and all new information (

In terms of convergence, Markov chain theory tells us that when *M* → ∞, the samples are asymptotically drawn from the desired target (equilibrium) distribution. However, choosing the proper value of *M* is often problem dependent, and the convergence diagnosis of the MCMC sampler remains an active research topic in the literature (e.g., Gelman & Rubin, 1992; Cowles & Carlin, 1996).

First, we simulate four spike trains with the time-rescaling theorem (see Figure 2 for a snapshot of one realization). The latent state variable is assumed to be drawn from a two-state discrete space: *S*(*t*) {0, 1}. The simulation is conducted with a 1 ms time bin size with the following model:

Synthetic data. (a) The simulated UP and DOWN hidden state process. (b) The simulated time-varying traces of conditional intensity function (CIF) λ^{c}(*t*) (*c* = 1, …, 4). (c) The four simulated spike trains. (d) The averaged firing rate across **...**

$$\begin{array}{ll}{\mathrm{\lambda}}^{c}(t)=exp({\mu}_{c}+{\alpha}_{c}\phantom{\rule{0.1em}{0ex}}S(t)+{\beta}_{c}\phantom{\rule{0.1em}{0ex}}\stackrel{\sim}{n}(t)),& (c=1,\dots ,4),\end{array}$$

where *ñ*(*t*) denotes the number of spike counts across all spike trains during the previous 100 ms prior to the current time index *t*, and the parameters of individual spike trains are set as follows:

$$\begin{array}{llll}{\mu}_{1}=-3.5,& {\mu}_{2}=-4.0,& {\mu}_{3}=-3.8,& {\mu}_{4}=-3.8,\\ {\alpha}_{1}=7.0,& {\alpha}_{2}=8.0,& {\alpha}_{3}=7.6,& {\alpha}_{4}=7.6,\\ {\beta}_{1}=0.06,& {\beta}_{2}=0.05,& {\beta}_{3}=0.03,& {\beta}_{4}=0.05.\end{array}$$

For the simulated hidden latent process, the total duration is *T* = 30 s, and the number of jumps varies from 35 to 45, yielding an average occurrence rate of state transitions of about 80 min^{−1}. Furthermore, we assume that the sojourn time durations for both UP and DOWN states follow a log-normal distribution. For the UP state, the survival time length is randomly drawn from logn(−0.4005, 0.8481) (such that the mean and median value of the duration are 0.67 s and 0.96 s, respectively), with a lower bound of 0.15 s; and for the DOWN state, the survival time length is randomly drawn from logn(−1.9661, 0.6231) (such that the mean and median value of the duration are 0.14 s and 0.17 s, respectively), with a lower bound of 0.05 s.

To test the discrete-time HMM model, the spike trains are binned in 10 ms and collected by spike counts. We employed the EM algorithm with the following initialization parameters: *π*_{0} = *π*_{1} = 0.5, *P*_{00} = *P*_{11} = 0.9, *P*_{01} = *P*_{10} = 0.1. For the synthetic data, the EM algorithm typically converges within 200 iterations. The forward-backward algorithm computes all necessary sufficient statistics. Upon convergence, the Viterbi algorithm produced the ultimate state sequence output, yielding an average decoding error rate of 1.52% (averaged over 10 independent runs). In this case, since the CIF model is given (no model mismatch issue is involved), the decoding error rate is reasonably low even with the discrete-time HMM. As a comparison, we also employed the threshold-based method (Ji & Wilson, 2007; see appendix B for brief descriptions) to classify the UP and DOWN states using the simulated spike trains. It was found (see Table 3) that the discrete-time HMM method yields better performance than the threshold-based method. Figure 3 plots a snapshot of hidden state estimation obtained from the discrete-time HMM in our experiments.

A snapshot of UP and DOWN state estimation obtained from the discrete-time HMM for the simulated spike train data.

Next, we applied the MCEM algorithm to refine the latent state estimation in the continuous-time domain. Naturally, with a smaller bin size, the continuous-time model allows precisely segmenting the UP and DOWN states for identifying the location of state transition. With the initial conditions obtained from the discrete-time EM algorithm, we simulated the Markov chains for 20,000 iterations and discarded the first 1000 iterations (“burn-in” period). For the synthetic data, the MCEM algorithm converged after 20 to 30 iterations, and we were able to further improve the decoding accuracy by reducing the average error rate to 1.26%. As seen in Table 3, the continuous-time model outperformed the discrete-time HMM model in terms of the lower estimation error. However, in terms of estimating the correct number of state transitions, the HMM obtained nearly 100% accuracy in all 10 Monte Carlo trials (except for two trials that miscount two more jumps); in this sense, the HMM estimation result can be treated as a very good initial state as the input to the continuous-time semi-Markov chain model. In addition, the continuous-time model yields a 10 times greater information transmission rate (1 bit/ms) than the discrete-time model (1 bit/10 ms). We also computed the probability distribution statistics of the UP and DOWN states from both estimation methods. In the discrete-time HMM, we used the sample statistics of the UP and DOWN state durations as the estimated results. These results were also used as the initial values for the continuous-time semi-Markov process, where the MCEM algorithm was run to obtain the final estimate. The results are summarized in Table 4.

Once the estimates of {*S*(*t*)} and {*μ _{c}*,

The fitted KS plots (top row) and autocorrelation plots (bottom row) for the four simulated spike trains from one Monte Carlo experiment (dotted and dashed lines in the plots indicate the 95% confidence bounds).

Finally we also do extra simulation studies by examining the sensitivity of different methods regarding the change of two factors: the modulation gains of the hidden state and the number of observed spike trains. The first issue examines the impact of the global network activity during the UP state, that is, the *α _{c}* component appearing in λ

$$\begin{array}{llll}{\alpha}_{1}=6.7,& {\alpha}_{2}=7.8,& {\alpha}_{3}=7.3,& {\alpha}_{4}=7.3,\end{array}$$

such that each λ* _{c}* is reduced about 20% during the UP state period. It appears that the average performance of the threshold-based method degraded from the original 2.91% to 3.62% (with a trial-and-error selected threshold), while the performance of the probabilistic models remained almost unchanged. This is partly because of the fact that a correct generative model is used and the uncertainties of the hidden state were taken into account during the final estimation (see Figure 5). In the meantime, if

Snapshot illustrations of simulated synthetic spike trains and the estimated state posterior probability from the (a) HMM and (b) continuous-time semi-Markov model (b). The shaded area denotes the posterior probability of the hidden state being in an **...**

The second issue examines the estimation accuracy of the missing variable against the number of observed variables. It is expected that as more and more observations are added, the resulting discrepancy between the threshold-based method and the probabilistic models will also become smaller, since the uncertainty of the hidden state is less (or the posterior of the hidden variable is larger with more spike train observations). In our simulations, we did extra experiments by either reducing or increasing the number of simulated spike trains, followed by rechecking the results across different setups for different methods. The estimation error results are summarized in Figure 6. Specifically, for the threshold-based method, as more and more spike trains are added, its estimation error gradually improves. This is expected since the threshold selection criterion (see appendix B) heavily depends on the number of the spike train observations, and adding more spike train observations help to disambiguate the boundary between the UP and DOWN states. Meanwhile, for the probabilistic models, the estimation performance either slightly improves (in the discrete-time HMM) or remains roughly the same (in the continuous-time model). This is partly because adding more observations will also increase the number of parameters to be estimated in the continuous-time model, so the difficulty of inference also increases; whereas the HMM performance is likely to saturate quickly due to either the insufficiency of the model or the local minimum problem inherent in the EM algorithm. This observation implies that the probabilistic models are particularly valuable when the number of spike train observations is relatively small and that the simple threshold-based method becomes more and more reliable in terms of estimation accuracy—yet its performance is still worse than that of two probabilistic models. This is probably because it is difficult to find an optimal kernel smoothing parameter or the two threshold values (see appendix B). Moreover, the threshold-based method cannot produce the statistics of interest (e.g., posterior probability, transition probability).

Next, we apply our models to validate some real-world simultaneously recorded spike trains collected from one behaving rat (Vijayan, 2007), where the MUA, cortical and hippocampal EEGs, and EMG have been simultaneously recorded (see Figure 7 for a snapshot). We presented the spike train data of a single animal (in one day), recorded from the primary somatosensory cortex (SI) during SWS. Neurophysiological studies of neural spike trains and EEGs across different rats and different recording days, as well as the comparison between the cortex and hippocampus, will be presented elsewhere. In this study, 20 clearly identified cortical cells from eight tetrodes were recorded and sorted. All spiking activity from these eight tetrodes was used to determine the UP and DOWN states.

A snapshot of recordings of cortical MUA, raw cortical EEG, cortical theta wave (4–8 Hz), cortical delta wave (2–4 Hz), raw hippocampal EEG, hippocampal ripple power (more than 100 Hz), hippocampal theta wave, and EMG.

For this study, we selected about 15.7 minutes of recordings from a total of 11 (interrupted) SWS periods of one rat,^{11} where the mean ± SD length of SWS periods is 85.7 ± 35.8 s (maximum 156.1 s, minimum 30.7 s). We pulled out the multiunit spikes from eight tetrodes (no spike sorting is necessary here). For each spike train (from one tetrode), we empirically chose the following CIF model, as defined in the continuous-time domain:^{12}

$$\begin{array}{l}{\mathrm{\lambda}}^{c}(t)=exp\phantom{\rule{0.2em}{0ex}}\left({\mu}_{c}+{\alpha}_{c}\phantom{\rule{0.1em}{0ex}}{S}_{t}+{\gamma}_{c}{\int}_{0}^{t}{e}^{-{\beta}_{c}\tau}\phantom{\rule{0.1em}{0ex}}dN(t-\tau )\phantom{\rule{0.1em}{0ex}}d\tau \right)\\ \phantom{\rule{2em}{0ex}}\approx exp\phantom{\rule{0.2em}{0ex}}\left({\mu}_{c}+{\alpha}_{c}\phantom{\rule{0.1em}{0ex}}{S}_{t}+{\gamma}_{c}{\int}_{0}^{\overline{\tau}}{e}^{-{\beta}_{c}\tau}\phantom{\rule{0.1em}{0ex}}dN(t-\tau )\phantom{\rule{0.1em}{0ex}}d\tau \right)\\ \phantom{\rule{2em}{0ex}}\approx exp\phantom{\rule{0.2em}{0ex}}\left({\mu}_{c}+{\alpha}_{c}\phantom{\rule{0.1em}{0ex}}{S}_{t}+{\gamma}_{c}\phantom{\rule{0.1em}{0ex}}{\mathbf{\beta}}_{c}^{T}{\mathbf{N}}_{t-\overline{\tau}:t}\right),\end{array}$$

where the exponential decaying parameter *β _{c}* is initially set to a value that lets

Since ** θ** will be largely dependent on in the MCEM algorithm, a sensible choice of initial state

$${\mathrm{\lambda}}_{k}=exp(\mu +\alpha \phantom{\rule{0.1em}{0ex}}{S}_{k}+{\beta}_{1}{n}_{k-2:k-1}+{\beta}_{2}{n}_{k-4:k-2}+{\beta}_{3}{n}_{k-6:k-4}),$$

and *n*_{k−2:k−1} defines the number of spike counts (across all spike trains) within the previous 10 ms time interval prior to time index *k* or *t _{k}* (with bin size 10 ms). Therefore, the spiking history dependence is described by the number of spike counts in the past three history windows: 10–20 ms, 20–40 ms, and 40–60 ms. The initial parameters were set as

Real-world MUA spike trains of eight tetrodes recorded from the primary somatosensory cortex of one rat (note that each tetrode might contain varying number of single cells). (a) A selected 5 s segment of the MUA spike trains during SWS and its UP and **...**

State Estimation Discrepancy Between the Proposed Algorithms and the Threshold-Based Method for the Real-World Spike Trains Data.

Estimated Statistics of the UP and DOWN State Durations (in msec) for the Real-World Spike Train Data.

In order to choose a proper parametric model for the sojourn time duration for the UP and DOWN states, we used the state classification result from the discrete-time HMM (see Figure 9). Based on the histogram data analysis, we chose the exponential pdf as the probability model for the sojourn duration of the DOWN state and the log-normal pdf as the probability model for the sojourn duration of the UP state. We also computed their sample statistics (mean, SD) and used them as the initial parameters for the continuous-time probabilistic model. The lower bounds for the UP and DOWN state duration lengths are both set as 40 ms.

Fitting the real-world sojourn-time duration length for the DOWN and UP states, where the UP or DOWN state classification is obtained from the discrete-time HMM estimation result. (Left panels) Histograms. (Right panels) Fitting the sojourn durations **...**

Since the recording time of the real-world spike trains data is rather long (about 60 times longer than the synthetic data), the computational overhead is much greater for the MCEM algorithm. In RJMCMC sampling, 300,000 iterations were run to simulate the Markov chain,^{17} and the first 3000 iterations were discarded as the burn-in period. After that, we fed the obtained parameter estimates using the complete data set. After an additional 100 MCEM iterations, the algorithm converged (when the iterative log-likelihood increase is sufficiently small), and we obtained the final parameter estimates. With these estimates, we simulated another 1000 realizations of the semi-Markov chain and used them for the final hidden state reconstruction (see equations 3.22 and 3.23). The convergence plots of the semi-Markov chain and the MCEM algorithm are shown in Figure 10.

Convergence plot of the simulated Markov chain. (a) Trajectory of the number of state jumps (inset: the trajectory within the first 1000 iterations). (b) Trajectory of the log likelihood in running the MCEM algorithm.

Several noteworthy comments are in order:

- As a comparison, we also used the estimated hidden state {
*S*(*t*)} to fit a GLM model (using glmfit function in Matlab, Δ = 1 ms) by modeling the history dependence with eight discrete windows (1–5, 5–10, 10–15, 15–20, 20–30, 30–40, 40–50 ms). Upon fitting the GLM model, we obtained the estimated spiking history dependence coefficients for the individual spike trains (see Figure 11); as seen from the results, their curves all have an approximately exponential-decaying shape. Finally, the KS plots and autocorrelation plots are shown in Figures 12 and and13,13, respectively. Overall, the goodness of fit is quite satisfactory.Fitted KS plots of the real-world MUA spike trains (dotted lines along the diagonal indicate the 95% confidence bounds). - In comparison with the discrete-time HMM-EM method (see Tables 5 and and6),6), the continuous-time MCEM method yields less frequent state jumps. As a consequence, the MCEM result is accompanied with less short sojourn durations since it allows a potential merge of neighboring sojourns during the RJMCMC procedure (see move type 3 in appendix A) that considers the joint likelihoods of the neighboring sojourns. Furthermore, in comparison with the threshold-based method, the continuous-time semi-Markov model is more powerful in representing the uncertainty as well as inferring the underlying neural dynamics. Its estimated model parameters (the shape of the transition and duration probability density) might reveal the some neural mechanism or physiology behind the transitory dynamics (e.g., the inhibitory period after the last transition event). In our experiments, the MCEM method obtained the lowest estimate of the number of state transitions (see Table 5), yielding a transition occurrence rate about 82 min
^{−1}(slightly greater than the rate reported in visual cortex; Ji & Wilson, 2007). Despite its flexibility, the MCEM method is much more computationally intensive than the HMM-EM method. The implementation of HMM is simpler and has a faster convergence speed (the EM algorithm typically converged within 100–200 steps, although the local maximum problem remains). In contrast, the MCEM method relies on simulation of state sequences at every iteration and is required to evaluate the point-process joint likelihood (see equation 3.5) for each possible move. The calculation of every single spike's likelihood contribution is time-consuming and is a bottleneck in the computation. In addition, the convergence speed of the MCEM algorithm becomes slower in the end. This is because when the Markov chain gradually approaches the equilibrium, many moves are rejected and a small modification of the hidden state or the parameterwould not change very much in terms of the joint log likelihood of the data. This can be seen in the flat plateau of the log-likelihood curve near the end of the convergence in Figure 10b. For the real-world spike train data, the simulation of Markov chain needs to be very long in order to pass through all of move possibilities, especially if the number of potential state transitions is large (here, on the order of thousands). Even so, no optimum stop criterion can be provided with a guarantee; hence, the trade-off between the computation cost and the estimation accuracy remains in any Monte Carlo optimization problem.*θ* - To compare these three classification methods, we also computed the cortical EEG averages (mean ± standard error of mean) triggered by the their UP state starting and ending time stamps, respectively (recall note 15). The results are compared in Figure 14. Although the figures all look similar (due to large timescale), on close examination of the plots, it appears that the EEG averages from the MCEM method result in a more accurate detection of the onset of the UP state.
- Given the MUA spike train data analyzed for the behaving rat, the latent process
*S*stays longer during the UP state than the DOWN state, indicating that the population neurons remained dominantly active during SWS._{t}^{18}Whether these neuronal firing patterns contain any “memory replay” compared with the earlier firing pattern during the RUN behavior will be the subject of future investigation.

As observed from the analysis of the recorded multiple spike trains, the somatosensory cortical neurons undergo near-synchronous transitions between the UP and DOWN states, from every tens of milliseconds to a few seconds or so. The neuronal firing activities inside the UP state are mainly characterized by duration length and spiking rate. It would be interesting to see if there are any firing “patterns” embedded in these UP-state periods, in either multiunit or single-unit activity (e.g., Luczak et al., 2007; Ji & Wilson, 2007).

On estimating the latent state process, we obtain two features: one is the log (natural basis) of duration, the other the number of spikes per second per tetrode. After collecting these two features from the experimental data shown earlier, we resort to the clustering tool for feature visualization. The soft-clustering algorithm we use here is a greedy clustering algorithm (Verbeek, Vlassis, & Kröse, 2003) based on fitting a gaussian mixture model. In the greedy learning algorithm, the optimal number of mixtures is automatically determined during the learning process. The algorithm was run 20 times, and the best result (with the highest log likelihood) was chosen (see Figure 15). Hence, the neuronal firing pattern can be characterized by a finite number of parameters (mean and covariance), from which we can compare the different neuronal firing across different animals, different days, different brain regions (cortex versus hippocampus), different sleep stages (Pre-RUN versus Post-RUN), and so on. Since this article mainly focuses on the statistical modeling methodology, further quantitative data analysis and its link to neurophysiology will be presented and discussed elsewhere.

When investigating the real-world recording spike trains data, an important task of computational modeling is to identify the functional form of CIF (see equations 2.3 and 3.3 or 3.4). Unlike the simulated spike train data, the CIF of the real-world spike trains is not known in advance and needs to be identified before the probabilistic inference is carried on. In this article, for simplicity, we have presumed that the CIF can be approximated by a GLM (McCullagh & Nelder, 1989; Truccolo et al., 2005) which includes the hidden state and firing history as variables. We have also assumed that the spike trains across tetrodes are mutually independent. Most likely, the neuronal spiking is influenced not only by its own firing history but also by the other spike trains. Despite these simplifications, we think the models presented here still serve as a valuable first step to represent the temporal dynamics of the observed MUA spike trains. Above all, to quote George Box, “All models are wrong, but some are useful.” In addition, theoretically, given sufficient data and under some regular conditions (Pawitan, 2001), the MLE for a GLM is consistent even when the model (e.g., the link function) is chosen incorrectly (Paninski, 2004). From a practical point of view, we have ignored the possible estimation bias here.

For the real-world spike train data, there is no ground truth available for . A common practice is to select a small data set, and the UP and DOWN states are first identified by the threshold-based or the HMM method and reconfirmed by human inspection (with extra help of EEG measurements). Based on that information and the assumption that the CIF might be identified by a GLM, we can use the GLM fit for model selection. The model fit would be shown by the deviance and validated by the KS test. If the KS plot falls inside the 95% confidence intervals, it indicates that the CIF model fits well with the given spike train data. Unfortunately, in practice, this is not always the case given only a limited amount of the observed data and an economical size of parameter space for the GLM, indicating a lack of discrepancy between the model and the data.

In this article, the sojourn time survival function for the UP and DOWN states is assumed and modeled as being continuous and parametric. More generally, if the histogram analysis of the data indicates that the true distribution is far away from any parametric (exponential or nonexponential) probability density, we might also employ a discretized probability model for the survival probability of the sojourn time. Specifically, let [*a*, *b*] denote the range for the sojourn time; we may split the range evenly into *L* bins and model the discrete probability at each piece as *P _{i}*(

Although in this article, we have exclusively discussed a two-state (0 and 1) Markov model, it is easy to extend the framework to a general *N*-state Markovian model. Indeed, it is quite possible to add an intermediate state between DOWN and UP as the transitory state. The reason for this argument arises from the observation from the real-world MUA spike trains: in many circumstances, there is no clear evidence that the MUA are either up-modulated or completely silent. Nor does the EEG in any obvious fashion help to differentiate these ambiguous periods. Unfortunately, how to define a transitory state presumably remains a nontrivial problem, and no attempt was made to explore this direction here.

In the MCEM algorithm discussed above, we consider Monte Carlo sampling only in the E-step, whereas the M-step uses a standard deterministic optimization procedure. Potentially we can use the MCMC procedure for both state and parameter estimation {^{(k)}, *θ*^{(k)}} for *k* = 1, 2, …, *M*, from which we can obtain the full posterior distribution *p*(, ** θ** ) instead of the marginal

There are several obvious assumptions used in our statistical modeling approach. First, the statistical mutual independence is assumed across neural spike trains, without explicit modeling of the recurrent network activity.^{19} Second, the observed data are assumed to be stationary in the sense that the state transition and the CIF parameters are estimated from a long period of spike train recordings when those parameters are assumed to remain constant. Finally, an identical CIF model is also assumed across all neural spike trains. Nevertheless, these limitations by no means diminish the value of the models and methods proposed here, since this article can be treated as a pilot study toward the ultimate modeling goal.

We have considered several future investigation efforts in the line with the work reported here. From a computational modeling point of view, we can extend the model by including continuous-valued observations (e.g., Srinivasan, Eden, Mitter, & Brown, 2007). For instance, the LFP or EEG measurements have been simultaneously recorded from both cortical and hippocampal regions. The detection of K-complexes from the cortical EEG and detection of the sharp wave-ripple complexes (SPW-Rs) from the hippocampal EEG would be beneficial to the identification of UP and DOWN states (Siriota et al., 2003; Battaglia et al., 2004; Ji & Wilson, 2007). Furthermore, it is possible to build a more complex SSM by allowing both continuous- and discrete-valued hidden variables—for instance, a switching SSM where the two latent processes interact with each other (e.g., Ghahramani & Hinton, 2000; Srinivasan et al., 2007).

From a neurophysiological point of view, we are also interested in studying the population neuronal firing causality and latency between the cortex and hippocampus, as well as their spike patterns relevant to the rat's RUN behavior. It is well known that sleep is a key factor that may promote the transfer of memory from the hippocampus to the cortex, and during sleep, replays in these two regions occur synchronously (Mehta, 2007; Ji & Wilson, 2007). Based on the extracelluar recordings (MUA and LFP), it would be interesting to investigate the UP and DOWN activities during multiple processing stages and sites in the cortico-hippocampal circuits, and the UP and DOWN state transitions can be used to quantify the functional connectivity of the neural circuits. An in-depth exploration of the LFP (e.g., K-complexes, SPW-Rs), with single- and multiunit firing activities in both cortical and hippocampal regions, would be key to understanding memory consolidation during sleep.

We have developed both discrete- and continuous-time probabilistic models and inference algorithms for inferring population neurons' UP and DOWN states, using the MUA spike trains. Compared to the deterministic threshold-based method (see appendix B) used in the literature, our probabilistic paradigms offer a stochastic approach to analyze the spike trains as well as provide a generative model to simulate the spike trains. Furthermore, the hidden state estimate is treated as a random variable with certain uncertainty (encoded by its posterior probability), whereas the threshold-based method cannot represent such uncertainties.

The discrete-time HMM provides a reasonable state estimate with a rather fast computing speed. However, the model is restricted to locate the UP and DOWN state transition with a relatively large time bin size (here, 10 ms). Another drawback of the HMM is that it is prone to get stuck in the local solution; in other words, the number of state transitions typically remains unchanged after a few EM iterations. In contrast, one advantage of the continuous-time probabilistic model is that it allows estimating the exact locations of state jumps. By using the RJMCMC sampling technique, the number of jumps as well as the locations of the jumps are allowed to be modified during the inference procedure, which offers a way to escape from the local minimum and tackle the model selection problem. The only shortcoming of the RJMCMC method is its greater computational complexity and the tremendous demand of computational power. In practice, the number of steps required to reach equilibrium often demands sensible initial conditions and diagnostic monitoring during the convergence process. We found that the inference solution obtained from the discrete-time HMM yields a reasonable initial state sequence to feed into the MCMC sampler. Once the number and the locations of the state jumps are determined, we can use the Monte Carlo statistics to infer the latent process. For practitioners who are more concerned about the processing speed than the accuracy of hidden state estimation, the discrete-time HMM might offer a reasonable guess (depending on the data characteristics). Nevertheless, no claim is made here that our proposed models and algorithms could always produce a correct UP or DOWN state classification result. The final justification might still rely on careful human inspection, but our estimation results certainly provide a good starting point with high confidence for follow-up.

In analyzing the simultaneously recorded spike trains, identifying an accurate CIF model is crucial to the probabilistic inference. However, there is no free-lunch-recipe to obtain the ultimate answer. In practice, it often requires some empirical data analysis (Brown, Kass, & Mitra, 2004; Kass, Ventura, & Brown, 2005), such as the interspike interval histogram, firing-rate trend dependence analysis, or fitting a GLM (Truccolo et al., 2005) or a non-Poisson model (Barbieri, Quirk, Frank, Wilson, & Brown, 2001; Brown et al., 2003).

Finally, we hope that our proposed statistical models can shed some light on developing physiologically plausible mechanistic models. A better understanding of the transition mechanism between the UP and DOWN states would also help to improve the statistical description of the data.

The research reported here was supported by NIH/NIDA grant R01-DA015644 to E.N.B. and M.A.W., an NIH Director Pioneer Award DP1-OD003646 to E.N.B., and an NIH/NHLBI grant R01-HL084502 to Z.C. and R.B.S.V. was supported by NIH institutional NRSA grant T32 HL07901. We thank R. Haslinger, D. Ji, and D. Nguyen for some helpful discussions. We also thank two anonymous reviewers for their valuable comments that helped to improve the presentation of this article.

Reversible-jump MCMC (RJMCMC) is a Metropolis-Hastings-type sampling algorithm with a transdimensional proposal. The term *reversible jump* refers to the ability of the Markov chain to “jump” between two parameter spaces that have different dimensions. The sampler explores the state spaces of variable dimensionality by various modifications through the Metropolis-Hastings proposals. Each Metropolis-Hastings proposal has a respective reverse proposal. For every proposal, the acceptance probability is computed according to a certain rule. The goal of the RJMCMC algorithm is to design efficient moves that allow the simulated Markov chain to reach the desired equilibrium (posterior) distribution within a reasonable amount of time. Unlike the fixed-dimensional MCMC algorithms, RJMCMC allows the state transitions to occur between spaces with different dimensions, say, → ′, where *dim*() ≠ *dim*(′).

In this appendix, we present a detailed elaboration of the RJMCMC algorithm in the context of simulating a continuous-time (semi-) Markov chain for the problem. In the following, we use similar notations and formulations of Ball et al. (1999).

Let *n* denote the number of jumps between two distinct discrete states in the latent process {*S*(*t*); 0 ≤ *t* ≤ *T*}, where *S*(*t*) {0,1}. Let = (*n*, ** τ**,

The following seven types of moves are considered in the Metropolis-type proposal:

- Move a boundary between two successive sojourns of . First, decide which boundary to move by sampling
*j*uniformly from {0, 1, …,*n*− 1}. Let*χ*=_{j}*i*_{1},*χ*_{j+1}=*i*_{2}, and then we have two alternative sampling options:^{20}- Sample
*u*from a uniform distribution (*a*_{i1},*τ*+_{j}*τ*_{j+1}−*a*_{i2}), where*a*_{i1},*a*_{i2}denote the lower-bound constraints of the sojourn time of states*χ*and_{j}*χ*_{j+1}, respectively. The proposal ′ is obtained by moving the boundary between the*j*th and (*j*+ 1)st sojourns from*ν*_{j+1}to*ν*+_{j}*u*. In this case, ′ = (*n*′,′,*τ*′), where ${\tau}_{j}^{\prime}=u,{\tau}_{j+1}^{\prime}={\tau}_{j}+{\tau}_{j+1}-u$.*χ* - Sample
*u*from a gaussian distribution (0,*σ*^{2}), where*σ*^{2}denotes the (user-specified) variance parameter. The proposal ′ is obtained by moving the boundary between the*j*th and (*j*+ 1)st sojourns from*ν*_{j+1}to*ν*_{j+1}+*u*. In this case, ′ = (*n*′,′,*τ*′), ${\tau}_{j}^{\prime}={\tau}_{j}+u,{\tau}_{j+1}^{\prime}={\tau}_{j+1}-u$,*χ*

For both sampling options,*n*′ =*n*, ${\tau}_{l}^{\prime}={\tau}_{l}\phantom{\rule{0.2em}{0ex}}(l\ne j,j+1)$, and ${\chi}_{l}^{\prime}={\chi}_{l}\phantom{\rule{0.2em}{0ex}}(l=1,2,\dots ,{n}^{\prime})$. - Insert a sojourn in one of the existing sojourns of . First, sample
*l** uniformly from {0, 1, …,*n*}, and let*i*=*χ*_{l*}. Determine the state for the inserted sojourn by sampling*j*from the probability_{1}(*j**i*,) (*θ**j*≠*i*). In the two-state space,_{1}(*j**i*) = 1 (i.e., deterministic). Sample*u*from a uniform distribution (*a*,_{i}*τ*_{l*}−*a*) (where_{i}*a*denotes the lower bound of the sojourn time for state_{i}*i*), and then sample*v*from the censored version of the exponential (or log-normal or inverse gaussian) distribution with parameters**θ**_{j}*r*(or_{j}{**θ**_{j}*μ*,_{j}*σ*} for log normal, or_{j}{**θ**_{j}*μ*,_{j}*s*} for the inverse gaussian) truncated at_{j}*τ*_{l}_{*}−*u*−*a*, that is, from the distribution that has the following conditional pdf (_{i}*v**u*)$$\begin{array}{l}\stackrel{\sim}{p}\left(v\mid u\right)=\frac{{p}_{v}({\mathbf{\theta}}_{j};v)}{{F}_{v}\phantom{\rule{0.1em}{0ex}}({\mathbf{\theta}}_{j};{\tau}_{l\ast}-u-{a}_{i})-{F}_{v}\phantom{\rule{0.1em}{0ex}}({\mathbf{\theta}}_{j};{a}_{j})}\\ \phantom{\rule{4.2em}{0ex}}\times \phantom{\rule{0.1em}{0ex}}({a}_{j}\le v\le {\tau}_{l\ast}-u-{a}_{i}),\end{array}$$(A.1)where*a*denotes the lower bound of the sojourn time for state_{j}*j*. Then a sojourn in state*j*(*j*≠*i*) of length*v*is inserted in the*l**th sojourn of . And the new proposal ′ = (*n*′,′,*τ*′) is given by*χ**n*′ =*n*+ 2, $({\tau}_{l}^{\prime},{\chi}_{l}^{\prime})=({\tau}_{l},{\chi}_{l})(l=0,1,l\ast -1)$, ${\tau}_{l\ast}^{\prime}=u$, ${\tau}_{l\ast +1}^{\prime}=v$, ${\tau}_{l\ast +2}^{\prime}={\tau}_{l\ast}-u-v$, $({\tau}_{l}^{\prime},{\chi}_{l}^{\prime})=({\tau}_{l-2},{\chi}_{l-2})(l=l\ast +3,l\ast +4,\dots ,n+2)$, ${\chi}_{l\ast}^{\prime}={\chi}_{l\ast +2}^{\prime}={\chi}_{l\ast}(=i)$, ${\chi}_{l\ast +1}^{\prime}=j$. Note that if*τ*_{l}_{*}−*u*− 2*a*<_{i}*a*, move 2 will not be executed._{j} - Delete an intermediate sojourn of whose two adjacent sojourns are in the same state (namely, merge one sojourn with its neighboring sojourns). Sample
*l** uniformly from {0, 1, …,*n*− 1}, and delete the*l**th sojourn of . The new ′ is given by*n*′ =*n*− 2, $({\tau}_{l}^{\prime},{\chi}_{l}^{\prime})=({\tau}_{l},{\chi}_{l})(l=0,1,\dots ,l\ast -2)$, ${\tau}_{l\ast -1}^{\prime}={\tau}_{l\ast -1}+{\tau}_{l\ast}+{\tau}_{l\ast +1},{\chi}_{l\ast -1}^{\prime}={\chi}_{l\ast -1}$, $({\tau}_{l}^{\prime},{\chi}_{l}^{\prime})=({\tau}_{l+2},{\chi}_{l+2})\phantom{\rule{0.1em}{0ex}}(l=l\ast ,l\ast +1,\dots ,n-2)$. - Split the first sojourn of . First, let
*i*=*χ*_{0}, and sample*u*from the uniform distribution (*a*,_{j}*τ*_{0}−*a*), where_{i}*a*and_{i}*a*denote the lower-bound constraints of the sojourn time for states_{j}*i*and*j*, respectively (*j*≠*i*; in the two-state case, the choice of*j*is deterministic). The new proposal ′ is given by*n*′ =*n*+ 1, ${\tau}_{0}^{\prime}=u$, ${\chi}_{0}^{\prime}=j$, ${\tau}_{1}^{\prime}={\tau}_{0}-u$, ${\chi}_{1}^{\prime}=i$, $({\tau}_{l}^{\prime},{\chi}_{l}^{\prime})=({\tau}_{l-1},{\chi}_{l-1})\phantom{\rule{0.1em}{0ex}}(l=1,2,\dots ,n-1)$. Note that if*τ*_{0}<*a*+_{i}*a*, move 4 will not be executed._{j} - Delete the first sojourn of . This move is deterministic. The new proposal ′ is given by
*n*′ =*n*− 1, ${\tau}_{0}^{\prime}={\tau}_{0}+{\tau}_{1}$, ${\chi}_{0}^{\prime}={\chi}_{1}$, $({\tau}_{l}^{\prime},{\chi}_{l}^{\prime})=({\tau}_{l+1},{\chi}_{l+1})\phantom{\rule{0.1em}{0ex}}(l=1,2,\dots ,n-1)$. - Split the last sojourn of . First, let
*i*=*χ*, and sample_{n}*u*from a uniform distribution (*a*,_{i}*τ*), where_{n}*a*denotes the lower-bound constraint of the sojourn time for state*i*. Next, sample*j*from the distribution_{3}(*j**i*) (*j*≠*i*; in the two-state state space, the choice of*j*is deterministic). The new proposal ′ is given by*n*′ =*n*− 1, $({\tau}_{l}^{\prime},{\chi}_{l}^{\prime})=({\tau}_{l},{\chi}_{l})\phantom{\rule{0.2em}{0ex}}(l=1,2,\dots ,n-1)$, ${\tau}_{n}^{\prime}=u$, ${\chi}_{n}^{\prime}={\chi}_{n}$, ${\tau}_{n+1}^{\prime}={\tau}_{n}-u$, ${\chi}_{n+1}^{\prime}=j$. Note that if*τ*<_{n}*a*, move 6 will not be executed._{i} - Delete the last sojourn of . This move type is deterministic, and the new proposal ′ is given by
*n*′ =*n*− 1, $({\tau}_{l}^{\prime},{\chi}_{l}^{\prime})=({\tau}_{l},{\chi}_{l})\phantom{\rule{0.3em}{0ex}}(l=1,2,\dots ,n-2)$, ${\tau}_{n-1}^{\prime}={\tau}_{n-1}+{\tau}_{n}$, ${\chi}_{n-1}^{\prime}={\chi}_{n-1}$.

Of the above moves, moves 2 to 7 are discrete and transdimensional, and move 1 is continuous. For the convenience of technical treatment, we classify the seven moves into three classes: *A* = {1}, *B* = {2, 4, 6}, *C* = {3, 5, 7}, which correspond to *boundary move*, *insertion*, and *deletion*, respectively. Specifically, the individual moves in class B are the respective inverses of those in class C.

The move types chosen for the update are obtained by sampling independently from the distribution of *q _{i}* (

To compute the acceptance probabilities: for *i* = 1, 2, …, 7, let *R*^{(i)}( → ′; ** θ**) be the density of the transition kernel associated with the proposal ′ for the move type (

$$R(\mathcal{S}\to \mathcal{S}\prime ;\mathbf{\theta})=\sum _{i=1}^{7}{q}_{i}{R}^{(i)}(\mathcal{S}\to \mathcal{S}\prime ;\mathbf{\theta}).$$

Among the seven move proposals, move type 1 does not change the dimension of , and its acceptance probability is = min(1, ), where

$$\mathcal{B}=\frac{p(\mathcal{S}\prime ,\mathbf{\theta},y)R(\mathcal{S}\prime \to \mathcal{S};\mathbf{\theta})}{p(\mathcal{S},\mathbf{\theta},y)R(\mathcal{S}\to \mathcal{S}\prime ;\mathbf{\theta})}.$$

(A.2)

On the other hand, move types 2 to 7 change the dimension of , and their acceptance probabilities are given by = min(1, ), where

$$\mathcal{B}=\frac{p(\mathcal{S}\prime ,\mathbf{\theta},y)R(\mathcal{S}\prime \to \mathcal{S};\mathbf{\theta})}{p(\mathcal{S},\mathbf{\theta},y)R(\mathcal{S}\to \mathcal{S}\prime ;\mathbf{\theta})}|\mathcal{J}\phantom{\rule{0.2em}{0ex}}|,$$

(A.3)

where || denotes the determinant of the Jacobian. The Jacobian measures the ratio of the volume of two state spaces.

For presentation convenience, we often factorize the acceptance probability as = _{1}_{2}_{3} (prior ratio × likelihood ratio × proposal probability ratio),^{21} where

$$\begin{array}{ccc}{\mathcal{B}}_{1}=\frac{p\left(\mathcal{S}\prime |\mathbf{\theta}\right)}{p\left(\mathcal{S}|\mathbf{\theta}\right)},& {\mathcal{B}}_{2}=\frac{p\left(y|\mathcal{S}\prime ,\mathbf{\theta}\right)}{p\left(y|\mathcal{S},\mathbf{\theta}\right)},& {\mathcal{B}}_{3}=\frac{R\left(\mathcal{S}\prime \to \mathcal{S};\mathbf{\theta}\right)}{R\left(\mathcal{S}\to \mathcal{S}\prime ;\mathbf{\theta}\right)}.\end{array}$$

(A.4)

In the context of two-state continuous-time (semi-) Markov chain that is used in this article, we compute these three probability ratios for these seven moves in the following.

For the first sampling option, let *i*_{1} = *χ _{j}* and

$$\begin{array}{l}{\mathcal{B}}_{1}^{(1)}=\frac{\stackrel{\sim}{p}({\mathbf{\theta}}_{{i}_{\text{1}}};u)}{\stackrel{\sim}{p}({\mathbf{\theta}}_{{i}_{\text{1}}};{\tau}_{j})}\frac{\stackrel{\sim}{p}({\mathbf{\theta}}_{{i}_{2}};{\tau}_{j+1}+{\tau}_{j}-u)}{\stackrel{\sim}{p}({\mathbf{\theta}}_{{i}_{\text{2}}};{\tau}_{j+1})}\phantom{\rule{1em}{0ex}}(\text{option}\phantom{\rule{0.2em}{0ex}}1)\\ {\mathcal{B}}_{2}^{(1)}=\prod _{c=1}^{C}\phantom{\rule{0.3em}{0ex}}\{exp({\mathrm{\lambda}}_{{i}_{1}}^{c}+{\mathrm{\lambda}}_{{i}_{2}}^{c}-{\stackrel{\sim}{\mathrm{\lambda}}}_{{i}_{1}}^{c}-{\stackrel{\sim}{\mathrm{\lambda}}}_{{i}_{2}}^{c})\\ \phantom{\rule{3.8em}{0ex}}\times \phantom{\rule{0.2em}{0ex}}\frac{{\prod}_{k=1}^{{\stackrel{\sim}{y}}_{{i}_{1}}^{c}}{\mathrm{\lambda}}^{c}({t}_{k},{\mathcal{S}}_{{\nu}_{j}:{\nu}_{j}+u}^{\prime}){\prod}_{k=1}^{{\stackrel{\sim}{y}}_{{i}_{2}}^{c}}{\mathrm{\lambda}}^{c}({t}_{k},{\mathcal{S}}_{{\nu}_{j}+u:{v}_{j+2}}^{\prime})}{{\prod}_{k=1}^{{y}_{{i}_{1}}^{c}}{\mathrm{\lambda}}^{c}({t}_{k},{\mathcal{S}}_{{\nu}_{j}:{\nu}_{j+1}}){\prod}_{k=1}^{{y}_{{i}_{2}}^{c}}{\mathrm{\lambda}}^{c}({t}_{k},{\mathcal{S}}_{{\nu}_{j+1}:{\nu}_{j+2}})}\}\phantom{\rule{1em}{0ex}}(\text{option}\phantom{\rule{0.2em}{0ex}}1)\\ {\mathcal{B}}_{3}^{(1)}=\frac{R(\mathcal{S}\prime \to \mathcal{S};\mathbf{\theta})}{R(\mathcal{S}\to \mathcal{S}\prime ;\mathbf{\theta})}=1,\end{array}$$

where (** θ**; ·) is defined by the censored version of the parametric pdf of the sojourn time (for either UP or DOWN state). The ratios for the second sampling option are conceptually similar, and we show only
${\mathcal{B}}_{1}^{(1)}$ here:

$${\mathcal{B}}_{1}^{(1)}=\frac{\stackrel{\sim}{p}({\mathbf{\theta}}_{{i}_{1}};{\tau}_{j}+u)}{\stackrel{\sim}{p}({\mathbf{\theta}}_{{i}_{1}};{\tau}_{j})}\frac{\stackrel{\sim}{p}({\mathbf{\theta}}_{{i}_{2}};{\tau}_{j+1}-u)}{\stackrel{\sim}{p}({\mathbf{\theta}}_{{i}_{2}};{\tau}_{j+1})}\phantom{\rule{1em}{0ex}}(\text{option}\phantom{\rule{0.2em}{0ex}}2).$$

Let *i*_{1} = *i*, *i*_{2} = *j*, and let
${y}_{{i}_{1}}^{c}$,
${\stackrel{\sim}{y}}_{{i}_{1}}^{c}$,
${\stackrel{\sim}{y}}_{{i}_{2}}^{c}$,
${\widehat{y}}_{{i}_{1}}^{c}$ denote the number of spike counts for spike train *c* during the time intervals [*ν _{l}*

$$\begin{array}{l}{\mathcal{B}}_{1}^{(2)}=\frac{\stackrel{\sim}{p}({\mathbf{\theta}}_{{i}_{1}};u)\stackrel{\sim}{p}({\mathbf{\theta}}_{{i}_{2}};v)\stackrel{\sim}{p}({\mathbf{\theta}}_{{i}_{1}};{\tau}_{l\ast}-u-v)}{\stackrel{\sim}{p}({\mathbf{\theta}}_{{i}_{1}};{\tau}_{l\ast})}\\ {\mathcal{B}}_{2}^{(2)}=\prod _{c=1}^{C}\phantom{\rule{0.3em}{0ex}}\{exp({\mathrm{\lambda}}_{{i}_{1}}^{c}-{\stackrel{\sim}{\mathrm{\lambda}}}_{{i}_{1}}^{c}-{\stackrel{\sim}{\mathrm{\lambda}}}_{{i}_{2}}^{c}-{\widehat{\mathrm{\lambda}}}_{{i}_{1}}^{c})\\ \phantom{\rule{1em}{0ex}}\times \frac{{\prod}_{k=1}^{{\stackrel{\sim}{y}}_{{i}_{1}}^{c}}{\mathrm{\lambda}}^{c}({t}_{k},{\mathcal{S}}_{{\nu}_{l\ast}:{\nu}_{l\ast}+u}^{\prime}){\prod}_{k=1}^{{\stackrel{\sim}{y}}_{{i}_{2}}^{c}}{\mathrm{\lambda}}^{c}({t}_{k},{\mathcal{S}}_{{\nu}_{l\ast}+u:{\nu}_{l\ast}+u+v}^{\prime}){\prod}_{k=1}^{{\widehat{y}}_{{i}_{1}}^{c}}{\mathrm{\lambda}}^{c}({t}_{k},{\mathcal{S}}_{{\nu}_{l\ast}+u+v:{\nu}_{l\ast +1}}^{\prime})}{{\prod}_{k=1}^{{y}_{{i}_{1}}^{c}}{\mathrm{\lambda}}^{c}({t}_{k},{\mathcal{S}}_{{\nu}_{l\ast}:{\nu}_{l\ast +1}})}\}\\ {\mathcal{B}}_{3}^{(2)}=\frac{R(\mathcal{S}\prime \to \mathcal{S};\mathbf{\theta})}{R(\mathcal{S}\to \mathcal{S}\prime ;\mathbf{\theta})}=\frac{\frac{{q}_{3}}{n+1}}{\frac{{q}_{2}}{n+1}\frac{1}{{\tau}_{l\ast}-2{a}_{i}}\stackrel{\sim}{p}(v\mid u)}=\frac{{\tau}_{l\ast}-2{a}_{i}}{\stackrel{\sim}{p}(v\mid u)},\end{array}$$

where in computing
${\mathcal{B}}_{3}^{(2)}$, we have used *q*_{2} = *q*_{3} and
$p(u)\frac{1}{{\tau}_{l\ast}-2{a}_{i}}$, and (*v* *u*) is given by equation A.1. Since move type 2 changes the dimension of the , we need to compute the associated Jacobian's determinant. Note that ** τ**′ is obtained from

$$\begin{array}{l}\left|\mathcal{J}\phantom{\rule{0.2em}{0ex}}\right|=\left|\frac{\partial {\mathcal{S}}^{\prime}(u,v,{\tau}_{l}^{\ast}-{a}_{i}-u-v))}{\partial \mathcal{S}(u,v,{\tau}_{l}^{\ast})}\right|\\ \phantom{\rule{1.2em}{0ex}}=\left|\begin{array}{ccc}\frac{\partial u}{\partial {\tau}_{i\ast}}& \frac{\partial u}{\partial u}& \frac{\partial u}{\partial v}\\ \frac{\partial v}{\partial {\tau}_{i\ast}}& \frac{\partial v}{\partial u}& \frac{\partial v}{\partial v}\\ \frac{\partial ({\tau}_{l\ast}-{a}_{i}-u-v)}{\partial {\tau}_{l\ast}}& \frac{\partial ({\tau}_{l\ast}-{a}_{i}-u-v)}{\partial u}& \frac{\partial ({\tau}_{l\ast}-{a}_{i}-u-v)}{\partial v}\end{array}\right|=\left|\begin{array}{ccc}0& 1& 0\\ 0& 0& 1\\ 1& -1& -1\end{array}\right|=1.\end{array}$$

Moves 3 and 2 are inverses of each other. Let *i*_{1} = *χ _{l}*

$$\begin{array}{l}{\mathcal{B}}_{1}^{(3)}=\frac{\stackrel{\sim}{p}({\mathbf{\theta}}_{{i}_{1}};{\tau}_{l\ast -1}+{\tau}_{l\ast}+{\tau}_{l\ast +1})}{\stackrel{\sim}{p}({\mathbf{\theta}}_{{i}_{1}},{\tau}_{l\ast -1})\stackrel{\sim}{p}({\mathbf{\theta}}_{{i}_{2}},{\tau}_{l\ast})\stackrel{\sim}{p}({\mathbf{\theta}}_{{i}_{1}},{\tau}_{l\ast +1})}\\ {\mathcal{B}}_{2}^{(3)}=\prod _{c=1}^{C}\phantom{\rule{0.3em}{0ex}}\{exp({\mathrm{\lambda}}_{{i}_{1}}^{c}+{\mathrm{\lambda}}_{{i}_{2}}^{c}+{\widehat{\mathrm{\lambda}}}_{{i}_{1}}^{c}-{\stackrel{\sim}{\mathrm{\lambda}}}_{{i}_{1}}^{c})\\ \phantom{\rule{2.7em}{0ex}}\times \phantom{\rule{0.2em}{0ex}}\frac{{\prod}_{k=1}^{{\stackrel{\sim}{y}}_{{i}_{1}}^{c}}{\mathrm{\lambda}}^{c}({t}_{k},{\mathcal{S}}_{{\nu}_{l\ast -1}:{\nu}_{l\ast +2}}^{\prime})}{{\prod}_{k=1}^{{y}_{{i}_{1}}^{c}}{\mathrm{\lambda}}^{c}({t}_{k},{\mathcal{S}}_{{\nu}_{l\ast -1}:{\nu}_{l\ast}}){\prod}_{k=1}^{{y}_{{i}_{2}}^{c}}{\mathrm{\lambda}}^{c}({t}_{k},{\mathcal{S}}_{{\nu}_{l\ast}:{\nu}_{l\ast +1}}){\prod}_{k=1}^{{\widehat{y}}_{{i}_{1}}^{c}}{\mathrm{\lambda}}^{c}({t}_{k},{\mathcal{S}}_{{\nu}_{l\ast +1}:{\nu}_{l\ast +2}})}\}\\ {\mathcal{B}}_{3}^{(3)}=\frac{\frac{{q}_{2}}{n+1}\frac{1}{({\tau}_{l\ast -1}+{\tau}_{l\ast}+{\tau}_{l\ast +1})}{\stackrel{\sim}{p}}_{v}({\tau}_{l\ast}\mid {\tau}_{l\ast +1})}{\frac{{q}_{3}}{n+1}}=\frac{{\stackrel{\sim}{p}}_{v}({\tau}_{l\ast}\mid {\tau}_{l\ast +1})}{{\tau}_{l\ast -1}+{\tau}_{l\ast}+{\tau}_{l\ast +1}}\end{array}$$

where *q*_{2} = *q*_{3}, and * _{v}*(

$${\stackrel{\sim}{p}}_{v}({\tau}_{l\ast}\mid {\tau}_{l\ast +1})=\frac{p({\mathbf{\theta}}_{{i}_{2}};{\tau}_{l\ast})}{{\int}_{{a}_{{i}_{2}}}^{{\tau}_{l\ast}+{\tau}_{l\ast +1}}p({\mathbf{\theta}}_{{i}_{2}};z)\phantom{\rule{0.2em}{0ex}}dz}.$$

Let *i*_{1} = *j*, *i*_{2} = *i*, and let
${\mathrm{\lambda}}_{{i}_{2}}^{c}={\int}_{0}^{{\nu}_{1}}{\mathrm{\lambda}}^{c}(t;{S}_{0:{\nu}_{1}}={i}_{2})\phantom{\rule{0.2em}{0ex}}dt$,
${\stackrel{\sim}{\mathrm{\lambda}}}_{{i}_{2}}^{c}={\int}_{0}^{u}{\mathrm{\lambda}}^{c}(t;{S}_{0:u}={i}_{2})\phantom{\rule{0.2em}{0ex}}dt$,
${\stackrel{\sim}{\mathrm{\lambda}}}_{{i}_{1}}^{c}={\int}_{u}^{{\nu}_{1}}{\mathrm{\lambda}}^{c}(t;{S}_{u:{\nu}_{1}}={i}_{1})\phantom{\rule{0.2em}{0ex}}dt$, and let
${y}_{{i}_{2}}^{c}$,
${\stackrel{\sim}{y}}_{{i}_{2}}^{c}$, and
${\stackrel{\sim}{y}}_{{i}_{1}}^{c}$ denote the number of spike counts for spike train *c* observed within the intervals [0, *ν*_{1}], [0, *u*], and [*u*, *ν*_{1}], respectively. Let *π*_{i1} and *π*_{i2} denote the prior probabilities of the initial sojourn in state *i*_{1} and *i*_{2}, respectively. Then we have

$$\begin{array}{l}{\mathcal{B}}_{1}^{(4)}=\frac{{\pi}_{{i}_{1}}}{{\pi}_{{i}_{2}}}\phantom{\rule{0.2em}{0ex}}\frac{\stackrel{\sim}{p}({\mathbf{\theta}}_{{i}_{1}};u)\stackrel{\sim}{p}({\mathbf{\theta}}_{{i}_{2}};{\tau}_{0}-u)}{\stackrel{\sim}{p}({\mathbf{\theta}}_{{i}_{2}};{\tau}_{0})}\\ {\mathcal{B}}_{2}^{(4)}=\prod _{c=1}^{C}\phantom{\rule{0.3em}{0ex}}\left\{exp({\mathrm{\lambda}}_{{i}_{2}}^{c}-{\stackrel{\sim}{\mathrm{\lambda}}}_{{i}_{2}}^{c}-{\stackrel{\sim}{\mathrm{\lambda}}}_{{i}_{1}}^{c})\frac{{\prod}_{k=1}^{{\stackrel{\sim}{y}}_{{i}_{2}}^{c}}{\mathrm{\lambda}}^{c}({t}_{k},{\mathcal{S}}_{0:u}^{\prime}){\prod}_{k=1}^{{\stackrel{\sim}{y}}_{{i}_{1}}^{c}}{\mathrm{\lambda}}^{c}({t}_{k},{\mathcal{S}}_{u:{\nu}_{1}}^{\prime})}{{\prod}_{k=1}^{{y}_{{i}_{2}}^{c}}{\mathrm{\lambda}}^{c}({t}_{k},{\mathcal{S}}_{0:{\nu}_{1}})}\right\}\\ {\mathcal{B}}_{3}^{(4)}=\frac{{q}_{5}}{{q}_{4}\frac{1}{{\tau}_{0}-{a}_{{i}_{2}}-{a}_{{i}_{1}}}}={\tau}_{0}-{a}_{{i}_{2}}-{a}_{{i}_{1}},\end{array}$$

where *q*_{4} = *q _{B}*

$$\left|\mathcal{J}\phantom{\rule{0.2em}{0ex}}\right|=\left|\begin{array}{cc}\frac{\partial u}{\partial {\tau}_{0}}& \frac{\partial u}{\partial u}\\ \frac{\partial ({\tau}_{0}-u)}{\partial {\tau}_{0}}& \frac{\partial ({\tau}_{0}-u)}{\partial u}\end{array}\right|=\left|\begin{array}{cc}0& 1\\ 1& -1\end{array}\right|=1.$$

Moves 5 and 4 are inverses of each other. Let *i*_{1} = *χ*_{0}, *i*_{2} = *χ*_{1},
${\mathrm{\lambda}}_{{i}_{1}}^{c}={\int}_{0}^{{\nu}_{1}}{\mathrm{\lambda}}^{c}(t;{S}_{0:{\nu}_{1}}={i}_{1})\phantom{\rule{0.2em}{0ex}}dt$,
${\mathrm{\lambda}}_{{i}_{2}}^{c}={\int}_{{\nu}_{1}}^{{\nu}_{2}}{\mathrm{\lambda}}^{c}(t;{S}_{{\nu}_{1}:{\nu}_{2}}={i}_{1})\phantom{\rule{0.2em}{0ex}}dt$,
${\stackrel{\sim}{\mathrm{\lambda}}}_{{i}_{2}}^{c}={\int}_{0}^{{\nu}_{1}}{\mathrm{\lambda}}^{c}(t;{S}_{0:{\nu}_{2}}={i}_{2})\phantom{\rule{0.2em}{0ex}}dt$, and let
${y}_{{i}_{1}}^{c}$,
${y}_{{i}_{2}}^{c}$, and
${\stackrel{\sim}{y}}_{{i}_{2}}^{c}$ denote the observed number of spike counts for spike train *c* within the intervals [0, *ν*_{1}], [*ν*_{1}, *ν*_{2}], and [0, *ν*_{2}], respectively. Then we have

$$\begin{array}{l}{\mathcal{B}}_{1}^{(5)}=\frac{{\pi}_{{i}_{2}}}{{\pi}_{{i}_{1}}}\phantom{\rule{0.2em}{0ex}}\frac{\stackrel{\sim}{p}({\mathbf{\theta}}_{{i}_{2}};{\tau}_{0}+{\tau}_{1})}{\stackrel{\sim}{p}({\mathbf{\theta}}_{{i}_{1}};{\tau}_{0})\stackrel{\sim}{p}({\mathbf{\theta}}_{{i}_{2}};{\tau}_{1})}\\ {\mathcal{B}}_{2}^{(5)}=\prod _{c=1}^{C}\phantom{\rule{0.3em}{0ex}}\left\{exp({\mathrm{\lambda}}_{{i}_{1}}^{c}+{\mathrm{\lambda}}_{{i}_{2}}^{c}-{\stackrel{\sim}{\mathrm{\lambda}}}_{{i}_{2}}^{c})\frac{{\prod}_{k=1}^{{\stackrel{\sim}{y}}_{{i}_{2}}^{c}}{\mathrm{\lambda}}^{c}({t}_{k},{{\mathcal{S}}^{\prime}}_{0:{\nu}_{2}})}{{\prod}_{k=1}^{{y}_{{i}_{1}}^{c}}{\mathrm{\lambda}}^{c}({t}_{k},{\mathcal{S}}_{0:{\nu}_{1}}){\prod}_{k=1}^{{y}_{{i}_{2}}^{c}}{\mathrm{\lambda}}^{c}({t}_{k},{\mathcal{S}}_{{\nu}_{1}:{\nu}_{2}})}\right\}\\ {\mathcal{B}}_{3}^{(5)}=\frac{{q}_{4}\frac{1}{{\tau}_{0}+{\tau}_{1}}}{{q}_{5}}=\frac{1}{{\tau}_{0}+{\tau}_{1}},\end{array}$$

where *q*_{4} = *q*_{5}. Similarly, ** τ**′ ← (

Let *i*_{1} = *i*, *i*_{2} = *j*,
${\mathrm{\lambda}}_{{i}_{1}}^{c}={\int}_{{\nu}_{n}}^{{\nu}_{n+1}}{\mathrm{\lambda}}^{c}(t;{S}_{{\nu}_{n}:{\nu}_{n+1}}={i}_{1})\phantom{\rule{0.2em}{0ex}}dt$,
${\stackrel{\sim}{\mathrm{\lambda}}}_{{i}_{1}}^{c}={\int}_{{\nu}_{n}}^{{\nu}_{n}+u}{\mathrm{\lambda}}^{c}(t;{S}_{{\nu}_{n}:{\nu}_{n}+u}={i}_{1})\phantom{\rule{0.2em}{0ex}}dt$,
${\stackrel{\sim}{\mathrm{\lambda}}}_{{i}_{2}}^{c}={\int}_{{\nu}_{n}+u}^{{\nu}_{n+1}}{\mathrm{\lambda}}^{c}(t;{S}_{{\nu}_{n}+u:{\nu}_{n+1}}={i}_{2})\phantom{\rule{0.2em}{0ex}}dt$, and let
${y}_{{i}_{1}}^{c}$,
${\stackrel{\sim}{y}}_{{i}_{1}}^{c}$, and
${\stackrel{\sim}{y}}_{{i}_{2}}^{c}$ denote the number of spike counts for spike train *c* observed within the intervals [*ν _{n}*,

$$\begin{array}{l}{\mathcal{B}}_{1}^{(6)}=\frac{\stackrel{\sim}{p}({\mathbf{\theta}}_{{i}_{1}};u)\stackrel{\sim}{p}({\mathbf{\theta}}_{{i}_{2}};{\tau}_{n}-u)}{\stackrel{\sim}{p}({\mathbf{\theta}}_{{i}_{1}};{\tau}_{n})}\\ {\mathcal{B}}_{2}^{(6)}=\prod _{c=1}^{C}\phantom{\rule{0.3em}{0ex}}\{exp({\mathrm{\lambda}}_{{i}_{1}}^{c}-{\stackrel{\sim}{\mathrm{\lambda}}}_{{i}_{2}}^{c}-{\stackrel{\sim}{\mathrm{\lambda}}}_{{i}_{2}}^{c})\\ \phantom{\rule{2.7em}{0ex}}\times \phantom{\rule{0.2em}{0ex}}\frac{{\prod}_{k=1}^{{\stackrel{\sim}{y}}_{{i}_{1}}^{c}}{\mathrm{\lambda}}^{c}({t}_{k},{{\mathcal{S}}^{\prime}}_{{\nu}_{n}:{\nu}_{n}+u}){\prod}_{k=1}^{{\stackrel{\sim}{y}}_{{i}_{2}}^{c}}{\mathrm{\lambda}}^{c}({t}_{k},{{\mathcal{S}}^{\prime}}_{{\nu}_{n}+u:{\nu}_{n+1}})}{{\prod}_{k=1}^{{y}_{{i}_{1}}^{c}}{\mathrm{\lambda}}^{c}({t}_{k},{\mathcal{S}}_{{\nu}_{n}:{\nu}_{n+1}})}\}\\ {\mathcal{B}}_{3}^{(6)}=\frac{{q}_{7}}{{q}_{6}\frac{1}{{\tau}_{n}-{a}_{{i}_{1}}}}={\tau}_{n}-{a}_{{i}_{1}},\end{array}$$

where *q*_{6} = *q*_{7}. Similarly ** τ**′ ← (

$$\left|\mathcal{J}\phantom{\rule{0.2em}{0ex}}\right|=\left|\begin{array}{cc}\frac{\partial u}{\partial u}& \frac{\partial u}{\partial {\tau}_{n}}\\ \frac{\partial ({\tau}_{n}-u)}{\partial u}& \frac{\partial ({\tau}_{n}-u)}{\partial {\tau}_{n}}\end{array}\right|=\left|\begin{array}{cc}1& 0\\ -1& 1\end{array}\right|=1.$$

Moves 7 and 6 are inverses of each other. Let *i*_{1} = *χ*_{n−1}, *i*_{2} = *χ _{n}*,
${\mathrm{\lambda}}_{{i}_{1}}^{c}={\int}_{{\nu}_{n-1}}^{{\nu}_{n}}{\mathrm{\lambda}}^{c}(t;{S}_{{\nu}_{n-1}:{\nu}_{n}}={i}_{1})\phantom{\rule{0.2em}{0ex}}dt$,
${\mathrm{\lambda}}_{{i}_{2}}^{c}={\int}_{{\nu}_{n}}^{{\nu}_{n+1}}{\mathrm{\lambda}}^{c}(t;{S}_{{\nu}_{n}:{\nu}_{n+1}}={i}_{2})\phantom{\rule{0.2em}{0ex}}dt$,
${\stackrel{\sim}{\mathrm{\lambda}}}_{{i}_{1}}^{c}={\int}_{{v}_{n-1}}^{{v}_{n+1}}{\mathrm{\lambda}}^{c}(t;{{S}^{\prime}}_{{v}_{n-1}:{v}_{n+1}}={i}_{1})\phantom{\rule{0.2em}{0ex}}dt$, and let
${y}_{{i}_{1}}^{c}$,
${y}_{{i}_{2}}^{c}$, and
${\stackrel{\sim}{y}}_{{i}_{1}}^{c}$ denote the observed number of spike counts for spike train

$$\begin{array}{l}{\mathcal{B}}_{1}^{(7)}=\frac{\stackrel{\sim}{p}({\mathbf{\theta}}_{{i}_{1}};{\tau}_{n-1}+{\tau}_{n})}{\stackrel{\sim}{p}({\mathbf{\theta}}_{{i}_{1}};{\tau}_{n-1})\stackrel{\sim}{p}({\mathbf{\theta}}_{{i}_{2}};{\tau}_{n})}\\ {\mathcal{B}}_{2}^{(7)}=\prod _{c=1}^{C}\phantom{\rule{0.3em}{0ex}}\left\{exp({\mathrm{\lambda}}_{{i}_{1}}^{c}+{\mathrm{\lambda}}_{{i}_{2}}^{c}-{\stackrel{\sim}{\mathrm{\lambda}}}_{{i}_{1}}^{c})\frac{{\Pi}_{k=1}^{{\stackrel{\sim}{y}}_{{i}_{1}}^{c}}{\mathrm{\lambda}}^{c}({t}_{k},{{\mathcal{S}}^{\prime}}_{{v}_{n-1}:{v}_{n+1}})}{{\Pi}_{k=1}^{{y}_{{i}_{1}}^{c}}{\mathrm{\lambda}}^{c}({t}_{k},{\mathcal{S}}_{{v}_{n-1}:{v}_{n}}){\Pi}_{k=1}^{{y}_{{i}_{2}}^{c}}{\mathrm{\lambda}}^{c}({t}_{k},{\mathcal{S}}_{{v}_{n}:{v}_{n+1}})}\right\}\\ {\mathcal{B}}_{3}^{(7)}=\frac{{q}_{6}\frac{1}{{\tau}_{n-1}+{\tau}_{n}}}{{q}_{7}}=\frac{1}{{\tau}_{n-1}+{\tau}_{n}},\end{array}$$

where *q*_{6} = *q*_{7}. In addition, we have ** τ**′ ← (

The experimental recordings are relatively long (varying 15–30 minutes for different rats or dates), and the MCMC sampling for the continuous-time model (with 1 ms bin size) is quite time-consuming. We need to design an efficient (problem-specific) sampling procedure for tackling this problem. One important issue is the initialization of state. As discussed earlier, this will be obtained from the estimation result of the discrete-time HMM. Another issue is to design data-driven MCMC proposals (e.g., Tu & Zhu, 2002) that “intelligently” select moves that also satisfy the detailed balance condition. Specifically, we use a few heuristics in carrying out RJMCMC sampling:

- Move type 1: Given a reasonably initialized state, use option 2 instead of option 1.
- Move type 2: Implement it favorably for those long sojourn time durations, and execute it only for those sojourn time durations with at least four times the minimum length.
- Move type 3: Implement it favorably for those short sojourn time durations.
- Move type 4: Execute it only when the initial sojourn time has at least four times the minimum length.
- Move type 6: Execute it only when the final sojourn time has at least four times the minimum length.

As far as the current UP and DOWN estimation problem is concerned, move types 1 and 3 are the most important ones. When implementing move type 3, we employ a heuristic importance sampling trick. Specifically, the probability of choosing a sojourn time to merge with its neighboring sojourns is inversely proportional to the sojourn length: the shorter the duration, the more likely to be picked out to be merged. Similarly, this trick can be utilized in move type 2 to determine where to split a DOWN state sojourn. The probability of the selected position will be inversely proportional to the observed number of instantaneous MUA spike counts.

In what follows, we derive a special example, in which the sojourn time durations of both UP and DOWN states are modeled by a censored exponential distribution as given in equation 3.12. This example can be viewed as a special case of the result from Ball et al. (1999) in which no constraint was imposed for the pdf. Let *r*_{0} and *r*_{1} denote the rate parameters associated with the exponential distribution for the DOWN and UP states, respectively. And let *a*_{0} > 0 and *a*_{1} > 0 denote the lower bounds of the sojourn durations for the DOWN and UP states, respectively. The probability ratios _{1} and _{3} for the seven move types are as follows:

- Move type 1:where$$\begin{array}{l}{\mathcal{B}}_{1}^{(1)}=\frac{{c}_{1}{r}_{{i}_{1}}exp(-{r}_{{i}_{1}}u){c}_{2}{r}_{{i}_{2}}exp[-{r}_{{i}_{2}}({\tau}_{j+1}+{\tau}_{j}-u)]}{{c}_{3}{r}_{{i}_{1}}exp(-{r}_{{i}_{1}}{\tau}_{j}){c}_{4}{r}_{{i}_{2}}exp(-{r}_{{i}_{2}}{\tau}_{j+1})}\\ \phantom{\rule{2.1em}{0ex}}=\frac{{c}_{1}{c}_{2}exp[({r}_{{i}_{1}}-{r}_{{i}_{2}})({\tau}_{i}-u)]}{{c}_{3}{c}_{4}}\phantom{\rule{2em}{0ex}}(\text{option}\phantom{\rule{0.2em}{0ex}}1)\\ {\mathcal{B}}_{1}^{(1)}=\frac{{c}_{5}{r}_{{i}_{1}}exp[-{r}_{{i}_{1}}({\tau}_{j}+u)]{c}_{5}{r}_{{i}_{2}}exp[-{r}_{{i}_{2}}({\tau}_{j+1}-u)]}{{c}_{3}{r}_{{i}_{1}}exp(-{r}_{{i}_{1}}{\tau}_{j}){c}_{4}{r}_{{i}_{2}}exp(-{r}_{{i}_{2}}{\tau}_{j+1})}\\ \phantom{\rule{2.1em}{0ex}}=\frac{{c}_{5}{c}_{6}exp[({r}_{{i}_{1}}-{r}_{{i}_{2}})u]}{{c}_{3}{c}_{4}}\phantom{\rule{4.5em}{0ex}}(\text{option}\phantom{\rule{0.2em}{0ex}}2)\\ {\mathcal{B}}_{3}^{(1)}=1\phantom{\rule{12.5em}{0ex}}(\text{option}\phantom{\rule{0.2em}{0ex}}1\phantom{\rule{0.2em}{0ex}}\text{and}\phantom{\rule{0.2em}{0ex}}2),\end{array}$$
*c*_{1},*c*_{2},*c*_{3},*c*_{4},*c*_{5},*c*_{6}are the normalized coefficients (details are ignored here; see the description after equation 3.13) - Move type 2:where$$\begin{array}{l}{\mathcal{B}}_{1}^{(2)}=\frac{{c}_{1}{r}_{{i}_{1}}exp(-{r}_{{i}_{1}}u){c}_{2}{r}_{{i}_{2}}exp(-{r}_{{i}_{2}}v){c}_{3}{r}_{{i}_{1}}exp[-{r}_{{i}_{1}}({\tau}_{l\ast}-u-v)]}{{c}_{4}{r}_{{i}_{1}}exp(-{r}_{{i}_{1}}{\tau}_{l\ast})}\\ \phantom{\rule{2.1em}{0ex}}=\frac{{c}_{1}{c}_{2}{c}_{3}{r}_{{i}_{1}}{r}_{{i}_{2}}exp[({r}_{{i}_{1}}-{r}_{{i}_{2}})v]}{{c}_{4}}\\ {\mathcal{B}}_{3}^{(2)}=\frac{({\tau}_{l\ast}-2{a}_{{i}_{1}})\phantom{\rule{0.1em}{0ex}}\left(exp(-{r}_{{i}_{2}}{a}_{{i}_{2}})-exp[-{r}_{{i}_{2}}({\tau}_{l\ast}-u-{a}_{{i}_{1}})]\right)}{{r}_{{i}_{2}}exp(-{r}_{{i}_{2}}v)},\end{array}$$
*c*_{1},*c*_{2},*c*_{3},*c*_{4}are the normalized coefficients - Move type 3:where$$\begin{array}{l}{\mathcal{B}}_{1}^{(3)}=\frac{{c}_{1}{r}_{{i}_{1}}exp[-{r}_{{i}_{1}}({\tau}_{l\ast -1}+{\tau}_{l\ast}+{\tau}_{l\ast +1})]}{{c}_{2}{r}_{{i}_{1}}exp(-{r}_{{i}_{1}}{\tau}_{l\ast -1}){c}_{3}{r}_{{i}_{2}}exp(-{r}_{{i}_{2}}{\tau}_{l\ast}){c}_{4}{r}_{{i}_{1}}exp(-{r}_{{i}_{1}}{\tau}_{l\ast +1})}\\ \phantom{\rule{2.1em}{0ex}}=\frac{{c}_{1}exp[({r}_{{i}_{2}}-{r}_{{i}_{1}}){\tau}_{l\ast}]}{{c}_{2}{c}_{3}{c}_{4}{r}_{{i}_{1}}{r}_{{i}_{2}}}\\ {\mathcal{B}}_{3}^{(3)}=\frac{{r}_{{i}_{2}}exp(-{r}_{{i}_{2}}{\tau}_{l\ast})}{({\tau}_{l\ast -1}+{\tau}_{l\ast}+{\tau}_{l\ast +1})\phantom{\rule{0.1em}{0ex}}\left(exp(-{r}_{{i}_{2}}{a}_{{i}_{2}})-exp[-{r}_{{i}_{2}}({\tau}_{l\ast}+{\tau}_{l\ast +1})]\right)},\end{array}$$
*c*_{1},*c*_{2},*c*_{3},*c*_{4}are the normalized coefficients - Move type 4:where$$\begin{array}{l}{\mathcal{B}}_{1}^{(4)}=\frac{{\pi}_{{i}_{1}}}{{\pi}_{{i}_{2}}}\frac{{c}_{1}{r}_{{i}_{1}}exp(-{r}_{{i}_{1}}u){c}_{2}{r}_{{i}_{2}}exp[-{r}_{{i}_{2}}({\tau}_{0}-u)]}{{c}_{3}{r}_{{i}_{2}}exp(-{r}_{{i}_{2}}{\tau}_{0})}\\ \phantom{\rule{2.1em}{0ex}}=\frac{{\pi}_{{i}_{1}}}{{\pi}_{{i}_{2}}}\frac{{c}_{1}{c}_{2}{r}_{1}exp[({r}_{{i}_{2}}-{r}_{{i}_{1}})u]}{{c}_{3}}\\ {\mathcal{B}}_{3}^{(4)}={\tau}_{0}-{a}_{{i}_{1}}-{a}_{{i}_{2}},\end{array}$$
*c*_{1},*c*_{2},*c*_{3}are the normalized coefficients - Move type 5:where$$\begin{array}{l}{\mathcal{B}}_{1}^{(5)}=\frac{{\pi}_{{i}_{2}}}{{\pi}_{{i}_{1}}}\frac{{c}_{1}{r}_{{i}_{2}}exp[-{r}_{{i}_{2}}({\tau}_{0}+{\tau}_{1})]}{{c}_{2}{r}_{{i}_{1}}exp(-{r}_{{i}_{1}}{\tau}_{0}){c}_{3}{r}_{{i}_{2}}exp(-{r}_{{i}_{2}}{\tau}_{1})}\\ \phantom{\rule{2.1em}{0ex}}=\frac{{\pi}_{{i}_{2}}}{{\pi}_{{i}_{1}}}\frac{{c}_{1}exp[({r}_{{i}_{1}}-{r}_{{i}_{2}}){\tau}_{0}]}{{c}_{2}{c}_{3}{r}_{1}}\\ {\mathcal{B}}_{3}^{(5)}=\frac{1}{{\tau}_{0}+{\tau}_{1}},\end{array}$$
*c*_{1},*c*_{2},*c*_{3}are the normalized coefficients - Move type 6:where$$\begin{array}{l}{\mathcal{B}}_{1}^{(6)}=\frac{{c}_{1}{r}_{{i}_{1}}exp(-{r}_{{i}_{1}}u){c}_{2}{r}_{{i}_{2}}exp[-{r}_{{i}_{2}}({\tau}_{n}-u)]}{{c}_{3}{r}_{{i}_{1}}exp(-{r}_{{i}_{1}}{\tau}_{n})}\\ \phantom{\rule{2.1em}{0ex}}=\frac{{c}_{1}{c}_{2}{r}_{2}exp[({r}_{{i}_{1}}-{r}_{{i}_{2}})({\tau}_{n}-u)]}{{c}_{3}}\\ {\mathcal{B}}_{3}^{(6)}={\tau}_{n}-{a}_{{i}_{1}},\end{array}$$
*c*_{1},*c*_{2},*c*_{3}are the normalized coefficients - Move type 7:where$$\begin{array}{l}{\mathcal{B}}_{1}^{(7)}=\frac{{c}_{1}{r}_{{i}_{1}}exp[-{r}_{{i}_{1}}({\tau}_{n-1}+{\tau}_{n})]}{{c}_{2}{r}_{{i}_{1}}exp(-{r}_{{i}_{1}}{\tau}_{n-1}){c}_{3}{r}_{{i}_{2}}exp(-{r}_{{i}_{2}}{\tau}_{n})}=\frac{{c}_{1}exp[({r}_{{i}_{2}}-{r}_{{i}_{1}}){\tau}_{n}]}{{c}_{2}{c}_{3}{r}_{2}}\\ {\mathcal{B}}_{3}^{(7)}=\frac{1}{{\tau}_{n-1}+{\tau}_{n}},\end{array}$$
*c*_{1},*c*_{2},*c*_{3}are the normalized coefficients.

The standard threshold-based method for determining the UP and DOWN states based on MUA spike trains (Ji & Wilson, 2007) consists of three major steps.

First, we bin the spike trains into 10 ms windows and calculate the raw spike counts for all time intervals. The raw count signal is smoothed by a gaussian window with an SD of 30 ms to obtain the smoothed count signal over time. We then calculate the first minimum (count threshold value) of the smoothed spike count histogram during SWS. As the spike count has been smoothed, the count threshold value may be a noninteger value.

Second, based on the count threshold value, we determine the active and silent periods for all 10 ms bins. The active periods are set to 1, and silent periods are set to 0. Next, the duration lengths of all silent periods are computed. We then calculate the first local minimum (gap threshold) of the histogram of the silent period durations.

Third, based on the gap threshold value, we merge those active periods separated by silent periods in duration less than the gap threshold. The resultant active and silent periods are classified as the UP and DOWN states, respectively. Finally, we recalculate the duration lengths of all UP and DOWN state periods and compute their respective histograms and sample statistics (min, max, median, mean, SD).

In summary, the choices of the spike count threshold and the gap threshold will directly influence the UP and DOWN state classification and their statistics (in terms of duration length and occurrence frequency). However, the optimal choices of these two hand-tuned parameters are rather ad hoc and dependent on several issues (e.g., kernel smoothing, bin size; see Figure 16 for an illustration). In some cases, no minimum can be found in the smoothed histogram, and then the choice of the threshold is problematic. Note that the procedure will need to be repeated for different data sets such that the UP and DOWN states statistics can be compared.

^{1}The neuronal state sometimes can refer to a single cell level, during which neurons exhibit different lengths or durations of depolarizing shift (e.g., Fujisawa, Matsuki, & Ikegaya, 2005).

^{2}The CIF is also known as the *hazard rate function* in survival analysis. The value λ* _{t}*Δ measures the probability of a failure or death of an event in [

^{3}In the worst case, the complexity is (*NT*(*N* + *T*)) in time, in contrast to (*N*^{2}*T*)) for the HMM, where *N* denotes the number of discrete state and *T* denotes the total length of sequences.

^{4}The term *emission probability* arose from the HMM literature in the context of speech recognition; it refers to the probability of observing a (finite) symbol given a hidden state (finite alphabet).

^{5}Here we assume that the individual CIF λ* ^{c}*(

^{6}In addition to the compact representation, another main reason for this reformulation is the efficiency and stability of numerical computation in calculating the observed data likelihood or likelihood ratio.

^{7}The exponential distribution with mean 1/λ is the maximum entropy distribution among all continuous distributions with nonnegative support that have a mean 1/λ.

^{8}*Censoring* is a term used in statistics that refers to the condition that the value of an observation is partially known or the condition that a value occurs outside the range of measuring tool.

^{9}Alternatively, one can use a discrete nonparametric probability model for the sojourn time, which is discussed in section 5.

^{10}Note that *Ŝ*(*t*) is not the same as *χ _{l}* (

^{11}The sleep stage classification was based on the recordings of electromyography (EMG) and hippocampal and cortical EEGs (ripple power, theta and delta power). SWS is characterized as having low EMG, high hippocampal ripple, low hippocampal theta (4–8 Hz), and high cortical delta (2–4 Hz). In practice, we varied the cut-off thresholds of those parameters (via grid search) to obtain a suboptimal SWS classification for a specific rat.

^{12}The model was empirically verified by model selection based on the GLM fit of a small data set using the glmfit function in Matlab. The model with the lowest deviance (defined by twofold negative log likelihood) or the smallest Akaike's information criterion (AIC) value will be chosen.

^{13}A sensible initial parameter value will be an important factor for obtaining a fast convergence of the simulated Markov chain. In practice, one can fit a small spike train data set (with preidentified hidden state values) with a GLM.

^{14}We computed the mean and variance of spike counts given all 10 ms time bins and obtained a mean 2.42 and a variance 4.45. The deviance between the mean and variance statistics suggested that the inhomogeneous Poisson probability model is inaccurate, and this fact motivated us to include history-dependent covariates in the rate parameter.

^{15}A video demo file is provided online (https://neurostat.mit.edu/zhechen/UpDownDemo.avi) for readers interested in a detailed result comparison for a selected 5 minute recording.

^{16}The cortical EEG averages have special waveforms triggered by the start and the end times of the UP state; furthermore, ripple events (150–300 Hz) occur much more frequently during the UP state (Ji & Wilson, 2007).

^{17}In implementation by Matlab version 7.0, that roughly amounts to about 15 hours of CPU time in a personal computer equipped with an Intel Core 2 Du processor.

^{18}This is in contrast to the anesthetized animals, in which the DOWN states occupy a larger fraction of time than the UP states.

^{19}Recently, complementary work has been reported in modeling self-organized recurrent network model of excitatory and inhibitory neurons for spontaneous UP and DOWN state transitions (Kang, Kitano, & Fukai, 2008).

^{20}Given a reasonably accurate initial state, the second sampling option would be more efficient since its rejection rate would be lower. For the current continuous-time estimation problem, the SD parameter *σ* in the second sampling option is chosen to be 2 ms, twice that of the bin size.

^{21}To avoid numerical problems, it is more convenient to calculate the ratios in the logarithm domain.

Zhe Chen, Neuroscience Statistics Research Laboratory, Department of Anesthesia and Critical Care, Massachusetts General Hospital, Harvard Medical School, Boston, MA 02114, U.S.A., and Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, U.S.A.

Sujith Vijayan, Program in Neuroscience, Harvard University, Cambridge, MA 02139, U.S.A., and Picower Institute for Learning and Memory, Massachusetts Institute of Technology, Cambridge, MA 02139, U.S.A.

Riccardo Barbieri, Neuroscience Statistics Research Laboratory, Department of Anesthesia and Critical Care, Massachusetts General Hospital, Harvard Medical School, Boston, MA 02114, U.S.A.

Matthew A. Wilson, Picower Institute for Learning and Memory, RIKEN-MIT Neuroscience Research Center, Department of Brain and Cognitive Sciences and Department of Biology, Massachusetts Institute of Technology, Cambridge, MA 02139, U.S.A.

Emery N. Brown, Neuroscience Statistics Research Laboratory, Department of Anesthesia and Critical Care, Massachusetts General Hospital, Harvard Medical School, Boston, MA 02114, U.S.A., Harvard-MIT Division of Health Sciences and Technology, Cambridge, MA 02139, U.S.A., and Department of Brain and Cognitive Sciences, Massachusetts Institute of Technology, Cambridge, MA 02139, U.S.A.

- Abeles M, Bergman H, Gat I, Meilijson I, Seidemann E, Tishby N, et al. Cortical activity flips among quasi-stationary states. Proc Natl Acad Sci USA. 1995;92:8616–8620. [PubMed]
- Achtman N, Afshar A, Santhanam G, Yu BM, Ryu SI, Shenoy KV. Free paced high-performance brain-computer interfaces. Journal of Neural Engineering. 2007;4:336–347. [PubMed]
- Albert PS. A two-state Markov mixture model for a time series of epileptic seizure counts. Biometrics. 1991;47:1371–1381. [PubMed]
- Ball FG, Cai Y, Kadane JB, O'Hagan A. Bayesian inference for ion-channel gating mechanisms directly from single-channel recordings, using Markov chain Monte Carlo. Proceedings of the Royal Society of London, A. 1999;455:2879–2932.
- Bannerjee AK, Bhattacharyya GK. Bayesian results for the inverse gaussian distribution with an application. Technometrics. 1979;21(2):247–251.
- Barbieri R, Quirk MC, Frank LM, Wilson MA, Brown EN. Construction and analysis of non-Poisson stimulus response models of neural spike train activity. Journal of Neuroscience Methods. 2001;105:25–37. [PubMed]
- Battaglia FP, Sutherland GR, McNaughton BL. Hippocampal sharp wave bursts coincide with neocortical “up-state” transitions. Learning and Memory. 2004;11:697–704. [PubMed]
- Baum L. An inequality and associated maximization technique in statistical estimation for probabilistic function of Markov processes. Inequalities. 1972;3:1–8.
- Baum LE, Petrie T, Soules G, Weiss N. A maximization technique occurring in the statistical analysis of probabilistic functions of Markov chains. Annals of Mathematical Statistics. 1970;41(1):164–171.
- Bellman R. Dynamic programming. Princeton, NJ: Princeton University Press; 1957.
- Brillinger DR. Maximum likelihood analysis of spike trains of interacting nerve cells. Biological Cybernetics. 1988;59:189–200. [PubMed]
- Brooks S, Giudici P, Roberts G. Efficient construction of reversible jump MCMC proposal distributions (with discussion) Journal of the Royal Society of London, Series B. 2003;65(1):3–56.
- Brown EN. Theory of point processes for neural systems. In: Chow CC, Gutkin B, Hansel D, Meunier C, Dalibard J, editors. Methods and models in neurophysics. Amsterdam: Elsevier; 2005. pp. 691–727.
- Brown EN, Barbieri R, Eden UT, Frank LM. Likelihood methods for neural data analysis. In: Feng J, editor. Computational neuroscience: A comprehensive approach. London: CRC Press; 2003. pp. 253–286.
- Brown EN, Barbieri R, Ventura V, Kass RE, Frank LM. The time-rescaling theorem and its application to neural spike data analysis. Neural Comput. 2002;14(2):325–346. [PubMed]
- Brown EN, Kass RE, Mitra PP. Multiple neural spike train data analysis: State-of-the-art and future challenges. Nature Neuroscience. 2004;7:456–461. [PubMed]
- Buzsáki G. Rhythms of the brain. New York: Oxford University Press; 2006.
- Chan KS, Ledolter J. Monte Carlo EM estimation for time series models involving counts. Journal of the American Statistical Association. 1995;90:242–252.
- Chung SH, Krishnamurthy V, Moore JB. Adaptive processing techniques based on hidden Markov models for characterizing very small channel currents buried in noise and deterministic interferences. Philosophical Transactions of the Royal Society, London B. 1991:357–384. [PubMed]
- Coleman TP, Brown EN. A recursive filter algorithm for state estimation from simultaneously recorded continuous-valued, point process and binary observations. Proc. Asilomar Conference on Signals, Systems, and Computers; Piscataway, NJ: IEEE Press; 2006. pp. 1949–1953.
- Cowles MK, Carlin BP. Markov chain Monte Carlo convergence diagnostics: A comparative review. Journal of the American Statistical Association. 1996;91:883–904.
- Cox DR, Isham V. Point processes. London: Chapman & Hall; 1980.
- Daley D, Vere-Jones D. An introduction to the theory of point processes, Vol. 1: Elementary theory and methods. New York: Springer-Verlag; 2002.
- Danóczy MG, Hahnloser RHR. Efficient estimation of hidden state dynamics from spike trains. In: Weiss Y, Schölkopf B, Platt J, editors. Advances in neural information processing systems, 18. Cambridge, MA: MIT Press; 2006. pp. 227–234.
- Dempster A, Laird N, Rubin D. Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B. 1977;39(1):1–38.
- Deng L, Mark JW. Parameter estimation for Markov modulated Poisson processes via the EM algorithm with time discretization. Telecommunication Systems. 1993;1(1):321–338.
- Durbin R, Eddy S, Krough A, Mitchison G. Biological sequence analysis: Probabilistic models of proteins and nucleic acids. Cambridge: Cambridge University Press; 1998.
- Eden UT, Brown EN. Mixed observation filtering for neural data. Proc. IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP'08); Piscataway, NJ: IEEE Press; 2008. pp. 5201–5203.
- Ephraim Y, Merhav N. Hidden Markov processes. IEEE Transactions on Information Theory. 2002;48:1518–1569.
- Escola S, Paninski L. Hidden Markov models for the complex stimulus-response relationship of multi-state neurons. Conference abstract, Frontiers in Computational Neuroscience, Bernstein Symposium.2008.
- Forney GD. The Viterbi algorithm. Proceedings of the IEEE. 1973;61(3):268–278.
- Fredkin DR, Rice JA. Maximum likelihood estimation and identification directly from single channel recordings. Proceedings of the Royal Society of London, B. 1992;249:125–132. [PubMed]
- Fujisawa S, Matsuki N, Ikegaya Y. Single neurons can induce phase transitions of cortical recurrent networks with multiple internal states. Cerebral Cortex. 2005;16(5):639–654. [PubMed]
- Gat I, Tishby N, Abeles M. Hidden Markov modeling of simultaneously recorded cells in the associated cortex of behaving monkeys. Networks: Computation in Neural Systems. 1997;8(3):297–322.
- Gelman A, Rubin DB. Inference from iterative simulation using multiple sequences. Statistical Sciences. 1992;7:457–472.
- Geyer CJ. Practical Markov chain Monte Carlo. Statistical Science. 1992;7:473–511.
- Ghahramani Z, Hinton GE. Variational learning for switching state-space models. Neural Comput. 2000;12(4):831–864. [PubMed]
- Gilks WR, Richardson S, Spiegelhalter DJ, editors. Markov chain Monte Carlo in practice. London: Chapman & Hall/CRC; 1995.
- Green PJ. Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika. 1995;82:711–732.
- Guon Y. Estimating hidden semi-Markov chains from discrete sequences. Journal of Computational Graphical Statistics. 2003;12:604–639.
- Haider B, Duque A, Hasenstaub AR, McCormick DA. Neocortical network activity in vivo is generated through a dynamic balance of excitation and inhibition. Journal of Neuroscience. 2006;26:4535–4545. [PubMed]
- Haider B, Duque A, Hasentaub AR, Yu Y, McCormick DA. Enhancement of visual responsiveness by spontaneous local network activity in vivo. Journal of Neurophysiology. 2007;97:4186–4202. [PubMed]
- Haslinger R, Ulbert I, Moore CI, Brown EN, Devor A. Analysis of LFP phase predicts sensory response of barrel cortex. Journal of Neurophysiology. 2006;96:1658–1663. [PubMed]
- Ji D, Wilson MA. Coordinated memory replay in the visual cortex and hippocampus during sleep. Nature Neuroscience. 2007;10(1):100–107. [PubMed]
- Jones LM, Fontanini A, Sadacca BF, Katz DB. Natural stimuli evoke analysis dynamic sequences of states in sensory cortical ensembles. Proc Natl Acad Sci USA. 2007;104:18772–18777. [PubMed]
- Kang S, Kitano K, Fukai T. Structure of spontaneous UP and DOWN transitions self-organizing in a cortical network model. PLoS Computational Biology. 2008;4(3):e1000022. doi: 10.1371/journal.pcbi.1000022. [PMC free article] [PubMed] [Cross Ref]
- Kass RE, Ventura V, Brown EN. Statistical issues in the analysis of neuronal data. Journal of Neurophysiology. 2005;94:8–25. [PubMed]
- Kemere C, Santhanam G, Yu BM, Afshar A, Ryu SI, Meng TH, et al. Detecting neural-state transitions using hidden Markov models for motor cortical prostheses. Journal of Neurophysiology. 2008;100:2441–2452. [PubMed]
- Le ND, Leroux BG, Puterman ML. Exact likelihood evaluation in a Markov mixture model for time series of seizure counts. Biometrics. 1992;48:317–323. [PubMed]
- Luczak A, Barthó P, Marguet SL, Buzsáki G, Harris KD. Sequential structure of neocortical spontaneous activity in vivo. Proc Natl Acad Sci. 2007;104:347–352. [PubMed]
- McCullagh P, Nelder J. Generalized linear models. London: Chapman & Hall; 1989.
- McLachlan GJ, Krishnan T. The EM algorithm and extensions. Hoboken, NJ: Wiley; 1996.
- Mehta MR. Cortico-hippocampal interaction during up-down states and memory consolidation. Nature Neuroscience. 2007;10(1):13–15. [PubMed]
- Paninski L. Maximum likelihood estimation of cascade point-process neural encoding models. Network: Computation in Neural Systems. 2004;15:243–262. [PubMed]
- Paninski L, Pillow J, Lewi J. Statistical models for neural encoding, decoding, and optimal stimulus design. In: Cisek P, Drew T, Kalaska J, editors. Computational neuroscience: Theoretical insights into brain function. Amsterdam: Elsevier; 2007. [PubMed]
- Pawitan Y. In all likelihood: Statistical modelling and inference using likelihood. New York: Oxford University Press; 2001.
- Prerau MJ, Smith AC, Eden UT, Yanike M, Suzuki W, Brown EN. A mixed filter algorithm for cognitive state estimation from simultaneously recorded continuous-valued and binary measures of performance. Biological Cybernetics. 2008;99(1):1–14. [PMC free article] [PubMed]
- Rabiner LR. A tutorial on hidden Markov models and selected applications in speech recognition. Proceedings of the IEEE. 1989;77(2):257–286.
- Radons G, Becker JD, Dülfer B, Krüger J. Analysis, classification, and coding of multielectrode spike trains with hidden Markov models. Biological Cybernetics. 1994;71(4):359–373. [PubMed]
- Robert CP, Rydén T, Titterington DM. Bayesian inference in hidden Markov models through reversible jump Markov chain Monte Carlo method. Journal of the Royal Statistical Society, Series B. 2000;62:57–75.
- Roberts WJJ, Ephraim Y, Dieguez E. On Rydén's EM algorithm for estimating MMPPs. IEEE Signal Processing Letters. 2006;13(6):373–376.
- Rydén T. An EM algorithm for estimation in Markov-modulated Poisson processes. Computational Statistics and Data Analysis. 1996;21:431–447.
- Sanchez-Vives MV, McCormick DA. Cellular and network mechanisms of rhythmic recurrent activity in neocortex. Nature Neuroscience. 2000;3:1027–1034. [PubMed]
- Sirota A, Csicsvari J, Buhl D, Buzsáki G. Communication between neocortex and hippocampus during sleep in rodents. Proc Nat Acad Sci USA. 2003;100:2065–2069. [PubMed]
- Smith AC, Brown EN. Estimating a state-space model from point process observations. Neural Comput. 2003;15:965–991. [PubMed]
- Srinivasan L, Eden UT, Mitter SK, Brown EN. General-purpose filter design for neural prosthetic devices. Journal of Neurophysiology. 2007;98:2456–2475. [PubMed]
- Stjernqvist S, Rydén T, Sköld M, Staaf J. Continuous-index hidden Markov modelling of array CGH copy number data. Bioinformatics. 2007;23(8):1006–1014. [PubMed]
- Takagi K, Kumagai S, Matsunaga I, Kusaka Y. Application of inverse gaussian distribution to occupational exposure data. Annals of Occupational Hygiene. 1997;41(5):505–514.
- Tanner MA. Tools for statistical inference. 3rd. New York: Springer-Verlag; 1996.
- Truccolo W, Eden UT, Fellow M, Donoghue JD, Brown EN. A point process framework for relating neural spiking activity to spiking history, neural ensemble and covariate effects. Journal of Neurophysiology. 2005;93:1074–1089. [PubMed]
- Tu Z, Zhu S. Image segmentation by data-driven Markov chain Monte Carlo. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2002;24(5):657–673.
- Tuckwell HC. Stochastic processes in the neurosciences. Philadelphia: SIAM; 1989.
- Verbeek JJ, Vlassis N, Kröse B. Efficient greedy learning of gaussian mixture models. Neural Comput. 2003;15(2):469–485. [PubMed]
- Vijayan S. Unpublished doctoral dissertation. Harvard University; 2007. Characterization of activity in the primary somatosensory cortex during active behavior and sleep.
- Viterbi J. Error bounds for convolutional codes and an asymptotically optimal decoding algorithm. IEEE Transactions on Information Theory. 1967;13:260–269.
- Volgushev M, Chauvette S, Mukovski M, Timofeev I. Precise long-range synchronization of activity and silence in neocortical neurons during slow-wave sleep. Journal of Neuroscience. 2006;26:5665–5672. [PubMed]
- Wolansky T, Clement EA, Peters SR, Palczak MA, Dickson CT. Hippocampal slow oscillation: A novel EEG state and its coordination with ongoing neocortical activity. Journal of Neuroscience. 2006;26:6213–6229. [PubMed]
- Xi J, Kass RE. Detection of bursts in neuronal spike trains using hidden semi-Markov point process models. 2008 Unpublished manuscript.

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |