Search tips
Search criteria 


Logo of nihpaAbout Author manuscriptsSubmit a manuscriptHHS Public Access; Author Manuscript; Accepted for publication in peer reviewed journal;
J Stat Plan Inference. Author manuscript; available in PMC 2012 July 1.
Published in final edited form as:
J Stat Plan Inference. 2011 July 1; 141(7): 2209–2227.
doi:  10.1016/j.jspi.2011.01.016
PMCID: PMC3086408

Quasi- and pseudo-maximum likelihood estimators for discretely observed continuous-time Markov branching processes


This article deals with quasi- and pseudo-likelihood estimation in a class of continuous-time multi-type Markov branching processes observed at discrete points in time. “Conventional” and conditional estimation are discussed for both approaches. We compare their properties and identify situations where they lead to asymptotically equivalent estimators. Both approaches possess robustness properties, and coincide with maximum likelihood estimation in some cases. Quasi-likelihood functions involving only linear combinations of the data may be unable to estimate all model parameters. Remedial measures exist, including the resort either to non-linear functions of the data or to conditioning the moments on appropriate sigma-algebras. The method of pseudo-likelihood may also resolve this issue. We investigate the properties of these approaches in three examples: the pure birth process, the linear birth-and-death process, and a two-type process that generalizes the previous two examples. Simulations studies are conducted to evaluate performance in finite samples.

Keywords: Continuous-time branching process, Conditional inference, Non-identifiability, Estimating equation, Optimality, Birth process, Birth-and-death process

1. Introduction

A vast body of literature is concerned with problems of statistical inference for branching processes. Many publications have investigated the Bienaymé-Galton-Watson process (see Guttorp (1991), Yanev (2008) and references therein, for instance). Much fewer have considered continuous-time branching processes, especially when the process is partially observed.

Kendall (1949), Darwin (1956), and Keiding (1974, 1975) investigated maximum likelihood estimation for the pure birth and the linear birth-and-death processes. Oh, Severo and Slivka (1991) proposed approximate maximum likelihood estimators for a class of pure birth processes. Becker and Hasofer (1997) investigated estimation for the linear birth-and-death process under a sampling schema slightly different than the one considered herein. Nedelman, Downs and Pharr (1985) constructed maximum likelihood estimators for a multi-type Bellman-Harris branching process model of the proliferation of mast cells. Maximum likelihood estimation remains, however, generally unattainable for multi-type Bellman-Harris branching processes. Hoel and Crump (1974) considered moment-based estimation using asymptotic approximations to the moments of the process. Boucher et al (1999) used a least squares estimator for a multi-type Bellman-Harris branching process describing the generation of oligodendrocytes in vitro. More recently, Monte Carlo approaches have been proposed for the multi-type Bellman-Harris process and one of its extensions. Zorin et al (2000) constructed a simulated maximum likelihood estimator using a stochastic approximation algorithm. Hyrien (2007) and Hyrien et al (2005a, 2005b, 2010a, 2010c) considered a pseudo-likelihood approach solely based on the first two moments of the process. Quasi-likelihood estimation was considered by Hyrien et al (2010). Chen et al (2010) investigated composite likelihood estimators. Hyrien and Zand (2008) and Hyrien, Chen and Zand (2010) proposed a method combining a mixture model and composite likelihood for CFSE-labeling data.

This paper deals with the methods of quasi- and pseudo-likelihood for discretely observed continuous-time Markov branching processes. Maximum likelihood estimation will generally outperform these alternative approaches, except when they coincide. These two methods are still worthy of consideration because maximum likelihood estimation may sometime be difficult to carry out, either because an expression for the likelihood function cannot be obtained in closed-form (this is particularly true for age-dependent branching processes) or because evaluation of the likelihood function is numerically challenging. Although Markov processes do not always provide realistic models, investigation in this setting remains of interest for several reasons, including: they provide useful tools for preliminary investigation of complex phenomena; explicit results can sometimes be obtained, thereby allowing potential pitfalls to be more easily identified. The methods of quasi- and pseudo-likelihood have been extensively studied in various contexts. These methods may give strongly consistent and asymptotically normal estimators under appropriate, often mild, regularity conditions. We refer to Heyde (1997) for an overview of quasi-likelihood, and to Gourieroux, Monfort and Trognon (1984) for a presentation of pseudo likelihood estimation. Heyde and Lin (1992) proposed quasi-likelihood estimators for a Bienaymé-Galton-Watson process with immigration. Klimko and Nelson (1978) and Jacob (2010) considered conditional least squares for general classes of stochastic processes including discrete time branching processes. Hyrien (2007) investigated a pseudo-maximum likelihood estimator for the multi-type Bellman-Harris branching process (see also Hyrien et al (2005a, 2005b, Hyrien et al (2010)).

The objective of this paper is to outline the main properties of quasi- and pseudo-likelihood estimation. Both methods rest solely on moments of the process, which can be computed with relative ease. They allow “conventional” and conditional inference, and offer reasonable alternatives to maximum likelihood estimation. Although quasi- and pseudo-maximum likelihood estimators are generally different, we show that they may be asymptotically equivalent on the set of non-extinction. We discuss possible connections with maximum likelihood estimation. In the case of quasi-likelihood for the pure birth process, this is accomplished by using the fact that the distribution of the process belongs to the linear exponential family. Unlike quasi-likelihood, the method of pseudo-likelihood as originally proposed by Gourieroux, Monfort and Trognon (1984) has not been shown to enjoy any optimality properties. We construct optimal pseudo-maximum likelihood estimators using the theory of estimating equations (Godambe, 1985). We show that the choice of the conditioning sigma-algebra may be important in making model parameters estimable when using the method of conditional quasi-likelihood. Large sample properties (consistency and asymptotic normality) of the proposed estimators are outlined in the general case, and discussed in detail in a few examples. These properties are investigated when an increasing number of independent realizations of the process are observed at bounded numbers of time points, and when a single realization is observed at an increasing number of discrete time points. The finite sample performance of the estimators are investigated and compared using simulations.

This paper is organized as follows. The process is presented in Section 2, along with expressions for its expected value and covariance function. Section 3 describes the methods of quasi- and pseudo-likelihood. Section 4 considers application to three specific examples: the pure birth process, the linear birth-and-death process, and a two-type process generalizing the previous two examples.

2. A multi-type continuous-time Markov branching process

2.1. The process

The process under consideration describes a population consisting of K cell types, K [set membership] N. Without loss of generality, it begins at time 0 with a single ancestor cell of age zero and type 1. Upon completion of its lifespan, every type-k cell produces ξk,1 type-1 cells, …, ξk,K type-K cells, where ξk = (ξk,1, …, ξk,K) is a random vector with probability distribution function pk(x) = P(ξk = x). Denote the support of ξk by An external file that holds a picture, illustration, etc.
Object name is nihms277454ig1.jpgk. Every type-k cell lives a random lifespan τk with cumulative distribution function Fk(t) = P(τkt). The random variables τk and ξk are usually assumed mutually independent (see Athreya and Ney, 1972), so that their joint distribution can be written as P(τkt, ξk = x) = pk(x)Fk(t). In the present paper, we consider a natural generalization of this process, which allows the distribution of the lifespan τk to depend on the progeny vector ξk, and we write Fk(t)=xSkpk(x)Fk(t,x), where Fk(t, x) = P(τkt[mid ]ξk = x) denotes the conditional cumulative distribution function of τk given the type-k cell generates a progeny vector ξk = x when completing its lifespan. The lifespan τk, given ξk = x, is assumed to follow an exponential distribution with parameter λk,x. Last, every cell of the population evolves independently of every other cell.

This process is a particular case of a multi-type continuous-time branching process considered by Hyrien et al (2005a), which was built on a two-type process earlier proposed by Jagers (1975). It is also akin to Sevast'yanov (1971)'s process in that both processes allow the progeny vector and the cell lifespan to be dependent. The former process was used to analyze the division of the so-called O-2A progenitor cells and their differentiation into terminally differentiated oligodendrocytes by Hyrien et al (2005a), where dissimilar distributions were used to model the distributions of the time to occurrence of these two events. An analysis of experimental data on this cell system suggested the need for such a feature (Hyrien et al, 2005a), and time-lapse cinematography data provided further evidence in support of the proposed model extension (Hyrien et al, 2006). Of note, the above process reduces to the “usual” multi-type Markov process (Athreya and Ney, 1972) if λk,x = λk for every x [set membership] An external file that holds a picture, illustration, etc.
Object name is nihms277454ig1.jpgk.

We assume that pk(x) and λk,x are functions of a finite dimensional parameter θ0 [set membership] Θ [subset or is implied by] Rp, [Theta with macron] compact. Any element of Θ will be denoted by θ. Our objective is to construct estimators for θ on the basis of partial observations of the above process. The data structure will be presented in section 3.1.

2.2. Moments of the process

For every k, l = 1, …, K, let Zk(l)(t) be an N-valued stochastic process denoting the number of type-k cells at any time t ≥ 0 arising from a single type-l cell born at t = 0, and introduce the vector Z(l)(t)={Z1(l)(t),,ZK(l)(t)}. We investigate estimators defined from marginal and conditional mean and variance-covariance of Z(l)(t), and we present below expressions for these moments. Their derivation proceeds from arguments similar to those employed for the multi-type Bellman-Harris branching process (See Sevast'yanov and Chistyakov (1971) for instance). Technical details are therefore omitted.

Let s = (s1, …, sK), x = (x1, …, xK), 0 = (0, …, 0)K, and 1 = (1, …, 1)K. For every k = 1, …, K, let ek be a 1 × K vector having all elements equal to zero, except for a one in the k-th place. Define the probability generating functions (p.g.f.)



This function is parameterized by θ, but we shall use H(l)(s; t) and H(l)(s; t, θ) interchangeably depending upon the context. The same will be true for the expectation and variance-covariance functions to be defined below. Also, we shall omit unnecessary sub- and superscripts when K = 1.

The functions H(l)(s; t) satisfy the set of functional equations


Let mk(l)(t)=E{Zk(l)(t)}, mk(l)(tu)=E{Zk(l)(t)Z(l)(u)} and mk(l){tZ(l)(u)0}=E{Zk(l)(t)Z(l)(u)0} denote marginal and conditional expectations, and write m(l)(t)={m1(l)(t),,mK(l)(t)}, m(l)(tu)={m1(l)(tu),,mK(l)(tu)}, and m(l)(tZ(l)(u)0)={m1(l)(tZ(l)(u)0),,mK(l)(tZ(l)(u)0)}. Similarly, let vjk(l)(t,u)=cov{Zj(l)(t),Zk(l)(u)}, vjk(l)(tu)=cov{Zj(l)(t),Zk(l)(t)Z(l)(u)} and vjk(l)(tZ(l)(u)0)=cov{Zj(l)(t),Zk(l)(t)Z(l)(u)0} denote the marginal and conditional covariances, and introduce the matrix notation V(l)(t,u)=[vjk(l)(t,u)]j,k=1,,K, V(l)(tu)=[vjk(l)(tu)]j,k=1,,K and V(l)(tZ(l)(u)0)=[vjk(l)(tZ(l)(u)0)]j,k=1,,K. Since mk(l)(t)=H(l)(s;t)/sks=1, the expectations satisfy the system of functional equations


where χ{l=k} denotes the indicator function. The systems (2) contains renewal-type equations, and explicit solutions can be obtained by solving the system iteratively. These solutions are linear combinations of certain functions taking the form


for some (k1, (...), km) [set membership] Nm, x(k1), (...), x(km) [set membership] {1, (...), K}[multiply sign in circle]K, where


stands for the convolution of Fk1(·, x(k1)) and Fk2(·, x(k2)). The function C(u; k1, …, km; x(k1), …, x(km)) is the c.d.f. of a General Erlang distribution (Johnson and Kotz, 1970) or of a Phase-type distribution, and it can be computed explicitly. An example shall be treated in Section 4.3.

To compute the variance-covariance function of the process, compute first the matrix of second order moments An external file that holds a picture, illustration, etc.
Object name is nihms277454ig2.jpg{Z(l)(t)Z(l)(u)′}. Let Ujk(l)(u)=E{Zj(l)(u)Zk(l)(u)}, and write Uk(l)(u)={Uk1(l)(u),,UkK(l)(u)}. The following Lemma is established by using the Markov Property:

Lemma 1

For every tu ≥ 0 and l = 1, …, K, we have

  1. m(l)(tu)=k=1KZk(l)(u)m(k)(tu);
  2. m(l)(t[mid ]Z(l)(u) ≠ 0) = m(l)(t[mid ]u)/{1 − H(l)(0; u)},
  3. V(l)(t,u)=k=1Km(k)(tu)Uk(l)(u)m(l)(t)m(l)(u),
  4. V(l)(tu)=k=1KZk(l)(u)V(k)(tu,tu),
  5. V(l)(t[mid ]Z(l)(u) ≠ 0) = V(l)(t[mid ]u)/{1 − H(l)(0; u)},

where H(l)(0; u) = P{Z(l)(u) = 0}.

To derive an expression for Ujk(l)(u), we have


where the partial second order derivatives are the solutions to the system of integral equations


for l = 1, (...), K, and where


These equations also can be solved explicitly (see remark above for renewal equations).

3. Estimation

3.1. Data structure

Let n denote the number of independent realizations of the process observed during the experiment. For every i = 1, …, n, let 0 = ti0 < ti1 < (...) < tini < ∞ be ni + 1 non-negative ordered time points. For every j = 0, …, ni, define the K × 1 vectors Yi j = (Yi j1, …, Yi jK)′, where Yi jK, k = 1, …, K, denotes the number of type-k cells for the i-th realization of the process at time ti j. By convention, the first cell shall be of type 1 such that Yi0=e1. Set Yi=(Yi1,,Yini). We assume that the vectors Yi, i = 1, …, n, are independent realizations of the vector {Z(1)(ti1)′, …, Z(1)(tini)′}′. The two most common experimental designs are (i) when multiple realizations of the process are each observed at a single or a small number of time points, possibly distinct for different realizations; and (ii) when a single realization of the process is observed repeatedly over time.

Another situation that is of practical interest is when the process begins with an arbitrary, possibly large, number of cells. We do not consider this situation here, but the method generalizes to this setting by adapting the expression for the moments of the process. We refer to Yanev (1975), Dion and Yanev (1994), and Yakovlev and Yanev (2009) for related results. The estimators obtained when n independent realizations of the process are observed separately are not always identical to those obtained when the process starts with n ancestors, even when the times of observations are identical for every realization.

Let the Kni × 1 vector μi(θ) = {m(1)(ti1; θ)′, …, m(1)(tini; θ)′}′ and the Kni × Kni matrix


denote the expectation and the variance-covariance matrix of the number of cells counted in any realization of the process at the time points ti1, …, tini. Let F = {Fij, j = 1, …, ni, i = 1, 2 (...)} denote any collection of sigma-algebras, and let μij(θ)=Eθ{Z(1)(tij)i,j1} and vij(θ)=Varθ{Z(1)(tij)i,j1} denote associated conditional expectation and variance-covariance matrix. The most obvious choice for F is the natural filtration denoted by F2 = {Fij : Fij = σ{Yi0, …, Yij}}, but other choices are sometimes warranted; for instance, we shall use the collection F3 = {Fij : Fij = σ{Yij0}} to remediate a non-identifiability issue for the linear birth-and-death process. In what follows, we consider estimating θ on the basis of Y1, …, Yn using the moment-based methods of quasi-and pseudo-likelihood.

3.2. Quasi-likelihood estimation

A quasi-likelihood estimator can be computed by solving the quasi-score equation Q1(θ) = 0, provided such a solution exists, where


denotes the quasi-score function. It is an optimal estimating function within the class of estimating functions

C1={i=1nai(θ){Yiμi(θ)},whereai(θ)is non-random}

(Godambe, 1985; Heyde, 1997). We shall denote the corresponding estimator by [theta w/ hat]1, and refer to it as a “conventional” estimator because Q1(θ) does not involve conditional moments.

As an alternative, one could define conditional quasi-likelihood estimators using the class of estimating functions


Under certain regularity conditions, an optimal estimating function within An external file that holds a picture, illustration, etc.
Object name is nihms277454ig3.jpg2 will be given by


(Godambe, 1985; Heyde, 1997). In what follows, we denote the quasi-likelihood function and estimator corresponding to the natural sigma-algebra F2 by Q2(θ) and [theta w/ hat]2, and the ones corresponding to F3 by Q3(θ) and [theta w/ hat]3. It is not difficult to see that the quasi-score function Q2(θ) reduces to






Moreover, when K = 1, the quasi-score function simplifies to


Since An external file that holds a picture, illustration, etc.
Object name is nihms277454ig2.jpgθ0{Qa(θ0)} = 0, a = 1, 2, 3, [theta w/ hat]1, [theta w/ hat]2 and [theta w/ hat]3 are expected to be at least weakly consistent under certain regularity conditions (Heyde, 1997). Furthermore, when n → ∞, supi=1,2(...) ni < ∞ and supi,j=1,2(...) tij < ∞, these estimators are also n-consistent and asymptotically Gaussian. It is well-kown that the asymptotic variance-covariance matrix of [theta w/ hat]1, for instance, is given by


(Heyde, 1997). The asymptotic properties of [theta w/ hat]1 are more difficult to derive in the general case when supi=1,2(...) ni → ∞, and they depend on properties of the process. Those of the conditional quasi-likelihood estimators may be easier to derive than those of the conventional estimators by exploiting, for instance, the martingale difference property of Q2(θ) and Q3(θ). We refer to Dion and Keiding (1978) and to Feigin, (1976) for related discussions. Other techniques can be resorted to, including straightforward applications of existing asymptotic results for discrete-time branching processes using an embedded Bienaymé-Galton-Watson process, as proposed by Yanev (1975).

The classes An external file that holds a picture, illustration, etc.
Object name is nihms277454ig3.jpg1 and An external file that holds a picture, illustration, etc.
Object name is nihms277454ig3.jpg2 consist of estimating functions that are linear combinations of the Yij. In principle, classes of nonlinear functions of the data could also be used. For instance, when K = 1, θ could be estimated using the class of quadratic estimating functions


Alternative classes could be easily constructed either by using higher order moments of the data or by combining An external file that holds a picture, illustration, etc.
Object name is nihms277454ig3.jpg2 and An external file that holds a picture, illustration, etc.
Object name is nihms277454ig3.jpg3, for instance. The optimal quadratic estimating equation within An external file that holds a picture, illustration, etc.
Object name is nihms277454ig3.jpg3 is the one for which the weights are given by


These weights require knowledge of the third and fourth order conditional moments of the process. This may render estimation less robust and not particularly precise, in addition to being more complex. The approach becomes justified when the classes An external file that holds a picture, illustration, etc.
Object name is nihms277454ig3.jpg1 and An external file that holds a picture, illustration, etc.
Object name is nihms277454ig3.jpg2 do not allow estimation of all parameters (see section 4.2 on the birth-and-death process for an example).

When F = F2, the quasi-likelihood function involves only first- and second-order moments of the process. While robust with respect to mispecification of higher order moments, the method does not always permit estimation of all parameters. Choosing a different F may help with resolving identifiability issues, but possibly at the expense of robustness. For instance, the birth and the death rates of the linear birth-and-death process cannot be estimated using F = F2, but they become estimable using F = F3 (see section 4.2). As noted in Lemma 1, the conditional expectation of the process depends on the probability of extinction, which the model has to specify correctly for the estimating function to remain unbiased.

3.3. Pseudo- and conditional pseudo-maximum likelihood estimation

As an alternative, estimation can proceed from the method of pseudo-likelihood (Gourieroux, Monfort and Trognon, 1984; Concordet and Nũnez, 2002; Hyrien et al, 2005a, 2005b, 2010; Hyrien, 2007). A (conventional) pseudo-maximum likelihood estimator can be defined as the vector [theta w/ tilde]1 that maximizes the Gaussian pseudo-likelihood function


where |Ωi(θ)| denotes the determinant of the matrix Ωi(θ). When n → ∞, supi=1,2(...) ni < ∞ and supi,j=1,2(...) tij < ∞, we have that [theta w/ tilde]1 converges in probability to θ0, and


where I(θ0) = limn→∞ Varθ0 {n−1[partial differential]P1(θ0)/[partial differential]θ} and J(θ0) = limn→∞ An external file that holds a picture, illustration, etc.
Object name is nihms277454ig2.jpgθ0 {[partial differential]2P1(θ0)/[partial differential]θ[partial differential]θ′}/n (Hyrien, 2007). The asymptotic behavior of [theta w/ tilde]1 is more difficult to study in the general case when ni → ∞, and it depends on properties of the process and of the experimental design.

A wider class of pseudo-maximum likelihood estimators can be constructed by using estimating functions that rely on the quadratic exponential family. Gourieroux, Monfort and Trognon (1984) proposed the pseudo-likelihood function P(θ)=i=1nHi(θ), where


for properly chosen functions Ai(θ), Bi(Yi), Ci(θ) and Di(θ) that involve certain characteristics of the data (such as their first and second order moments). For instance, in the case of the Gaussian pseudo-likelihood, we have Ai(θ) = −μi(θ)′Ωi(θ)−1μi(θ) − log |Ωi(θ)|, Bi(Yi) = 0, Ci(θ) = 2Ωi(θ)−1μi(θ), and Di(θ) = −Ωi(θ)−1. Other pseudo-likelihood functions exist (including Poisson, gamma, multinomial), the choice of which can sometimes be guided by proximity to the actual distribution of the data. For instance, in the case of a pure birth process, it is well-known that the process, when normalized by its expectation, converges to an exponential distribution with parameter 1. Since, for an exponential distribution, when ni = 1 and K = 1, we have Ai(θ) = − log μi(θ), Bi(Yi) = 0, Ci(θ) = 1/μi(θ), and Di(θ) = 0, θ can be estimated by maximizing the (exponential) pseudo-likelihood function


or solving the associated estimating equation [partial differential]Pexp(θ)/[partial differential]θ = 0. It is not difficult to show that An external file that holds a picture, illustration, etc.
Object name is nihms277454ig2.jpgθ0 {[partial differential]Pexp(θ0)/[partial differential]θ} = 0, so that the corresponding estimator is expected to be consistent under certain regularity conditions (Heyde, 1997). One necessary conditions is that θ can be identified based solely on the expectation of the process. Therefore, the choice of a pseudo-likelihood function should also be driven by identifiability considerations.

The above class of pseudo-maximum likelihood estimators can be enriched by considering conditional pseudo-maximum likelihood estimators. For instance, a conditional Gaussian pseudo-maximum likelihood estimator maximizes the conditional (Gaussian) pseudo-likelihood function


The corresponding estimating equations are given by [partial differential]PF(θ)/[partial differential]θ = 0, where


It can be shown without much difficulty that [partial differential]PF(θ)/[partial differential]θ is centered about zero, thereby yielding also consistent estimators. It also defines a sequence of martingale differences when F is a filtration. Let P2(θ) and [theta w/ tilde]2 denote the (Gaussian) pseudo-maximum likelihood function and estimator corresponding to F = F2.

It is worth-noting that PF (θ) and QF (θ) are related through


so that the corresponding pseudo-maximum and quasi-likelihood estimators will be asymptotically equivalent whenever the sum in the right-hand side of equation (9) is asymptotically negligible compared to QF (θ). For instance, when K = 1 and F denotes the natural filtration, [partial differential]P2(θ)/[partial differential]θ simplifies to


Let n = #{i : Yij → ∞, as tij → ∞, i = 1, …, n} denote the number of realizations of the process which increase indefinitely. Suppose that n/n does not converge to zero, that m1(1)(tijti,j1;θ)/θr0, r = 1, …, dim(θ), and that [exists]0 < c < C < ∞ such that infi,j{|tijti,j−1|} > c and supi,j{|tijti,j−1|} < C. Since Yi,j11/2{YijYi,j1m1(1)(tijti,j1;θ)}=Op(1), the above assumptions, together with equation (6) and additional, mild regularity conditions, imply that the sum in the right-hand side of equation (10) may be of smaller order than Q2(θ), and that pseudo-and quasi-likelihood estimating functions will be asymptotically equivalent in such circumstances. In particular, the quasi- and pseudo-maximum likelihood estimators will be asymptotically equivalent whenever the method of quasi-likelihood can estimate consistently the entire vector θ (see section 4.1 on the pure birth process for an example). Partition now θ into θ1 and θ2, and suppose that the quasi-score function Q2(θ) can estimate consistently θ1 but not θ2. One can show that the quasi- and pseudo-maximum likelihood estimators of θ1 will be asymptotically equivalent, but the lack of estimability of θ2 by quasi-likelihood does not necessarily imply that it cannot be estimated using the method of pseudo-likelihood. The birth-and-death process is an example where neither Q1(θ) nor Q2(θ) can estimate both the birth and the death rates, but the Gaussian pseudo-likelihood will (see Section 4.2). Thus the method of Gaussian pseudo-likelihood may be preferable to quasi-likelihood when the latter suffers from non-identifiability issues. These properties should be established rigorously on a case by case basis.

The estimating equation associated with the pseudo-likelihood function P(θ) is a member of the class of estimating functions

CPL={i=1nai(θ)Hi(θ)θ,whereai(θ)is non-random},



The estimating function [partial differential]P(θ)/[partial differential]θ does not usually have any optimality properties (unless it coincides with the score function). An optimal estimating equation within An external file that holds a picture, illustration, etc.
Object name is nihms277454ig3.jpgPL is given by i=1nαi(θ)Hi(θ)/θ, where


(Godambe, 1985). It involves moments of order 3 and 4 whenever [partial differential]Di(θ)/[partial differential]θ ≠ 0. When applied to the pure birth process, the optimal exponential pseudo-likelihood approach will lead to maximum likelihood estimation. Optimal conditional pseudo-likelihood estimators can be constructed similarly.

4. Examples and simulations

4.1. The pure birth process

This first illustration considers the pure birth process (for which K = 1) where every cell divides into two new cells upon completion of its lifespan with probability one. Let λ = 1/θ denote the expected lifespan. It is well known that the marginal expectation and covariance function of the process Z(t) are respectively given by m(t; θ) = eθt and v(t, u; θ) = eθt (eθu − 1), for every tu ≥ 0 (Athreya and Ney, 1972). Therefore μi(θ) = (eti1θ, …, etiniθ)′, and


The conditional expectation and covariance function of the process are given by m(t[mid ]u; θ) = Z(u)eθ(tu) and v(t[mid ]u; θ) = Z(u)eθ(tu) (eθ(tu) − 1) respectively. Although the maximum likelihood estimator is easily computed here, it is of interest to investigate the behavior and performance of quasi- and pseudo-maximum likelihood estimators. The associated quasi-score functions are




When the times of observation are equidistant, with Δ = tijti,j−1, it is not difficult to show that the conditional quasi-likelihood estimator solving the equation Q2(θ) = 0 is explicitly given by


which is also the maximum likelihood estimator (Keiding, 1974). The above expression for [theta w/ hat]2 does not hold true when the sampling times are no longer equidistant, but the quasi-likelihood estimator [theta w/ hat]2 will still coincide with the maximum likelihood estimator. To prove this, notice that the log-likelihood function takes the form


where f(·, ·) denotes a function that does not dependent on θ (Kendall, 1949; Keiding, 1974). The maximum likelihood estimator [theta w/ hat]mle is defined as the solution to the score equation [partial differential]L(θ)/[partial differential]θ = 0, where


and it is therefore identical to [theta w/ hat]2 even when the sampling times are not equidistant. Another way of proving this result is by first noticing from equation (14) that the conditional distribution of Z(t) belongs to the linear exponential family, and by concluding using Theorem 2 in Wedderburn (1974) that [theta w/ hat]2=[theta w/ hat]mle. Proposition 1 states another property of [theta w/ hat]1 and [theta w/ hat]2 that holds true whether or not the sampling times are equidistant (See Appendix 1 for a proof.).

Proposition 1

For the pure birth process, we have[theta w/ hat]1 = [theta w/ hat]2 = [theta w/ hat]mle.

Write [theta w/ hat] in place of [theta w/ hat]1 and [theta w/ hat]2. Asymptotic properties for [theta w/ hat] when a single realization of the process is observed at an increasing number of equidistant time points (n = 1, t1j = jΔ, and n1 → ∞) follows directly from work by Dion (1974), Keiding (1974), Yanev (1975) and Heyde (1975). In particular, [theta w/ hat] is strongly consistent and asymptotically Gaussian when normalized by j=1n1Y1j. Specifically, we have that


(e.g., Keiding (1974), Theorem 5.2). When one observes an increasing number of realizations at uniformly bounded numbers of equidistant time points (n → ∞, supi=1,2(...) ni = R < ∞, and ti,j = jΔ), one can show that


where πr, r = 1, …, R, denotes the limiting proportion of realizations of the process observed at exactly ni = r time points (see Appendix 2 for a proof). When R = 1, the asymptotic variance of [theta w/ hat] in (16) reduces to that in (15), but the rates of convergence of the estimator remain different in the two cases.

The conventional pseudo-likelihood function P1(θ) is constructed from the expressions given above for μi(θ) and Ωi(θ). When each realization of the process is observed at a single time point, P1(θ) simplifies to


The conditional Gaussian pseudo-likelihood function using the collection of sigma-algebras F2 is given by


The asymptotic properties of the pseudo-maximum likelihood estimators depend on the sampling schema. This is conveniently illustrated using [theta w/ tilde]2. In the case where n → ∞, supi=1,2(...) ni < ∞ and Δ is fixed, we have that [theta w/ tilde]2 converges to θ0 in probability, and n(θ2θ0) is asymptotically normal, centered about zero, and with variance that, although cumbersome, can be computed explicitly. The proof of the asymptotic normality relies on the independence of the Yi, i = 1, (...), n and a Central Limit Theorem for i.i.d. r.v.s (e.g., see Van der Vaart, 2007). The case where a single realization of the process is observed at an increasing number of equidistant time points (n = 1, t1j = jΔ and n1 → ∞) is treated in the following Proposition.

Proposition 2

Suppose n = 1 and t1j = Δj, j = 1, 2, …, n1. For the pure birth process, the conditional Gaussian pseudo-maximum and quasi-likelihood estimators[theta w/ tilde]2 and[theta w/ hat]2 are asymptotically equivalent when n1 → ∞. Specifically, we have that




and [theta w/ tilde]2 is asymptotically fully efficient.


The pseudo-maximum likelihood estimator [theta w/ tilde]2 is defined as a solution to the equation [partial differential]P2(θ)/[partial differential]θ = 0, where


where g(θ, Δ) = Δ(2eθΔ − 1)/eθΔ(eθΔ − 1)2, and where h(θ, Δ) = Δ + ΔeθΔ/(eθΔ − 1). Let W ~ exp(1) denote the almost sure limit of Y1,n1/en1Δθ0 as n1 → ∞. It is well known that j=1n1Y1j/j=1n1ejΔθ0a.s.W (Heyde, 1970; Dion, 1974). Therefore, we have


Since (Y1jY1,j1eθΔ)/Y1,j1=Op(1) as j → ∞, and j=1n1ejΔθ0=eΔθ0(en1Δθ01)/(eΔθ01)en1Δθ0, we deduce, as n1 → ∞, that


and that


with the convergence holding uniformly in θ. Since θ0 is the unique root of the right-hand side of equation (19), we deduce the convergence in probability of [theta w/ tilde]2 to θ0.

To show that [theta w/ tilde]2 is asymptotically equivalent to [theta w/ tilde]2, we will consider the asymptotic expansion


We will prove that the above expansion is equivalent, up to an op(1), to the asymptotic expansion for [theta w/ hat]2


Notice first that the second order derivative of P2(θ) is given by




It follows from (18) that


We also have




Using similar arguments applied to (17), we can also show that


We deduce from (17), (21) and (22) that


Proposition 2 follows from (15).

We deduce also from Propositions 1 and 2 that [theta w/ tilde]2 is asymptotically equivalent to the maximum likelihood estimator. The conditional pseudo-maximum and quasi-likelihood estimators will no longer be equivalent when supi,j=1,2(...) tij < ∞, even asymptotically when n → ∞. We deduce from equation (7) that the (conventional) exponential pseudo-likelihood function, when independent realizations of the process are observed at a single time point each (ni = 1), takes the form


The associated estimating equation is given by


where hi(θ) = ti1(Yi1eθti1 − 1). Since Eθ{hi(θ)/θ}=ti12 and Varθ{hi(θ)}=ti12eθti1(eθti11), the optimal estimating equation constructed from an exponential pseudo-likelihood function is


which coincides with the quasi-likelihood function Q1(θ) in this particular case.

Using simulations we compared the performance of the above estimators under a longitudinal design where the process was observed daily from day 1 to day 4. The true value of the mean time to division was 20 = 1/θ0. As shown in Table 1, all estimators improved as the sample size increases. Quasi-likelihood estimation outperformed the Gaussian pseudo-likelihood estimation, as expected since quasi-likelihood is equivalent to maximum likelihood here, but moderately only.

Table 1
Means, standard errors (se), and mean squared errors (mse) for three estimators of the mean lifespan for a pure birth process: quasi-likelihood (QLE), Gaussian pseudo-maximum likelihood (PLE), and conditional Gaussian pseudo-maximum likelihood (CPLE) ...

4.2. The linear birth-and-death process

A variety of estimators can be constructed using the methods of quasi- and pseudo-likelihood applied to a linear birth-and-death process. Under this model, every cell, upon completion of its lifespan, either divides into two new cells with probability p = ϕ/(ϕ + ρ) or dies (that is, disappears) with probability 1 − p = ρ/(ϕ + ρ). The length of the lifespan is assumed to follow an exponential distribution with parameter ϕ + ρ. The parameters ϕ and ρ are nonnegative and referred to as the birth and the death rates. Our objective is to estimate θ = (ϕ, ρ)′.

The p.g.f. of the process takes the form


(Athreya and Ney, 1972) where the sub- and superscripts of Section 2.2 have been dropped to ease notation. The maximum likelihood estimator can be constructed using the probability distribution as given by Darwin (1956). Although the likelihood function takes an explicit form, we found that the calculation of the maximum likelihood estimator became numerically challenging as the size of the process got large, thereby warranting alternative approaches to be investigated.

Write ζ = ϕ+ρ and δ = ϕρ. Under this parameterization, the expectation of the process is given by m(t; ζ, δ) = e, while its covariance, for every tu ≥ 0, is


When δ ≠ 0, since [partial differential]m(t; ζ, δ)/[partial differential]ζ = 0 and [partial differential]m(t; ζ, δ)/[partial differential]δ = te, the quasi-score function Q1(ζ, δ) can be expressed as




The solution to Q1(ζ, δ) = 0, therefore, will not be unique, when it exists. Using Q2(ζ, δ) instead of Q1(ζ, δ) does not solve the problem, so that both estimating equations fail to estimate the birth and the death rates. We can however estimate their difference δ. Denoting by [delta with circumflex]1 and [delta with circumflex]2 the corresponding quasi-likelihood estimators, we have:

Proposition 3

For the birth-and-death process, we have that[delta with circumflex]1 = [delta with circumflex]2. Furthermore, when the sampling times are equidistant, we have that[delta with circumflex]1 = [delta with circumflex]2 = [delta with circumflex]mle, where


where Δ = tijti,j−1.


The proof of Proposition 3 follows directly from results of Section 4.1 and by noticing that both quasi-score equations Q1(ζ, δ) and Q2(ζ, δ) can be equivalently expressed in terms of the quasi-score functions given in (11) and (12). The estimator [delta with circumflex]mle in equation (24) is the maximum likelihood estimators for δ (Keiding, 1975).

To remediate the problem of non-estimability of ζ when using Q1(ζ, δ) and Q2(ζ, δ), we propose two alternative solutions. The idea behind the first solution is to construct a quasi-likelihood function where one conditions on sigma-algebras Fij yielding conditional expectations An external file that holds a picture, illustration, etc.
Object name is nihms277454ig2.jpgθ{Z(tij)[mid ]Fij} such that the first order partial derivatives [partial differential] An external file that holds a picture, illustration, etc.
Object name is nihms277454ig2.jpgθ{Z(tij)[mid ]Fij}/[partial differential]ζ and [partial differential] An external file that holds a picture, illustration, etc.
Object name is nihms277454ig2.jpgθ{Z(tij)[mid ]Fij}/[partial differential]δ are not linearly dependent. One such example is provided when Fij is the sigma-algebra generated by the event of non-extinction {Yij ≠ 0}. The probability that the process becomes extinct before time t is given by


Writing Δij = tijti,j−1, we deduce from Lemma 1 that




If δ were known, an optimal estimating equation within An external file that holds a picture, illustration, etc.
Object name is nihms277454ig3.jpg2 for ζ would then take the form


The solution to the quasi-score equation Q3(ζ) = 0 does not appear to have any explicit expression, but can be approximated numerically. Since an optimal estimator for δ is readily available, we propose to replace δ by [delta with circumflex] when solving the above equation.

An alternative remedial measure to estimate both the birth and the death rates is to use an estimating equation that is a quadratic function of the data. Assuming again that δ is known, candidate estimating equations for estimating ζ would have the general form


where αij(ζ, δ) are given weights. Set αij(ζ, δ) = 1. When the sampling times are equidistant, the above equation yields the following estimator for ζ when δ is replaced by [delta with circumflex]:


The optimal weight αij(ζ, δ) is given by




The expression for the conditional variance Varθ[(Z(tij) − An external file that holds a picture, illustration, etc.
Object name is nihms277454ig2.jpgθ{Z(ti,j)[mid ]Z(ti,j−1)})2[mid ]Z(ti,j−1)] is given in Appendix 3. The associated quasi-likelihood estimator does not seem to have any explicit expression, but it is easily computed numerically. When the variance is replaced by 1, the above equation yields the following estimator for ζ when δ is replaced by [delta with circumflex]:


The conventional Gaussian pseudo-maximum likelihood estimator ([phi with tilde]1, [rho with tilde]1) maximizes the pseudo-likelihood function P1(θ). To show that this approach can estimate both ϕ and ρ, assume first that observations are collected at a single time point t. Notice that the pseudo-likelihood can be reparameterized as P(m,σ2)=nlogσ2i=1n(Yim)2/σ2, where m = et(ϕρ) and σ2 = (ϕ + ρ)et(ϕρ){et(ϕρ) −1}/(ϕρ). The function P(m, σ2) admits a unique maximum for P(m, σ2) attained at m¯t=i=1nYi/n and σ¯t2=i=1n(Yim¯t)2/n. A necessary and sufficient condition for ϕ and ρ to have a unique pseudo-maximum likelihood estimator is therefore that the system


admits a single solution. Since mt is always positive in the present context, it is easy to show that the pseudo-maximum likelihood estimators of ϕ and ρ always exist and are given by


This establishes estimability of ϕ and ρ by Gaussian pseudo-likelihood. Notice that, in equation (27), σ¯t2 could be replaced by the unbiased estimator nσ¯t2/(n1). Moreover, since σ¯t2<m¯t(m¯t1) cannotbe ruled out in finite samples, [rho with tilde](t) may be negative, and a proper estimator is [rho with tilde]+(t) = [rho with tilde](t) [logical or] 0.

When the experimental design includes multiple time points, t1, …, tm, both ϕ and ρ can also be estimated by weighted averages of the [phi with tilde](tk) and ρ(tk):ϕπ=k=1mπkϕ(tk) and ρπ=k=1mπkρ(tk), where the πk's are fixed positive weights with k=1mπk=1. In doing so, we avoid maximizing the pseudo-likelihood function using optimization algorithms. A simple choice for these weights is πk = 1/m, k = 1, …, m, yielding


A third estimator for ϕ and ρ is constructed using the maximum likelihood estimator of δ, as given in equation (24), which we plug into P(ϕ, ϕδ), and the birth rate is estimated by maximizing P(ϕ, ϕ[delta with circumflex]1) with respect to ϕ. The conditional Gaussian pseudo-likelihood ([theta w/ tilde]2 and [theta w/ tilde]3) estimators can also be computed as presented previously.

We conducted a simulation study to evaluate the performance of the various estimators constructed above. The design used for these simulations was identical to the one considered for the pure birth process. Table 2 reports a summary of the results. The simulations suggest that the maximum likelihood estimator was the best of all. The second best estimator was the conditional Gaussian pseudo-likelihood estimator, which outperformed all quasi-likelihood estimators. While conditioning on the event of non-extinction did resolve the identifiability issue, the resulting estimator did not perform particularly well. Additional simulations (not shown) were carried out where the realizations of the process was observed over longer time periods. It is worth noting that we found maximum likelihood estimation numerically unstable when the size of the population grew too large, making numerical evaluation of the log-likelihood function challenging. No such problem occurred with the methods of pseudo- and quasi-likelihood, in which case they become particularly attractive.

Table 2
Means, standard errors (se), and mean squared errors (mse) for various estimators of the birth and the death rates of a linear birth-and-death process: quasi-likelihood (QLE), conditional Gaussian pseudo-maximum likelihood (CPLE), conditional least square ...

4.3. A two-type process

We now consider a two-type process (K = 2) that extends the linear birth-and-death process. It begins with a single cell of type 1 and age zero. Upon completion of its lifespan, every type-1 cell either divides into two new cells of type 1 with probability p, or it produces a single type-2 cell, with probability 1 −p. The progeny vector ξ1 of type-1 cells is a 1 × 2 random vector with support An external file that holds a picture, illustration, etc.
Object name is nihms277454ig1.jpg1 = {(2, 0), (0, 1)}. The time to division of any type-1 cell is described by an exponentially distributed random variable with parameter λ1 = λ1,(2, 0), whereas the time to differentiation of any type-1 cell follows an exponential distribution with parameter λ2 = λ1,(0, 1). To simplify the notations, we will denote their respective cumulative distribution functions by G1(t) = F1(t, (2, 0)) and G2(t) = F1(t, (0, 1)). In contrast, any type-2 cell has an infinite lifespan. Write θ = (p, λ1, λ2)′ to denote the vector of unknown model parameters.

Let Z1 (t) and Z2 (t) denote the number of type-1 and the number of type-2 cells, respectively, at time t ≥ 0, and set Z(t) = {Z1(t), Z2(t)′. Let m1(t) and m2(t) denote the expectations of Z1(t) and Z2(t), and v11(t), v22(t) and v12(t) their respective variances and covariance.

It follows from equation (1) that the conditional p.g.f. H(1)(s; t) satisfies


and H(2)(s; t) = s2. Substituting this equation into (28) yields


Differentiating with respect to s at s = 1 yields




By solving the above renewal equations (Athreya and Ney, 1972), we deduce expressions for the expected numbers of type-1 and type-2 cells started from a type-1 cell:




Clearly, G1l(t) is the cumulative distribution function of an Erlang distribution with parameters l and λ1. When λ1λ2, an expression for G1lG2(t) can be computed explicitly


and m1(t) and m2(t) can be computed explicitly.

To derive expressions for the variances and covariance of the process, we first compute the partial second order derivatives of H(1)(s; t) with respect to s at s = 1. It follows from equation (3) that for all i, j = 1, 2,




We deduce that


The expectations m1(tτ) and m2(tτ) in right-hand side of the above identity can be computed explicitly using expressions given before. The integrals, however, have quite tedious expressions, but they can be easily approximated using methods of numerical integration such as the Simpson or the Lobatto methods (Carnahan, Luther, Wilkes, 1969). The variances and covariance can then be deduced from the above expressions together with (2) in Lemma 1.

Define the first order partial derivatives [partial differential]m(t)/[partial differential]θ = {[partial differential]m1(t)/[partial differential]θ, [partial differential]m2(t)/[partial differential]θ}, where


The entries of [partial differential]m(t, θ)/[partial differential]θ are given by




The partial first order derivatives in the above quantities can be computed explicitly: (for l ≥ 1)






if i = j, and it is identical to zero otherwise.

We conducted a simulation study to compare the quasi- and pseudo-maximum likelihood estimators, for the two-type process. The study design was identical to that used in the previous simulation studies, except that every realization of the process was observed at a single time point (ni = 1). Both quasi-likelihood approaches are therefore identical here. The results are shown in Table 3.

Table 3
Means, standard errors (se), and mean squared errors (mse) for two estimators of the parameters for a two-type process: quasi-likelihood (QLE) and Gaussian pseudo-maximum likelihood (PLE) estimators.

Unlike the simulation study on the pure birth process, this one indicates that both estimators perform similarly in the setting under consideration. The standard errors of either estimators for all three parameters decrease as the sample size increases. The estimator of the probability of division is essentially unbiased. The estimators of the mean time to division and of the mean time to differentiation are slightly biased, but these biases remain reasonable and decrease with increasing sample size. Both estimation methods are able to estimate the parameters of the lifespan distributions, even though these distributions are dissimilar.

5. Concluding remarks

We have investigated quasi- and pseudo-maximum likelihood estimation for continuous-time Markov branching processes. These methods offer generally applicable approaches to statistical inference, and they are particularly useful in situations where maximum likelihood estimation is difficult. Both approaches allow conventional and conditional inference, and can lead to optimal estimation by applying the theory of estimating equations (Godambe, 1895). When relying on first and second order moments of the process only, both methods will generally be robust with respect to mispecifications of higher order moments. Nonetheless, they sometimes lead to maximum likelihood estimation. Although the methods of quasi- and pseudo-likelihood are generally distinct, they may result in equivalent estimators under certain conditions (e.g., we have shown that they can be asymptotically equivalent on the set of non-extinction). When using linear combinations of the data only, quasi-likelihood estimators may be more prone to non-identifiability issues than Gaussian pseudo-likelihood estimators. Two mitigating measures were identified, including the use of (i) conditional quasi-likelihood functions where conditioning is done on appropriate sigma-algebras (such as the one generated by the event of non extinction), and (ii) quasi-likelihood functions involving non-linear functions of the data (e.g., quadratic estimating equations). We found that both options resulted in estimators with poor performance. In contrast, the conditional Gaussian pseudo-maximum likelihood estimator offered a better alternative that not only mitigated the non-identifiability issue but also demonstrated better properties in simulation studies. Direct derivation of maximum likelihood estimators for markov branching processes from likelihood function may be difficult. Keiding (1975) derived an expression for the maximum likelihood estimator for the difference of the birth and the death rates of a birth-and-death process indirectly using a modified sampling schema. In contrast, quasi-likelihood made it straightforward. Generally speaking, quasi- and maximum likelihood estimators are identical when the true distribution of the process belongs to the linear exponential family (Wedderburn, 1974). This result applies to the pure birth process, but not necessarily for more complex processes such as the birth-and-death process. Although not optimal, we found that pseudo-likelihood estimators can sometimes perform better than quasi-likelihood estimators.


This research was supported by grants R01 NS39511, R01 CA134839, R01 AI069351, and UL1 RR024160 from the National Institutes of Health (NIH).

Appendix 1

Proof of Proposition 1

Without loss of generality, assume n = 1, and write tj for t1j and Yj for Y1j. Define Y = (Y1, …, Yn1)′ and the diagonal matrix Y* = diag{1, Y1, …, Yn1−1}. Let t = (t1, …, tn1)′ and Δt = (Δt1, …, Δtn1)′, where Δtj = tjtj−1 for j = 1, …,n1. Define the vector M (t; θ) = {m1(t1; θ), …, m1(tn1; θ)}′ and the n1 × n1 diagonal matrix D(θ) = diag {vtj, Δtj; θ)}j=1, …,n1.

The first quasi-score function defined in (5) takes the form


while the second one is given by


Define an n1 × n1 matrix


It is easy to derive that YY*Mt; θ) = Ψ(θ){YM(t; θ)} by direct computation. Therefore, Q2(θ) can be transformed into


In order to prove that Q1(θ) = Q2(θ), it suffices to show that


Computing the i-th entry of the right-hand side of (29) yields


which is the i-th entry of [partial differential]M(t; θ)′/[partial differential]θ. [diamond with plus]

Appendix 2

Proof of Asymptotic normality (16)

For the pure birth process observed at an increasing number of realizations at uniformly bounded numbers of equidistant time points (n → ∞, supini = R < ∞, and tij = jΔ), we have that


where Nr denotes the number of realizations observed at r equidistant time points, such that r=1RNr=n. Since πr = limn→∞ Nr/n, we deduce that


For r = 1, …, R, by invoking the independence of the Yi = (Yi1, …, Yi,ni)′ and a Central Limit Theorem (Chow and Teicher, 2003) we obtain that




Write Cj(θ0) = Y1jY1,j−1m(Δ; θ0). Clearly, Cj(θ0) defines a martingale difference, so that An external file that holds a picture, illustration, etc.
Object name is nihms277454ig2.jpgθ0{Cj(θ0)} = 0 and covθ0 {Cj1 (θ0),Cj2 (θ0)} = 0 for j1j2. Therefore


Since = m(jΔ; θ) = ejΔθ = (eΔθ)j = m(Δ; θ)j, the expression for Vr(θ0) becomes


Therefore, we deduce that


Consider next the first order derivative


Notice that


Using a Law of Large Numbers (Chow and Teicher, 2003), we deduce that


Therefore, we have n(θ^θ0)DN(0,(θ0)), where


Substituting m(Δ; θ) = eΔθ, m′(Δ; θ) = ΔeΔθ, and ν(Δ, Δ; θ) = eΔθ(eΔθ − 1) in the expression for Σ(θ0), we finally obtain that


Appendix 3

Conditional second order moments of the birth-and-death process

Straightforward calculations show that




and where the Mkij), k = 1,2,3,4, are given in Appendix 4.

Appendix 4

Moments of the birth-and-death process

Let M(s; t) = An external file that holds a picture, illustration, etc.
Object name is nihms277454ig2.jpg (esZ(t)) denote the moment generating function of the process Z(t). Put A = A(s, t) = et(ρϕ)(ϕesρ) − ρ(es − 1) and B = B(s, t) = et(ρϕ)(ϕesρ) − ϕ(es − 1), and notice that M(s; t) = A/B when ρϕ (Allen, 2003). We also have the following useful identities: A′ = [partial differential]A(s; t)/[partial differential]s = et(ρϕ)(ϕsρ)es, B′ = [partial differential]B(s; t)/[partial differential]s = et(ρϕ)(ϕsϕ)es, A(k) = [partial differential]kA(s; t)/[partial differential]sk = A′, and B(k) = [partial differential]kB(s; t)/[partial differential]sk = B′. Write M(k)(s, t) = [partial differential]kM(s; t)/[partial differential]sk. A little algebra shows that


from which we deduce the (non-central) moments Mk(t) = An external file that holds a picture, illustration, etc.
Object name is nihms277454ig2.jpg{Zk(t)}, k = 1,2,3,4, by setting s = 0.


Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errorsmaybe discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.


  • Allen LJS. An Introduction to Stochastic Processes with Biology Applications. Prentice Hall; 2003.
  • Athreya KB, Ney PE. Branching Processes. Springer-Verlag; Berlin: 1972.
  • Becker NG, Hasofer AM. Estimation in epidemics with incomplete observations. J R Statist Soc B. 1997;59:415–430.
  • Boucher K, Yakovlev AY, Mayer-Pröschel M, Noble M. A stochastic model of temporarily regulated generation of oligodendrocytes in vitro. Math Biosci. 1999;159:47–78. [PubMed]
  • Boucher K, Zorin A, Yakovlev AY, Mayer-Pröschel M, Noble M. An alternative stochastic model of generation of oligodendrocytes in cell culture. J Math Biol. 2001;43:22–36. [PubMed]
  • Carnahan B, Luther HA, Wilkes JO. Applied Numerical Methods. John Wiley and Sons; New York: 1969.
  • Chen R, Hyrien O, Mayer-Proschel M, Noble M. A composite likelihood approach to the analysis of longitudinal clonal data under an age-dependent branching process. Biostatistics. 2010 to appear. [PMC free article] [PubMed]
  • Chow YS, Teicher H. Probability Theory: Independence, Interchangeability, Martingales. Springer; New York: 2003.
  • von Collani E, Tsodikov A, Yakovlev A, Mayer-Pröschel M, Noble M. A random walk model of oligodendrocyte generation in vitro and associated estimation problems. Math Biosci. 1999;159:189. [PubMed]
  • Concordet D, Nuñez O. A simulated pseudo-maximum likelihood estimator for nonlinear mixed models. Computational Statistics and Data Analysis. 2002;39:187–201.
  • Darwin JH. The behavior of an estimator for a simple birth-and-death process. Biometrika. 1956;43:23–31.
  • Dion JP, Keiding N. Statistical inference in branching processes. In: Joffe A, Ney PE, editors. Branching Processes. Dekker; New York: 1978. pp. 105–140.
  • Dion JP, Yanev NM. Statistical inference for branching processes with an increasing random number of ancestors. J Statist Planning Inf. 1994;39:329–359.
  • Godambe VP. The foundations of finite sample estimation in stochastic processes. Biometrika. 1985;72:419–428.
  • Gourieroux C, Monfort A, Trognon A. Pseudo maximum likelihood methods: theory. Econometrica. 1984;52(3):681–700.
  • Guttorp P. Statistical Inference for Branching Processes. John Wiley and Sons; New York: 1991.
  • Haccou P, Jagers P, Vatutin VA. Branching Processes: Variation, Growth and Extinction of Populations. Cambridge University Press; Cambridge: 2005.
  • Heyde CC. Extension of a result of Seneta for the supercritical Galton-Watson process. Ann Math Statist. 1970;41:739–742.
  • Heyde CC. Remarks on efficiency in estimation for branching processes. Biometrika. 1975;62:49–55.
  • Heyde CC. Quasi-Likelihood and Its Applications A General Approach to Optimal Parameter Estimation. Springer-Verlag; New-York.: 1997.
  • Hoel DG, Crump KS. Estimating the generation-time distribution of an age-dependent branching process. Biometrics. 1974;30:125–135. [PubMed]
  • Hyrien O. A pseudo maximum likelihood estimator for discretely observed multi-type Bellman-Harris branching processes. J Statist Planning Inf. 2007;137:1375–1388.
  • Hyrien O, Ambeskovic I, Mayer-Pröschel M, Noble M, Yakovlev A. Stochastic Modeling of oligodendrocyte generation in cell culture: model validation with time-lapse data. Theor Biol Medical Model. 2006;3:21. [PMC free article] [PubMed]
  • Hyrien O, Chen R, Mayer-Pröschel M, Noble M. Saddlepoint approximations to the moments of multi-type age-dependent branching processes, with applications. Biometrics. 2010a;66:567–577. [PMC free article] [PubMed]
  • Hyrien O, Chen R, Zand MS. An age-dependent branching process model for the analysis of CFSE-labeling experiments. Biology Direct. 2010b;5:41. [PMC free article] [PubMed]
  • Hyrien O, Dietrich J, Noble M. Mathematical and experimental approaches to identify and predict the effects of chemotherapy on neuroglial precursors. Cancer Research. 2010c to appear. [PMC free article] [PubMed]
  • Hyrien O, Mayer-Pröschel M, Noble M, Yakovlev A. A stochastic model to analyze clonal data on multi-type cell populations. Biometrics. 2005a;61:199–207. [PubMed]
  • Hyrien O, Mayer-Pröschel M, Noble M, Yakovlev A. Estimating the life-span of oligodendrocytes from clonal data on their development in cell culture. Math Biosci. 2005b;193:255–274. [PubMed]
  • Hyrien O, Zand MS. A mixture model with dependent observations for the analysis of CFSE-labeling experiments. J Am Statist Ass. 2008;103:222–239.
  • Jacob C. Conditional least squares estimation in nonstationary nonlinear stochastic regression models. Ann Statist. 2010;38:566–597.
  • Jagers P. Branching Processes with Biological Applications. John Wiley and Sons; London: 1975.
  • Johnson NI, Kotz S. Continuous Univariate Distributions I. Houghton Miffin Company; Boston: 1970.
  • Keiding N. Estimation in the birth process. Biometrika. 1974;61:71–80.
  • Keiding N. Maximum likelihood Estimation in the birth-and-death process. Ann Statist. 1975;2:363–372.
  • Kendall DG. Stochastic processes and population growth. J R Statist Soc B. 1949;11:230–264. with discussion.
  • Kimmel M, Axelrod DE. Branching Processes in Biology. Springer; New York: 2002.
  • Klimko LA, Nelson PI. On conditional least squares estimation for stochastic processes. Ann Statist. 1978;6:629–642.
  • Nedelman J, Downs H, Pharr P. Inference on an age-dependent, multitype branching-process model of mast cells. J Math Biol. 1985;25:203–226. [PubMed]
  • Oh C, Severo NC, Slivka J. Approximations to the maximum likelihood estimate in some pure birth processes. Biometrika. 1991;78:295–299.
  • Sevast'yanov BA. Branching Processes. MIR; Moscow: 1971. in Russian.
  • Sevast'yanov BA, Chistyakov VP. Multidimensional renewal equations and moments of branching processes. Theor Probab Appl. 1971;16:199–214.
  • Wedderburn RWM. Quasi-likelihood functions, generalized linear models, and the Gauss-Newton method. Biometrika. 1974;61:439–447.
  • Yakovlev AY, Boucher K, Mayer-Pröschel M, Noble M. Quantitative insight into proliferation and differentiation of oligodendrocyte-type 2 astrocyte progenitor cells in vitro. Proc Natl Acad Sci USA. 1998a;95:14164–14167. 4. [PubMed]
  • Yakovlev AY, Mayer-Pröschel M, Noble M. A stochastic model of brain cell differentiation in tissue culture. J Math Biol. 1998b;37:49–60. [PubMed]
  • Yakovlev AY, Mayer-Pröschel M, Noble M. Stochastic formulations of a clock model for temporally regulated generation of oligodendrocytes in vitro. Math Comput Model. 2000;32:125–137.
  • Yakovlev AY, Yanev NM. Transient Processes in Cell Proliferation Kinetics. Springer; Heidelberg: 1989.
  • Yakovlev AY, Yanev NM. Relative frequencies in multitype branching processes. Ann Appl Probab. 2009;19:1–14.
  • Yanev NM. On the statistics of branching processes. Theory Prob Appl. 1975;20:612–622.
  • Yanev NM. Statistical inference for branching processes. In: Ahsanullah M, Yanev GP, editors. Records and Branching Processes. Nova Science Publishers Inc.; New York: 2008.
  • Zorin AA, Yakovlev AY, Mayer-Pröschel M, Noble M. Estimation problems associated with stochastic modeling of proliferation and differentiation of O-2A progenitor cells in vitro. Math Biosci. 2000;167:109–121. [PubMed]