Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC3086408

Formats

Article sections

- Abstract
- 1. Introduction
- 2. A multi-type continuous-time Markov branching process
- 3. Estimation
- 4. Examples and simulations
- 5. Concluding remarks
- References

Authors

Related links

J Stat Plan Inference. Author manuscript; available in PMC 2012 July 1.

Published in final edited form as:

J Stat Plan Inference. 2011 July 1; 141(7): 2209–2227.

doi: 10.1016/j.jspi.2011.01.016PMCID: PMC3086408

NIHMSID: NIHMS277454

See other articles in PMC that cite the published article.

This article deals with quasi- and pseudo-likelihood estimation in a class of continuous-time multi-type Markov branching processes observed at discrete points in time. “Conventional” and conditional estimation are discussed for both approaches. We compare their properties and identify situations where they lead to asymptotically equivalent estimators. Both approaches possess robustness properties, and coincide with maximum likelihood estimation in some cases. Quasi-likelihood functions involving only linear combinations of the data may be unable to estimate all model parameters. Remedial measures exist, including the resort either to non-linear functions of the data or to conditioning the moments on appropriate sigma-algebras. The method of pseudo-likelihood may also resolve this issue. We investigate the properties of these approaches in three examples: the pure birth process, the linear birth-and-death process, and a two-type process that generalizes the previous two examples. Simulations studies are conducted to evaluate performance in finite samples.

A vast body of literature is concerned with problems of statistical inference for branching processes. Many publications have investigated the Bienaymé-Galton-Watson process (see Guttorp (1991), Yanev (2008) and references therein, for instance). Much fewer have considered continuous-time branching processes, especially when the process is partially observed.

Kendall (1949), Darwin (1956), and Keiding (1974, 1975) investigated maximum likelihood estimation for the pure birth and the linear birth-and-death processes. Oh, Severo and Slivka (1991) proposed approximate maximum likelihood estimators for a class of pure birth processes. Becker and Hasofer (1997) investigated estimation for the linear birth-and-death process under a sampling schema slightly different than the one considered herein. Nedelman, Downs and Pharr (1985) constructed maximum likelihood estimators for a multi-type Bellman-Harris branching process model of the proliferation of mast cells. Maximum likelihood estimation remains, however, generally unattainable for multi-type Bellman-Harris branching processes. Hoel and Crump (1974) considered moment-based estimation using asymptotic approximations to the moments of the process. Boucher et al (1999) used a least squares estimator for a multi-type Bellman-Harris branching process describing the generation of oligodendrocytes *in vitro*. More recently, Monte Carlo approaches have been proposed for the multi-type Bellman-Harris process and one of its extensions. Zorin et al (2000) constructed a simulated maximum likelihood estimator using a stochastic approximation algorithm. Hyrien (2007) and Hyrien et al (2005a, 2005b, 2010a, 2010c) considered a pseudo-likelihood approach solely based on the first two moments of the process. Quasi-likelihood estimation was considered by Hyrien et al (2010). Chen et al (2010) investigated composite likelihood estimators. Hyrien and Zand (2008) and Hyrien, Chen and Zand (2010) proposed a method combining a mixture model and composite likelihood for CFSE-labeling data.

This paper deals with the methods of quasi- and pseudo-likelihood for discretely observed continuous-time Markov branching processes. Maximum likelihood estimation will generally outperform these alternative approaches, except when they coincide. These two methods are still worthy of consideration because maximum likelihood estimation may sometime be difficult to carry out, either because an expression for the likelihood function cannot be obtained in closed-form (this is particularly true for age-dependent branching processes) or because evaluation of the likelihood function is numerically challenging. Although Markov processes do not always provide realistic models, investigation in this setting remains of interest for several reasons, including: they provide useful tools for preliminary investigation of complex phenomena; explicit results can sometimes be obtained, thereby allowing potential pitfalls to be more easily identified. The methods of quasi- and pseudo-likelihood have been extensively studied in various contexts. These methods may give strongly consistent and asymptotically normal estimators under appropriate, often mild, regularity conditions. We refer to Heyde (1997) for an overview of quasi-likelihood, and to Gourieroux, Monfort and Trognon (1984) for a presentation of pseudo likelihood estimation. Heyde and Lin (1992) proposed quasi-likelihood estimators for a Bienaymé-Galton-Watson process with immigration. Klimko and Nelson (1978) and Jacob (2010) considered conditional least squares for general classes of stochastic processes including discrete time branching processes. Hyrien (2007) investigated a pseudo-maximum likelihood estimator for the multi-type Bellman-Harris branching process (see also Hyrien et al (2005a, 2005b, Hyrien et al (2010)).

The objective of this paper is to outline the main properties of quasi- and pseudo-likelihood estimation. Both methods rest solely on moments of the process, which can be computed with relative ease. They allow “conventional” and conditional inference, and offer reasonable alternatives to maximum likelihood estimation. Although quasi- and pseudo-maximum likelihood estimators are generally different, we show that they may be asymptotically equivalent on the set of non-extinction. We discuss possible connections with maximum likelihood estimation. In the case of quasi-likelihood for the pure birth process, this is accomplished by using the fact that the distribution of the process belongs to the linear exponential family. Unlike quasi-likelihood, the method of pseudo-likelihood as originally proposed by Gourieroux, Monfort and Trognon (1984) has not been shown to enjoy any optimality properties. We construct optimal pseudo-maximum likelihood estimators using the theory of estimating equations (Godambe, 1985). We show that the choice of the conditioning sigma-algebra may be important in making model parameters estimable when using the method of conditional quasi-likelihood. Large sample properties (consistency and asymptotic normality) of the proposed estimators are outlined in the general case, and discussed in detail in a few examples. These properties are investigated when an increasing number of independent realizations of the process are observed at bounded numbers of time points, and when a single realization is observed at an increasing number of discrete time points. The finite sample performance of the estimators are investigated and compared using simulations.

This paper is organized as follows. The process is presented in Section 2, along with expressions for its expected value and covariance function. Section 3 describes the methods of quasi- and pseudo-likelihood. Section 4 considers application to three specific examples: the pure birth process, the linear birth-and-death process, and a two-type process generalizing the previous two examples.

The process under consideration describes a population consisting of *K* cell types, *K* . Without loss of generality, it begins at time 0 with a single ancestor cell of age zero and type 1. Upon completion of its lifespan, every type-*k* cell produces *ξ*_{k,1} type-1 cells, …, *ξ _{k,K}* type-

This process is a particular case of a multi-type continuous-time branching process considered by Hyrien et al (2005a), which was built on a two-type process earlier proposed by Jagers (1975). It is also akin to Sevast'yanov (1971)'s process in that both processes allow the progeny vector and the cell lifespan to be dependent. The former process was used to analyze the division of the so-called O-2A progenitor cells and their differentiation into terminally differentiated oligodendrocytes by Hyrien et al (2005a), where dissimilar distributions were used to model the distributions of the time to occurrence of these two events. An analysis of experimental data on this cell system suggested the need for such a feature (Hyrien et al, 2005a), and time-lapse cinematography data provided further evidence in support of the proposed model extension (Hyrien et al, 2006). Of note, the above process reduces to the “usual” multi-type Markov process (Athreya and Ney, 1972) if *λ*_{k,x} = *λ _{k}* for every

We assume that *p _{k}*(

For every *k, l* = 1, …, *K*, let
${Z}_{k}^{(l)}(t)$ be an -valued stochastic process denoting the number of type-*k* cells at any time *t* ≥ 0 arising from a single type-*l* cell born at *t* = 0, and introduce the vector
${Z}^{(l)}(t)=\{{Z}_{1}^{(l)}(t),\dots ,{Z}_{K}^{(l)}(t)\}\prime $. We investigate estimators defined from marginal and conditional mean and variance-covariance of *Z*^{(l)}(*t*), and we present below expressions for these moments. Their derivation proceeds from arguments similar to those employed for the multi-type Bellman-Harris branching process (See Sevast'yanov and Chistyakov (1971) for instance). Technical details are therefore omitted.

Let **s** = (*s*_{1}, …, *s _{K}*),

$${H}^{(l)}(\mathbf{\text{s}};t)={H}^{(l)}(\mathbf{\text{s}};t,\theta )={\mathbb{\text{E}}}_{\theta}\phantom{\rule{0.2em}{0ex}}\left\{\prod _{k=1}^{K}{s}_{k}^{{Z}_{k}^{(l)}(t)}\right\}.$$

This function is parameterized by *θ*, but we shall use *H*^{(l)}(**s**; *t*) and *H*^{(l)}(**s**; *t, θ*) interchangeably depending upon the context. The same will be true for the expectation and variance-covariance functions to be defined below. Also, we shall omit unnecessary sub- and superscripts when *K* = 1.

The functions *H*^{(l)}(**s**; *t*) satisfy the set of functional equations

$${H}^{(l)}(\mathbf{\text{s}};t)={s}_{l}\phantom{\rule{0.2em}{0ex}}\left\{1-\sum _{\mathbf{\text{x}}\in {\mathcal{S}}_{l}}{p}_{l}(\mathbf{\text{x}}){F}_{l}(t,\mathbf{\text{x}})\right\}+\sum _{\mathbf{\text{x}}\in {\mathcal{S}}_{l}}{p}_{l}(\mathbf{\text{x}}){\int}_{\mathbf{0}}^{t}{H}^{(1)}{(\mathbf{\text{s}};t-\tau )}^{{x}_{1}}\times \cdots \times {H}^{(K)}{(\mathbf{\text{s}};t-\tau )}^{{x}_{K}}d{F}_{l}(\tau ,\mathbf{\text{x}}).$$

(1)

Let ${m}_{k}^{(l)}(t)=\mathbb{\text{E}}\{{Z}_{k}^{(l)}(t)\}$, ${m}_{k}^{(l)}(t\mid u)=\mathbb{\text{E}}\{{Z}_{k}^{(l)}(t)\mid {Z}^{(l)}(u)\}$ and ${m}_{k}^{(l)}\{t\mid {Z}^{(l)}(u)\ne \mathbf{0}\}=\mathbb{\text{E}}\{{Z}_{k}^{(l)}(t)\mid {Z}^{(l)}(u)\ne \mathbf{0}\}$ denote marginal and conditional expectations, and write ${m}^{(l)}(t)=\{{m}_{1}^{(l)}(t),\dots ,{m}_{K}^{(l)}(t)\}\phantom{\rule{-0.2em}{0ex}}\prime $, ${m}^{(l)}(t\mid u)=\{{m}_{1}^{(l)}(t\mid u),\dots ,{m}_{K}^{(l)}(t\mid u)\}\phantom{\rule{-0.2em}{0ex}}\prime $, and ${m}^{(l)}(t\mid {Z}^{(l)}(u)\ne \mathbf{0})=\left\{{m}_{1}^{(l)}(t\mid {Z}^{(l)}(u)\ne \mathbf{0}),\dots ,{m}_{K}^{(l)}(t\mid {Z}^{(l)}(u)\ne \mathbf{0})\right\}\phantom{\rule{-0.2em}{0ex}}\prime $. Similarly, let ${v}_{jk}^{(l)}(t,u)=\text{cov}\{{Z}_{j}^{(l)}(t),{Z}_{k}^{(l)}(u)\}$, ${v}_{jk}^{(l)}(t\mid u)=\text{cov}\{{Z}_{j}^{(l)}(t),{Z}_{k}^{(l)}(t)\mid {Z}^{(l)}(u)\}$ and ${v}_{jk}^{(l)}(t\mid {Z}^{(l)}(u)\ne \mathbf{0})=\text{cov}\{{Z}_{j}^{(l)}(t),{Z}_{k}^{(l)}(t)\mid {Z}^{(l)}(u)\ne \mathbf{0}\}$ denote the marginal and conditional covariances, and introduce the matrix notation ${V}^{(l)}(t,u)={\left[{v}_{jk}^{(l)}(t,u)\right]}_{j,k=1,\dots ,K}$, ${V}^{(l)}(t\mid u)={\left[{v}_{jk}^{(l)}(t\mid u)\right]}_{j,k=1,\dots ,K}$ and ${V}^{(l)}(t\mid {Z}^{(l)}(u)\ne \mathbf{0})={\left[{v}_{jk}^{(l)}(t\mid {Z}^{(l)}(u)\ne \mathbf{0})\right]}_{j,k=1,\dots ,K}$. Since ${m}_{k}^{(l)}(t)=\partial {H}^{(l)}(\mathbf{\text{s}};t)/\partial {s}_{k\mid \mathbf{\text{s}}=1}$, the expectations satisfy the system of functional equations

$${m}_{k}^{(l)}(t)=\left\{1-\sum _{\mathbf{\text{x}}\in {\mathcal{S}}_{l}}{p}_{l}(\mathbf{\text{x}}){F}_{l}(t,\mathbf{\text{x}})\right\}{\chi}_{\{l=k\}}+\sum _{\mathbf{\text{x}}\in {\mathcal{S}}_{l}}{p}_{l}(\mathbf{\text{x}})\sum _{\alpha =1}^{K}{x}_{\alpha}{\int}_{\mathbf{0}}^{t}{m}_{k}^{(\alpha )}(t-\tau )d{F}_{l}(\tau ,\mathbf{\text{x}}),|k,l=1,\dots ,K$$

(2)

where *χ*_{{}_{l}_{=}_{k}_{}} denotes the indicator function. The systems (2) contains renewal-type equations, and explicit solutions can be obtained by solving the system iteratively. These solutions are linear combinations of certain functions taking the form

$$C(u;{k}_{1},\dots ,{k}_{m};{\mathbf{\text{x}}}^{({k}_{1})},\dots ,{\mathbf{\text{x}}}^{({k}_{m})})={F}_{{k}_{1}}(\cdot ,{\mathbf{\text{x}}}^{({k}_{1})})\ast \cdots \ast {F}_{{k}_{m}}(u,{\mathbf{\text{x}}}^{({k}_{m})})$$

for some (*k*_{1}, , *k _{m}*)

$${F}_{{k}_{1}}(\cdot ,{\mathbf{\text{x}}}^{({k}_{1})})\ast {F}_{{k}_{2}}(u,{\mathbf{\text{x}}}^{({k}_{2})})={\int}_{0}^{u}{F}_{{k}_{1}}(u-v,{\mathbf{\text{x}}}^{({k}_{1})})d{F}_{{k}_{2}}(v,{\mathbf{\text{x}}}^{({k}_{2})})$$

stands for the convolution of *F*_{k1}(·, **x**^{(}^{k}^{1)}) and *F*_{k2}(·, **x**^{(k2)}). The function *C*(*u; k*_{1}, …, *k _{m}*;

To compute the variance-covariance function of the process, compute first the matrix of second order moments
{*Z*^{(l)}(*t*)Z^{(l)}(*u*)′}. Let
${U}_{jk}^{(l)}(u)=\mathbb{\text{E}}\{{Z}_{j}^{(l)}(u){Z}_{k}^{(l)}(u)\}$, and write
${U}_{k}^{(l)}(u)=\{{U}_{k1}^{(l)}(u),\dots ,{U}_{kK}^{(l)}(u)\}$. The following Lemma is established by using the Markov Property:

*For every t* ≥ *u* ≥ 0 *and l* = 1, …, *K, we have*

- ${m}^{(l)}(t\mid u)=\sum _{k=1}^{K}{Z}_{k}^{(l)}(u){m}^{(k)}(t-u)$;
*m*^{(l)}(*t**Z*^{(l)}(*u*) ≠**0**) =*m*^{(l)}(*t**u*)/{1 −*H*^{(l)}(**0**;*u*)},- ${V}^{(l)}(t,u)=\sum _{k=1}^{K}{m}^{(k)}(t-u){U}_{k}^{(l)}(u)-{m}^{(l)}(t){m}^{(l)}(u)\prime $,
- ${V}^{(l)}(t\mid u)=\sum _{k=1}^{K}{Z}_{k}^{(l)}(u){V}^{(k)}(t-u,t-u)$,
*V*^{(l)}(*t**Z*^{(l)}(*u*) ≠**0**) =*V*^{(l)}(*t**u*)/{1 −*H*^{(l)}(**0**;*u*)},

where *H*^{(l)}(**0**; *u*) = {*Z*^{(l)}(*u*) = **0**}.

To derive an expression for ${U}_{jk}^{(l)}(u)$, we have

$${U}_{jk}^{(l)}(u)=\{\begin{array}{ll}{\frac{{\partial}^{2}{H}^{(l)}(\mathbf{\text{s}};u)}{\partial {s}_{k}\partial {s}_{j}}\mid}_{\mathbf{\text{s}}=\mathbf{1}}+{m}_{k}^{(l)}(u)& \text{if}\phantom{\rule{0.2em}{0ex}}k=j\\ {\frac{{\partial}^{2}{H}^{(l)}(\mathbf{\text{s}};u)}{\partial {s}_{k}\partial {s}_{j}}\mid}_{\mathbf{\text{s}}=\mathbf{1}}& \text{otherwise},\end{array}$$

where the partial second order derivatives are the solutions to the system of integral equations

$${\frac{{\partial}^{2}{H}^{(l)}(\mathbf{\text{s}};u)}{\partial {s}_{k}\partial {s}_{j}}\mid}_{\mathbf{\text{s}}=\mathbf{1}}={L}_{kj}^{(l)}(u)+\sum _{\mathbf{\text{x}}\in {\mathcal{S}}_{l}}{p}_{l}(\mathbf{\text{x}})\sum _{\alpha =1}^{K}{x}_{\alpha}{\int}_{0}^{u}{\frac{{\partial}^{2}{H}^{(\alpha )}(\mathbf{\text{s}};u-\tau )}{\partial {s}_{k}\partial {s}_{j}}\mid}_{\mathbf{\text{s}}=\mathbf{1}}d{F}_{l}(\tau ,x)$$

(3)

for *l* = 1, , *K*, and where

$${L}_{kj}^{(l)}(u)=\sum _{\mathbf{\text{x}}\in {\mathcal{S}}_{l}}{p}_{l}(\mathbf{\text{x}})\sum _{\alpha =1}^{K}\sum _{\beta =1}^{K}{x}_{\alpha}{x}_{\beta}{\int}_{0}^{u}{m}_{k}^{(\alpha )}(u-\tau ){m}_{j}^{(\beta )}(u-\tau )d{F}_{l}(\tau ,\mathbf{\text{x}})-\sum _{\mathbf{\text{x}}\in {\mathcal{S}}_{l}}{p}_{l}(\mathbf{\text{x}})\sum _{\alpha =1}^{K}{x}_{\alpha}{\int}_{0}^{u}{m}_{k}^{(\alpha )}(u-\tau ){m}_{j}^{(\alpha )}(u-\tau )d{F}_{l}(\tau ,\mathbf{\text{x}}).$$

These equations also can be solved explicitly (see remark above for renewal equations).

Let *n* denote the number of independent realizations of the process observed during the experiment. For every *i* = 1, …, *n*, let 0 = *t _{i}*

Another situation that is of practical interest is when the process begins with an arbitrary, possibly large, number of cells. We do not consider this situation here, but the method generalizes to this setting by adapting the expression for the moments of the process. We refer to Yanev (1975), Dion and Yanev (1994), and Yakovlev and Yanev (2009) for related results. The estimators obtained when *n* independent realizations of the process are observed separately are not always identical to those obtained when the process starts with *n* ancestors, even when the times of observations are identical for every realization.

Let the *Kn _{i}* × 1 vector

$${\Omega}_{i}(\theta )=\left(\begin{array}{ccc}{V}^{(1)}({t}_{i1},{t}_{i1};\theta )& \dots & {V}^{(1)}({t}_{i1},{t}_{i{n}_{i}};\theta )\\ \vdots & \ddots & \vdots \\ {V}^{(1)}({t}_{i{n}_{i}},{t}_{i1};\theta )& \dots & {V}^{(1)}({t}_{i{n}_{i}},{t}_{i{n}_{i}};\theta )\end{array}\right)$$

(4)

denote the expectation and the variance-covariance matrix of the number of cells counted in any realization of the process at the time points *t _{i}*

A quasi-likelihood estimator can be computed by solving the quasi-score equation *Q*_{1}(*θ*) = 0, provided such a solution exists, where

$${Q}_{1}(\theta )=\sum _{i=1}^{n}\frac{\partial {\mu}_{i}(\theta )\prime}{\partial \theta}{\Omega}_{i}{(\theta )}^{-1}\{{Y}_{i}-{\mu}_{i}(\theta )\}$$

(5)

denotes the quasi-score function. It is an optimal estimating function within the class of estimating functions

$${\mathcal{C}}_{1}=\left\{\sum _{i=1}^{n}{a}_{i}\phantom{\rule{0.2em}{0ex}}(\theta )\{{Y}_{i}-{\mu}_{i}(\theta )\},\text{where}\phantom{\rule{0.2em}{0ex}}{a}_{i}\phantom{\rule{0.2em}{0ex}}(\theta )\phantom{\rule{0.2em}{0ex}}\text{is non-random}\right\}$$

(Godambe, 1985; Heyde, 1997). We shall denote the corresponding estimator by _{1}, and refer to it as a “conventional” estimator because *Q*_{1}(*θ*) does not involve conditional moments.

As an alternative, one could define conditional quasi-likelihood estimators using the class of estimating functions

$${\mathcal{C}}_{2}=\left\{\sum _{i=1}^{n}\sum _{j=1}^{{n}_{i}}{\alpha}_{ij}(\theta )\{{Y}_{ij}-{\mu}_{ij}^{\mathcal{F}}(\theta )\},\text{where}\phantom{\rule{0.2em}{0ex}}{\alpha}_{ij}(\theta )\phantom{\rule{0.2em}{0ex}}\text{is}\phantom{\rule{0.2em}{0ex}}{\mathcal{F}}_{i,j-1}-\text{measurable}\right\}.$$

Under certain regularity conditions, an optimal estimating function within
_{2} will be given by

$${Q}^{\mathcal{F}}(\theta )=\sum _{i=1}^{n}\sum _{j=1}^{{n}_{i}}\frac{\partial {\mu}_{ij}^{\mathcal{F}}(\theta )\phantom{\rule{-0.1em}{0ex}}\prime}{\partial \theta}{v}_{ij}^{\mathcal{F}}{(\theta )}^{-1}\{{Y}_{ij}-{\mu}_{ij}^{\mathcal{F}}(\theta )\}$$

(Godambe, 1985; Heyde, 1997). In what follows, we denote the quasi-likelihood function and estimator corresponding to the natural sigma-algebra _{2} by *Q*_{2}(*θ*) and _{2}, and the ones corresponding to _{3} by *Q*_{3}(*θ*) and _{3}. It is not difficult to see that the quasi-score function *Q*_{2}(*θ*) reduces to

$${Q}_{2}(\theta )=\sum _{i=1}^{n}\sum _{j=1}^{{n}_{i}}{A}_{ij}(\theta ){B}_{ij}{(\theta )}^{-1}{C}_{ij}(\theta ),$$

where

$$\begin{array}{l}{A}_{ij}(\theta )=\sum _{k=1}^{K}{Y}_{i,j-1,k}\frac{\partial {m}^{(k)}({t}_{ij}-{t}_{i,j-1},\theta )\phantom{\rule{-0.2em}{0ex}}\prime}{\partial \theta},\end{array}$$

$$\begin{array}{l}{B}_{ij}(\theta )=\sum _{k=1}^{K}{Y}_{i,j-1,k}{V}^{(k)}({t}_{ij}-{t}_{i,j-1},\theta ),\end{array}$$

and

$${C}_{ij}(\theta )={Y}_{ij}-\sum _{k=1}^{K}{Y}_{i,j-1,k}{m}^{(k)}({t}_{ij}-{t}_{i,j-1},\theta )\}.$$

Moreover, when *K* = 1, the quasi-score function simplifies to

$${Q}_{2}(\theta )=\sum _{i=1}^{n}\sum _{j=1}^{{n}_{i}}\frac{\partial {m}_{1}^{(1)}({t}_{ij}-{t}_{i,j-1};\theta )}{\partial \theta}\frac{\left\{{Y}_{ij}-{Y}_{i,j-1}{m}_{1}^{(1)}({t}_{ij}-{t}_{i,j-1};\theta )\right\}}{{v}_{11}^{(1)}({t}_{ij}-{t}_{i,j-1};\theta )}.$$

(6)

Since
_{θ0}{*Q _{a}*(

$${\sum}_{1}({\theta}_{0})={\left\{\underset{n\to \infty}{lim}\frac{1}{n}\sum _{i=1}^{n}\frac{\partial {\mu}_{i}({\theta}_{0})}{\partial \theta}{\Omega}_{i}{({\theta}_{0})}^{-1}\frac{\partial {\mu}_{i}({\theta}_{0})\phantom{\rule{-0.2em}{0ex}}\prime}{\partial \theta}\right\}}^{-1}$$

(Heyde, 1997). The asymptotic properties of _{1} are more difficult to derive in the general case when sup_{i}_{=1,2} *n _{i}* → ∞, and they depend on properties of the process. Those of the conditional quasi-likelihood estimators may be easier to derive than those of the conventional estimators by exploiting, for instance, the martingale difference property of

The classes
_{1} and
_{2} consist of estimating functions that are linear combinations of the *Y _{ij}*. In principle, classes of nonlinear functions of the data could also be used. For instance, when

$${\mathcal{C}}_{3}=\left\{\sum _{i=1}^{n}\sum _{j=1}^{{n}_{i}}{\alpha}_{ij}(\theta )\phantom{\rule{0.2em}{0ex}}\left[{\{{Y}_{ij}-{\mu}_{ij}^{\mathcal{F}}(\theta )\}}^{2}-{v}_{ij}^{\mathcal{F}}(\theta )\right],\phantom{\rule{0.2em}{0ex}}\text{where}\phantom{\rule{0.2em}{0ex}}{\alpha}_{ij}(\theta )\phantom{\rule{0.2em}{0ex}}\text{is}\phantom{\rule{0.2em}{0ex}}{\mathcal{F}}_{i,j-1}-\text{measurable}\right\}.$$

Alternative classes could be easily constructed either by using higher order moments of the data or by combining
_{2} and
_{3}, for instance. The optimal quadratic estimating equation within
_{3} is the one for which the weights are given by

$${\alpha}_{ij}(\theta )=\left[-2\frac{\partial {\mu}_{ij}^{\mathcal{F}}(\theta )\phantom{\rule{-0.2em}{0ex}}\prime}{\partial \theta}\{{Y}_{ij}-{\mu}_{ij}^{\mathcal{F}}(\theta )\}-\frac{\partial {v}_{ij}^{\mathcal{F}}(\theta )\phantom{\rule{-0.2em}{0ex}}\prime}{\partial \theta}\right]\text{Var}{\left[{\{{Y}_{ij}-{\mu}_{ij}^{\mathcal{F}}(\theta )\}}^{2}\mid {\mathcal{F}}_{i,j-1}\right]}^{-1}.$$

These weights require knowledge of the third and fourth order conditional moments of the process. This may render estimation less robust and not particularly precise, in addition to being more complex. The approach becomes justified when the classes
_{1} and
_{2} do not allow estimation of all parameters (see section 4.2 on the birth-and-death process for an example).

When = _{2}, the quasi-likelihood function involves only first- and second-order moments of the process. While robust with respect to mispecification of higher order moments, the method does not always permit estimation of all parameters. Choosing a different may help with resolving identifiability issues, but possibly at the expense of robustness. For instance, the birth and the death rates of the linear birth-and-death process cannot be estimated using = _{2}, but they become estimable using = _{3} (see section 4.2). As noted in Lemma 1, the conditional expectation of the process depends on the probability of extinction, which the model has to specify correctly for the estimating function to remain unbiased.

As an alternative, estimation can proceed from the method of pseudo-likelihood (Gourieroux, Monfort and Trognon, 1984; Concordet and Nũnez, 2002; Hyrien et al, 2005a, 2005b, 2010; Hyrien, 2007). A (conventional) pseudo-maximum likelihood estimator can be defined as the vector _{1} that maximizes the Gaussian pseudo-likelihood function

$${P}_{1}(\theta )=-\sum _{i=1}^{n}[\{{Y}_{i}-{\mu}_{i}(\theta )\}\phantom{\rule{-0.2em}{0ex}}\prime {\Omega}_{i}{(\theta )}^{-1}\{{Y}_{i}-{\mu}_{i}(\theta )\}+log|{\Omega}_{i}(\theta )|],$$

where |Ω* _{i}*(

$$\sqrt{n}({\stackrel{\sim}{\theta}}_{1}-{\theta}_{0})\stackrel{\mathcal{D}}{\to}\mathcal{N}(0,J{({\theta}_{0})}^{-1}I({\theta}_{0})J{({\theta}_{0})}^{-1}),$$

where *I*(*θ*_{0}) = lim_{n}_{→∞} Var_{θ0} {*n*^{−1}*P*_{1}(*θ*_{0})/*θ*} and *J*(*θ*_{0}) = lim_{n}_{→∞}
_{θ0} {^{2}*P*_{1}(*θ*_{0})/*θθ*′}/*n* (Hyrien, 2007). The asymptotic behavior of _{1} is more difficult to study in the general case when *n _{i}* → ∞, and it depends on properties of the process and of the experimental design.

A wider class of pseudo-maximum likelihood estimators can be constructed by using estimating functions that rely on the quadratic exponential family. Gourieroux, Monfort and Trognon (1984) proposed the pseudo-likelihood function $P(\theta )={\sum}_{i=1}^{n}{H}_{i}(\theta )$, where

$${H}_{i}(\theta )={A}_{i}(\theta )+{B}_{i}({Y}_{i})+{C}_{i}(\theta ){Y}_{i}+{Y}_{i}^{\prime}{D}_{i}(\theta ){Y}_{i},$$

for properly chosen functions *A _{i}*(

$${P}_{exp}(\theta )=-\sum _{i=1}^{n}\left\{\frac{{Y}_{i}}{{\mu}_{i}(\theta )}+log{\mu}_{i}(\theta )\right\},$$

(7)

or solving the associated estimating equation *P*_{exp}(*θ*)/*θ* = 0. It is not difficult to show that
_{θ0} {*P*_{exp}(*θ*_{0})/*θ*} = 0, so that the corresponding estimator is expected to be consistent under certain regularity conditions (Heyde, 1997). One necessary conditions is that *θ* can be identified based solely on the expectation of the process. Therefore, the choice of a pseudo-likelihood function should also be driven by identifiability considerations.

The above class of pseudo-maximum likelihood estimators can be enriched by considering conditional pseudo-maximum likelihood estimators. For instance, a conditional Gaussian pseudo-maximum likelihood estimator maximizes the conditional (Gaussian) pseudo-likelihood function

$${P}^{\mathcal{F}}(\theta )=-\sum _{i=1}^{n}\sum _{j=1}^{{n}_{i}}\phantom{\rule{0.2em}{0ex}}\left[\{{Y}_{ij}-{\mu}_{ij}^{\mathcal{F}}(\theta )\}\phantom{\rule{-0.2em}{0ex}}\prime {v}_{ij}^{\mathcal{F}}{(\theta )}^{-1}\{{Y}_{ij}-{\mu}_{ij}^{\mathcal{F}}(\theta )\}+log|{v}_{ij}^{\mathcal{F}}(\theta )|\right].$$

The corresponding estimating equations are given by *P*^{}(*θ*)/*θ* = 0, where

$$\frac{\partial {P}^{\mathcal{F}}(\theta )}{\partial \theta}=-\sum _{i=1}^{n}\sum _{j=1}^{{n}_{i}}\phantom{\rule{0.2em}{0ex}}\left[-2\frac{\partial {\mu}_{ij}^{\mathcal{F}}(\theta )\phantom{\rule{-0.2em}{0ex}}\prime}{\partial \theta}{v}_{ij}^{\mathcal{F}}{(\theta )}^{-1}\{{Y}_{ij}-{\mu}_{ij}^{\mathcal{F}}(\theta )\}-\{{Y}_{ij}-{\mu}_{ij}^{\mathcal{F}}(\theta )\}\phantom{\rule{-0.2em}{0ex}}\prime {v}_{ij}^{\mathcal{F}}{(\theta )}^{-1}\frac{\partial {v}_{ij}^{\mathcal{F}}(\theta )}{\partial \theta}{v}_{ij}^{\mathcal{F}}{(\theta )}^{-1}\{{Y}_{ij}-{\mu}_{ij}^{\mathcal{F}}(\theta )\}+\text{tr}\phantom{\rule{0.2em}{0ex}}\left\{{v}_{ij}^{\mathcal{F}}{(\theta )}^{-1}\frac{\partial {v}_{ij}^{\mathcal{F}}(\theta )}{\partial \theta}\right\}\right].$$

(8)

It can be shown without much difficulty that *P*^{}(*θ*)/*θ* is centered about zero, thereby yielding also consistent estimators. It also defines a sequence of martingale differences when is a filtration. Let *P*_{2}(*θ*) and _{2} denote the (Gaussian) pseudo-maximum likelihood function and estimator corresponding to = _{2}.

It is worth-noting that *P*^{} (*θ*) and *Q*^{} (*θ*) are related through

$$\frac{\partial {P}^{\mathcal{F}}(\theta )}{\partial \theta}=2{Q}^{\mathcal{F}}(\theta )+\sum _{i=1}^{n}\sum _{j=1}^{{n}_{i}}\phantom{\rule{0.2em}{0ex}}\left[\{{Y}_{ij}-{\mu}_{ij}^{\mathcal{F}}(\theta )\}\phantom{\rule{-0.2em}{0ex}}\prime {v}_{ij}^{\mathcal{F}}{(\theta )}^{-1}\frac{\partial {v}_{ij}^{\mathcal{F}}(\theta )}{\partial \theta}{v}_{ij}^{\mathcal{F}}{(\theta )}^{-1}\{{Y}_{ij}-{\mu}_{ij}^{\mathcal{F}}(\theta )\}-\text{tr}\phantom{\rule{0.2em}{0ex}}\left\{{v}_{ij}^{\mathcal{F}}{(\theta )}^{-1}\frac{\partial {v}_{ij}^{\mathcal{F}}(\theta )}{\partial \theta}\right\}\right],$$

(9)

so that the corresponding pseudo-maximum and quasi-likelihood estimators will be asymptotically equivalent whenever the sum in the right-hand side of equation (9) is asymptotically negligible compared to *Q*^{} (*θ*). For instance, when *K* = 1 and denotes the natural filtration, *P*_{2}(*θ*)/*θ* simplifies to

$$\frac{\partial {P}_{2}(\theta )}{\partial \theta}=2{Q}_{2}(\theta )+\sum _{i=1}^{n}\sum _{j=1}^{{n}_{i}}\phantom{\rule{0.2em}{0ex}}\left[\frac{\partial {v}_{1}^{(1)}({t}_{ij}-{t}_{i,j-1};\theta )/\partial \theta}{{Y}_{i,j-1}{v}_{11}^{(1)}{({t}_{ij}-{t}_{i,j-1};\theta )}^{2}}{\{{Y}_{ij}-{Y}_{i,j-1}{m}_{1}^{(1)}({t}_{ij}-{t}_{i,j-1};\theta )\}}^{2}-\frac{\partial {v}_{11}^{(1)}({t}_{ij}-{t}_{i,j-1};\theta )/\partial \theta}{{v}_{11}^{(1)}({t}_{ij}-{t}_{i,j-1};\theta )}\right].$$

(10)

Let *n*_{∞} = #{*i : Y _{ij}* → ∞, as

The estimating equation associated with the pseudo-likelihood function *P*(*θ*) is a member of the class of estimating functions

$${\mathcal{C}}_{PL}=\left\{\sum _{i=1}^{n}{a}_{i}(\theta )\frac{\partial {H}_{i}(\theta )}{\partial \theta},\text{where}\phantom{\rule{0.2em}{0ex}}{a}_{i}(\theta )\phantom{\rule{0.2em}{0ex}}\text{is non-random}\right\},$$

where

$$\frac{\partial {H}_{i}(\theta )}{\partial \theta}=\frac{\partial {A}_{i}(\theta )}{\partial \theta}+\frac{\partial {C}_{i}(\theta )}{\partial \theta}{Y}_{i}+{Y}_{i}^{\prime}\frac{\partial {D}_{i}(\theta )}{\partial \theta}{Y}_{i}.$$

The estimating function *P*(*θ*)/*θ* does not usually have any optimality properties (unless it coincides with the score function). An optimal estimating equation within
* _{PL}* is given by
${\sum}_{i=1}^{n}{\alpha}_{i}^{\ast}(\theta )\partial {H}_{i}(\theta )/\partial \theta $, where

$${\alpha}_{i}^{\ast}(\theta )={\mathbb{\text{E}}}_{\theta}\phantom{\rule{0.2em}{0ex}}\left\{\frac{{\partial}^{2}{H}_{i}(\theta )}{\partial \theta \partial \theta \prime}\right\}\phantom{\rule{0.2em}{0ex}}{\text{Var}}_{\theta}\phantom{\rule{0.2em}{0ex}}{\left\{\frac{\partial {H}_{i}(\theta )}{\partial \theta}\right\}}^{-1}$$

(Godambe, 1985). It involves moments of order 3 and 4 whenever *D _{i}*(

This first illustration considers the pure birth process (for which *K* = 1) where every cell divides into two new cells upon completion of its lifespan with probability one. Let *λ* = 1/*θ* denote the expected lifespan. It is well known that the marginal expectation and covariance function of the process *Z*(*t*) are respectively given by *m*(*t; θ*) = *e ^{θt}* and

$${\Omega}_{i}(\theta )=\left(\begin{array}{ccc}{e}^{{t}_{i1}\theta}({e}^{{t}_{i1}\theta}-1)& \cdots & {e}^{{t}_{i{n}_{i}}\theta}({e}^{{t}_{i1}\theta}-1)\\ \vdots & \ddots & \vdots \\ {e}^{{t}_{i{n}_{i}}\theta}({e}^{{t}_{i1}\theta}-1)& \cdots & {e}^{{t}_{i{n}_{i}}\theta}({e}^{{t}_{i{n}_{i}}\theta}-1)\end{array}\right).$$

The conditional expectation and covariance function of the process are given by *m*(*t**u; θ*) = *Z*(*u*)*e ^{θ}*

$${Q}_{1}(\theta )=\sum _{i=1}^{n}({t}_{i1}{e}^{\theta {t}_{i1}},\dots ,{t}_{i{n}_{i}}{e}^{\theta {t}_{i{n}_{i}}}){\left(\begin{array}{ccc}{e}^{\theta {t}_{i1}}({e}^{\theta {t}_{i1}}-1)& \cdots & {e}^{\theta {t}_{i{n}_{i}}}({e}^{\theta {t}_{i1}}-1)\\ \vdots & \ddots & \vdots \\ {e}^{\theta {t}_{i{n}_{i}}}({e}^{\theta {t}_{i1}}-1)& \cdots & {e}^{\theta {t}_{i{n}_{i}}}({e}^{\theta {t}_{i{n}_{i}}}-1)\end{array}\right)}^{-1}\left(\begin{array}{c}{Y}_{i1}-{e}^{\theta {t}_{i1}}\\ \vdots \\ {Y}_{i{n}_{i}}-{e}^{\theta {t}_{i{n}_{i}}}\end{array}\right),$$

(11)

and

$${Q}_{2}(\theta )=\sum _{i=1}^{n}\sum _{j=1}^{{n}_{i}}\frac{{t}_{ij}-{t}_{i,j-1}}{{e}^{\theta ({t}_{ij}-{t}_{i,j-1})}-1}\left\{{Y}_{ij}-{Y}_{i,j-1}{e}^{\theta ({t}_{ij}-{t}_{i,j-1})}\right\}.$$

(12)

When the times of observation are equidistant, with Δ = *t _{ij}* −

$${\widehat{\theta}}_{2}=log\left(\frac{{\sum}_{i=1}^{n}{\sum}_{j=1}^{{n}_{i}}{Y}_{ij}}{{\sum}_{i=1}^{n}{\sum}_{j=1}^{{n}_{i}}{Y}_{i,j-1}}\right)/\Delta ,$$

(13)

which is also the maximum likelihood estimator (Keiding, 1974). The above expression for _{2} does not hold true when the sampling times are no longer equidistant, but the quasi-likelihood estimator _{2} will still coincide with the maximum likelihood estimator. To prove this, notice that the log-likelihood function takes the form

$$L(\theta )=\sum _{i=1}^{n}\sum _{j=1}^{{n}_{i}}({Y}_{ij}-{Y}_{i,j-1})log\left(1-{e}^{-\theta ({t}_{ij}-{t}_{i,j-1})}\right)-{Y}_{i,j-1}\theta ({t}_{ij}-{t}_{i,j-1})+f({Y}_{ij},{Y}_{i,j-1}),$$

(14)

where *f*(·, ·) denotes a function that does not dependent on *θ* (Kendall, 1949; Keiding, 1974). The maximum likelihood estimator * _{mle}* is defined as the solution to the score equation

$$\frac{\partial L(\theta )}{\partial \theta}=\sum _{i=1}^{n}\sum _{j=1}^{{n}_{i}}({t}_{ij}-{t}_{i,j-1})\left\{\frac{{Y}_{ij}-{Y}_{i,j-1}}{{e}^{\theta ({t}_{ij}-{t}_{i,j-1})}-1}-{Y}_{i,j-1}\right\}={Q}_{2}(\theta ),$$

and it is therefore identical to _{2} even when the sampling times are not equidistant. Another way of proving this result is by first noticing from equation (14) that the conditional distribution of *Z*(*t*) belongs to the linear exponential family, and by concluding using Theorem 2 in Wedderburn (1974) that _{2}=* _{mle}*. Proposition 1 states another property of

*For the pure birth process, we have*_{1} = _{2} = * _{mle}*.

Write in place of _{1} and _{2}. Asymptotic properties for when a single realization of the process is observed at an increasing number of equidistant time points (*n* = 1, *t*_{1}* _{j}* =

$$\sqrt{\sum _{j=1}^{{n}_{1}}{Y}_{1,j-1}}(\widehat{\theta}-{\theta}_{0})\stackrel{\mathcal{D}}{\to}\mathcal{N}\phantom{\rule{0.3em}{0ex}}\left(0,\frac{{e}^{\Delta {\theta}_{0}}-1}{{\Delta}^{2}{e}^{\Delta {\theta}_{0}}}\right)$$

(15)

(e.g., Keiding (1974), Theorem 5.2). When one observes an increasing number of realizations at uniformly bounded numbers of equidistant time points (*n* → ∞, sup_{i}_{=1,2} *n _{i}* =

$$\sqrt{n}(\widehat{\theta}-{\theta}_{0})\to \mathcal{N}\phantom{\rule{0.3em}{0ex}}\left(0,\frac{{({e}^{\Delta {\theta}_{0}}-1)}^{2}}{{\Delta}^{2}{e}^{\Delta {\theta}_{0}}{\sum}_{r=1}^{R}{\pi}_{r}({e}^{r\Delta {\theta}_{0}}-1)}\right),$$

(16)

where *π _{r}, r* = 1, …,

The conventional pseudo-likelihood function *P*_{1}(*θ*) is constructed from the expressions given above for *μ _{i}*(

$${P}_{1}(\theta )=-\sum _{i=1}^{n}\left\{\frac{{({Y}_{i1}-{e}^{\theta {t}_{i1}})}^{2}}{{e}^{\theta {t}_{i1}}({e}^{\theta {t}_{i1}}-1)}+log({e}^{\theta {t}_{i1}}-1)+\theta {t}_{i1}\right\}.$$

The conditional Gaussian pseudo-likelihood function using the collection of sigma-algebras _{2} is given by

$${P}_{2}(\theta )=-\sum _{i=1}^{n}\sum _{j=1}^{{n}_{i}}\left\{\frac{{({Y}_{ij}-{Y}_{i,j-1}{e}^{\theta \Delta})}^{2}}{{Y}_{i,j-1}{e}^{\theta \Delta}({e}^{\theta \Delta}-1)}+log{Y}_{i,j-1}+\theta \Delta +log({e}^{\theta \Delta}-1)\right\}.$$

The asymptotic properties of the pseudo-maximum likelihood estimators depend on the sampling schema. This is conveniently illustrated using _{2}. In the case where *n* → ∞, sup_{i}_{=1,2} *n _{i}* < ∞ and Δ is fixed, we have that

*Suppose n* = 1 *and t*_{1}* _{j}* = Δ

$$\sqrt{\sum _{j=1}^{{n}_{1}}{Y}_{1,j-1}}({\stackrel{\sim}{\theta}}_{2}-{\theta}_{0})=\sqrt{\sum _{j=1}^{{n}_{1}}{Y}_{1,j-1}}({\widehat{\theta}}_{2}-{\theta}_{0})+{o}_{p}(1).$$

*Consequently*,

$$\begin{array}{c}{\stackrel{\sim}{\theta}}_{2}={\theta}_{0}+{o}_{p}(1),\\ \sqrt{\sum _{j=1}^{{n}_{1}}{Y}_{1,j-1}}({\stackrel{\sim}{\theta}}_{2}-{\theta}_{0})\stackrel{\mathcal{D}}{\to}\mathcal{N}\phantom{\rule{0.3em}{0ex}}\left(0,\frac{{e}^{\Delta {\theta}_{0}}-1}{{\Delta}^{2}{e}^{\Delta {\theta}_{0}}}\right),\end{array}$$

*and *_{2} *is asymptotically fully efficient*.

The pseudo-maximum likelihood estimator _{2} is defined as a solution to the equation *P*_{2}(*θ*)/*θ* = 0, where

$$\frac{\partial {P}_{2}(\theta )}{\partial \theta}=2{Q}_{2}(\theta )+\sum _{j=1}^{{n}_{1}}\left\{\frac{{({Y}_{1j}-{Y}_{1,j-1}{e}^{\theta \Delta})}^{2}}{{Y}_{1,j-1}}g(\theta ,\Delta )-h(\theta ,\Delta )\right\},$$

(17)

where *g*(*θ*, Δ) = Δ(2*e ^{θ}*

$$\begin{array}{ll}\frac{{Q}_{2}(\theta )}{{\sum}_{j=1}^{{n}_{1}}{e}^{j\Delta {\theta}_{0}}}& =\left(\frac{{\sum}_{j=1}^{{n}_{1}}{Y}_{1j}}{{\sum}_{j=1}^{{n}_{1}}{e}^{j\Delta {\theta}_{0}}}-\frac{{\sum}_{j=1}^{{n}_{1}}{Y}_{1,j-1}}{{\sum}_{j=1}^{{n}_{1}}{e}^{j\Delta {\theta}_{0}}}{e}^{\Delta \theta}\right)\frac{\Delta}{{e}^{\theta \Delta}-1}\\ & =W(1-{e}^{(\theta -{\theta}_{0})\Delta})\frac{\Delta}{{e}^{\theta \Delta}-1}+{o}_{p}(1).\end{array}$$

Since
$({Y}_{1j}-{Y}_{1,j-1}{e}^{\theta \Delta})/\sqrt{{Y}_{1,j-1}}={O}_{p}(1)$ as *j* → ∞, and
${\sum}_{j=1}^{{n}_{1}}{e}^{j\Delta {\theta}_{0}}={e}^{\Delta {\theta}_{0}}({e}^{{n}_{1}\Delta {\theta}_{0}}-1)/({e}^{\Delta {\theta}_{0}}-1)\sim {e}^{{n}_{1}\Delta {\theta}_{0}}$, we deduce, as *n*_{1} → ∞, that

$$\frac{1}{{\sum}_{j=1}^{{n}_{1}}{e}^{j\Delta {\theta}_{0}}}\sum _{j=1}^{{n}_{1}}\left\{\frac{{({Y}_{1j}-{Y}_{1,j-1}{e}^{\theta \Delta})}^{2}}{{Y}_{1,j-1}}g(\theta ,\Delta )-h(\theta ,\Delta )\right\}={o}_{p}(1),$$

(18)

and that

$$\underset{{n}_{1}\to \infty}{lim}\frac{1}{{\sum}_{j=1}^{{n}_{1}}{e}^{j\Delta {\theta}_{0}}}\frac{\partial {P}_{2}(\theta )}{\partial \theta}=2W(1-{e}^{(\theta -{\theta}_{0})\Delta})\frac{\Delta}{{e}^{\theta \Delta}-1},$$

(19)

with the convergence holding uniformly in *θ*. Since *θ*_{0} is the unique root of the right-hand side of equation (19), we deduce the convergence in probability of _{2} to *θ*_{0}.

To show that _{2} is asymptotically equivalent to _{2}, we will consider the asymptotic expansion

$$\sqrt{\sum _{j=1}^{{n}_{1}}{Y}_{1,j-1}}({\stackrel{\sim}{\theta}}_{2}-{\theta}_{0})=\frac{1}{\sqrt{{\sum}_{j=1}^{{n}_{1}}{Y}_{1,j-1}}}\frac{\partial {P}_{2}({\theta}_{0})}{\partial \theta}/\left(\frac{1}{{\sum}_{j=1}^{{n}_{1}}{Y}_{1,j-1}}\left\{\frac{{\partial}^{2}{P}_{2}({\theta}_{0})}{\partial {\theta}^{2}}+{o}_{p}(1)\right\}\right).$$

We will prove that the above expansion is equivalent, up to an *o _{p}*(1), to the asymptotic expansion for

$$\sqrt{\sum _{j=1}^{{n}_{1}}{Y}_{1,j-1}}({\widehat{\theta}}_{2}-{\theta}_{0})=\frac{{Q}_{2}({\theta}_{0})}{\sqrt{{\sum}_{j=1}^{{n}_{1}}{Y}_{1,j-1}}}/\left(\frac{1}{{\sum}_{j=1}^{{n}_{1}}{Y}_{1,j-1}}\left\{\frac{\partial {Q}_{2}({\theta}_{0})}{\partial \theta}+{o}_{p}(1)\right\}\right).$$

Notice first that the second order derivative of *P*_{2}(*θ*) is given by

$$\frac{{\partial}^{2}{P}_{2}({\theta}_{0})}{\partial {\theta}^{2}}=2\frac{\partial {Q}_{2}({\theta}_{0})}{\partial \theta}+\sum _{j=1}^{{n}_{1}}\left\{\frac{{({Y}_{1j}-{Y}_{1,j-1}{e}^{{\theta}_{0}\Delta})}^{2}}{{Y}_{1,j-1}}\frac{\partial g({\theta}_{0},\Delta )}{\partial \theta}-2({Y}_{1j}-{Y}_{1,j-1}{e}^{{\theta}_{0}\Delta})\Delta {e}^{{\theta}_{0}\Delta}g({\theta}_{0},\Delta )-\frac{\partial h({\theta}_{0},\Delta )}{\partial \theta}\right\},$$

(20)

where

$$\frac{\partial {Q}_{2}({\theta}_{0})}{\partial \theta}=\sum _{j=1}^{{n}_{1}}\left\{\frac{{\Delta}^{2}{e}^{{\theta}_{0}\Delta}}{{({e}^{{\theta}_{0}\Delta}-1)}^{2}}({Y}_{1j}-{Y}_{1,j-1}{e}^{{\theta}_{0}\Delta})-\frac{{Y}_{1,j-1}{\Delta}^{2}{e}^{{\theta}_{0}\Delta}}{{e}^{{\theta}_{0}\Delta}-1}\right\}.$$

It follows from (18) that

$$\frac{1}{{\sum}_{j=1}^{{n}_{1}}{Y}_{1,j-1}}\sum _{j=1}^{{n}_{1}}\frac{{({Y}_{1j}-{Y}_{1,j-1}{e}^{{\theta}_{0}\Delta})}^{2}}{{Y}_{1,j-1}}={e}^{\Delta {\theta}_{0}}\frac{{\sum}_{j=1}^{{n}_{1}}{e}^{(j-1)\Delta {\theta}_{0}}}{{\sum}_{j=1}^{{n}_{1}}{Y}_{1,j-1}}\frac{1}{{\sum}_{j=1}^{{n}_{1}}{e}^{j\Delta {\theta}_{0}}}\sum _{j=1}^{{n}_{1}}\frac{{({Y}_{1j}-{Y}_{1,j-1}{e}^{{\theta}_{0}\Delta})}^{2}}{{Y}_{1,j-1}}={o}_{p}(1).$$

We also have

$$\frac{1}{{\sum}_{j=1}^{{n}_{1}}{Y}_{1,j-1}}\sum _{j=1}^{{n}_{1}}({Y}_{1j}-{Y}_{1,j-1}{e}^{{\theta}_{0}\Delta})={e}^{\Delta {\theta}_{0}}\frac{{\sum}_{j=1}^{{n}_{1}}{e}^{(j-1)\Delta {\theta}_{0}}}{{\sum}_{j=1}^{{n}_{1}}{Y}_{1,j-1}}\frac{{\sum}_{j=1}^{{n}_{1}}({Y}_{1j}-{Y}_{1,j-1}{e}^{{\theta}_{0}\Delta})}{{\sum}_{j=1}^{{n}_{1}}{e}^{j\Delta {\theta}_{0}}}={o}_{p}(1).$$

Therefore

$$\frac{1}{{\sum}_{j=1}^{{n}_{1}}{Y}_{1,j-1}}\frac{{\partial}^{2}{P}_{2}({\theta}_{0})}{\partial {\theta}^{2}}=\frac{2}{{\sum}_{j=1}^{{n}_{1}}{Y}_{1,j-1}}\frac{\partial {Q}_{2}({\theta}_{0})}{\partial \theta}+{o}_{p}(1).$$

(21)

Using similar arguments applied to (17), we can also show that

$$\frac{1}{\sqrt{{\sum}_{j=1}^{{n}_{1}}{Y}_{1,j-1}}}\frac{\partial {P}_{2}({\theta}_{0})}{\partial \theta}=\frac{2{Q}_{2}({\theta}_{0})}{\sqrt{{\sum}_{j=1}^{{n}_{1}}{Y}_{1,j-1}}}+{o}_{p}(1).$$

(22)

We deduce from (17), (21) and (22) that

$$\sqrt{\sum _{j=1}^{{n}_{1}}{Y}_{1,j-1}}({\stackrel{\sim}{\theta}}_{2}-{\theta}_{0})=\frac{{Q}_{2}({\theta}_{0})}{\sqrt{{\sum}_{j=1}^{{n}_{1}}{Y}_{1,j-1}}}/\left(\frac{1}{{\sum}_{j=1}^{{n}_{1}}{Y}_{1,j-1}}\left\{\frac{\partial {Q}_{2}({\theta}_{0})}{\partial \theta}+{o}_{p}(1)\right\}\right)=\sqrt{\sum _{j=1}^{{n}_{1}}{Y}_{1,j-1}}({\widehat{\theta}}_{2}-{\theta}_{0})+{o}_{p}(1).$$

Proposition 2 follows from (15).

We deduce also from Propositions 1 and 2 that _{2} is asymptotically equivalent to the maximum likelihood estimator. The conditional pseudo-maximum and quasi-likelihood estimators will no longer be equivalent when sup_{i,j}_{=1,2} *t _{ij}* < ∞, even asymptotically when

$${P}_{exp}(\theta )=-\sum _{i=1}^{n}({Y}_{i1}{e}^{-\theta {t}_{i1}}+\theta {t}_{i1}).$$

The associated estimating equation is given by

$$\frac{\partial {P}_{exp}(\theta )}{\partial \theta}=\sum _{i=1}^{n}{h}_{i}(\theta )$$

where *h _{i}*(

$$\frac{\partial {P}_{exp}(\theta )}{\partial \theta}=\sum _{i=1}^{n}\frac{{t}_{i1}{e}^{\theta {t}_{i1}}}{{e}^{\theta {t}_{i1}}-1}({Y}_{i1}{e}^{-\theta {t}_{i1}}-1),$$

which coincides with the quasi-likelihood function *Q*_{1}(*θ*) in this particular case.

Using simulations we compared the performance of the above estimators under a longitudinal design where the process was observed daily from day 1 to day 4. The true value of the mean time to division was 20 = 1/*θ*_{0}. As shown in Table 1, all estimators improved as the sample size increases. Quasi-likelihood estimation outperformed the Gaussian pseudo-likelihood estimation, as expected since quasi-likelihood is equivalent to maximum likelihood here, but moderately only.

A variety of estimators can be constructed using the methods of quasi- and pseudo-likelihood applied to a linear birth-and-death process. Under this model, every cell, upon completion of its lifespan, either divides into two new cells with probability *p* = *ϕ*/(*ϕ* + *ρ*) or dies (that is, disappears) with probability 1 − *p* = *ρ*/(*ϕ* + *ρ*). The length of the lifespan is assumed to follow an exponential distribution with parameter *ϕ* + *ρ*. The parameters *ϕ* and *ρ* are nonnegative and referred to as the birth and the death rates. Our objective is to estimate *θ* = (*ϕ, ρ*)′.

The p.g.f. of the process takes the form

$$H(\mathbf{\text{s}};t)=\{\begin{array}{cc}\frac{\rho (s-1)-{e}^{t(\rho -\varphi )}(\varphi s-\rho )}{\varphi (s-1)-{e}^{t(\rho -\varphi )}(\varphi s-\rho )}& \text{if}\phantom{\rule{0.2em}{0ex}}\varphi \ne \rho \\ \frac{s-\varphi st+\varphi t}{1-\varphi st+\varphi t}& \text{if}\phantom{\rule{0.2em}{0ex}}\varphi =\rho \end{array}$$

(Athreya and Ney, 1972) where the sub- and superscripts of Section 2.2 have been dropped to ease notation. The maximum likelihood estimator can be constructed using the probability distribution as given by Darwin (1956). Although the likelihood function takes an explicit form, we found that the calculation of the maximum likelihood estimator became numerically challenging as the size of the process got large, thereby warranting alternative approaches to be investigated.

Write *ζ* = *ϕ*+*ρ* and *δ* = *ϕ*−*ρ*. Under this parameterization, the expectation of the process is given by *m*(*t; ζ, δ*) = *e ^{tδ}*, while its covariance, for every

$$v(t,u;\zeta ,\delta )=\{\begin{array}{cc}\frac{\zeta}{\delta}{e}^{t\delta}({e}^{u\delta}-1)& \text{if}\phantom{\rule{0.2em}{0ex}}\delta \ne 0\\ u(\zeta +\delta )& \text{if}\phantom{\rule{0.2em}{0ex}}\delta =0.\end{array}$$

When *δ* ≠ 0, since *m*(*t; ζ, δ*)/*ζ* = 0 and *m*(*t; ζ, δ*)/*δ* = *te ^{tδ}*, the quasi-score function

$${Q}_{1}(\zeta ,\delta )=\frac{\delta}{\zeta}\sum _{i=1}^{n}\left(\begin{array}{ccc}0& \dots & 0\\ -{t}_{i1}{e}^{{t}_{i1}\delta}& \dots & -{t}_{i{n}_{i}}{e}^{{t}_{i{n}_{i}}\delta}\end{array}\right){\Gamma}_{i}(\delta ),$$

(23)

where

$${\Gamma}_{i}(\delta )={\left(\begin{array}{ccc}{e}^{{t}_{i1}\delta}({e}^{{t}_{i1}\delta}-1)& \cdots & {e}^{{t}_{i1}\delta}({e}^{{t}_{i{n}_{i}1}\delta}-1)\\ \vdots & \ddots & \vdots \\ {e}^{{t}_{i{n}_{i}}\delta}({e}^{{t}_{i1}\delta}-1)& \cdots & {e}^{{t}_{i{n}_{i}}\delta}({e}^{{t}_{i{n}_{i}}\delta}-1)\end{array}\right)}^{-1}\left(\begin{array}{c}{Y}_{i1}-{e}^{{t}_{i1}\delta}\\ \vdots \\ {Y}_{i{n}_{i}}-{e}^{{t}_{i{n}_{i}}\delta}\end{array}\right).$$

The solution to *Q*_{1}(*ζ, δ*) = 0, therefore, will not be unique, when it exists. Using *Q*_{2}(*ζ, δ*) instead of *Q*_{1}(*ζ, δ*) does not solve the problem, so that both estimating equations fail to estimate the birth and the death rates. We can however estimate their difference *δ*. Denoting by _{1} and _{2} the corresponding quasi-likelihood estimators, we have:

*For the birth-and-death process, we have that*_{1} = _{2}*. Furthermore, when the sampling times are equidistant, we have that*_{1} = _{2} = _{mle}, where

$${\widehat{\delta}}_{\mathit{\text{mle}}}=log\phantom{\rule{0.2em}{0ex}}\left(\frac{{\sum}_{i=1}^{n}{\sum}_{j=1}^{{n}_{i}}{Y}_{ij}}{{\sum}_{i=1}^{n}{\sum}_{j=1}^{{n}_{i}}{Y}_{i,j-1}}\right)\phantom{\rule{-0.2em}{0ex}}/\phantom{\rule{-0.2em}{0ex}}\Delta ,$$

(24)

*where* Δ = *t _{ij}* −

The proof of Proposition 3 follows directly from results of Section 4.1 and by noticing that both quasi-score equations *Q*_{1}(*ζ, δ*) and *Q*_{2}(*ζ, δ*) can be equivalently expressed in terms of the quasi-score functions given in (11) and (12). The estimator * _{mle}* in equation (24) is the maximum likelihood estimators for

To remediate the problem of non-estimability of *ζ* when using *Q*_{1}(*ζ, δ*) and *Q*_{2}(*ζ, δ*), we propose two alternative solutions. The idea behind the first solution is to construct a quasi-likelihood function where one conditions on sigma-algebras * _{ij}* yielding conditional expectations

$${\mathbb{P}}_{\theta}\{Z(t)=0\}=H(0;t)=\frac{-(\zeta -\delta )+(\zeta -\delta ){e}^{t\delta}}{-(\zeta +\delta )+(\zeta -\delta ){e}^{t\delta}}=1-\frac{\delta}{(\zeta -\delta ){e}^{t\delta}-(\zeta +\delta )}.$$

Writing Δ* _{ij}* =

$$\begin{array}{c}{\mathbb{\text{E}}}_{\theta}\{Z({t}_{ij})\phantom{\rule{-0.4em}{0ex}}\mid \phantom{\rule{-0.4em}{0ex}}Z({t}_{i,j-1})\ne 0\}=\frac{(\zeta +\delta ){e}^{{t}_{ij}\delta}-(\zeta -\delta ){e}^{{\Delta}_{ij}\delta}}{2\delta}\\ \phantom{\rule{6.0em}{0ex}}{\text{Var}}_{\theta}\{Z({t}_{ij})\phantom{\rule{-0.4em}{0ex}}\mid \phantom{\rule{-0.4em}{0ex}}Z({t}_{i,j-1})\ne 0\}=\frac{\zeta}{2{\delta}^{2}}\{{e}^{{t}_{ij}\delta}-1\}\phantom{\rule{0.2em}{0ex}}\{(\zeta +\delta ){e}^{{t}_{ij}\delta}-(\zeta -\delta ){e}^{{\Delta}_{ij}\delta}\},\end{array}$$

and

$$\frac{\partial {\mathbb{\text{E}}}_{\theta}\{Z({t}_{ij})\phantom{\rule{-0.4em}{0ex}}\mid \phantom{\rule{-0.4em}{0ex}}Z({t}_{i,j-1})\ne 0\}}{\partial \zeta}=\frac{{e}^{{t}_{ij}\delta}-{e}^{{\Delta}_{ij}\delta}}{2\delta}.$$

If *δ* were known, an optimal estimating equation within
_{2} for *ζ* would then take the form

$${Q}_{3}(\zeta )=\sum _{i=1}^{n}\sum _{j=1}^{{n}_{i}}\frac{\partial {\mathbb{\text{E}}}_{\theta}\{Z({t}_{ij})\phantom{\rule{-0.4em}{0ex}}\mid \phantom{\rule{-0.4em}{0ex}}Z({t}_{i,j-1})\ne 0\}}{\partial \zeta}\frac{\{{Y}_{ij}-{\mathbb{\text{E}}}_{\theta}\{Z({t}_{ij})\phantom{\rule{-0.4em}{0ex}}\mid \phantom{\rule{-0.4em}{0ex}}Z({t}_{i,j-1})\ne 0\}\}}{{\text{Var}}_{\theta}\{Z({t}_{ij})\phantom{\rule{-0.4em}{0ex}}\mid \phantom{\rule{-0.4em}{0ex}}Z({t}_{i,j-1})\ne 0\}}.$$

The solution to the quasi-score equation *Q*_{3}(*ζ*) = 0 does not appear to have any explicit expression, but can be approximated numerically. Since an optimal estimator for *δ* is readily available, we propose to replace *δ* by when solving the above equation.

An alternative remedial measure to estimate both the birth and the death rates is to use an estimating equation that is a quadratic function of the data. Assuming again that *δ* is known, candidate estimating equations for estimating *ζ* would have the general form

$${Q}_{4}(\zeta ;\delta )=\sum _{i=1}^{n}\sum _{j=1}^{{n}_{i}}{\alpha}_{ij}(\zeta ,\delta )\phantom{\rule{0.2em}{0ex}}\left[{\left\{{Y}_{ij}-{\mathbb{\text{E}}}_{\theta}\{Z({t}_{ij})\phantom{\rule{-0.4em}{0ex}}\mid \phantom{\rule{-0.4em}{0ex}}Z({t}_{i,j-1})\}\right\}}^{2}-{\text{Var}}_{\theta}\{Z({t}_{ij})\phantom{\rule{-0.4em}{0ex}}\mid \phantom{\rule{-0.4em}{0ex}}Z({t}_{i,j-1})\}\right],$$

where *α _{ij}*(

$$\widehat{\zeta}=\frac{{\sum}_{i=1}^{n}{\sum}_{j=1}^{{n}_{i}}{({Y}_{ij}-{Y}_{i,j-1}{e}^{\widehat{\delta}\Delta})}^{2}}{\left({\sum}_{i=1}^{n}{\sum}_{j=1}^{{n}_{i}}{Y}_{i,j-1}\right){e}^{\widehat{\delta}\Delta}({e}^{\widehat{\delta}\Delta}-1)}\widehat{\delta}.$$

(25)

The optimal weight *α _{ij}*(

$${\alpha}_{ij}(\zeta ,\delta )=\frac{\partial \left[{\left\{{Y}_{ij}-{\mathbb{\text{E}}}_{\theta}\{Z({t}_{ij})\phantom{\rule{-0.4em}{0ex}}\mid \phantom{\rule{-0.4em}{0ex}}Z({t}_{i,j-1})\}\right\}}^{2}-{\text{Var}}_{\theta}\{Z({t}_{ij})\phantom{\rule{-0.4em}{0ex}}\mid \phantom{\rule{-0.4em}{0ex}}Z({t}_{i,j-1})\}\right]}{\partial \zeta}{\text{Var}}_{\theta}{\{{(Z({t}_{ij})-{\mathbb{\text{E}}}_{\theta}\{Z({t}_{ij})\phantom{\rule{-0.4em}{0ex}}\mid \phantom{\rule{-0.4em}{0ex}}Z({t}_{i,j-1})\})}^{2}\phantom{\rule{-0.4em}{0ex}}\mid \phantom{\rule{-0.4em}{0ex}}Z({t}_{i,j-1})\}}^{-1},$$

where

$$\frac{\partial \left[{\left\{{Y}_{ij}-{\mathbb{\text{E}}}_{\theta}\{Z({t}_{ij})\phantom{\rule{-0.4em}{0ex}}\mid \phantom{\rule{-0.4em}{0ex}}Z({t}_{i,j-1})\}\right\}}^{2}-{\text{Var}}_{\theta}\{Z({t}_{ij})\phantom{\rule{-0.4em}{0ex}}\mid \phantom{\rule{-0.4em}{0ex}}Z({t}_{i,j-1})\}\right]}{\partial \zeta}=\frac{\partial {\text{Var}}_{\theta}\{Z({t}_{ij})\phantom{\rule{-0.4em}{0ex}}\mid \phantom{\rule{-0.4em}{0ex}}Z({t}_{i,j-1})\}}{\partial \zeta}=\frac{{Y}_{i,j-1}}{\delta}{e}^{\delta {\Delta}_{ij}}({e}^{\delta {\Delta}_{ij}}-1).$$

The expression for the conditional variance Var* _{θ}*[(

$$\widehat{\zeta}=\frac{{\sum}_{i=1}^{n}{\sum}_{j=1}^{{n}_{i}}{Y}_{i,j-1}{({Y}_{ij}-{Y}_{i,j-1}{e}^{\widehat{\delta}\Delta})}^{2}}{\left({\sum}_{i=1}^{n}{\sum}_{j=1}^{{n}_{i}}{Y}_{i,j-1}^{2}\right){e}^{\widehat{\delta}\Delta}({e}^{\widehat{\delta}\Delta}-1)}\widehat{\delta}.$$

(26)

The conventional Gaussian pseudo-maximum likelihood estimator (_{1}, _{1}) maximizes the pseudo-likelihood function *P*_{1}(*θ*). To show that this approach can estimate both *ϕ* and *ρ*, assume first that observations are collected at a single time point *t*. Notice that the pseudo-likelihood can be reparameterized as
$\stackrel{\sim}{P}(m,{\sigma}^{2})=-nlog{\sigma}^{2}-{\sum}_{i=1}^{n}{({Y}_{i}-m)}^{2}\phantom{\rule{-0.2em}{0ex}}/\phantom{\rule{-0.2em}{0ex}}{\sigma}^{2}$, where *m* = *e ^{t}*

$$\{\begin{array}{ll}{\overline{m}}_{t}& ={e}^{t(\varphi -\rho )}\\ {\overline{\sigma}}_{t}^{2}& =\frac{\varphi +\rho}{\varphi -\rho}{e}^{t(\varphi -\rho )}({e}^{t(\varphi -\rho )}-1)\end{array}$$

admits a single solution. Since * _{t}* is always positive in the present context, it is easy to show that the pseudo-maximum likelihood estimators of

$$\stackrel{\sim}{\varphi}(t)=\frac{log{\overline{m}}_{t}}{2t}\phantom{\rule{0.2em}{0ex}}\left\{\frac{{\overline{\sigma}}_{t}^{2}}{{\overline{m}}_{t}({\overline{m}}_{t}-1)}+1\right\}\phantom{\rule{0.2em}{0ex}}\text{and}\phantom{\rule{0.2em}{0ex}}\stackrel{\sim}{\rho}(t)=\frac{log{\overline{m}}_{t}}{2t}\phantom{\rule{0.2em}{0ex}}\left\{\frac{{\overline{\sigma}}_{t}^{2}}{{\overline{m}}_{t}({\overline{m}}_{t}-1)}-1\right\}.$$

(27)

This establishes estimability of *ϕ* and *ρ* by Gaussian pseudo-likelihood. Notice that, in equation (27),
${\overline{\sigma}}_{t}^{2}$ could be replaced by the unbiased estimator
$n{\overline{\sigma}}_{t}^{2}\phantom{\rule{-0.2em}{0ex}}/\phantom{\rule{-0.2em}{0ex}}(n-1)$. Moreover, since
${\overline{\sigma}}_{t}^{2}<{\overline{m}}_{t}({\overline{m}}_{t}-1)$ cannotbe ruled out in finite samples, (*t*) may be negative, and a proper estimator is ^{+}(*t*) = (*t*) 0.

When the experimental design includes multiple time points, *t*_{1}, …, *t _{m}*, both

$${\stackrel{\sim}{\varphi}}_{1\phantom{\rule{-0.2em}{0ex}}/\phantom{\rule{-0.2em}{0ex}}m}=\frac{1}{m}\sum _{k=1}^{m}\stackrel{\sim}{\varphi}({t}_{k})\phantom{\rule{0.2em}{0ex}}\text{and}\phantom{\rule{0.2em}{0ex}}{\stackrel{\sim}{\rho}}_{1\phantom{\rule{-0.2em}{0ex}}/\phantom{\rule{-0.2em}{0ex}}m}\frac{1}{m}\sum _{k=1}^{m}\stackrel{\sim}{\rho}({t}_{k}).$$

A third estimator for *ϕ* and *ρ* is constructed using the maximum likelihood estimator of *δ*, as given in equation (24), which we plug into *P*(*ϕ, ϕ* −*δ*), and the birth rate is estimated by maximizing *P*(*ϕ, ϕ* −_{1}) with respect to *ϕ*. The conditional Gaussian pseudo-likelihood (_{2} and _{3}) estimators can also be computed as presented previously.

We conducted a simulation study to evaluate the performance of the various estimators constructed above. The design used for these simulations was identical to the one considered for the pure birth process. Table 2 reports a summary of the results. The simulations suggest that the maximum likelihood estimator was the best of all. The second best estimator was the conditional Gaussian pseudo-likelihood estimator, which outperformed all quasi-likelihood estimators. While conditioning on the event of non-extinction did resolve the identifiability issue, the resulting estimator did not perform particularly well. Additional simulations (not shown) were carried out where the realizations of the process was observed over longer time periods. It is worth noting that we found maximum likelihood estimation numerically unstable when the size of the population grew too large, making numerical evaluation of the log-likelihood function challenging. No such problem occurred with the methods of pseudo- and quasi-likelihood, in which case they become particularly attractive.

We now consider a two-type process (*K* = 2) that extends the linear birth-and-death process. It begins with a single cell of type 1 and age zero. Upon completion of its lifespan, every type-1 cell either divides into two new cells of type 1 with probability *p*, or it produces a single type-2 cell, with probability 1 −*p*. The progeny vector *ξ*_{1} of type-1 cells is a 1 × 2 random vector with support
_{1} = {(2, 0), (0, 1)}. The time to division of any type-1 cell is described by an exponentially distributed random variable with parameter *λ*_{1} = *λ*_{1,(2, 0)}, whereas the time to differentiation of any type-1 cell follows an exponential distribution with parameter *λ*_{2} = *λ*_{1,(0, 1)}. To simplify the notations, we will denote their respective cumulative distribution functions by *G*_{1}(*t*) = *F*_{1}(*t*, (2, 0)) and *G*_{2}(*t*) = *F*_{1}(*t*, (0, 1)). In contrast, any type-2 cell has an infinite lifespan. Write *θ* = (*p, λ*_{1}, *λ*_{2})′ to denote the vector of unknown model parameters.

Let *Z*_{1} (*t*) and *Z*_{2} (*t*) denote the number of type-1 and the number of type-2 cells, respectively, at time *t* ≥ 0, and set *Z*(*t*) = {*Z*_{1}(*t*), *Z*_{2}(*t*)′. Let *m*_{1}(*t*) and *m*_{2}(*t*) denote the expectations of *Z*_{1}(*t*) and *Z*_{2}(*t*), and *v*_{11}(*t*), *v*_{22}(*t*) and *v*_{12}(*t*) their respective variances and covariance.

It follows from equation (1) that the conditional p.g.f. *H*^{(1)}(**s**; *t*) satisfies

$${H}^{(1)}(\mathbf{\text{s}};t)={s}_{1}\{1-p{G}_{1}(t)-(1-p){G}_{2}(t)\}+p{\int}_{0}^{t}{H}^{(1)}{(\mathbf{\text{s}};t-\tau )}^{2}d{G}_{1}(\tau )+(1-p){\int}_{0}^{t}{H}^{(2)}(\mathbf{\text{s}};t-\tau )d{G}_{2}(\tau ),$$

(28)

and *H*^{(2)}(**s**; *t*) = *s*_{2}. Substituting this equation into (28) yields

$${H}^{(1)}(\mathbf{\text{s}};t)={s}_{1}\{1-p{G}_{1}(t)-(1-p){G}_{2}(t)\}+{s}_{2}(1-p){G}_{2}(t)+p{\int}_{0}^{t}{H}^{(1)}{(\mathbf{\text{s}};t-\tau )}^{2}d{G}_{1}(\tau ).$$

Differentiating with respect to **s** at **s** = **1** yields

$$\frac{\partial {H}^{(1)}(\mathbf{1};t)}{\partial {s}_{1}}=1-p{G}_{1}(t)-(1-p){G}_{2}(t)+2p{\int}_{0}^{t}\frac{\partial {H}^{(1)}(\mathbf{1};t-\tau )}{\partial {s}_{1}}d{G}_{1}(\tau ),$$

and

$$\frac{\partial {H}^{(1)}(\mathbf{1};t)}{\partial {s}_{2}}=(1-p){G}_{2}(t)2p{\int}_{0}^{t}\frac{\partial {H}^{(1)}(\mathbf{1};t-\tau )}{\partial {s}_{2}}d{G}_{1}(\tau ).$$

By solving the above renewal equations (Athreya and Ney, 1972), we deduce expressions for the expected numbers of type-1 and type-2 cells started from a type-1 cell:

$${m}_{1}(t)=[1-p{G}_{1}-(1-p){G}_{2}]\ast \sum _{l=0}^{\infty}{(2p)}^{l}{G}_{1}^{\ast l}(t)=\frac{1}{2}+\frac{1}{2}\sum _{l=0}^{\infty}{(2p)}^{l}{G}_{1}^{\ast l}(t)-{m}_{2}(t),$$

and

$${m}_{2}(t)=(1-p)\sum _{l=0}^{\infty}{(2p)}^{l}{G}_{1}^{\ast l}\ast {G}_{2}(t).$$

Clearly,
${G}_{1}^{\ast l}(t)$ is the cumulative distribution function of an Erlang distribution with parameters *l* and *λ*_{1}. When *λ*_{1} ≠ *λ*_{2}, an expression for
${G}_{1}^{\ast l}\ast {G}_{2}(t)$ can be computed explicitly

$${G}_{1}^{\ast l}\ast {G}_{2}(t)=\frac{{\lambda}_{2}^{l}}{{({\lambda}_{2}-{\lambda}_{1})}^{l}}{G}_{2}(t)-\sum _{i=0}^{l-1}\frac{{\lambda}_{1}{\lambda}_{2}^{i}}{{({\lambda}_{2}-{\lambda}_{1})}^{i+1}}{G}_{1}^{\ast (l-i)}(t),$$

and *m*_{1}(*t*) and *m*_{2}(*t*) can be computed explicitly.

To derive expressions for the variances and covariance of the process, we first compute the partial second order derivatives of *H*^{(1)}(**s**; *t*) with respect to **s** at **s** = 1. It follows from equation (3) that for all *i, j* = 1, 2,

$$\frac{{\partial}^{2}{H}^{(1)}(\mathbf{1};t)}{\partial {s}_{i}\partial {s}_{j}}={L}_{ij}^{(1)}(t)+2p{\int}_{0}^{t}\frac{{\partial}^{2}{H}^{(1)}(\mathbf{1};t-\tau )}{\partial {s}_{i}\partial {s}_{j}}d{G}_{1}(\tau ),$$

where

$${L}_{ij}^{(1)}(t)=2p{\int}_{0}^{t}{m}_{i}(t-\tau ){m}_{j}(t-\tau )d{G}_{1}(\tau ).$$

We deduce that

$$\frac{{\partial}^{2}{H}^{(1)}(\mathbf{1};t)}{\partial {s}_{i}\partial {s}_{j}}={\int}_{0}^{t}{m}_{i}(t-\tau ){m}_{j}(t-\tau )\sum _{k=1}^{\infty}{(2p)}^{k}d{G}_{1}^{\ast k}(\tau ),i,j=1,2.$$

The expectations *m*_{1}(*t* −*τ*) and *m*_{2}(*t* −*τ*) in right-hand side of the above identity can be computed explicitly using expressions given before. The integrals, however, have quite tedious expressions, but they can be easily approximated using methods of numerical integration such as the Simpson or the Lobatto methods (Carnahan, Luther, Wilkes, 1969). The variances and covariance can then be deduced from the above expressions together with (2) in Lemma 1.

Define the first order partial derivatives *m*(*t*)*/θ* = {*m*_{1}(*t*)*/θ, m*_{2}(*t*)*/θ*}, where

$$\frac{\partial {m}_{k}(t)}{\partial \theta}={\left\{\frac{\partial {m}_{k}(t)}{\partial p},\frac{\partial {m}_{k}(t)}{\partial {\lambda}_{1}},\frac{\partial {m}_{k}(t)}{\partial {\lambda}_{2}}\right\}}^{\prime},k=1,2\phantom{\rule{-0.2em}{0ex}}.$$

The entries of *m*(*t, θ*)*/θ* are given by

$$\begin{array}{l}\frac{\partial {m}_{1}(t)}{\partial p}=-{G}_{1}(t)+\sum _{l=1}^{\infty}\left(2l{(2p)}^{l-1}{G}_{1}^{\ast l}(t)-(l+1){(2p)}^{l}{G}_{1}^{\ast (l+1)}(t)\right)-\frac{\partial {m}_{2}(t)}{\partial p},\\ \frac{\partial {m}_{1}(t)}{\partial {\lambda}_{1}}=\sum _{l=0}^{\infty}{(2p)}^{l}\phantom{\rule{0.2em}{0ex}}\left(\frac{\partial {G}_{1}^{\ast l}(t)}{\partial {\lambda}_{1}}-p\frac{\partial {G}_{1}^{\ast (l+1)}(t)}{\partial {\lambda}_{1}}\right)-\frac{\partial {m}_{2}(t)}{\partial {\lambda}_{1}},\\ \frac{\partial {m}_{1}(t)}{\partial {\lambda}_{2}}=\frac{\partial {m}_{2}(t)}{\partial {\lambda}_{2}},\end{array}$$

and

$$\begin{array}{l}\frac{\partial {m}_{2}(t)}{\partial p}=-{G}_{2}(t)+\sum _{l=1}^{\infty}\left(2l{(2p)}^{l-1}-(l+1){(2p)}^{l}\right){G}_{1}^{\ast l}\ast {G}_{2}(t),\\ \frac{\partial {m}_{2}(t)}{\partial {\lambda}_{1}}=(1-p)\sum _{l=0}^{\infty}{(2p)}^{l}\frac{\partial {G}_{1}^{\ast l}\ast {G}_{2}(t)}{\partial {\lambda}_{1}},\\ \frac{\partial {m}_{2}(t)}{\partial {\lambda}_{2}}=(1-p)\sum _{l=0}^{\infty}{(2p)}^{l}\frac{\partial {G}_{1}^{\ast l}\ast {G}_{2}(t)}{\partial {\lambda}_{2}}.\end{array}$$

The partial first order derivatives in the above quantities can be computed explicitly: (for *l* ≥ 1)

$$\frac{\partial {G}_{1}^{\ast l}\ast {G}_{2}(t)}{\partial {\lambda}_{1}}=\frac{l{\lambda}_{2}^{l}}{{({\lambda}_{2}-{\lambda}_{1})}^{l+1}}{G}_{2}(t)-\sum _{i=0}^{l-1}\phantom{\rule{0.2em}{0ex}}\left(\frac{{\lambda}_{2}^{i}(i{\lambda}_{1}+{\lambda}_{2})}{{({\lambda}_{2}-{\lambda}_{1})}^{i+2}}{G}_{1}^{\ast (l-i)}(t)+\frac{{\lambda}_{1}{\lambda}_{2}^{i}}{{({\lambda}_{2}-{\lambda}_{1})}^{i+1}}\frac{\partial {G}_{1}^{\ast (l-i)}(t)}{\partial {\lambda}_{1}}\right),$$

and

$$\frac{\partial {G}_{1}^{\ast l}\ast {G}_{2}(t)}{\partial {\lambda}_{2}}=\frac{-l{\lambda}_{1}{\lambda}_{2}^{l-1}}{{({\lambda}_{2}-{\lambda}_{1})}^{l+1}}{G}_{2}(t)+\frac{{\lambda}_{2}^{l}}{{({\lambda}_{2}-{\lambda}_{1})}^{l}}\frac{\partial {G}_{2}(t)}{\partial {\lambda}_{2}}+\sum _{i=0}^{l-1}\frac{{\lambda}_{1}{\lambda}_{2}^{i-1}(i{\lambda}_{1}+{\lambda}_{2})}{{({\lambda}_{2}-{\lambda}_{1})}^{i+2}}{G}_{1}^{\ast (l-i)}(t),$$

where

$$\frac{\partial {G}_{i}^{\ast l}(t)}{\partial {\lambda}_{j}}=-{e}^{t\phantom{\rule{-0.2em}{0ex}}/\phantom{\rule{-0.2em}{0ex}}\lambda i}\frac{{t}^{l}}{{\lambda}_{i}^{l+1}(l-1)!}$$

if *i* = *j*, and it is identical to zero otherwise.

We conducted a simulation study to compare the quasi- and pseudo-maximum likelihood estimators, for the two-type process. The study design was identical to that used in the previous simulation studies, except that every realization of the process was observed at a single time point (*n _{i}* = 1). Both quasi-likelihood approaches are therefore identical here. The results are shown in Table 3.

Means, standard errors (se), and mean squared errors (mse) for two estimators of the parameters for a two-type process: quasi-likelihood (QLE) and Gaussian pseudo-maximum likelihood (PLE) estimators.

Unlike the simulation study on the pure birth process, this one indicates that both estimators perform similarly in the setting under consideration. The standard errors of either estimators for all three parameters decrease as the sample size increases. The estimator of the probability of division is essentially unbiased. The estimators of the mean time to division and of the mean time to differentiation are slightly biased, but these biases remain reasonable and decrease with increasing sample size. Both estimation methods are able to estimate the parameters of the lifespan distributions, even though these distributions are dissimilar.

We have investigated quasi- and pseudo-maximum likelihood estimation for continuous-time Markov branching processes. These methods offer generally applicable approaches to statistical inference, and they are particularly useful in situations where maximum likelihood estimation is difficult. Both approaches allow conventional and conditional inference, and can lead to optimal estimation by applying the theory of estimating equations (Godambe, 1895). When relying on first and second order moments of the process only, both methods will generally be robust with respect to mispecifications of higher order moments. Nonetheless, they sometimes lead to maximum likelihood estimation. Although the methods of quasi- and pseudo-likelihood are generally distinct, they may result in equivalent estimators under certain conditions (e.g., we have shown that they can be asymptotically equivalent on the set of non-extinction). When using linear combinations of the data only, quasi-likelihood estimators may be more prone to non-identifiability issues than Gaussian pseudo-likelihood estimators. Two mitigating measures were identified, including the use of (*i*) conditional quasi-likelihood functions where conditioning is done on appropriate sigma-algebras (such as the one generated by the event of non extinction), and (*ii*) quasi-likelihood functions involving non-linear functions of the data (e.g., quadratic estimating equations). We found that both options resulted in estimators with poor performance. In contrast, the conditional Gaussian pseudo-maximum likelihood estimator offered a better alternative that not only mitigated the non-identifiability issue but also demonstrated better properties in simulation studies. Direct derivation of maximum likelihood estimators for markov branching processes from likelihood function may be difficult. Keiding (1975) derived an expression for the maximum likelihood estimator for the difference of the birth and the death rates of a birth-and-death process indirectly using a modified sampling schema. In contrast, quasi-likelihood made it straightforward. Generally speaking, quasi- and maximum likelihood estimators are identical when the true distribution of the process belongs to the linear exponential family (Wedderburn, 1974). This result applies to the pure birth process, but not necessarily for more complex processes such as the birth-and-death process. Although not optimal, we found that pseudo-likelihood estimators can sometimes perform better than quasi-likelihood estimators.

This research was supported by grants R01 NS39511, R01 CA134839, R01 AI069351, and UL1 RR024160 from the National Institutes of Health (NIH).

Without loss of generality, assume *n* = 1, and write *t _{j}* for

The first quasi-score function defined in (5) takes the form

$${Q}_{1}(\theta )=\frac{\partial M(t;\theta )\phantom{\rule{-0.2em}{0ex}}\prime}{\partial \theta}{\Omega}_{1}{(\theta )}^{-1}\{Y-M(t;\theta )\},$$

while the second one is given by

$${Q}_{2}(\theta )=\frac{\partial M(\Delta t;\theta )\phantom{\rule{-0.2em}{0ex}}\prime}{\partial \theta}D{(\theta )}^{-1}\{Y-Y\ast M(\Delta t;\theta )\}.$$

Define an *n*_{1} × *n*_{1} matrix

$$\Psi (\theta )=\left(\begin{array}{cccc}1& 0& \cdots & 0\\ -{e}^{\Delta {t}_{2}\theta}& 1& \ddots & \vdots \\ \vdots & \ddots & \ddots & 0\\ 0& \cdots & -{e}^{\Delta {t}_{n1}\theta}& 1\end{array}\right).$$

It is easy to derive that *Y* − *Y***M*(Δ*t; θ*) = Ψ(*θ*){*Y* − *M*(*t; θ*)} by direct computation. Therefore, *Q*_{2}(*θ*) can be transformed into

$${Q}_{2}(\theta )=\frac{\partial M(\Delta t;\theta )\phantom{\rule{-0.2em}{0ex}}\prime}{\partial \theta}D{(\theta )}^{-1}\psi (\theta )\{Y-M(t;\theta )\}.$$

In order to prove that *Q*_{1}(*θ*) = *Q*_{2}(*θ*), it suffices to show that

$$\frac{\partial M(t;\theta )\phantom{\rule{-0.2em}{0ex}}\prime}{\partial \theta}=\frac{\partial M(\Delta t;\theta )\phantom{\rule{-0.2em}{0ex}}\prime}{\partial \theta}D{(\theta )}^{-1}\psi (\theta ){\Omega}_{1}(\theta ).$$

(29)

Computing the *i*-th entry of the right-hand side of (29) yields

$$\begin{array}{lll}\{\frac{\partial M(\Delta t;\theta )\prime}{\partial \theta}D{(\theta )}^{-1}\psi (\theta )\}\Omega {(\theta )}_{i}& =& {\left(\begin{array}{lll}\frac{\Delta {t}_{1}}{{e}^{\Delta {t}_{1}\theta}-1}& -& \frac{\Delta {t}_{2}{e}^{\Delta {t}_{2}\theta}}{{e}^{\Delta {t}_{2}\theta}-1}\\ \vdots \\ \frac{\Delta {t}_{n-1}}{{e}^{\Delta {t}_{n-1}\theta}-1}& -& \frac{\Delta {t}_{n}{e}^{\Delta {t}_{n}\theta}}{{e}^{\Delta {t}_{n}\theta}-1}\\ \frac{\Delta {t}_{n}}{{e}^{\Delta {t}_{n}\theta}-1}\end{array}\right)}^{\prime}\left(\begin{array}{c}{e}^{{t}_{i}\theta}({e}^{{t}_{1}\theta}-1)\\ \vdots \\ {e}^{{t}_{i}\theta}({e}^{{t}_{i}\theta}-1)\\ \vdots \\ {e}^{{t}_{n}\theta}({e}^{{t}_{i}\theta}-1)\end{array}\right)\\ & =& (\Delta {t}_{1}+\cdots +\Delta {t}_{i}){e}^{{t}_{i}\theta}-({e}^{{t}_{i}\theta}-1)\left(\frac{\Delta {t}_{i+1}{e}^{{t}_{i+1}\theta}}{{e}^{\Delta {t}_{i+1}\theta}-1}-\frac{\Delta {t}_{i+1}{e}^{{t}_{i+1}\theta}}{{e}^{\Delta {t}_{i+1}\theta}-1}+\cdots +\cdots +\frac{\Delta {t}_{n}{e}^{{t}_{n}\theta}}{{e}^{\Delta {t}_{n}\theta}-1}-\frac{\Delta {t}_{n}{e}^{{t}_{n}\theta}}{{e}^{\Delta {t}_{n}\theta}-1}\right)\\ & =& {t}_{i}{e}^{{t}_{i}\theta},\end{array}$$

which is the *i*-th entry of *M*(*t; θ*)′/*θ*.

For the pure birth process observed at an increasing number of realizations at uniformly bounded numbers of equidistant time points (*n* → ∞, sup* _{i}n_{i}* =

$${Q}_{2}(\theta )=\sum _{i=1}^{n}\sum _{j=1}^{{n}_{i}}\frac{m\phantom{\rule{-0.2em}{0ex}}\prime (\Delta ;\theta )}{\nu (\Delta ,\Delta ;\theta )}\{{Y}_{ij}-{Y}_{i,j-1}m(\Delta ;\theta )\}=\sum _{r=1}^{R}\sum _{i=1}^{{N}_{r}}\sum _{j=1}^{r}\frac{m\phantom{\rule{-0.2em}{0ex}}\prime (\Delta ;\theta )}{\nu (\Delta ,\Delta ;\theta )}\{{Y}_{ij}-{Y}_{i,j-1}m(\Delta ;\theta )\},$$

where *N _{r}* denotes the number of realizations observed at

$$\frac{{Q}_{2}(\theta )}{\sqrt{n}}=\sum _{r=1}^{R}\frac{\sqrt{{N}_{r}}}{\sqrt{n}}\frac{1}{\sqrt{{N}_{r}}}\sum _{i=1}^{{N}_{r}}\sum _{j=1}^{r}\frac{m\phantom{\rule{-0.2em}{0ex}}\prime (\Delta ;\theta )}{\nu (\Delta ,\Delta ;\theta )}\{{Y}_{ij}-{Y}_{i,j-1}m(\Delta ;\theta )\}.$$

For *r* = 1, …, *R*, by invoking the independence of the *Y _{i}* = (

$$\frac{\sqrt{{N}_{r}}}{\sqrt{n}}\frac{1}{\sqrt{{N}_{r}}}\sum _{i=1}^{{N}_{r}}\left(\sum _{j=1}^{r}\frac{m\phantom{\rule{-0.2em}{0ex}}\prime (\Delta ;{\theta}_{0})}{\nu (\Delta ,\Delta ;{\theta}_{0})}\{{Y}_{ij}-{Y}_{i,j-1}m(\Delta ;{\theta}_{0})\}\right)\stackrel{\mathcal{D}}{\to}\mathcal{N}\phantom{\rule{0.2em}{0ex}}(0,{\pi}_{r}{V}_{r}({\theta}_{0})),\phantom{\rule{0.2em}{0ex}}\text{as}{N}_{r}\to \infty ,$$

where

$${V}_{r}({\theta}_{0})={\text{var}}_{{\theta}_{0}}\left(\sum _{j=1}^{r}\frac{m\phantom{\rule{-0.2em}{0ex}}\prime (\Delta ;{\theta}_{0})}{\nu (\Delta ,\Delta ;{\theta}_{0})}\{{Y}_{1j}-{Y}_{1,j-1}m(\Delta ;{\theta}_{0})\}\right).$$

Write *C _{j}*(

$$\begin{array}{lll}{V}_{r}({\theta}_{0})& =& \frac{m\phantom{\rule{-0.2em}{0ex}}\prime {(\Delta ;{\theta}_{0})}^{2}}{\nu {(\Delta ,\Delta ;{\theta}_{0})}^{2}}\sum _{j=1}^{r}{\text{var}}_{{\theta}_{0}}\left(\{{Y}_{1j}-{Y}_{1,j-1}m(\Delta ;{\theta}_{0})\}\right)\\ & =& \frac{m\phantom{\rule{-0.2em}{0ex}}\prime {(\Delta ;{\theta}_{0})}^{2}}{\nu {(\Delta ,\Delta ;{\theta}_{0})}^{2}}\sum _{j=1}^{r}\phantom{\rule{0.2em}{0ex}}\left[{\mathbb{\text{E}}}_{{\theta}_{0}}\{{\text{var}}_{{\theta}_{0}}\left({Y}_{1j}-{Y}_{1,j-1}m(\Delta ;{\theta}_{0})\phantom{\rule{-0.4em}{0ex}}\mid \phantom{\rule{-0.4em}{0ex}}{Y}_{1,j-1}\right)\}+{\text{var}}_{{\theta}_{0}}\left({\mathbb{\text{E}}}_{{\theta}_{0}}\{{Y}_{1j}-{Y}_{1,j-1}m(\Delta ;{\theta}_{0})\phantom{\rule{-0.4em}{0ex}}\mid \phantom{\rule{-0.4em}{0ex}}{Y}_{1,j-1}\}\right)\right]\\ & =& \frac{m\phantom{\rule{-0.2em}{0ex}}\prime {(\Delta ;{\theta}_{0})}^{2}}{\nu {(\Delta ,\Delta ;{\theta}_{0})}^{2}}\sum _{j=1}^{r}{\mathbb{\text{E}}}_{{\theta}_{0}}\{{\text{var}}_{{\theta}_{0}}\left({Y}_{1j}\phantom{\rule{-0.4em}{0ex}}\mid \phantom{\rule{-0.4em}{0ex}}{Y}_{1,j-1}\right)\}\\ & =& \frac{m\phantom{\rule{-0.2em}{0ex}}\prime {(\Delta ;{\theta}_{0})}^{2}}{\nu {(\Delta ,\Delta ;{\theta}_{0})}^{2}}\sum _{j=1}^{r}{\mathbb{\text{E}}}_{{\theta}_{0}}\{{Y}_{1,j-1}\nu (\Delta ,\Delta ;{\theta}_{0})\}\\ & =& \frac{m\phantom{\rule{-0.2em}{0ex}}\prime {(\Delta ;{\theta}_{0})}^{2}}{\nu (\Delta ,\Delta ;{\theta}_{0})}\sum _{j=1}^{r}m((j-1)\Delta ;{\theta}_{0}).\end{array}$$

Since = *m*(*j*Δ; *θ*) = *e ^{j}*

$${V}_{r}({\theta}_{0})=\frac{m\phantom{\rule{-0.2em}{0ex}}\prime {(\Delta ;{\theta}_{0})}^{2}}{\nu (\Delta ,\Delta ;{\theta}_{0})}\frac{1-m{(\Delta ;{\theta}_{0})}^{r}}{1-m(\Delta ;{\theta}_{0})}.$$

Therefore, we deduce that

$$\frac{{Q}_{2}({\theta}_{0})}{\sqrt{n}}\stackrel{\mathcal{D}}{\to}\mathcal{N}\phantom{\rule{0.2em}{0ex}}\left(0,\frac{m\phantom{\rule{-0.2em}{0ex}}\prime {(\Delta ;{\theta}_{0})}^{2}}{\nu (\Delta ,\Delta ;{\theta}_{0})}\sum _{r=1}^{R}{\pi}_{r}\frac{1-m{(\Delta ;{\theta}_{0})}^{r}}{1-m(\Delta ;{\theta}_{0})}\right).$$

Consider next the first order derivative

$${Q}_{2}^{\prime}(\theta )=\sum _{r=1}^{R}\sum _{i=1}^{{N}_{r}}\sum _{j=1}^{r}\phantom{\rule{0.2em}{0ex}}\left[{\left(\frac{m\prime (\Delta ;\theta )}{\nu (\Delta ,\Delta ;\theta )}\right)}^{\prime}\{{Y}_{ij}-{Y}_{i,j-1}m(\Delta ;\theta )\}-{Y}_{i,j-1}\frac{m\prime {(\Delta ;\theta )}^{2}}{\nu (\Delta ,\Delta ;\theta )}\right].$$

Notice that

$$\begin{array}{ll}{\mathbb{\text{E}}}_{{\theta}_{0}}\left\{{\left(\frac{m\prime (\Delta ;{\theta}_{0})}{\nu (\Delta ,\Delta ;\theta )}\right)}^{\prime}\{{Y}_{ij}-{Y}_{i,j-1}m(\Delta ;{\theta}_{0})\}-{Y}_{i,j-1}\frac{m\prime {(\Delta ;{\theta}_{0})}^{2}}{\nu (\Delta ,\Delta ;{\theta}_{0})}\right\}& ={\mathbb{\text{E}}}_{{\theta}_{0}}\left\{-{Y}_{i,j-1}\frac{m\prime {(\Delta ;{\theta}_{0})}^{2}}{\nu (\Delta ,\Delta ;{\theta}_{0})}\right\}\\ & =\frac{m\prime {(\Delta ;{\theta}_{0})}^{2}}{\nu (\Delta ,\Delta ;{\theta}_{0})}m{(\Delta ;{\theta}_{0})}^{j-1}.\end{array}$$

Using a Law of Large Numbers (Chow and Teicher, 2003), we deduce that

$$\begin{array}{lll}\frac{{Q}_{2}^{\prime}({\theta}_{0})}{n}& =& \sum _{r=1}^{R}\frac{{N}_{r}}{n}\frac{1}{{N}_{r}}\sum _{i=1}^{{N}_{r}}\sum _{j=1}^{r}\phantom{\rule{0.2em}{0ex}}\left[{\left(\frac{m\prime (\Delta ;{\theta}_{0})}{\nu (\Delta ,\Delta ;{\theta}_{0})}\right)}^{\prime}\{{Y}_{ij}-{Y}_{i,j-1}m(\Delta ;{\theta}_{0})\}-{Y}_{i,j-1}\frac{m\prime {(\Delta ;{\theta}_{0})}^{2}}{\nu (\Delta ,\Delta ;{\theta}_{0})}\right]\\ & \stackrel{\mathbb{\text{P}}}{\to}& -\sum _{r=1}^{R}{\pi}_{r}\sum _{j=1}^{r}\frac{m\prime {(\Delta ;{\theta}_{0})}^{2}}{\nu (\Delta ,\Delta ;{\theta}_{0})}m{(\Delta ;{\theta}_{0})}^{j-1}\\ & =& -\frac{m\prime {(\Delta ;{\theta}_{0})}^{2}}{\nu (\Delta ,\Delta ;{\theta}_{0})}\sum _{r=1}^{R}{\pi}_{r}\frac{1-m{(\Delta ;{\theta}_{0})}^{r}}{1-m(\Delta ;{\theta}_{0})}.\end{array}$$

Therefore, we have $\sqrt{n}(\widehat{\theta}-{\theta}_{0})\stackrel{\mathcal{D}}{\to}\mathcal{N}(0,\sum ({\theta}_{0}))$, where

$$\begin{array}{ll}\sum ({\theta}_{0})& ={\left[-\frac{m\phantom{\rule{-0.2em}{0ex}}\prime {(\Delta ;{\theta}_{0})}^{2}}{\nu (\Delta ,\Delta ;{\theta}_{0})}\sum _{r=1}^{R}{\pi}_{r}\frac{1-m{(\Delta ;{\theta}_{0})}^{r}}{1-m(\Delta ;{\theta}_{0})}\right]}^{-2}\frac{m\phantom{\rule{-0.2em}{0ex}}\prime {(\Delta ;{\theta}_{0})}^{2}}{\nu (\Delta ,\Delta ;{\theta}_{0})}\sum _{r=1}^{R}{\pi}_{r}\frac{1-m{(\Delta ;{\theta}_{0})}^{r}}{1-m(\Delta ;{\theta}_{0})}\\ & =\frac{\nu (\Delta ,\Delta ;{\theta}_{0})(1-m(\Delta ;{\theta}_{0}))}{m\phantom{\rule{-0.2em}{0ex}}\prime {(\Delta ;{\theta}_{0})}^{2}{\sum}_{r=1}^{R}{\pi}_{r}(1-m{(\Delta ;{\theta}_{0})}^{r})}.\end{array}$$

Substituting *m*(Δ; *θ*) = *e*^{Δ}* ^{θ}, m*′(Δ;

$$\sqrt{n}(\widehat{\theta}-{\theta}_{0})\to \mathcal{N}\left(0,\frac{{({e}^{\Delta {\theta}_{0}}-1)}^{2}}{{\Delta}^{2}{e}^{\Delta {\theta}_{0}}{\sum}_{r=1}^{R}{\pi}_{r}({e}^{r\Delta {\theta}_{0}}-1)}\right).$$

Straightforward calculations show that

$${\text{Var}}_{\theta}[{(Z({t}_{ij})-\mathbb{\text{E}}\{Z({t}_{ij})\phantom{\rule{-0.4em}{0ex}}\mid \phantom{\rule{-0.4em}{0ex}}Z({t}_{i,j-1})\})}^{2}\phantom{\rule{-0.4em}{0ex}}\mid \phantom{\rule{-0.4em}{0ex}}Z({t}_{i,j-1})]={\text{Var}}_{\theta}(Z{({t}_{ij})}^{2}\phantom{\rule{-0.4em}{0ex}}\mid \phantom{\rule{-0.4em}{0ex}}Z({t}_{i,j-1}))+4{\mathbb{\text{E}}}_{\theta}{(Z({t}_{ij})\phantom{\rule{-0.4em}{0ex}}\mid \phantom{\rule{-0.4em}{0ex}}Z({t}_{i,j-1}))}^{2}{\text{Var}}_{\theta}(Z({t}_{ij})\phantom{\rule{-0.4em}{0ex}}\mid \phantom{\rule{-0.4em}{0ex}}Z({t}_{i,j-1}))-4{\mathbb{\text{E}}}_{\theta}(Z({t}_{ij})\phantom{\rule{-0.4em}{0ex}}\mid \phantom{\rule{-0.4em}{0ex}}Z({t}_{i,j-1})){\text{Cov}}_{\theta}(Z{({t}_{ij})}^{2},Z({t}_{ij})\phantom{\rule{-0.4em}{0ex}}\mid \phantom{\rule{-0.4em}{0ex}}Z({t}_{i,j-1})),$$

where

$$\begin{array}{lll}{\text{Var}}_{\theta}(Z{({t}_{ij})}^{2}\phantom{\rule{-0.4em}{0ex}}\mid \phantom{\rule{-0.4em}{0ex}}Z({t}_{i,j-1}))& =& Z({t}_{i,j-1})\{{M}_{4}({\Delta}_{ij})-{M}_{2}^{2}({\Delta}_{ij})\}+2Z({t}_{i,j-1})(Z({t}_{i,j-1})-1)\{{M}_{2}^{2}({\Delta}_{ij})-{M}_{1}^{4}({\Delta}_{ij})\}+4Z({t}_{i,j-1})(Z({t}_{i,j-1})-1)\{{M}_{3}({\Delta}_{ij}){M}_{1}({\Delta}_{ij})-{M}_{2}({\Delta}_{ij}){M}_{1}^{2}({\Delta}_{ij})\},\\ {\mathbb{\text{E}}}_{\theta}(Z({t}_{ij})\phantom{\rule{-0.4em}{0ex}}\mid \phantom{\rule{-0.4em}{0ex}}Z({t}_{i,j-1}))& =& Z({t}_{i,j-1}){M}_{1}({\Delta}_{ij}),\\ {\text{Var}}_{\theta}(Z({t}_{ij})\phantom{\rule{-0.4em}{0ex}}\mid \phantom{\rule{-0.4em}{0ex}}Z({t}_{i,j-1}))& =& Z({t}_{i,j-1})\{{M}_{2}({\Delta}_{ij})-{M}_{1}^{2}({\Delta}_{ij})\},\\ {\text{Cov}}_{\theta}(Z{({t}_{ij})}^{2},Z({t}_{ij})\phantom{\rule{-0.4em}{0ex}}\mid \phantom{\rule{-0.4em}{0ex}}Z({t}_{i,j-1}))& =& Z({t}_{i,j-1}){M}_{3}({\Delta}_{ij})+3Z({t}_{i,j-1})(Z({t}_{i,j-1})-1){M}_{1}({\Delta}_{ij}){M}_{2}({\Delta}_{ij})+6Z({t}_{i,j-1})(Z({t}_{i,j-1})-1)(Z({t}_{i,j-1})-2){M}_{1}^{3}({\Delta}_{ij})-Z{({t}_{i,j-1})}^{3}{M}_{1}({\Delta}_{ij}){M}_{2}({\Delta}_{ij}),\end{array}$$

and where the *M _{k}*(Δ

Let *M*(*s; t*) =
(*e*^{sZ(t)}) denote the moment generating function of the process *Z*(*t*). Put *A* = *A*(*s, t*) = *e ^{t}*

$$\begin{array}{lll}{M}^{(1)}(s,t)& =& \frac{A\phantom{\rule{-0.2em}{0ex}}\prime}{B}-\frac{AB\phantom{\rule{-0.2em}{0ex}}\prime}{{B}^{2}}\\ {M}^{(2)}(s,t)& =& {M}^{(1)}(s,t)\phantom{\rule{0.2em}{0ex}}\left(1-\frac{2B\phantom{\rule{-0.2em}{0ex}}\prime}{B}\right)\\ {M}^{(3)}(s,t)& =& {M}^{(1)}(s,t)\phantom{\rule{0.2em}{0ex}}\left(1-\frac{6B\phantom{\rule{-0.2em}{0ex}}\prime}{B}+\frac{6B\phantom{\rule{-0.2em}{0ex}}{\prime}^{2}}{{B}^{2}}\right)\\ {M}^{(4)}(s,t)& =& {M}^{(1)}(s,t)\phantom{\rule{0.2em}{0ex}}\left(1-\frac{14B\phantom{\rule{-0.2em}{0ex}}\prime}{B}+\frac{36B\phantom{\rule{-0.2em}{0ex}}{\prime}^{2}}{{B}^{2}}-\frac{24B\phantom{\rule{-0.2em}{0ex}}{\prime}^{3}}{{B}^{3}}\right),\end{array}$$

from which we deduce the (non-central) moments *M _{k}*(

**Publisher's Disclaimer: **This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errorsmaybe discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

- Allen LJS. An Introduction to Stochastic Processes with Biology Applications. Prentice Hall; 2003.
- Athreya KB, Ney PE. Branching Processes. Springer-Verlag; Berlin: 1972.
- Becker NG, Hasofer AM. Estimation in epidemics with incomplete observations. J R Statist Soc B. 1997;59:415–430.
- Boucher K, Yakovlev AY, Mayer-Pröschel M, Noble M. A stochastic model of temporarily regulated generation of oligodendrocytes in vitro. Math Biosci. 1999;159:47–78. [PubMed]
- Boucher K, Zorin A, Yakovlev AY, Mayer-Pröschel M, Noble M. An alternative stochastic model of generation of oligodendrocytes in cell culture. J Math Biol. 2001;43:22–36. [PubMed]
- Carnahan B, Luther HA, Wilkes JO. Applied Numerical Methods. John Wiley and Sons; New York: 1969.
- Chen R, Hyrien O, Mayer-Proschel M, Noble M. A composite likelihood approach to the analysis of longitudinal clonal data under an age-dependent branching process. Biostatistics. 2010 to appear. [PMC free article] [PubMed]
- Chow YS, Teicher H. Probability Theory: Independence, Interchangeability, Martingales. Springer; New York: 2003.
- von Collani E, Tsodikov A, Yakovlev A, Mayer-Pröschel M, Noble M. A random walk model of oligodendrocyte generation in vitro and associated estimation problems. Math Biosci. 1999;159:189. [PubMed]
- Concordet D, Nuñez O. A simulated pseudo-maximum likelihood estimator for nonlinear mixed models. Computational Statistics and Data Analysis. 2002;39:187–201.
- Darwin JH. The behavior of an estimator for a simple birth-and-death process. Biometrika. 1956;43:23–31.
- Dion JP, Keiding N. Statistical inference in branching processes. In: Joffe A, Ney PE, editors. Branching Processes. Dekker; New York: 1978. pp. 105–140.
- Dion JP, Yanev NM. Statistical inference for branching processes with an increasing random number of ancestors. J Statist Planning Inf. 1994;39:329–359.
- Godambe VP. The foundations of finite sample estimation in stochastic processes. Biometrika. 1985;72:419–428.
- Gourieroux C, Monfort A, Trognon A. Pseudo maximum likelihood methods: theory. Econometrica. 1984;52(3):681–700.
- Guttorp P. Statistical Inference for Branching Processes. John Wiley and Sons; New York: 1991.
- Haccou P, Jagers P, Vatutin VA. Branching Processes: Variation, Growth and Extinction of Populations. Cambridge University Press; Cambridge: 2005.
- Heyde CC. Extension of a result of Seneta for the supercritical Galton-Watson process. Ann Math Statist. 1970;41:739–742.
- Heyde CC. Remarks on efficiency in estimation for branching processes. Biometrika. 1975;62:49–55.
- Heyde CC. Quasi-Likelihood and Its Applications A General Approach to Optimal Parameter Estimation. Springer-Verlag; New-York.: 1997.
- Hoel DG, Crump KS. Estimating the generation-time distribution of an age-dependent branching process. Biometrics. 1974;30:125–135. [PubMed]
- Hyrien O. A pseudo maximum likelihood estimator for discretely observed multi-type Bellman-Harris branching processes. J Statist Planning Inf. 2007;137:1375–1388.
- Hyrien O, Ambeskovic I, Mayer-Pröschel M, Noble M, Yakovlev A. Stochastic Modeling of oligodendrocyte generation in cell culture: model validation with time-lapse data. Theor Biol Medical Model. 2006;3:21. [PMC free article] [PubMed]
- Hyrien O, Chen R, Mayer-Pröschel M, Noble M. Saddlepoint approximations to the moments of multi-type age-dependent branching processes, with applications. Biometrics. 2010a;66:567–577. [PMC free article] [PubMed]
- Hyrien O, Chen R, Zand MS. An age-dependent branching process model for the analysis of CFSE-labeling experiments. Biology Direct. 2010b;5:41. [PMC free article] [PubMed]
- Hyrien O, Dietrich J, Noble M. Mathematical and experimental approaches to identify and predict the effects of chemotherapy on neuroglial precursors. Cancer Research. 2010c to appear. [PMC free article] [PubMed]
- Hyrien O, Mayer-Pröschel M, Noble M, Yakovlev A. A stochastic model to analyze clonal data on multi-type cell populations. Biometrics. 2005a;61:199–207. [PubMed]
- Hyrien O, Mayer-Pröschel M, Noble M, Yakovlev A. Estimating the life-span of oligodendrocytes from clonal data on their development in cell culture. Math Biosci. 2005b;193:255–274. [PubMed]
- Hyrien O, Zand MS. A mixture model with dependent observations for the analysis of CFSE-labeling experiments. J Am Statist Ass. 2008;103:222–239.
- Jacob C. Conditional least squares estimation in nonstationary nonlinear stochastic regression models. Ann Statist. 2010;38:566–597.
- Jagers P. Branching Processes with Biological Applications. John Wiley and Sons; London: 1975.
- Johnson NI, Kotz S. Continuous Univariate Distributions I. Houghton Miffin Company; Boston: 1970.
- Keiding N. Estimation in the birth process. Biometrika. 1974;61:71–80.
- Keiding N. Maximum likelihood Estimation in the birth-and-death process. Ann Statist. 1975;2:363–372.
- Kendall DG. Stochastic processes and population growth. J R Statist Soc B. 1949;11:230–264. with discussion.
- Kimmel M, Axelrod DE. Branching Processes in Biology. Springer; New York: 2002.
- Klimko LA, Nelson PI. On conditional least squares estimation for stochastic processes. Ann Statist. 1978;6:629–642.
- Nedelman J, Downs H, Pharr P. Inference on an age-dependent, multitype branching-process model of mast cells. J Math Biol. 1985;25:203–226. [PubMed]
- Oh C, Severo NC, Slivka J. Approximations to the maximum likelihood estimate in some pure birth processes. Biometrika. 1991;78:295–299.
- Sevast'yanov BA. Branching Processes. MIR; Moscow: 1971. in Russian.
- Sevast'yanov BA, Chistyakov VP. Multidimensional renewal equations and moments of branching processes. Theor Probab Appl. 1971;16:199–214.
- Wedderburn RWM. Quasi-likelihood functions, generalized linear models, and the Gauss-Newton method. Biometrika. 1974;61:439–447.
- Yakovlev AY, Boucher K, Mayer-Pröschel M, Noble M. Quantitative insight into proliferation and differentiation of oligodendrocyte-type 2 astrocyte progenitor cells in vitro. Proc Natl Acad Sci USA. 1998a;95:14164–14167. 4. [PubMed]
- Yakovlev AY, Mayer-Pröschel M, Noble M. A stochastic model of brain cell differentiation in tissue culture. J Math Biol. 1998b;37:49–60. [PubMed]
- Yakovlev AY, Mayer-Pröschel M, Noble M. Stochastic formulations of a clock model for temporally regulated generation of oligodendrocytes
*in vitro*. Math Comput Model. 2000;32:125–137. - Yakovlev AY, Yanev NM. Transient Processes in Cell Proliferation Kinetics. Springer; Heidelberg: 1989.
- Yakovlev AY, Yanev NM. Relative frequencies in multitype branching processes. Ann Appl Probab. 2009;19:1–14.
- Yanev NM. On the statistics of branching processes. Theory Prob Appl. 1975;20:612–622.
- Yanev NM. Statistical inference for branching processes. In: Ahsanullah M, Yanev GP, editors. Records and Branching Processes. Nova Science Publishers Inc.; New York: 2008.
- Zorin AA, Yakovlev AY, Mayer-Pröschel M, Noble M. Estimation problems associated with stochastic modeling of proliferation and differentiation of O-2A progenitor cells in vitro. Math Biosci. 2000;167:109–121. [PubMed]

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |