Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2888915

Formats

Article sections

- Summary
- 1. Introduction
- 2. A multitype age-dependent branching process
- 3. Statistical inference from clonal data
- 4. Analysis of experimental clonal data
- 5. Approximating the distribution of the generation
- 6. Discussion
- References

Authors

Related links

Biometrics. Author manuscript; available in PMC 2010 June 29.

Published in final edited form as:

Published online 2009 June 9. doi: 10.1111/j.1541-0420.2009.01281.x

PMCID: PMC2888915

NIHMSID: NIHMS117734

Department of Biomedical Genetics, University of Rochester Medical Center 601 Elmwood Avenue, Rochester, NY 14642, USA

The publisher's final edited version of this article is available at Biometrics

See other articles in PMC that cite the published article.

This article proposes saddlepoint approximations to the expectation and variance-covariance function of multitype age-dependent branching processes. The proposed approximations are found accurate, easy to implement, and much faster to compute than by simulating of the process. Multiple applications are presented, including the analyses of clonal data on the generation of oligodendrocytes from their immediate progenitor cells, and on the proliferation of Hela cells. New estimators are also constructed to analyze clonal data. The proposed methods are finally used to approximate the distribution of the generation, which has recently found several applications in cell biology.

Age-dependent branching processes have become well-established means to study the dynamics of complex biological systems. Recently, they have found successful applications in such fields as stem cell biology to study the processes of division, differentiation and death of progenitor cells (Yakovlev et al, 1998; Boucher et al, 1999, 2001; Zorin et al, 2000; Hyrien et al, 2005a, 2005b), or to investigate the propagation of prions in yeast (Morgan, Ridout, and Ruddock, 2003). Additional examples can be found in relevant monographs (Jagers, 1975; Yakovlev and Yanev, 1989; Kimmel and Axelrod, 2002; Haccou, Vatutin and Jagers, 2005).

The practical implementation of age-dependent branching processes, however, remains hindered by the fact that most characteristics of the process, including the moments of the population size at any given time, do not have closed-form expressions. Three approaches are available to mitigate this limitation. A first approach, which is rarely satisfactory, consists in using a simplified version of the model (such as discrete- or continuous-time Markov branching processes) that allows computations to be carried out explicitly. A second approach is to use asymptotic approximations of the characteristics of interest as time goes to infinity. For instance, Hoel and Crump (1974) used the asymptotic form of the expected population size to estimate parameters of the distribution of the cell cycle time. More recently, Ridout et al (2006) and Cole et al (2007) used a similar approach to approximate the malthusian parameter and the expected generation number of age-dependent branching processes. However, the asymptotic behavior of the process may be diffcult to study for multitype processes with complex branching structures, and the accuracy of asymptotic approximations may not always prove satisfactory for transient properties. A third approach is to resort to computer simulations of the process to generate the desired approximations. Zorin et al (2000), Boucher et al (2001), Hyrien et al (2005a, 2005b), Hanin et al (2006), Golinelli et al. (2006), and Hyrien (2007) resorted to such a technique to estimate cell kinetics parameters when the process is observed at discrete points in time (a situation sometimes referred to as clonal experiments in cell biology). The simulation-based approach does not have the limitations of the above two methods, but it may break down for several reasons.

Firstly, it is time-consuming when a large number of cells have to be simulated, either because the number of initial ancestors is large, or because simulations have to be conducted on an extended period of time, or because the process is supercritical (that is, the expected number of offsprings per cell is greater than one). Supercritical processes provide reasonable models of the earliest stages of cell proliferation where the population size may grow exponentially (Jagers, 1975). Consider for instance the branching process for which every cell divides into two cells upon completion of its mitotic cycle, and assume that the duration of the cycle follows a gamma distribution with mean 36 hours and standard deviation 45 hours. To approximate moments of the population size at day 8, starting with a single cell at time 0, we could consider simulating 10, 000 realizations of the process (in our past work we have often used 100,000 simulations; Hyrien et al, 2005a, 2005b), and compute relevant empirical moments of cell counts. These simulations would take us about 1/2 hour on average. When decreasing the mean mitotic cycle duration to 34 or 32 hours, we would need 1.6 and 6.1 hours instead to perform the same task. The computing time is thus very sensitive to how fast the population grows. If such approximations were used to estimate model parameters (as is the case with the simulated pseudo-likelihood estimator (Hyrien, 2007)), the above 10,000 simulations would have to be repeated thousands of times to optimize the estimating function, which would make the approach very time consuming in that particular case.

The second limitation has to do with the discrete nature of branching processes which makes simulation-based approximations non-differentiable (Hyrien, 2007). This precludes the use of a variety of estimation techniques, including quasi-likelihood estimators (Hyrien, 2007) which have been proven to enjoy optimality properties (Heyde, 1997).

The objective of the present paper is to propose a novel approach to approximate moments of multitype age-dependent branching processes, which does not suffer from the above-mentioned limitations. The proposed approach rests on the technique of saddlepoint approximations, which is applied to suitable decompositions of the moments. Our investigations demonstrated that this approach can be thousand times faster than the method using direct computer simulations of the process, while still providing very accurate approximations (see Supplementary Materials). In addition, the use of saddlepoint approximations makes it available a wider range of estimation techniques for discretely observed branching processes, such as the methods of quasi-likelihood and generalized estimating equation.

One cellular system that has been extensively studied using age-dependent branching processes is that of oligodendrocytes (the cells of the central nervous system (CNS) that enwrap axons with myelin sheaths) generated from their immediate ancestral progenitor cells (Raff et al, 1983). The oligodendrocyte-type 2 astrocyte progenitor cell (also referred to as an oligodendrocyte precursor cell and here abbreviated as O2A/OPC) can be isolated from multiple regions of the CNS and studied *in vitro*. The O2A/OPC lineage has proven to be exceptionally suitable for studying differentiation at the clonal level. Dividing O2A/OPCs can be readily distinguished from oligodendrocytes on the basis of cell morphology (the former being bipolar with small elliptical cell bodies and the latter being multipolar with large round cell bodies), enabling unambiguous identification of every cell in a clone (e.g, Noble et al, 1988). Moreover, O2A/OPCs readily establish clones from single founder cells, and the subsequent development of these clones can be followed over time. (see Figure 1 for an illustration of the development of a clone of O2A/OPCs). Clonal analyses have been of enormous value in investigating the control of differentiation at the single cell level. Despite the power of clonal analyses in the study of differentiation, they also have an important limitation. The complete histories of the cellular trees are not fully observed, but instead the composition of the clones at the times of observation are used to provide information on the proliferative patterns of the cells. The quantitative analysis of such experiments calls for specific estimation methods. One objective of this article is to propose new methods for such data built on the proposed saddlepoint approximations.

The remainder of this article is organized as follows. Section 2 describes the process under consideration along with the proposed saddlepoint approximations. New estimation methods for clonal data using branching processes are presented in section 3. These methods are used in section 4.1 to analyze the generation of oligodendrocytes from cultured O2A/OPCs. The subsequent sections present two additional applications. In section 4.2 we analyze a second set of clonal data on the proliferation of cultured Hela cells, where we investigate the assumption that the distribution of the time to division does not depend on the generation, an assumption that is not often assessed in cell kinetic studies. Finally, our last illustration (section 5) considers the problem of approximating the distribution of the generation, which has found recent applications in the analysis of cell proliferation kinetics (Cole et al, 2006; Hyrien and Zand, 2008).

We model cell kinetics using a multitype age-dependent branching process presented by Hyrien et al (2005a), which generalizes the classical multitype Bellman-Harris process. The process under consideration is akin to a two-type process earlier proposed by Jagers (1975), and it can also be viewed as a multitype extension of the Sevastyanov’s process.

The process under consideration describes a population consisting of *K* cell types (1 ≤ *K* ≤ ∞). Without loss of generality, this process begins at time 0 with a single initiator cell of type 1 and age zero. Upon completing its lifespan (that is, the time to transformation of the cell, such as division, differentiation or death, computed from its birth), every type-*k* cell, *k* = 1, …, *K*, generates *ξ*_{k,1} type-1 cells, …, *ξ _{k,K}* type-

Let *τ _{k}* denote the lifespan of any type-

For every *a, j, k* = 1, …, *K*, let ${\mathrm{Z}}^{\left(a\right)}\left(t\right)={\left\{{Z}_{1}^{\left(a\right)}\right(t),\cdots ,{Z}_{K}^{\left(a\right)}(t\left)\right\}}^{\prime}$, where ${Z}_{k}^{\left(a\right)}\left(t\right)$ denotes the number of type-*k* cells at time *t* ≥ 0 arising from a single ancestor cell of type-*a* (in subsequent sections, the superscript *a* shall be dropped whenever unecessary). Let ${m}_{k}^{\left(a\right)}\left(t\right)=\mathbb{E}\left\{{Z}_{k}^{\left(a\right)}\right(t\left)\right\}$ and ${v}_{j,k}^{\left(a\right)}\left(t\right)=\mathrm{Cov}\left\{{Z}_{j}^{\left(a\right)}\right(t),{Z}_{k}^{\left(a\right)}(t\left)\right\}$ denote the expectation and covariance function of the process. Write ${{\mathrm{m}}^{\left(a\right)\left(t\right)=\left\{{m}_{1}^{\left(a\right)}\right(t),\cdots ,{m}_{K}^{\left(a\right)}(t\left)\right\}}}^{\prime}$ and ${\mathrm{V}}^{\left(a\right)}\left(t\right)={\left[{v}_{j,k}^{\left(a\right)}\left(t\right)\right]}_{j,k}$.

The expectations satisfy the system of functional equations

$${m}_{k}^{\left(a\right)}\left(t\right)=\left\{1-\sum _{\mathrm{x}\in {S}_{a}}{p}_{a}\left(\mathrm{x}\right){F}_{a,\mathrm{x}}\left(t\right)\right\}{1}_{\{a=k\}}+\sum _{\mathrm{x}\in {S}_{a}}{p}_{a}\left(\mathrm{x}\right)\sum _{\alpha =1}^{K}{x}_{\alpha}{\int}_{0}^{t}{m}_{k}^{\left(\alpha \right)}(t-\tau )d{F}_{a,\mathrm{x}}\left(\tau \right),$$

where 1_{{a=k}} is the indicator function. The variance-covariance function can be written as

$${v}_{jk}^{\left(a\right)}\left(t\right)={r}_{jk}^{\left(a\right)}\left(t\right)-{m}_{j}^{\left(a\right)}\left(t\right){m}_{k}^{\left(a\right)}\left(t\right)+{m}_{j}^{\left(a\right)}\left(t\right){1}_{\{j=k\}},$$

where

$${r}_{jk}^{\left(a\right)}\left(t\right)={b}_{jk}^{\left(a\right)}\left(t\right)+\sum _{\mathrm{x}\in {S}_{a}}{p}_{a}\left(\mathrm{x}\right)\sum _{\alpha =1}^{K}{x}_{\alpha}{\int}_{0}^{t}{r}_{jk}^{\left(\alpha \right)}(t-\tau )d{F}_{a,\mathrm{x}}\left(\tau \right),$$

and where

$$\begin{array}{cc}\hfill {b}_{jk}^{\left(a\right)}\left(t\right)=& \sum _{\mathrm{x}\in {S}_{a}}{p}_{a}\left(\mathrm{x}\right)\sum _{\alpha =1}^{K}{x}_{\alpha}\left({x}_{\alpha -1}\right){\int}_{0}^{t}{m}_{j}^{\left(\alpha \right)}(t-\tau ){m}_{k}^{\left(\alpha \right)}(t-\tau )d{F}_{a,\mathrm{x}}\left(\tau \right)\hfill \\ \hfill & +\sum _{\mathrm{x}\in {S}_{a}}{p}_{a}\left(\mathrm{x}\right)\sum _{\alpha =1}^{K}\sum _{{\scriptstyle \begin{array}{c}\hfill \beta =1\hfill \\ \hfill \beta \ne \alpha \hfill \end{array}}}^{K}{x}_{\alpha}{x}_{\beta}{\int}_{0}^{t}{m}_{j}^{\left(\alpha \right)}(t-\tau ){m}_{k}^{\left(\beta \right)}(t-\tau )d{F}_{a,\mathrm{x}}\left(\tau \right).\hfill \end{array}$$

These equations can be solved using renewal theory arguments (Athreya and Ney, 1972). For instance, for the binary splitting process, where *K* = 1 and where *ξ*_{1} = 2 with probability one, the expressions for the expectation and variance reduce to

$${m}_{1}^{\left(1\right)}\left(t\right)=\sum _{l=0}^{\infty}{2}^{l}\left\{{F}_{1}^{\star l}\right(t)-{F}_{1}^{\star (l+1)}(t\left)\right\},$$

and

$${v}_{11}^{\left(1\right)}\left(t\right)=\sum _{l=1}^{\infty}{2}^{l}{\int}_{0}^{t}{m}_{1}^{\left(1\right)}{(t-\tau )}^{2}d{F}_{1}^{\star l}\left(\tau \right),$$

where *F*^{l}(·) denotes the *l*-fold convolution of *F* (·), and where, by convention, *F*^{0}(·) = 1. In the most general case, these moments take more complicated expressions. However, just as in the above example, the expected values can be decomposed in terms of linear combinations of functions taking the form

$$C\left(t\right)\u2254C(t,{\mathrm{x}}_{k1},\dots ,{\mathrm{x}}_{{k}_{M}},{k}_{1},\dots ,{k}_{M})={F}_{{k}_{1},{\mathrm{x}}_{{k}_{1}}}\star \cdots \star {F}_{{k}_{M},{\mathrm{x}}_{{k}_{M}}}\left(t\right),$$

(1)

for some x_{kl} *S _{kl}*,

$$I\left(t\right)={\int}_{0}^{t}g\left(s\right)c\left(s\right)ds,$$

(2)

for some given function *g*(*t*), where *c*(*t*) = *dC*(*t*)/*dt*.

The coeffcients associated with the functions *C*(*t*) and *I*(*t*) in the expressions for ${m}_{k}^{\left(a\right)}\left(t\right)$ and ${v}_{jk}^{\left(a\right)}\left(t\right)$ are solely defined through the offspring distributions, and are computed explicitly. The integral in (2) is easily handled by numerical integration, which does not present any real diffculty here since integration is over a one-dimensional space. The functions *c*(*t*) and *C*(*t*), however, have generally no closed-form expressions, and they remain the sole obstacle to the explicit computation of the expectation and variance-covariance function of the process. Hyrien et al. (2005a, 2005b) and Hyrien (2007) considered simulating the branching process to compute the needed approximations. We present an alternative method that takes advantage of the above decompositions of the moments.

The functions *c*(·) and *C*(·) are the p.d.f. and c.d.f. of a r.v. *S* = *τ*_{k1} + + *τ _{kM}*, where

Denoting by ${K}_{X}\left(u\right)=\mathrm{log}\phantom{\rule{thinmathspace}{0ex}}\mathbb{E}\left[\mathrm{exp}\right(uX\left)\right]$ the cumulant generating function of any r.v. *X*, and by ${K}_{X}^{\left(r\right)}\left(u\right)$ its *r*-th derivative (providing that these quantities exist), we have ${K}_{S}\left(u\right)={\sum}_{l=1}^{M}{K}_{{\tau}_{{k}_{l}}}\left(u\right)$ and ${K}_{S}^{\left(r\right)}\left(u\right)={\sum}_{l=1}^{M}{K}_{{\tau}_{{k}_{l}}}^{\left(r\right)}\left(u\right)$ Following Daniels (1987) we approximate *c*(*s*) by

$${c}^{\mathrm{SA}}\left(s\right)=\frac{1}{{\left\{2\pi {K}_{S}^{\left(2\right)}\right({\stackrel{~}{u}}_{s}\left)\right\}}^{1\u22152}}\mathrm{exp}\left\{{K}_{S}\right({\stackrel{~}{u}}_{s})-{\stackrel{~}{u}}_{s}s\}\left[1+\frac{1}{8}{\lambda}_{4}\left({\stackrel{~}{u}}_{s}\right)-\frac{5}{24}{\lambda}_{3}{\left({\stackrel{~}{u}}_{s}\right)}^{2}\right],$$

(3)

where ${\lambda}_{r}\left(u\right)={K}_{S}^{\left(r\right)}\left(u\right)\u2215{\left\{{K}_{S}^{\left(2\right)}\right(u\left)\right\}}^{r\u22152}$, and where ${\stackrel{~}{u}}_{s}$ denotes the root to the equation ${K}_{S}^{\left(1\right)}\left(u\right)=s$. The root ${\stackrel{~}{u}}_{s}$ always exists. Using the method proposed by Lugannani and Rice (1980), we approximate *C*(*s*) by

$${C}^{\mathrm{SA}}\left(s\right)=\Phi \left({\zeta}_{s}\right)-\varphi \left({\zeta}_{s}\right)\left[\frac{1}{{\beta}_{s}}-\frac{1}{{\zeta}_{s}}\right],$$

(4)

where (·) and Φ(·) denote the p.d.f. and c.d.f. of the standard normal distribution, and where ${\zeta}_{s}={\left[2\right\{{\stackrel{~}{u}}_{s}s-{K}_{S}\left({\stackrel{~}{u}}_{s}\right)\left\}\right]}^{1\u22152}\mathrm{sgn}\left({\stackrel{~}{u}}_{s}\right)$ and ${\stackrel{~}{u}}_{s}$. When $s=\mathbb{E}\left(S\right)$, (4) is replaced by

$${C}^{\mathrm{SA}}\left(s\right)=\frac{1}{2}+\frac{1}{6}\frac{{\lambda}_{3}\left(0\right)}{{\left(2\pi \right)}^{1\u22152}}.$$

(5)

In our first approximation to the moments of *Z ^{(a)}(s)*, the functions

Clonal experiment is an experimental setup that is commonly used to study the processes of cell division, cell differentiation and cell death under *in vitro* conditions. In such experiments, isolated founding initiator cells are placed in a culture medium with predefined growing conditions. Each founder cell initiates its own population (or clone), the composition of which will change over time in accordance with the processes of division, differentiation and death that govern the cellular system under consideration. The resultant clone is examined at a given point in time, and the number of cells for each observable cell type are reported. This is done for multiple cell clones and at different time points so one can observe what happens over time. For instance, in the context of cultured oligodendrocytes, two cell types can be visually distinguished based on their morphology (O2A/OPCs and oligodendrocytes); clonal experiments would provide the numbers of cells at the sampling time for each of these two cell types in individual clones. Thus in the clone depicted in Figure 1, the experimentalist would report 2 O2A/OPCs and 2 oligodendrocytes at *t* = 36 hrs. In subsequent experiments, each clone is only to be examined at a single time point.

Let *Y _{il}* denote the number of cells of the

In order to construct realistic models of the temporal development of any given cell population, observable cell types sometimes need to be described by multiple types in the branching process (so that *K* > *L*). For instance, the proliferative characteristics of O2A/OPCs are suspected to change with the generation, and these cells will be represented by 3 types in the branching process defined in section 4.1. In contrast, the pool of oligodendrocytes is more homogeneous, and it will be described by a single type. Thus, define a stochastic process ${Y}_{l}\left(t\right)={\sum}_{k\in {I}_{l}}{Z}_{k}^{\left(1\right)}\left(t\right)$, where *I _{l}*,

Several methods have been proposed to analyze clonal data (Nedelman et al., 1985; Yakovlev et al., 1998a, 1998b, 2000; Boucher et al., 1999, 2001; von Collani et al., 1999 Zorin et al., 2000; Hyrien et al., 2005a, 2005b; Hyrien, 2007). All these publications resorted to branching processes to model clonal expansion. Diffculties in the statistical analysis of such data arise mainly because generally neither the distribution nor the moments of the process have tractable explicit expressions in the non-Markovian case. Alternatives to the Monte Carlo estimators proposed by Zorin et al. (2000), Hyrien et al. (2005a, 2005b) and Hyrien (2007) can be constructed using the proposed saddlepoint approximations.

A pseudo likelihood (PL) estimator will maximize

$$P\left(\theta \right)=-\sum _{i=1}^{n}{\{{\mathrm{Y}}_{i}-{\mathrm{m}}_{Y}^{\mathit{sa}}({t}_{i},\theta \left)\right\}}^{\prime}{\Omega}_{Y}^{\mathit{sa}}{({t}_{i},\theta )}^{-1}\{{\mathrm{Y}}_{i}-{\mathrm{m}}_{Y}^{\mathit{sa}}({t}_{i},\theta \left)\right\}+\mathrm{log}\mid {\Omega}_{Y}^{\mathit{sa}}({t}_{i},\theta )\mid .$$

Altought the PL function takes the form of the log-likelihood of a Gaussian sample, the normality assumption is not required for this approach to be justified (Hyrien, 2007). In the absence of normality, PL estimators are not generally effcient however. More effcient estimators, also defined from the expectation and variance-covariance matrix of the process, can be constructed using the quasi-likelihood (QL) approach. We thus propose a QL estimator that solves the quasi-score equation *Q*(*θ*) = 0, where

$$Q\left(\theta \right)=\sum _{i=1}^{n}\frac{\partial {\mathrm{m}}_{Y}^{\mathit{sa}}{({t}_{i},\theta )}^{\prime}}{\partial \theta}{\Omega}_{Y}^{\mathit{sa}}{({t}_{i},\theta )}^{-1}\{{\mathrm{Y}}_{i}-{\mathrm{m}}_{Y}^{\mathit{sa}}({t}_{i},\theta \left)\right\}.$$

QL estimators require to differentiate expected values of the process with respect to *θ*. When the moments are approximated through simulations, the resultant approximations are not differentiable. This is due to the discrete nature of Y(*t*) (Hyrien, 2007). Saddlepoint approximations to the moments do not suffer from this limitation.

When the branching process contains infinitely many types, the model-based variance-covariance matrix Ω_{Y}(*t*, *θ*) may have a cumbersome expression, making it diffcult to compute the above PL and QL estimators. The method of generalized estimating equation (GEE) provides an alternative approach in such cases (Liang and Zeger, 1986). The GEE estimator solves the equation *Q _{GEE}*(

$${Q}_{\mathit{GEE}}\left(\theta \right)=\sum _{i=1}^{n}\frac{\partial {\mathrm{m}}_{Y}^{\mathit{sa}}{({t}_{i},\theta )}^{\prime}}{\partial \theta}{\widehat{\Omega}}_{Y}{\left({t}_{i}\right)}^{-1}\{{\mathrm{Y}}_{i}-{\mathrm{m}}_{Y}^{\mathit{sa}}({t}_{i},\theta \left)\right\},$$

and where ${\widehat{\Omega}}_{Y}\left({t}_{i}\right)$ denotes a working correlation matrix that does not depend on *θ*. When the experimental design provides independent replicated observations in suffcient number at every time point, ${\widehat{\Omega}}_{Y}\left({t}_{i}\right)$ can be taken as the empirical estimator of the variance-covariance matrix of Y_{i}. In other situations the working “independence” correlation matrix ${\widehat{\Omega}}_{Y}\left({t}_{i}\right)={\mathrm{Id}}_{L}$ (the *L* × *L* identity matrix) would also be acceptable, althought it is generally suspected to decrease effciency of estimators when observations are moderately or strongly dependent (Liang and Zeger, 1986).

When computed from exact moments, PL, QL and GEE estimators will generally be consistent and asymptotically Gaussian under mild regularity conditions. We refer to Hyrien (2007) for asymptotic properties of the PL estimator in the context of discretely observed age-dependent branching processes. When moments are approximated using saddlepoint approximations, these asymptotic properties will not generally hold true unless the order of the approximations increases with the sample size (which would happen if the sampling times increased with the sample size or if higher order terms were included in the saddlepoint approximations of section 2.3). If the approximations to the moments are in close agreement with their exact values, however, the resultant estimators are expected to be close to the “exact” estimators. The performance of the proposed approximate estimators can be assessed in simulation studies. One such studies is reported in Supplementary Materials, which suggested that the proposed estimators are essentially unaffected by approximation of the moments (e.g., the bias of the estimators is negligible), and that they remain virtually identical to their exact counterparts. Analysis of experimental data often involve the construction of confidence intervals for model parameters and the test of hypotheses (e.g., for assessing the effects of different culture conditions on cell proliferation or for comparing nested models). These can be accomplished using resampling and Monte Carlo techniques. We refer to Hyrien et al (2005) for a Monte Carlo Wald test, which can be easily adapted to the above estimators, and to Diccicio and Efron (1996) for confidence intervals.

To illustrate the proposed methods we analyzed a set of clonal data on the generation of oligodendrocytes from O2A/OPCs cultured *in vitro*. In this experiment, O2A/OPCs isolated from the optical nerve of 7-day old rats were plated in cell culture in the presence of thyroid hormone. Cells were fed every other day with fresh PDGF. Clones were observed 1, 2, 3, 4, 5, 7 and 8 days after plating. At each time of observation, the composition of either 50, 100 or 150 clones (depending on the time point; total sample size *n* = 850) was individually examined to obtain the number of O2A/OPCs and the number of oligodendrocytes in each of them (as illustrated in Figure 1). The clones were discarded after examination. The mean number of O2A/OPCs and the mean number of oligodendrocytes versus time are showed in the left and right panels (respectively) of Figure 2. Each clone started with a single O2A/OPC and zero oligodendrocyte. The mean number of O2A/OPCs appears to decrease at days 1 and 2, increase at day 3, and decrease again after day 3. In contrast, the mean number of oligodendrocytes increases steadily in an almost linear fashion from the start of the experiment. On day 8, any clone would contain on average 0.7 O2A/OPC and 2.3 oligodendrocytes. The largest clone contained a total of 12 cells.

The objective of our analyses was to gain a quantitative insight into how cell clones develop over time and to estimate cell kinetics parameters. The temporal development of clones of O2A/OPCs was modeled using a modified version of the branching process proposed by Boucher et al. (1999). This model assumes a random, clone-specific “critical” number, *N*, that identifies the generation up to which differentiation of O2A/OPCs into oligodendrocytes cannot occur. Differentiation may only happen among cells of generation greater than *N*. The number of modeling types is *K* = *N* + 2. The process was defined from the following assumptions (see Figure 1, right panel, for a schematic representation), where, again, lifespan is used as a generic term for the time to division or to differentiation computed from birth.

A1- Without loss of generality, the process begins with a single cell of generation one (a founder O2A/OPC). This cell is said to be of type 1.

A2- In clones such that *N* > 0, every type-*k* cell, *k* = 1, …, *N*, divides into two type-(*k* + 1) cells (O2A/OPCs) when it completes its lifespan.

A3- Upon completion of its lifespan, every type-(*N* + 1) cell either divides into 2 new type-(*N* + 1) cells with probability *p*, or it differentiates into one type-0 cell (an oligodendrocyte) with probability 1 − *p*.

A4- The time to division of any type-*k* cell, *k* = 1, …, *N* +1, is represented by a non-negative r.v. *τ*_{1} with c.d.f. ${G}_{1}\left(t\right)=\mathbb{P}({\tau}_{1}\le t)$.

A5- The time to differentiation of any type-(*N* + 1) cell is represented by a non-negative r.v. *τ*_{2} with c.d.f. ${G}_{2}\left(t\right)=\mathbb{P}({\tau}_{2}\le t)$.

A6- Type-0 cells (oligodendrocytes) have an infinite lifespan.

In the above model, the distributions *G*_{1}(·) does not depend on *k*, the type of the cell. Boucher et al. (1999) further assumed that *G*_{1}(·) = *G*_{2}(·) (a Bellman-Harris process). Based on past investigations (Hyrien et al., 2005a; 2006), we relaxed this constraint to allow *G*_{1}(·) and *G*_{2}(·) to be dissimilar gamma distributions with respective means and variances *m _{l}* and ${\sigma}_{l}^{2}$,

Boucher et al. (1999) derived expressions for the expectation and variance of the number of O2A/OPCs and of the number of oligodendrocytes as a function of time in the case where *G*_{1}(*t*) = *G*_{2}(*t*). Using similar techniques, it is easy to obtain expressions for these moments in the full model. For instance the expected number of O2A/OPCs at any time *t* ≥ 0 is given by ${m}_{O2A}(t,\theta )={\sum}_{c=0}^{2}{\pi}_{c}{m}_{O2A}(t,c,\theta )$, where

$$\begin{array}{cc}\hfill {m}_{O2A}(t,c,\theta )=& \sum _{j=1}^{c-1}{2}^{j}\left\{{G}_{1}^{\star j}\right(t)-{G}_{1}^{\star (j+1)}(t\left)\right\}{1}_{\{c>0\}}\hfill \\ \hfill & +{2}^{c}\sum _{j=0}^{\infty}{\left(2p\right)}^{j}\left\{{G}_{1}^{\star (c+j)}\right(t)-p{G}_{1}^{\star (c+j+1)}(t)-(1-p){G}_{1}^{\star (c+j)}\star {G}_{2}(t\left)\right\}.\hfill \end{array}$$

The expression for the covariance between the numbers of O2A/OPCs and the number of oligodendrocytes is also derived in a similar fashion. These moments involve convolutions of the form ${G}_{1}^{\star k}\star {G}_{2}\left(t\right)$, *k* = 0, 1 …, where, for *k* ≥ 1, ${G}_{1}^{\star k}(\cdot )$ is the c.d.f. of a gamma distribution with mean and variance *km*_{1} and $k{\sigma}_{1}^{2}$. The associated cumulant generating function takes the form *K _{k}*(

$${K}_{k}^{\left(r\right)}\left(u\right)=\frac{(r-1)!k{\alpha}_{1}{\beta}_{1}^{r}}{{(1-{\beta}_{1}u)}^{r}}+\frac{(r-1)!{\alpha}_{2}{\beta}_{2}^{r}}{{(1-{\beta}_{2}u)}^{r}},$$

and the solution to the equation $k{\sigma}_{1}^{2}$ is

$${\stackrel{~}{u}}_{t,k}=\frac{({\beta}_{1}+{\beta}_{2})t-(k{\alpha}_{1}+{\alpha}_{2}){\beta}_{1}{\beta}_{2}-\sqrt{{\left\{\right({\beta}_{1}-{\beta}_{2})t+(k{\alpha}_{1}-{\alpha}_{2}\left){\beta}_{1}{\beta}_{2}\right\}}^{2}+4k{\alpha}_{1}{\alpha}_{2}{\beta}_{1}^{2}{\beta}_{2}^{2}}}{2{\beta}_{1}{\beta}_{2}t}.$$

(6)

Saddlepoint approximations to the moments can be easily constructed using the above quantities and equations (3-5).

While O2A/OPCs and oligodendrocytes are the only two distinguishible cell types, we enriched cell classification by defining *L* = 4 “observable” cell types: O2A/OPCs of first generation (*type 1*), O2A/OPCs of generation greater than one (*type 2*), oligodendrocytes that arise from an O2A/OPC of first generation (*type 3*), and all other oligodendrocytes (*type 4*). Based on the composition of the clones, the number of cells of each type can be easily determined. For instance, a clone containing a single O2A/OPCs will yield one cell of type 1 and 0 cells of types 2, 3, and 4. The expressions for the mean, variance and covariance of the number of cells of these four types can be determined as explained above. This enriched classifiation is expected to improve statistical inference since estimation is conducted conditional on whether the lifespan of the initiator cell has ended and on whether this cell ended up dividing or differentiating. In a simulation study, we compared the performance of the pseudo-likelihood estimator based on the original two cell types with the enriched classification. The study indicated that the variance of the estimator is substantially decreased in the latter case.

PL and QL estimates are reported in Table 1. The fitted model is shown in Figure 2. The full model provided a better fit than the reduced model. PL and QL estimates were relatively close. The full model suggested that the mean time to division of O2A/OPCs was substantially larger than their mean time to differentiation (23-30 hrs vs. 45-50 hrs). This is consistent with the results of a past analysis reported by Hyrien et al. (2005a) using a different model. Standard errors were computed for the parameter estimates of the full model fitted to the four cell types using bootstrap. The QL estimates of the mean and variance of the distribution of the time to events were smaller than their PL counterparts; the standard errors of the PL estimates for the parameters of the probability of division were smaller than those for the QL estimates. To assess the accuracy of saddlepoint approximations, we computed simulation-based approximations to the moments. The two approximations were almost perfectly superimposed (Figure 2).

In the quantitative analysis of dividing cell populations it is commonly assumed that the distribution of the mitotic cycle (MC) duration remains identical across generations, with the occasional exception of that of initiator cells (e.g., Hanin et al, 2006). There exists a number of biological reasons for questioning this assumption, however, since cell proliferation may slow down due to more limited access to nutrients or to cell signaling. In this second application the proposed method is used to investigate this assumption through analyzing a set of clonal data on the proliferation of Hela cells cultured *in vitro*. In contrast to the previous example, clonal cultures of these cells contained a single morphologically distinguishible cell type (*L* = 1), and every cell can be assumed to divide into two new cells upon completion of its MC with probability one. In the experiment under consideration, the number of cells was counted in multiple clones at the following time points: 10, 16, 18, 20, 24, 30, 35, 40, 48, 55, and 60 hours. About one hundred clones were examined at each of these time points, totaling *n* = 1077 clones in the entire experiment (See Hanin et al (2006) for further experimental details).

We modeled cell proliferation using a branching process with infinitely many types (*K* = 1), with each type representing a specific generation. This process begins with a single initiator cell of generation 1. The MC duration of any cell of generation *g* follows a gamma distribution with mean *μ _{g}* and variance ${\sigma}_{g}^{2}$,

We first challenged *Model 1* against a simpler model (*Model 0*) that assumes that the mean and the variance of the MC duration do not change with generation; that is, we tested the null hypothesis ${H}_{0}^{\left(1\right)}:\u2018\u2018{\mu}_{1}={\mu}_{2}=\cdots \phantom{\rule{thickmathspace}{0ex}}\&\phantom{\rule{thickmathspace}{0ex}}{\sigma}_{1}^{2}={\sigma}_{2}^{2}=\cdots "$ vs. ${H}_{1}^{\left(1\right)}:\u2018\u2018{\mu}_{1}\ne {\mu}_{2},{\mu}_{2}={\mu}_{3}=\cdots \phantom{\rule{thickmathspace}{0ex}}\&\phantom{\rule{thickmathspace}{0ex}}{\sigma}_{1}^{2}\ne {\sigma}_{2}^{2},{\sigma}_{2}^{2}={\sigma}_{3}^{2}=\cdots "$. This was done using a Monte Carlo Wald test as described by Hyrien et al (2005), with simulated data generated using fitted *Model 0*. Our analysis strongly supported *Model 1* (*P* < 0.001), and suggested that the mean MC duration was much shorter for cells of first generation (4.4 hrs vs. 20.1 hrs; see Table 2). This can be explained by the fact that founder cells are sampled at some point in their MC (not at the beginning), and *μ*_{1} would rather represent the expected residual MC duration.

To further assess *Model 1*, we tested it against an extended model (*Model 2*), which allowed the means and variances for the MC duration of cells of all generations to be different. To perform this analysis we set *μ _{g}* =

To perform the above analyses, we proceeded as follows. Similarly to the analysis of O2A/OPCs, we distinguished cells of first and of subsequent generations, and refer to them as type-1 and type-2 cells, respectively. Let Y_{i} = (*Y*_{i1}, *Y*_{i2})’ denote the number of type-1 and the number of type-2 cells counted in the *i*th clone at time *t _{i}*. Let

$${K}_{g}^{\left(r\right)}\left(u\right)=\sum _{l=1}^{g}\frac{(r-1)!{\mu}_{l}{({\sigma}_{l}^{2}\u2215{\mu}_{l})}^{r-1}}{{(1-{\sigma}_{l}^{2}\u2215{\mu}_{l}u)}^{r}}.$$

The saddlepoint ${\stackrel{~}{u}}_{g,t}$ is the root to the equation ${K}_{g}^{\left(1\right)}\left(u\right)=t$. We deduced ${\stackrel{~}{u}}_{2,t}$ from equation (6) where we set *k* = 1; for *g* ≥ 3, the equation is solved numerically. The saddlepoint approximations to the expectation follows from equations (3-5). Figure 3 shows the experimental and fitted (*Models 0-2*) mean number of type-1 and type-2 cells versus time. Simulations were used to assess the accuracy of the saddlepoint approximations to the expected number of cells.

The distribution of the generation (that is, the number of divisions undergone since time 0) has found multiple uses in cell biology. For instance, Hyrien and Zand (2008) used it in the analysis of CFSE (carboxy-fluorescein succinimidyl ester)-labeling experiments, and Cole et al (2006) considered the expectation of this distribution to compare cell proliferation kinetics across a set of experimental conditions. This distribution does not generally have an explicit expression, which motivated Cole et al (2006) to approximate its expectation using asymptotic arguments. Asymptotic formulae such as the one used by Cole et al (2006) have not been derived for general models, and saddlepoint approximations offer a good alternative to approximate the distribution of the generation or its expectation.

Let *κ*(*t*) denote the generation of any cell randomly sampled in the population at time *t*, given the population has not become extinct by time *t*, and let *Z _{l}*(

We tested this approach using a two-type process that begins at time 0 with a large number of type-1 cells. Upon completion of its lifespan, every cell of generation *g* (*g* = 1, 2 ) either divides into two cells of generation (*g* + 1) with probability *p*, or it dies (that is, disappears from the population) with probability *q* = 1 − *p*. The duration of the lifespan of any cell of first generation is represented by the sum of two r.v.s *τ*_{0} + *τ*_{1}, where *τ*_{0} and *τ*_{1} follow Inverse Gaussian distributions *G*_{0} and *G* with respective means *μ*_{0} and *μ*, variances ${\sigma}_{0}^{2}$ and *σ*^{2}, and c.d.f.s *G*_{0} and *G*, while, for every *g* = 2, 3 , the duration of the lifespan of any cell of generation *g* is described by a r.v. *τ _{g}* that follows the Inverse Gaussian distribution with c.d.f.

$${K}_{g}^{\left(r\right)}\left(u\right)=\frac{(2r-3)!!{\sigma}_{0}^{2r-2}}{{\mu}_{0}^{r-2}}{(1-2{\sigma}_{0}^{2}u\u2215{\mu}_{0})}^{-(2r-1)\u22152}+\frac{(2r-3)!!{g}^{r}{\sigma}^{2r-2}}{{\mu}^{r-2}}{(1-2g{\sigma}^{2}u\u2215\mu )}^{-(2r-1)\u22152},$$

where the double factorial *r*!! is defined as

$$r!!=\{\begin{array}{cc}r\cdot (r-2)\cdot (r-4)\dots 6\cdot 4\cdot 2\hfill & \text{if}\phantom{\rule{thickmathspace}{0ex}}r\phantom{\rule{thickmathspace}{0ex}}\text{is even}\hfill \\ r\cdot (r-2)\cdot (r-4)\dots 5\cdot 3\cdot 1\hfill & \text{if}\phantom{\rule{thickmathspace}{0ex}}r\phantom{\rule{thickmathspace}{0ex}}\text{is odd}.\hfill \end{array}\phantom{\}}$$

The saddlepoint ${\stackrel{~}{u}}_{t,g}$ which solves the equation ${K}_{g}^{\left(r\right)}\left(u\right)=t$ is approximated numerically. Plugging the above expressions into equations (3-5) yields the desired quantities. Depicted in Figure 4 are the saddlepoint approximations to *p _{g}*(

Age-dependent branching processes, and their multitype extensions, have become important tools in the analysis of cell population dynamics. To get around the diffculties that arise in their practical use, we have proposed saddlepoint approximations to their expectation and variance-covariance function. Our work suggests that this approach provides quite accurate, easy to implement approximations to these moments. The only requirement is existence of the cumulant generating function for the lifespan distributions, as well as the differentiability of these functions. This method also proved to be much faster than the method based on simulations of the process, and it offers the possibility to resort to a wider range of estimation techniques (such as QL and GEE estimators). The proposed approach allows complex branching processes to be fitted quite easily to experimental data. However, when fitting such models one should keep in mind that the fit will never worsen as the structure of the model is enriched, and that any improvement in the fit will not necessarily reflect that a better description of the biological process has been achieved. Whenever possible, the findings should be verified in independent experiments. In this spirit, Hyrien et al (2006) used time-lapse experiments to validate the results of a previous analysis of clonal data that suggested that the distributions of the time to division and of the time to differentiation of O2A/OPCs could be dissimilar (Hyrien et al, 2005).

This research was supported by NIH grants 2R01 NS039511, R01 CA134839, 1R01 AI069351, 2P30 ES001247, and NIH/NIAID grant N01-AI-050020.

Supplementary Materials The Web Appendix is available under the Paper Information link at the *Biometrics* website http://www.biometrics.tibs.org.

- Athreya KB, Ney PE. Branching Processes. Springer-Verlag; Berlin: 1972.
- Barndorff-Nielsen O, Cox DR. Edgeworth and saddle-point approximations with statistical applications. J. R. Statist. Soc. B. 1979;41:279–312.
- Boucher K, Yakovlev AY, Mayer-Pröschel M, Noble M. A stochastic model of temporarily regulated generation of oligodendrocytes in vitro. Math. Biosci. 1999;159:47–78. [PubMed]
- Boucher K, Zorin A, Yakovlev AY, Mayer-Pröschel M, Noble M. An alternative stochastic model of generation of oligodendrocytes in cell culture. J. Math. Biol. 2001;43:22–36. [PubMed]
- Cole DJ, Morgan BJT, Ridout MS, Byrne LJ, Tuite MF. Approximations for Expected Generation Number. Biometrics. 2007;63:1023–1030. [PubMed]
- von Collani E, Tsodikov A, Yakovlev A, Mayer-Pröschel M, Noble M. A random walk model of oligodendrocyte generation in vitro and associated estimation problems. Math. Biosci. 1999;159:189. [PubMed]
- Daniels HE. Saddlepoint approximations in statistics. Ann. Math. Statist. 1954;25:631–650.
- DiCiccio TJ, Efron B. Bootstrap confidence intervals. Statistical Science. 1996;3:189–228.
- Golinelli D, Guttorp P, Abkowitz JA. Bayesian inference in a hidden stochastic two-compartment model for feline hematopoiesis. Mathematical Medicine and Biology. 2006;23:153–172. [PubMed]
- Haccou P, Jagers P, Vatutin VA. Branching Processes: Variation, Growth and Extinction of Populations. Cambridge University Press; Cambridge: 2005.
- Hanin L, Hyrien O, Bedford J, Yakovlev A. A comprehensive stochastic model of irradiated cell population in cell culture. Journal of Theoretical Biology. 2006;239:401–416. [PubMed]
- Heyde CC. Quasi-Likelihood and Its Applications. A General Approach to Optimal Parameter Estimation. Springer-Verlag; New-York: 1997.
- Hoel DG, Crump KS. Estimating the generation-time distribution of an age-dependent branching process. Biometrics. 1974;30:125–135. [PubMed]
- Hyrien O. A pseudo maximum likelihood estimator for discretely observed multi-type Bellman-Harris branching processes. J. Statist. Plann. Inference. 2007;137:1375–1388.
- Hyrien O, Ambeskovic I, Mayer-Pröschel M, Noble M, Yakovlev A. Stochastic modeling of oligodendrocyte generation in cell culture: model validation with time-lapse data. Theor. Biol. Medical Model. 2006;3 Art. 21. [PMC free article] [PubMed]
- Hyrien O, Mayer-Pröschel M, Noble M, Yakovlev A. A stochastic model to analyze clonal data on multi-type cell populations. Biometrics. 2005a;61:199–207. [PubMed]
- Hyrien O, Mayer-Pröschel M, Noble M, Yakovlev A. Estimating the life-span of oligodendrocytes from clonal data on their development in cell culture. Math. Biosci. 2005b;193:255–274. [PubMed]
- Hyrien O, Zand MS. A mixture model with dependent observations for the analysis of CFSE-labeling experiments. J. Am. Statist. Assoc. 2008;103:222–239.
- Jagers P. Branching Processes with Biological Applications. John Wiley and Sons; London: 1975.
- Kimmel M, Axelrod DE. Branching Processes in Biology. Springer; New York: 2002.
- Liang K-Y, Zeger SL. Longitudinal data analysis using generalized linear models. Biometrika. 1986;73:13–22.
- Lugannani R, Rice S. Saddlepoint approximations for the distribution of the sum of independent random variables. Adv. Appl. Proba. 1980;12:475–490.
- Morgan BJT, Ridout MS, Ruddock L. Models for yeast prions. Biometrics. 2003;59:562–569. [PubMed]
- Nedelman J, Downs H, Pharr P. Inference on an age-dependent, multitype branching-process model of mast cells. J. Math. Biol. 1985;25:203–226. [PubMed]
- Noble M, Murray K, Stroobant P, Waterfield MD, Riddle P. Platelet-derived growth factor promotes division and motility and inhibits premature differentiation of the oligodendrocyte/type-2 astrocyte progenitor cell. Nature. 1988;333:560–562. [PubMed]
- Raff MC, Miller RH, Noble M. A glial progenitor cell that develops in vitro into an astrocyte or an oligodendrocyte depending on the culture medium. Nature. 1983;303:390–396. [PubMed]
- Renshaw E. Applying the saddlepoint approximation to bivariate stochastic processes. Math. Biosci. 2000;168:57–75. [PubMed]
- Ridout MS, Cole DJ, Morgan BJT, Byrne LJ, Tuite MF. New Approximations to the Malthusian parameter. Biometrics. 2006;62:1216–1223. [PubMed]
- Yakovlev AY, Boucher K, Mayer-Pröschel M, Noble M. Quantitative insight into proliferation and differentiation of oligodendrocyte-type 2 astrocyte progenitor cells in vitro. Proc. Natl. Acad. Sci. USA. 1998a;95(4):14164–14167. [PubMed]
- Yakovlev AY, Mayer-Pröschel M, Noble M. A stochastic model of brain cell differentiation in tissue culture. J. Math. Biol. 1998b;37:49–60. [PubMed]
- Yakovlev AY, Mayer-Pröschel M, Noble M. Stochastic formulations of a clock model for temporally regulated generation of oligodendrocytes
*in vitro*. Math. Comput. Model. 2000;32:125–137. - Yakovlev AY, Yanev N. Transient Processes in Cell Proliferation Kinetics. Springer; Heidelberg: 1989.
- Zorin AA, Yakovlev AY, Mayer-Pröschel M, Noble M. Estimation problems associated with stochastic modeling of proliferation and differentiation of O-2A progenitor cells in vitro. Math. Biosci. 2000;167:109–121. [PubMed]

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |