Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2692649

Formats

Article sections

- Abstract
- 1 Introduction
- 2 Mutation Rates in Replication
- 3 Statistical Inferences
- 4 Mutation Rate in Time
- 5 Divergent Culture Sizes
- 6 Conclusion
- References

Authors

Related links

J Math Biol. Author manuscript; available in PMC 2010 August 1.

Published in final edited form as:

Published online 2008 October 10. doi: 10.1007/s00285-008-0225-8

PMCID: PMC2692649

NIHMSID: NIHMS79365

The publisher's final edited version of this article is available at J Math Biol

See other articles in PMC that cite the published article.

In this paper we propose a stochastic model based on the branching process for estimation and comparison of the mutation rates in proliferation processes of cells or microbes. We assume in this model that cells or microbes (the elements of a population) are reproduced by generations and thus the model is more suitably applicable to situations in which the new elements in a population are produced by older elements from the previous generation rather than by newly created elements from the same current generation. Cells and bacteria proliferate by binary replication, whereas the RNA viruses proliferate by multiple replication. The model is in terms of multiple replications, which includes the special case of binary replication. We propose statistical procedures for estimation and comparison of the mutation rates from data of multiple cultures with divergent culture sizes. The mutation rate is defined as the probability of mutation per replication per genome and thus can be assumed constant in the entire proliferation process. We derive the number of cultures for planning experiments to achieve desired accuracy for estimation or desired statistical power for comparing the mutation rates of two strains of microbes. We establish the efficiency of the proposed method by demonstrating how the estimation of mutation rates would be affected when the culture sizes were assumed similar but actually diverge.

An important parameter in the evolution of populations, consisting of cells, viruses, bacteria, or other microbes, is the speed at which individuals mutate to individuals with altered biological properties. The rate of mutation can be measured with respect to the continuous time or to the generation of pedigree lineage. In this paper we simply call the former the *mutation rate in time* and the latter the *mutation rate in replication*. The mathematical models on mutation rates initiated by Luria and Delbruck [10] and further developed by other authors (e.g., [8], [1], [9], and [14]) were in terms of the mutation with respect to the continuous time. As the mutation occurs during replication of the genome, the mutation rate should be better defined as a probability that an error was made during replication so that the genome of a daughter element differs from that of the parent, rather than as a probability that an element changes its nature in an unit time at any moment of its lifetime. However, mathematically the *mutation rate in time* and the *mutation rate in replication* are about the same if the unit-time of the former is the mean time-span of life cycle of an element. The *mutation rate in time* can be meaningfully defined and used when the proliferation rate in culture is constant in time and uniform among multiple cultures. The assumption of constant and uniform proliferation rate may sometimes not hold, for example, initially similar cultures could grow into very different colony sizes in the same period of time. If the proliferation rate varies much among cultures, the *mutation rate in time* cannot be estimated reliably and nor can the estimated mutation rate be meaningfully interpreted or applied. The *mutation rate in replication*, defined as the probability of mutation in each replication, remains constant even when the proliferation rate changes, and hence can be estimated and applied reliably in different situations.

In this paper we propose a stochastic model based on a branching process, in which the variable for the time domain of the stochastic process is the log of population size *n* rather than the real time *t*, and hence the estimation of mutation rate would not be affected by probable temporal variations of proliferation among cultures during the process. The traditional Luria-Delbruck type models assume that any element, recently born or had been existed for certain time, at any moment are equally likely to give birth to new elements, or equivalently assume that the life time of each element since its birth is exponentially distributed. The model we proposed differs from the Luria-Delbruck type models with the following conditions: 1) elements can be born simultaneously (as compared to being born only one after another); 2) new elements can be produced only from a part of the current population, i.e., the elements having existed for certain time (as compared to that they can be produced from the whole current population); 3) the births and mutations of elements are independent among new born elements (as compared to that the birth and mutation of the *n*th element depends on the status of the (*n*-1)th element). For easier mathematical derivation, we simplify the model by assuming equal life span of elements (equivalently the synchronous replications of elements) and assuming a fixed number of offsprings produced by each element. This model is suitable for application situations in which the new elements are unlikely to be produced by recently born elements in the same current generation but by elements that had been existed for certain times (the previous generation). The fixed life span of elements assumed in the model is understood as the mean life spans of elements, and the fixed number of offsprings assumed in the model is understood as the mean numbers of offsprings.

We propose the model in terms of multiple replication (e.g., the RNA viruses proliferate by multiple replication) which includes the binary replication as a special case (e.g., cells and bacteria proliferate by binary replication). We also propose statistical procedures for estimating and testing the mutation rate as probability of mutation per replication per genome, based on data from multiple cultures with divergent culture sizes. The number of cultures is formulated for planning experiments to achieve desired accuracy for estimation or desired statistical power for comparing the mutation rates of two strains of cells. We derive an estimator for the *mutation rate in time* from the estimator for the *mutation rate in reprelication* based on the proposed model. Finally, we show how the estimation of mutation rate can be affected under the assumption of a common culture size when culture sizes actually diverge.

The size of a growing population in an experimental culture increases exponentially under good conditions. As an idealization, the growth process can be imagined as a result of repeated multiplication of individuals from *N*_{0} = *N*_{0}*b*^{0} at the 0th generation to *N _{g}* =

In this paper, we denote the mutation rate per replication as μ, the number of mutants in the population as *M*, the population size as *N*, the frequency of mutants in the population as *f* = *M*/*N*. Their initial values at generation 0 are *M*_{0}, *N*_{0}, and *f*_{0} = *M*_{0}/*N*_{0}, respectively. We denote the base for the multiple-replication proliferation as *b*, and denote a coefficient related to *b* as *r _{b}* = ln(

We propose a stochastic model for the number of mutants in the proliferation process of a culture based on a branching process which can be found in classic mathematical books such as Karlin & Taylor [5] or in recently published books on mathematical biology such as Haccou, Jager & Vatutin [3] or Kimmel & David [7]. Let *X _{j}* be the number of unmutated individuals in the

$$\begin{array}{cc}\hfill E\left(M\right)& =N-E\left({X}_{g}\right)=N\left\{1-(1-{f}_{0}){(1-(1-1\u2215b)\mu )}^{{\mathrm{log}}_{b}\left({\scriptstyle \frac{N}{{N}_{0}}}\right)}\right\},\hfill \\ \hfill \mathrm{Var}\left(M\right)& =\frac{(1-{f}_{0})\mu {N}^{2}}{b{N}_{0}}{(1-(1-1\u2215b)\mu )}^{{\mathrm{log}}_{b}\left(\frac{N}{{N}_{0}}\right)-1}\left\{{(1-(1-1\u2215b)\mu )}^{{\mathrm{log}}_{b}\left(\frac{N}{{N}_{0}}\right)}-\frac{{N}_{0}}{N}\right\},\hfill \end{array}$$

(1)

where ${f}_{0}={\scriptstyle \frac{{M}_{0}}{{N}_{0}}}$. The derivation of equations in (1) is sketched in A1 in Appendix. Assume *E*(*M*)/*N* and *f*_{0} are close to 0, by which ln(1 - *E*(*M*)/*N*) ≈ -*E*(*M*)/*N* and ln(1 - *f*_{0}) ≈ -*f*_{0}. Then we have $\mu \approx {\scriptstyle \frac{{r}_{b}\{{\scriptstyle \frac{E\left(M\right)}{N}}-{f}_{0}\}}{\mathrm{ln}\left({\scriptstyle \frac{N}{{N}_{0}}}\right)}}$ by solving μ from the equation for *E*(*M*) in (1), and the derivation is sketched in A2 in Appendix. Consequently, an estimator for μ is

$$\widehat{\mu}=\frac{{r}_{b}\left({\scriptstyle \frac{M}{N}}-{f}_{0}\right)}{\mathrm{ln}\left({\scriptstyle \frac{N}{{N}_{0}}}\right)},$$

(2)

where ${r}_{b}={\scriptstyle \frac{\mathrm{ln}\left(b\right)}{1-1\u2215b}}$. The estimator $\widehat{\mu}$ in (2) is approximately unbiased if ${\scriptstyle \frac{E\left(M\right)}{N}}$ and *f*_{0} are close to 0, as shown in A2. Methods for multiple cultures developed later in this paper are based on the $\widehat{\mu}$ in (2). Under this assumption, the variance of $\widehat{\mu}$ is

$$\mathrm{Var}\left(\widehat{\mu}\right)\approx \frac{{r}_{b}^{2}(1-{f}_{0})\mu}{b{N}_{0}{\left\{\mathrm{ln}\left({\scriptstyle \frac{N}{{N}_{0}}}\right)\right\}}^{2}},$$

(3)

and the derivation of this equation is sketched in A3 in Appendix. Replacing μ by $\widehat{\mu}$ in the equation above, we obtain an estimator for $\mathrm{Var}\left(\widehat{\mu}\right)$ as

$${\widehat{V}}_{\widehat{\mu}}=\frac{{r}_{b}^{3}(1-{f}_{0})({\scriptstyle \frac{M}{N}}-{f}_{0})}{b{N}_{0}{\left\{\mathrm{ln}\left({\scriptstyle \frac{N}{{N}_{0}}}\right)\right\}}^{3}}.$$

(4)

If the assumption that *E*(*M*)/*N* and *f*_{0} are close to 0 does not hold, then the approximation of μ is, as shown in A2 in Appendix,

$$\mu \approx \frac{{r}_{b}\left\{\mathrm{ln}(1-{f}_{0})-\mathrm{ln}\left(1-{\scriptstyle \frac{E\left(M\right)}{N}}\right)\right\}}{\mathrm{ln}\left({\scriptstyle \frac{N}{{N}_{0}}}\right)}$$

(5)

and its estimator $\widehat{\mu}$ with improved accuracy can be obtained by replacing *E*(*M*) in the equation (5) by *M*. The variance of this $\widehat{\mu}$ is, as shown in A3 in Appendix,

$$\mathrm{Var}\left(\widehat{\mu}\right)\approx \frac{{r}_{b}(1-{f}_{0})}{b{N}_{0}}\left[\frac{{r}_{b}\mu}{{\left\{\mathrm{ln}\left({\scriptstyle \frac{N}{{N}_{0}}}\right)\right\}}^{2}}-\frac{2{\mu}^{2}}{\mathrm{ln}\left({\scriptstyle \frac{N}{{N}_{0}}}\right)}\right],$$

(6)

which can be estimated by replacing μ in the equation (6) with this $\widehat{\mu}$.

Taking for example, the special case *b* = 2 is the most common in biology, for which the ξ is distributed with *P*(ξ = 1) = μ and *P* (ξ = 2) = 1 - μ; its expectation *E*(ξ) = 2 - μ and variance Var(ξ) = μ(1 - μ). With *r _{b}* = ln(4) for

$$\widehat{\mu}=\frac{\mathrm{ln}\left(4\right)({\scriptstyle \frac{M}{N}}-{f}_{0})}{\mathrm{ln}\left({\scriptstyle \frac{N}{{N}_{0}}}\right)},$$

(7)

and $\mathrm{Var}\left(\widehat{\mu}\right)$ in (3) became

$$\mathrm{Var}\left(\widehat{\mu}\right)\approx \frac{{\left\{\mathrm{ln}\left(4\right)\right\}}^{2}(1-{f}_{0})\mu}{2{N}_{0}{\left\{\mathrm{ln}\left({\scriptstyle \frac{N}{{N}_{0}}}\right)\right\}}^{2}}.$$

(8)

If *f*_{0} = 0, then $\widehat{\mu}$ and $\mathrm{Var}\left(\widehat{\mu}\right)$ by (7) and (8) are the same as the results by Rossman et al. [12], which were in a different form and obtained through a different approach.

In most of applications, the mutation rate at the phenotypic level of mutation can be estimated more easily than that at the genomic leve, because the mutants can be distinguished from nonmutants with much less efforts at the phenotypical level than that at the genomic level. However, in biological studies, most investigators are not interested in mutation rates at phenotypic level but at genomic level, because the comparison of mutation rates at phenotypic level can be misleading if the different changes in the genome lead to the same phenotypic change. For example, in virology, in the case of influenza viruses, virions are resistant to amantadine, an antiviral drug, if there is a point mutation at one of k (=4) known specific sites in the nucleotide sequence, shown by Hay et al. [4]. We denote the mutation rate at the phenotypic level as μ_{p}, the mutation rates at genomic level as μ* _{gi}* for genomic sites

$${\widehat{\mu}}_{g}=\frac{{r}_{b}({\scriptstyle \frac{M}{N}}-{f}_{0})}{k\phantom{\rule{thinmathspace}{0ex}}\mathrm{ln}\left({\scriptstyle \frac{N}{{N}_{0}}}\right)},$$

(9)

and its variance is

$$\mathrm{Var}\left({\widehat{\mu}}_{g}\right)\approx \frac{{r}_{b}^{2}(1-{f}_{0}){\mu}_{g}}{kb{N}_{0}{\left\{\mathrm{ln}\left({\scriptstyle \frac{N}{{N}_{0}}}\right)\right\}}^{2}}.$$

(10)

The derivation of the equation in (10) is sketched in A4 in Appendix. The culture size *N* and the number of mutants *M* are obtained at the phenotypic level (e.g., virus level), for example, by counting viruses in the culture before and after adding amantadine, respectively. Replacing μ* _{g}* by ${\widehat{\mu}}_{g}$ in the equation for $\mathrm{Var}\left({\widehat{\mu}}_{g}\right)$ above, we obtain an estimator for $\mathrm{Var}\left({\widehat{\mu}}_{g}\right)$,

$${\widehat{V}}_{{\widehat{\mu}}_{g}}=\frac{{r}_{b}^{3}(1-{f}_{0})({\scriptstyle \frac{M}{N}}-{f}_{0})}{{k}^{2}b{N}_{0}{\left\{\mathrm{ln}\left({\scriptstyle \frac{N}{{N}_{0}}}\right)\right\}}^{3}}.$$

(11)

The methods for multiple cultures developed later in this paper are based on equations (9) and (11). In any equation for the genotypical level that we will develop below, statistical inference on mutation rate at the phenotypic level can be obtained by setting μ* _{p}* = μ

Drake [2] proposed a model in which the *mutation rate in replication* was used instead of the *mutation rate in time* and a differential equation $dM=(\mu +{\scriptstyle \frac{M}{N}})dN$ was used to link the number of mutants in a culture with the population size of the culture rather than with the real time since the start of the culture. Solving this equation, an estimator for the *mutation rate in replicationμ* was obtained as $\widehat{\mu}={\scriptstyle \frac{{\scriptstyle \frac{M}{N}}-{f}_{0}}{\mathrm{ln}\left({\scriptstyle \frac{N}{{N}_{0}}}\right)}}$. The increment of mutants *dM* actually has two components: 1) increment by proliferation of existing mutants, or ${\scriptstyle \frac{M}{N}}dN$; 2) mutation of the increment of existing nonmutants, or $\mu (1-{\scriptstyle \frac{M}{N}})dN$. Taking into account this fact we can modify the above model to $dM=\{\mu (1-{\scriptstyle \frac{M}{N}})+{\scriptstyle \frac{M}{N}}\}dN$, and which yields an estimator $\widehat{\mu}={\scriptstyle \frac{\mathrm{ln}(1-{f}_{0})-\mathrm{ln}(1-{\scriptstyle \frac{M}{N}})}{\mathrm{ln}\left({\scriptstyle \frac{N}{{N}_{0}}}\right)}}$. The latter estimator improves the former when *M* is large such that the condition ${\scriptstyle \frac{M}{N}}\approx 0$ barely holds. The two $\widehat{\mu}\u2019s$ above derived from the model above as functions of *M*, *N*, *N*_{0}, and *f*_{0} are similar to the two $\widehat{\mu}\u2019s$ in Section 2.1 derived from the proposed stochastic model for *b* = 2, but differ with a coeffcient *r _{b}* = 1.386. The similarity suggests that the differences in the assumptions of the proposed stochastic model and the Luria-Delbruck type model do not lead to much different estimators for

For simplicity, from now on we will use μ instead of μ_{g} to denote the mutation rate at the genotypic level. The μ is very small in most applications, hence for statistical inference we measure the accuracy of $\widehat{\mu}$ with the relative scale of $\gamma \equiv {\sigma}_{\widehat{\mu}}\u2215\mu $ rather than with the scale of ${\sigma}_{\widehat{\mu}}$, where ${\sigma}_{\widehat{\mu}}^{2}=\mathrm{Var}\left(\widehat{\mu}\right)$. The smaller is the γ, the more accurate is the estimation of μ. The relationship between γ and the other parameters of the proposed model is $\gamma ={\scriptstyle \frac{{r}_{b}\sqrt{1-{f}_{0}}}{\mathrm{ln}\left({\scriptstyle \frac{N}{{N}_{0}}}\right)\sqrt{\mu kb{N}_{0}}}}$, which was obtained easily by joining the definition of γ and the equation for $\mathrm{Var}\left({\widehat{\mu}}_{g}\right)$ in (10). Using the equation for γ above, we calculated its value for several usual settings of parameters for estimating mutation rates of influenza viruses, and the results indicated that if an estimation is obtained from a single culture, then the γ would be too large for application purposes. Multiple cultures are needed for applications if one wants to obtain estimations with reasonably good confidence levels.

For *C* cultures, let *N*_{i} be the culture size, *M*_{i} the number of mutants, and ${\widehat{\mu}}_{i}$ the estimator for μ for the *i*th culture, *i* = 1, , *C*, where μ is the mutation rate common for all cultures. We assume *N*_{i}'s are known and constants (for being conditioned on *N*_{i}'s), *M _{i}*'s are random variables. There are two ways to pool estimators ${\widehat{\mu}}_{i}\u2019s$: the

Define ${\widehat{\mu}}_{a}={\scriptstyle \frac{1}{C}}{\sum}_{i=1}^{C}{\widehat{\mu}}_{i}$, by (9) we have

$${\widehat{\mu}}_{a}=\frac{{r}_{b}}{kC}\sum _{i=1}^{C}\frac{{\scriptstyle \frac{{M}_{i}}{{N}_{i}}}-{f}_{{0}_{i}}}{\mathrm{ln}\left({\scriptstyle \frac{{N}_{i}}{{N}_{{0}_{i}}}}\right)},$$

(12)

where *N*_{0i} and *f*_{0i} are the initial culture size and frequency of mutants in the *i*th culture. The variance of ${\widehat{\mu}}_{a}$ is

$$\mathrm{Var}\left({\widehat{\mu}}_{a}\right)=\frac{{r}_{b}^{2}\mu}{kb{C}^{2}}\sum _{i=1}^{C}\frac{1-{f}_{{0}_{i}}}{{N}_{{0}_{i}}{\left\{\mathrm{ln}\left({\scriptstyle \frac{{N}_{i}}{{N}_{{0}_{i}}}}\right)\right\}}^{2}},$$

(13)

which can be estimated by replacing the μ in (13) by the ${\widehat{\mu}}_{a}$ in (12).

Define ${\widehat{\mu}}_{w}={\sum}_{i=1}^{C}{w}_{i}{\widehat{\mu}}_{i}$, where *w*_{i}'s are weights (${\sum}_{i=1}^{C}{w}_{i}=1$ and ${w}_{i}\ge 0$). The variance of ${\widehat{\mu}}_{w}$ can be minimized by allocating *w*_{i} according to the inverse of variance of ${\widehat{\mu}}_{i}$ (e.g., see p308 in Rao [11]), by which the optimal weighted-average estimator is

$${\widehat{\mu}}_{w}=\frac{{r}_{b}}{kT}\sum _{i=1}^{C}\frac{{N}_{{0}_{i}}({\scriptstyle \frac{{M}_{i}}{{N}_{i}}}-{f}_{{0}_{i}})\phantom{\rule{thinmathspace}{0ex}}\mathrm{ln}\left({\scriptstyle \frac{{N}_{i}}{{N}_{{0}_{i}}}}\right)}{1-{f}_{{0}_{i}}},$$

(14)

where $T={\sum}_{j=1}^{C}{N}_{{0}_{j}}{\left\{\mathrm{ln}\left({\scriptstyle \frac{{N}_{j}}{{N}_{{0}_{j}}}}\right)\right\}}^{2}\u2215(1-{f}_{{0}_{j}})$. The variance of ${\widehat{\mu}}_{w}$ is $\mathrm{Var}\left({\widehat{\mu}}_{w}\right)={\scriptstyle \frac{{r}_{b}^{2}\mu}{kbT}}$, Which can be estimated by

$${\widehat{V}}_{{\widehat{\mu}}_{w}}=\frac{{r}_{b}^{3}}{b{\left(kT\right)}^{2}}\sum _{i=1}^{C}\frac{{N}_{{0}_{i}}({\scriptstyle \frac{{M}_{i}}{{N}_{i}}}-{f}_{{0}_{i}})\phantom{\rule{thinmathspace}{0ex}}\mathrm{ln}\left({\scriptstyle \frac{{N}_{i}}{{N}_{{0}_{i}}}}\right)}{1-{f}_{{0}_{i}}}.$$

(15)

In general, the weighted-average estimator ${\widehat{\mu}}_{w}$ has a smaller variance and hence should be prefered. If the sizes of all cultures are similar, then ${\widehat{\mu}}_{w}$ and ${\widehat{\mu}}_{a}$ are approximately the same. Some users may favor the simple-average estimator ${\widehat{\mu}}_{a}$ for its intuitiveness and simplicity.

To compare the mutation rates of two different strains of microbes is to test *H*_{0} : μ_{1} ≤_{2} versus *H _{a}*: μ

In 1979, an H1N1 avian influenza virus crossed the species barrier and established a new lineage in European swine. To determine whether a high mutation rate (in this study a higher rate of point mutations) had contributed to interspecies transmission, Stech et al. [13] compared the mutation rate of A/Swine/Germany/2/81(H1N1), a well-characterized early isolate of the above-mentioned lineage, with that of A/Mallard/New York/6750/78(H2N2), a virus that was well-established in avian hosts. Each culture was started with a single wild-type virus, hence *N*_{0} = 1 and *f*_{0} = 0. The final culture size *N* and the number of mutants *M* were obtained by counting viruses in absence and presence of amantadine. Since a virus survives in the presence of amantadine if and only if there is at least one mutated nucleotide at any of 4 specific sites in its genome, hence *k* = 4. There were 17 cultures for the Swine virus and 10 cultures for the Mallard virus. Let μ_{sw} be the mutation rate of Swine virus and μ_{mal} be that of the Mallard virus. The interest of the study is to test *H*_{0} : μ_{sw} ≤μ_{mal} vs. *H*_{a} : μ_{sw} >μ_{mal}. We applied the method of weighted average for estimations and tests, and did the calculation for two cases: *b* = 2 and *b* = 5. Assuming *b* = 2, for the Swine virus: ${\widehat{\mu}}_{sw}=6.67\times {10}^{-6}$ and ${\widehat{\sigma}}_{\widehat{\mu}sw}=1.46\times {10}^{-5}$; for the Mallard virus: ${\widehat{\mu}}_{mal}=3.23\times {10}^{-5}$ and ${\widehat{\sigma}}_{\widehat{\mu}mal}=4.58\times {10}^{-5}$. Assumming b = 5, for the Swine virus: ${\widehat{\mu}}_{sw}=9.68\times {10}^{-6}$ and ${\widehat{\sigma}}_{\widehat{\mu}sw}=1.59\times {10}^{-5}$; for the mallerd virus: ${\widehat{\mu}}_{mal}=4.69\times {10}^{-5}$ and ${\widehat{\sigma}}_{\widehat{\mu}mal}=5.06\times {10}^{-5}$ for *b* = 2, *Z* = -0.534 and p-value = 0.703. for *b* = 5, *Z* = -0.701 and p-value = 0.758. Both results indicate that the mutation rate of Swine virus is not significantly larger than that of Mallard virus.

To plan an experiment with desired accuracy for estimation of a mutation rate or desired power for detecting a given difference for comparison of two mutation rates, we need a sufficient number of cultures. At the planning stage, we may assume *N*_{0i} = *N*_{0}, *f*_{0i} = *f*_{0}, and *N _{i}* =

The number of cultures for estimation with a given γ is

$$C=\frac{{r}_{b}^{2}(1-{f}_{0})}{\mu kb{N}_{0}{\left\{\gamma \phantom{\rule{thinmathspace}{0ex}}\mathrm{ln}(N\u2215{N}_{0})\right\}}^{2}},$$

(16)

where $\gamma (={\scriptstyle \frac{{\sigma}_{{\widehat{\mu}}_{a}}}{\mu}}={\scriptstyle \frac{{\sigma}_{{\widehat{\mu}}_{w}}}{\mu}})$ measures the accuracy of estimators ${\widehat{\mu}}_{a}$ and ${\widehat{\mu}}_{w}$ relative to the magnitude of μ. A smaller γ leads to a better estimation, but demands a larger sample size *C*, as shown in Tables 1 and and22.

To test the hypotheses *H*_{0} : μ_{1} ≤ μ_{2} versus *H _{a}* : μ

$$C={\left\{\frac{{r}_{b}({z}_{\alpha}+{z}_{\beta})}{d\phantom{\rule{thinmathspace}{0ex}}\mathrm{ln}\left({\scriptstyle \frac{N}{{N}_{0}}}\right)}\right\}}^{2}\frac{(d+2)(1-{f}_{0})}{{\mu}_{2}kb{N}_{0}},$$

(17)

where $d={\scriptstyle \frac{{\mu}_{1}-{\mu}_{2}}{{\mu}_{2}}}$ which is the difference of μ* _{1}* and μ

The numbers of cultures are calculated in Tables 1 and and22 for the estimation and the comparison of mutation rates, respectively. For example, to test *H*_{0} : μ_{1} ≤ μ_{2} versus *H _{α}* : μ

The number of elements in population *N* increases from *N*_{0} at the beginning to its final value at the end of experiment. During this process the number of mutants *M* as a function of *N* can be viewed as a stochastic process with the time domain in terms of *N* or log_{b}(*N*). In real experiments, however, *M* and *N* during this process are unobservable except their last values at the end of the experiment. The μ hence can only be estimated from the last values of *N* and *M*. On the other hand, the times at which mutations really happen during the process cause a very large variation in the *M* at the end of experiment, and hence causes a large variation in $\widehat{\mu}$. For example, one mutation at the first generation will produce mutant descendents as many as 1/*b* of population at the end of the experiment, whereas one mutation at the second to last generation produces *b* mutant descendents. The times of mutations during the process are unobservable and hence unknown. Therefore, the variance of $\widehat{\mu}$ is very large which makes the statistical inference with required accuracy very difficult because the number of cultures required could be impractically large. To reduce the variability of estimation, by contending an analogy of hitting-jackpot, Luria and Delbruck [10] proposed an estimator by assuming no mutation until a time when one mutation is expected in the population. This proposition actually is to let the culture size increase from *N*_{0} to ${N}_{0}^{\ast}$ until the time mentioned above and assume no mutants in the ${N}_{0}^{\ast}$ elements and take them as new initial starters. The accuracy of estimation by this practice is not controllable because the ${N}_{0}^{\ast}$ is large and hence there is a good chance that mutants could have existed among the ${N}_{0}^{\ast}$ pretended new starters. The mutation rate is overestimated by this method with a probability of 0.632, as pointed out by Armitage [1].

We propose another approach for improving the efficiency of statistical inference. As the trends shown in Tables 1 and and2,2, the number of cultures *C* can be reduced by increasing initial culture size *N*_{0} and/or final culture size *N*, for achieving the same accuracy in estimation or power in hypotheses testing. The difficulties are that the *N* is limited in real experiments, and that if *N*_{0} is greater than 1, then the number of mutants *M*_{0} among the *N*_{0} starters is unknown (*f*_{0} is hence unknown). However, the latter difficulty can be get around in a following way. The number of mutants *M* consists of two parts, *M* = *M*_{1} + *M*_{2}. *M*_{1} is the number of descendants from *M*_{0} initial mutants, or *M*_{1} = *M*_{0}*b*^{g} where *g* is the number for current generation; *M*_{2} is the number of mutated descendants from the *N*_{0}-*M*_{0} initial unmutated elements. It is always true that *f* ≥ *f*_{0}, because *f* = *M/N* ≥ *M*_{1}/*N* = *M*_{0}*b ^{g}*/

The *mutation rate in time* (denoted as *a*) is the probability that a microbe mutates in an unit time. Luria and Delbruck [10] proposed two methods for estimating this mutation rate: the *P*_{0} method and the mean method. The *P*_{0} method is simple to use, however, information from data is not fully utilized because only the presence and absence of mutants are counted whereas the numbers of mutants in cultures are ignored, and hence it requires a large number of cultures. The mean method of Luria and Delbruck model was further developed by Lea and Coulson [8] by deriving the probability distribution of the number of mutants and proposing the median method and the maximum likelihood method based on the distribution. A more general model was proposed by Armitage [1], which extends the results by Luria and Delbruck [10] and Lea and Coulson [8] in the way that mutants and nonmutants in a culture may grow with two different rates exponentially and the model includes the back-mutation. However, the inferences of mutation rates basically remain the same because the parameters of the more delicate models are not estimable from only two available values, *M* and *N*, from the data. In this section, we derive an estimator for the *mutation rate in time* based on the proposed stochastic model. The estimator is similar to that by the Luria and Delbruck type models, but with a substantially larger variance.

Let *N _{t}* denote the

$${M}_{t}^{\ast}={N}_{t}\left\{1-{(1-(1-1\u2215b)\mu )}^{{\mathrm{log}}_{b}\left({\scriptstyle \frac{{N}_{t}}{{N}_{0}}}\right)}\right\},$$

(18)

where μ is the *mutation rate in replication*. As a function of *t*, the ${M}_{t}^{\ast}$ would be known if *N _{t}* is given. From the equation for Var(

$$\begin{array}{cc}\hfill \mathrm{Var}\left({M}_{t}\right)=& \frac{\mu {{N}_{t}}^{2}}{b{N}_{0}}{(1-(1-1\u2215b)\mu )}^{{\mathrm{log}}_{b}({N}_{t}\u2215{N}_{0})-1}.\hfill \\ \hfill & \cdot \left\{{(1-(1-1\u2215b)\mu )}^{{\mathrm{log}}_{b}({N}_{t}\u2215{N}_{0})}-{N}_{0}\u2215{N}_{t}\right\}.\hfill \end{array}$$

(19)

Assume *N _{t}* =

$${M}_{t}^{\ast}={N}_{0}{e}^{\rho t}\left[1-{\left\{1-(1-1\u2215b)\frac{a\phantom{\rule{thinmathspace}{0ex}}\mathrm{ln}\left(b\right)}{\rho}\right\}}^{{\scriptstyle \frac{\rho t}{\mathrm{ln}\left(b\right)}}}\right]$$

(20)

and $\mathrm{Var}\left({M}_{t}\right)={\scriptstyle \frac{a\phantom{\rule{thinmathspace}{0ex}}\mathrm{ln}\left(b\right)}{\rho b{N}_{0}}}{e}^{2\rho t}\phantom{\rule{thinmathspace}{0ex}}{(1-(1-1\u2215b){\scriptstyle \frac{a\phantom{\rule{thinmathspace}{0ex}}\mathrm{ln}\left(b\right)}{\rho}})}^{{\scriptstyle \frac{\rho t}{\mathrm{ln}\left(b\right)}}-1}\{{(1-(1-1\u2215b){\scriptstyle \frac{a\phantom{\rule{thinmathspace}{0ex}}\mathrm{ln}\left(b\right)}{\rho}})}^{{\scriptstyle \frac{\rho t}{\mathrm{ln}\left(b\right)}}}-{e}^{-\rho t}\}$. If *at* ≈ 0 (i.e., μln(*N*)≈ 0), then we have

$${M}_{t}^{\ast}\approx (1-1\u2215b)at{N}_{0}{e}^{\rho t}\phantom{\rule{1em}{0ex}}\phantom{\rule{1em}{0ex}}\text{and}\phantom{\rule{1em}{0ex}}\mathrm{Var}\left({M}_{t}\right)\approx \frac{{r}_{b}a}{(b-1)\rho {N}_{0}}{e}^{2\rho t}.$$

(21)

At the time *t*, the expected number of mutants in culture, ${M}_{t}^{\ast}$ in (21), is similar to that of Luria and Delbruck [10], differing with a coefficient (1-1*/b*). The variance of the number of mutants Var(*M _{t}*) in (21) as a function of

All cultures in an experiment are assumed to grow into around one common culture size in the traditional fluctuation analysis. In this section, we investigate how the estimation of mutation rate could be affected under the assumption of a common culture size when culture sizes actually diverge. The following analysis is based on the proposed stochastic model, however, its conclusion should be implicative to the fluctuation analysis using the traditional Luria-Delbruck type models.

Let $\stackrel{\u2012}{N}$ be the sample mean of *N _{i}*'s and $\stackrel{\u2012}{M}$ be the sample mean of

$$E\left({\widehat{\mu}}_{cs}\right)=\mu \frac{{\scriptstyle \frac{1}{C}}\sum _{i=1}^{C}{N}_{i}\mathrm{ln}({N}_{i}\u2215{N}_{0})}{\stackrel{\u2012}{N}\mathrm{ln}(\stackrel{\u2012}{N}\u2215{N}_{0})}\phantom{\rule{1em}{0ex}}\text{and}\phantom{\rule{1em}{0ex}}\mathrm{Var}\left({\widehat{\mu}}_{cs}\right)=\frac{{r}_{b}^{2}\mu}{kbC{N}_{0}}\frac{{\scriptstyle \frac{1}{C}}\sum _{i=1}^{C}{N}_{i}^{2}}{{\left\{\stackrel{\u2012}{N}\mathrm{ln}(\stackrel{\u2012}{N}\u2215{N}_{0})\right\}}^{2}}$$

. The first equation above indicates that ${\widehat{\mu}}_{cs}$ is unbiased $(E\left({\widehat{\mu}}_{cs}\right)=\mu )$ if *N _{i}*'s are all equal, and could be biased if otherwise. On the contrary, the ${\widehat{\mu}}_{a}$ by (12) is unbiased $(E\left({\widehat{\mu}}_{a}\right)=\mu )$ even for unequal

For the population size ${N}_{t}={N}_{0}{e}^{\rho t}$, we assume ρ is constant for one culture in the entire proliferation process, but ρ could be different for different cultures and thus can be considered as a random number for the multiple parallel cultures. Let *m*(τ) be the moment generating function of ρ, then $m\left(\tau \right)\equiv E\left({e}^{\rho \tau}\right)$. Let $\phi \left(\tau \right)\equiv \mathrm{ln}\left(\mathrm{ln}\left(m\left(\tau \right)\right)\right)$ and let ′(τ) be the derivative of (τ). Then it is straightforward to show that ${r}_{{\mu}_{cs}}=t{\phi}^{\prime}\left(t\right)$ for the ${r}_{{\mu}_{cs}}$ defined above. If ρ is normally distributed with mean ρ* and variance ${\sigma}_{\rho}^{2}$, then $m\left(\tau \right)=\mathrm{exp}({\rho}^{\ast}\tau +{\sigma}_{\rho}^{2}{\tau}^{2}\u22152)$, and with some algebra, ${b}_{{\widehat{\mu}}_{cs}}={({\scriptstyle \frac{2{\rho}^{\ast}}{{\sigma}_{\rho}^{2}t}}+1)}^{-1}$ for the relative bias ${b}_{{\widehat{\mu}}_{cs}}$ defined above. This equation implies that the bias of ${\widehat{\mu}}_{cs}$ equals 0 if ${\sigma}_{\rho}^{2}=0$. That is, if the proliferation rates are identical across the parallel cultures, then culture sizes would be the same at the end of the experiment, and consequently the estimation of mutation rate is unbiased. The equation also indicates that if the proliferation rates are not identical $\left({\sigma}_{\rho}^{2}>0\right)$, then the bias increases in *t*, that is, the bias is getting larger if the experiment lasts longer time. Fortunately, this relative bias is asymptotically bounded by 1. We may assume ρ>*c* for some*c*> 0, practically by excluding cultures which almost do not grow. Then ${r}_{{\sigma}_{cs}^{2},{\sigma}_{a}^{2}}\ge {e}^{{\sigma}_{\rho}^{2}{t}^{2}}\u2215({\scriptstyle \frac{2{\rho}^{\ast}}{{c}^{2}t}}+{\scriptstyle \frac{{\sigma}_{\rho}^{2}}{{c}^{2}}})$, which implies that the ratio of variances increases in *t* unboundedly. The loss of efficiency could be substantial if the variance of ${\widehat{\mu}}_{cs}$ is much larger than that of ${\widehat{\mu}}_{a}$, because the sample sizes required are proportional to the variances of the estimators for μ. For the example in Section 3.3, the culture sizes range from 3.4×10^{8} to 1.2×10^{10}. We have ${\widehat{r}}_{{\mu}_{cs}}=1.037$ and ${\widehat{r}}_{{\sigma}_{cs}^{2},{\sigma}_{a}^{2}}=3.32$. The relative bias of ${\widehat{\mu}}_{cs}$ for μ is ${\widehat{b}}_{{\widehat{\mu}}_{cs}}=0.037$ and is practically negligible. The ratio of variances for ${\widehat{r}}_{{\mu}_{cs}}$ to ${\widehat{r}}_{{\mu}_{a}}$ is 3.32 which implies that the number of cultures for ${\widehat{\mu}}_{cs}$ must be 3.32 times as large as that for ${\widehat{\mu}}_{a}$ in order to achieve a same accuracy for estimation or a same power for comparison. The analysis above indicates that an estimation assuming equal culture sizes may lose efficiency substantially when culture sizes actually diverge.

Estimation and comparison of mutation rates in cells and microbes are important measurements in biological studies. However, the results could be misleading if the variation of the estimation or the errors of the comparison are ignored. Large variability in estimations had been observed by biological researchers (e.g., Kendal and Frost [6]), and could not be adequately explained by the traditional models. In this paper we proposed a model based on the branching process such that some realistic aspects of proliferation and mutation of cells and microbes are better represented in the model. The large variability is inherent to the number of mutants in a culture, and hence is inherent to the estimators for mutation rates. For biological researchers who want to obtain reliable estimations and comparisons of mutation rates, we propose statistical planning of experiments by assuming adequate initial and final culture sizes and calculating number of cultures required to achieve desired accuracy in estimation and statistical power for comparison. The traditional fluctuation analysis assume a common culture size among all cultures. In practice, culture sizes could differ substantially, especially when culture grow into large sizes. We demonstrated that the efficiency of estimation could be much lowered if assumed homogenous culture sizes actually diverge. For the traditional models, to make culture sizes homogenous, the cultures cannot be allowed to grow into very large sizes, which is a restriction that lowers the efficiency of estimation and comparison. On the contrary, the proposed model allows cultures grow into large sizes by taking account of different culture sizes. By this model, if the cultures can grow into larger sizes, then a smaller number of cultures is required to achieve a same accuracy of estimation or testing. An implication of this fact is that the laboratory work in experiments could be reduced by letting cultures grow for a longer time. The proposed model should be applicable to the mutation in the proliferation of cells or the asexual proliferation of eukaryotic and prokaryotic organisms, in which the new births are unlikely being given by the new members of the population and the mutation rate in replication is constant during the proliferation process.

This work was supported in part by Cancer Center Support grant P30 CA-21765 from the National Cancer Institute, Bethesda, MD, and by the American Lebanese Syrian Associated Charities (ALSAC), Memphis, Tennessee, USA. The authors wish to thank the two anonymous referees for their comments, especially the first referee for very helpful suggestions.

Derivation of equations in (1): Since *E*ξ = 1 - (1 - 1/*b*)μ, *N* = *N*_{0}*b ^{g}* and

Derivation of the equation (2): Solving μ in the equation for *E*(*M*) in (1), we have identity $\mu ={\scriptstyle \frac{b}{b-1}}[1-\mathrm{exp}\left\{{\scriptstyle \frac{-\mathrm{ln}(1-E\left(M\right)\u2215N)+\mathrm{ln}(1-{f}_{0})}{{\mathrm{log}}_{b}(N\u2215{N}_{0})}}\right\}]\approx {\scriptstyle \frac{b}{b-1}}{\scriptstyle \frac{-\mathrm{ln}(1-E\left(M\right)\u2215N)+\mathrm{ln}(1-{f}_{0})}{{\mathrm{log}}_{b}(N\u2215{N}_{0})}}$. Assume *E*(*M*)/*N* and *f*_{0} are close to 0, then ln(1 - *E*(*M*)/*N*) ≈ -*E*(*M*)/*N* and ln(1 - *f*_{0}) ≈ -*f*_{0}; consequently, $\mu \approx {\scriptstyle \frac{b}{1-b}}{\scriptstyle \frac{E\left(M\right)\u2215N-{f}_{0}}{{\mathrm{log}}_{b}(N\u2215{N}_{0})}}$ which leads to the equation (2) by joining with log* _{b}*(

Derivation of the equation (3): Assume *N*_{0}/*N* ≈ 0, which is true for most of applications. Let *g* = log* _{b}*(

Derivation of the equation (10): For ${\widehat{\mu}}_{g}$ in (9) and $\widehat{\mu}$ in (2), we have ${\widehat{\mu}}_{g}=\widehat{\mu}\u2215k$ and thus $\mathrm{Var}\left({\widehat{\mu}}_{g}\right)=\mathrm{Var}\left(\widehat{\mu}\right)\u2215{k}^{2}$, which leads to (10) by joining with the equation (3) and the identity μ* _{g}* = μ/

[1] Armitage PJ. The Statistical Theory of Bacterial Populations Subject to Mutation. Journal of The Royal Statistics Society. 1952;B14:1–40.

[2] Drake JW. The Molecular Basis of Mutation. Holden-Day; San Francisco: 1970.

[3] Haccou P, Jagers P, Vatutin V. Branching Processes: Variation, Growth, and Extinction of Populations. Cambridge University Press; Cambridge, UK: 2005.

[4] Hay AJ, Wolstenholme AJ, Skehel JJ, Smith MH. The molecular basis of the specific antiinfluenza action of amantadine. European Molecular Biology Organization Journal. 1985;4:3021–3024. [PubMed]

[5] Karlin S, Taylor HM. A First Course in Stochastic Process. Academic Press; New York: 1975.

[6] Kendal WS, Frost P. Pitfalls and Practice of Luria-Delbruck Fluctuation Analysis: A Review. Cancer Research. 1988;48:1060–1065. [PubMed]

[7] Kimmel M, Axelrod D. Branching Processes in Biology. Spring-Verlag; New York: 2002.

[8] Lea DE, Coulson CA. The Distribution of the Numbers of Mutants in Bacterial Populations. Journal of Genetics. 1949;49:264–285. [PubMed]

[9] Li IC, Fu J, Hung YT, Chu EHY. Estimation of Mutation Rates in Cultured Mammalian Cells. Mutation Research. 1983;111:253–262. [PubMed]

[10] Luria SE, Delbruck M. Mutation of Bacteria from Virus Sensitivity to Virus Resistance. Genetics. 1943;28:492–511. [PubMed]

[11] Rao CR. Linear Statistical Inference and Its Applications. 2nd Ed. Wiley; New York: 1973.

[12] Rossman TG, Goncharova EI, Nadas A. Modeling and Measurement of the Spontaneous Mutation Rate in Mammalian Cells. Mutation Research. 1995;328:21–30. [PubMed]

[13] Stech J, Xiong X, Scholtissek C, Webster RG. Independence of Evolutionary and Mutational Rates after Transmission of Avian Influenza Viruses to Swine. Journal of Virology. 1999;73(3):1878–1884. [PMC free article] [PubMed]

[14] Stewart FM, Gordon DM, Levin BR. Fluctuation Analysis: The Probability Distribution of the Number of Mutants Under Different Conditions. Genetics. 1990;124:175–185. [PubMed]

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |