Enter Your Search:Search tips Search criteria Articles Journal titles Advanced

J Math Biol. Author manuscript; available in PMC 2010 August 1.
Published in final edited form as:
Published online 2008 October 10.
PMCID: PMC2692649
NIHMSID: NIHMS79365

# A Stochastic Model for Estimation of Mutation Rates in Multiple-replication Proliferation Processes

## Abstract

In this paper we propose a stochastic model based on the branching process for estimation and comparison of the mutation rates in proliferation processes of cells or microbes. We assume in this model that cells or microbes (the elements of a population) are reproduced by generations and thus the model is more suitably applicable to situations in which the new elements in a population are produced by older elements from the previous generation rather than by newly created elements from the same current generation. Cells and bacteria proliferate by binary replication, whereas the RNA viruses proliferate by multiple replication. The model is in terms of multiple replications, which includes the special case of binary replication. We propose statistical procedures for estimation and comparison of the mutation rates from data of multiple cultures with divergent culture sizes. The mutation rate is defined as the probability of mutation per replication per genome and thus can be assumed constant in the entire proliferation process. We derive the number of cultures for planning experiments to achieve desired accuracy for estimation or desired statistical power for comparing the mutation rates of two strains of microbes. We establish the efficiency of the proposed method by demonstrating how the estimation of mutation rates would be affected when the culture sizes were assumed similar but actually diverge.

Keywords: Mathematical Modeling, Genetic Mutation, Cell, Virus, Bacteria

## 1 Introduction

An important parameter in the evolution of populations, consisting of cells, viruses, bacteria, or other microbes, is the speed at which individuals mutate to individuals with altered biological properties. The rate of mutation can be measured with respect to the continuous time or to the generation of pedigree lineage. In this paper we simply call the former the mutation rate in time and the latter the mutation rate in replication. The mathematical models on mutation rates initiated by Luria and Delbruck [10] and further developed by other authors (e.g., [8], [1], [9], and [14]) were in terms of the mutation with respect to the continuous time. As the mutation occurs during replication of the genome, the mutation rate should be better defined as a probability that an error was made during replication so that the genome of a daughter element differs from that of the parent, rather than as a probability that an element changes its nature in an unit time at any moment of its lifetime. However, mathematically the mutation rate in time and the mutation rate in replication are about the same if the unit-time of the former is the mean time-span of life cycle of an element. The mutation rate in time can be meaningfully defined and used when the proliferation rate in culture is constant in time and uniform among multiple cultures. The assumption of constant and uniform proliferation rate may sometimes not hold, for example, initially similar cultures could grow into very different colony sizes in the same period of time. If the proliferation rate varies much among cultures, the mutation rate in time cannot be estimated reliably and nor can the estimated mutation rate be meaningfully interpreted or applied. The mutation rate in replication, defined as the probability of mutation in each replication, remains constant even when the proliferation rate changes, and hence can be estimated and applied reliably in different situations.

In this paper we propose a stochastic model based on a branching process, in which the variable for the time domain of the stochastic process is the log of population size n rather than the real time t, and hence the estimation of mutation rate would not be affected by probable temporal variations of proliferation among cultures during the process. The traditional Luria-Delbruck type models assume that any element, recently born or had been existed for certain time, at any moment are equally likely to give birth to new elements, or equivalently assume that the life time of each element since its birth is exponentially distributed. The model we proposed differs from the Luria-Delbruck type models with the following conditions: 1) elements can be born simultaneously (as compared to being born only one after another); 2) new elements can be produced only from a part of the current population, i.e., the elements having existed for certain time (as compared to that they can be produced from the whole current population); 3) the births and mutations of elements are independent among new born elements (as compared to that the birth and mutation of the nth element depends on the status of the (n-1)th element). For easier mathematical derivation, we simplify the model by assuming equal life span of elements (equivalently the synchronous replications of elements) and assuming a fixed number of offsprings produced by each element. This model is suitable for application situations in which the new elements are unlikely to be produced by recently born elements in the same current generation but by elements that had been existed for certain times (the previous generation). The fixed life span of elements assumed in the model is understood as the mean life spans of elements, and the fixed number of offsprings assumed in the model is understood as the mean numbers of offsprings.

We propose the model in terms of multiple replication (e.g., the RNA viruses proliferate by multiple replication) which includes the binary replication as a special case (e.g., cells and bacteria proliferate by binary replication). We also propose statistical procedures for estimating and testing the mutation rate as probability of mutation per replication per genome, based on data from multiple cultures with divergent culture sizes. The number of cultures is formulated for planning experiments to achieve desired accuracy for estimation or desired statistical power for comparing the mutation rates of two strains of cells. We derive an estimator for the mutation rate in time from the estimator for the mutation rate in reprelication based on the proposed model. Finally, we show how the estimation of mutation rate can be affected under the assumption of a common culture size when culture sizes actually diverge.

## 2 Mutation Rates in Replication

The size of a growing population in an experimental culture increases exponentially under good conditions. As an idealization, the growth process can be imagined as a result of repeated multiplication of individuals from N0 = N0b0 at the 0th generation to Ng = N0bg at the gth generation, assuming each individual produces b daughters. The development of a population can be easily described as a tree starting with N0 original individuals at generation 0 and branching b-tuple at each node. For easier formulation, it is assumed that an individual with its genome turns into exact b progenies as its next generation; the mutants do not have altered growth properties, neither advantageous nor disadvantageous. A mutant is an individual that has undergone a mutation and differs from its parent by a change in the genome like a point mutation (nucleotide substitution), insertion, or deletion. A mutation rate, calculated from mutants with known changes in their genomes, is denoted as the mutation rate at the genomic level. Any mutation can (but does not need to) lead to altered phenotypic traits of the individual; when distinguishing only by phenotypic traits, the mutation rate is denoted as the mutation rate at the phenotypic level. For example, for mutants with altered phenotypes requiring only one point mutation at one of k sites in the genome, the mutation rate at phenotypic level can be converted to the mutation rate at genomic level by a factor of 1/k if k is not large. This applies to the example in Section 3 studying influenza viruses.

In this paper, we denote the mutation rate per replication as μ, the number of mutants in the population as M, the population size as N, the frequency of mutants in the population as f = M/N. Their initial values at generation 0 are M0, N0, and f0 = M0/N0, respectively. We denote the base for the multiple-replication proliferation as b, and denote a coefficient related to b as rb = ln(b)/(1 - 1/b). For cells and bacteria, b = 2; for RNA viruses, b ≥ 2, and b could be different for different types of viruses.

### 2.1 A Stochastic Model

We propose a stochastic model for the number of mutants in the proliferation process of a culture based on a branching process which can be found in classic mathematical books such as Karlin & Taylor [5] or in recently published books on mathematical biology such as Haccou, Jager & Vatutin [3] or Kimmel & David [7]. Let Xj be the number of unmutated individuals in the jth generation of the population, then $Xj+1=∑i=1Xjξi$, for j = 1, , g, where ξi is the number of the (j+1)th generation unmutated descendants by the ith individual of the jth generation. The sequence X0,X1, ,Xg, forms a branching process. Assume a microbe gives birth to b daughters (b is the base of replication), in which, one daughter definitely does not mutate and for the rest b - 1 daughters, each mutates with probability μ. Let ξ be the number of unmutated daughters, then ξ takes possible values of 1, , b. The number of mutated daughters b - ξ ~ B(b - 1, μ the binomial distribution with parameters b - 1 and μ. Then E (ξ) = b - (b - 1)μ and Var(ξ) = (b - 1)μ(1 - μ). Assume a culture (population) starts with N0 microbes (generation 0) which include M0 mutants. Then X0 = N0 - M0 which counts the unmutated microbes at generation 0. Let Nj = N0bj be the population size and Mj the number of mutants at the jth generation, we have Mj = Nj - Xj, in which the Mj's are random numbers and the Nj's are deterministic numbers. Assume a culture at the end of the experiment has a size N = Ng = X0bg and a number of mutants M = Mg = Ng - Xg (for the gth generation). The mean and variance of M are, respectively,

$E(M)=N−E(Xg)=N{1−(1−f0)(1−(1−1∕b)μ)logb(NN0)},Var(M)=(1−f0)μN2bN0(1−(1−1∕b)μ)logb(NN0)−1{(1−(1−1∕b)μ)logb(NN0)−N0N},$
(1)

where $f0=M0N0$. The derivation of equations in (1) is sketched in A1 in Appendix. Assume E(M)/N and f0 are close to 0, by which ln(1 - E(M)/N) ≈ -E(M)/N and ln(1 - f0) ≈ -f0. Then we have $μ≈rb{E(M)N−f0}ln(NN0)$ by solving μ from the equation for E(M) in (1), and the derivation is sketched in A2 in Appendix. Consequently, an estimator for μ is

$μ^=rb(MN−f0)ln(NN0),$
(2)

where $rb=ln(b)1−1∕b$. The estimator $μ^$ in (2) is approximately unbiased if $E(M)N$ and f0 are close to 0, as shown in A2. Methods for multiple cultures developed later in this paper are based on the $μ^$ in (2). Under this assumption, the variance of $μ^$ is

$Var(μ^)≈rb2(1−f0)μbN0{ln(NN0)}2,$
(3)

and the derivation of this equation is sketched in A3 in Appendix. Replacing μ by $μ^$ in the equation above, we obtain an estimator for $Var(μ^)$ as

$V^μ^=rb3(1−f0)(MN−f0)bN0{ln(NN0)}3.$
(4)

If the assumption that E(M)/N and f0 are close to 0 does not hold, then the approximation of μ is, as shown in A2 in Appendix,

$μ≈rb{ln(1−f0)−ln(1−E(M)N)}ln(NN0)$
(5)

and its estimator $μ^$ with improved accuracy can be obtained by replacing E(M) in the equation (5) by M. The variance of this $μ^$ is, as shown in A3 in Appendix,

$Var(μ^)≈rb(1−f0)bN0[rbμ{ln(NN0)}2−2μ2ln(NN0)],$
(6)

which can be estimated by replacing μ in the equation (6) with this $μ^$.

Taking for example, the special case b = 2 is the most common in biology, for which the ξ is distributed with P(ξ = 1) = μ and P (ξ = 2) = 1 - μ; its expectation E(ξ) = 2 - μ and variance Var(ξ) = μ(1 - μ). With rb = ln(4) for b = 2, the $μ^$ in (2) became

$μ^=ln(4)(MN−f0)ln(NN0),$
(7)

and $Var(μ^)$ in (3) became

$Var(μ^)≈{ln(4)}2(1−f0)μ2N0{ln(NN0)}2.$
(8)

If f0 = 0, then $μ^$ and $Var(μ^)$ by (7) and (8) are the same as the results by Rossman et al. [12], which were in a different form and obtained through a different approach.

### 2.2 Mutation Rate at Genome Level

In most of applications, the mutation rate at the phenotypic level of mutation can be estimated more easily than that at the genomic leve, because the mutants can be distinguished from nonmutants with much less efforts at the phenotypical level than that at the genomic level. However, in biological studies, most investigators are not interested in mutation rates at phenotypic level but at genomic level, because the comparison of mutation rates at phenotypic level can be misleading if the different changes in the genome lead to the same phenotypic change. For example, in virology, in the case of influenza viruses, virions are resistant to amantadine, an antiviral drug, if there is a point mutation at one of k (=4) known specific sites in the nucleotide sequence, shown by Hay et al. [4]. We denote the mutation rate at the phenotypic level as μp, the mutation rates at genomic level as μgi for genomic sites i = 1, ,k. Within one virus, let X be the number of mutations at the genomic level, e.g., the number of mutated nucleotides at the k sites. If genome mutations are mutually independent, then $μp=P(X>0)=1−Πi=1k(1−μgi)$ where $μgi=P$(genome mutation on site i). If μgi = μg for i = 1,, k, then $μp≈kμg$ and hence $μg≈μpk$. Assuming genotypic mutations causing the phenotypic mutation equally likely and independently, we have the estimator for the mutation rate at genomic level

$μ^g=rb(MN−f0)kln(NN0),$
(9)

and its variance is

$Var(μ^g)≈rb2(1−f0)μgkbN0{ln(NN0)}2.$
(10)

The derivation of the equation in (10) is sketched in A4 in Appendix. The culture size N and the number of mutants M are obtained at the phenotypic level (e.g., virus level), for example, by counting viruses in the culture before and after adding amantadine, respectively. Replacing μg by $μ^g$ in the equation for $Var(μ^g)$ above, we obtain an estimator for $Var(μ^g)$,

$V^μ^g=rb3(1−f0)(MN−f0)k2bN0{ln(NN0)}3.$
(11)

The methods for multiple cultures developed later in this paper are based on equations (9) and (11). In any equation for the genotypical level that we will develop below, statistical inference on mutation rate at the phenotypic level can be obtained by setting μp = μg and k = 1.

### 2.3 A Luria-Delbruck Type Model for the Mutation Rate in Replication

Drake [2] proposed a model in which the mutation rate in replication was used instead of the mutation rate in time and a differential equation $dM=(μ+MN)dN$ was used to link the number of mutants in a culture with the population size of the culture rather than with the real time since the start of the culture. Solving this equation, an estimator for the mutation rate in replicationμ was obtained as $μ^=MN−f0ln(NN0)$. The increment of mutants dM actually has two components: 1) increment by proliferation of existing mutants, or $MNdN$; 2) mutation of the increment of existing nonmutants, or $μ(1−MN)dN$. Taking into account this fact we can modify the above model to $dM={μ(1−MN)+MN}dN$, and which yields an estimator $μ^=ln(1−f0)−ln(1−MN)ln(NN0)$. The latter estimator improves the former when M is large such that the condition $MN≈0$ barely holds. The two $μ^’s$ above derived from the model above as functions of M, N, N0, and f0 are similar to the two $μ^’s$ in Section 2.1 derived from the proposed stochastic model for b = 2, but differ with a coeffcient rb = 1.386. The similarity suggests that the differences in the assumptions of the proposed stochastic model and the Luria-Delbruck type model do not lead to much different estimators for mutation rates. We will show in Section 4.2 that the main difference between the proposed model and the Luria-Delbruck type models is in the variances of the estimators for the mutation rate.

## 3 Statistical Inferences

For simplicity, from now on we will use μ instead of μg to denote the mutation rate at the genotypic level. The μ is very small in most applications, hence for statistical inference we measure the accuracy of $μ^$ with the relative scale of $γ≡σμ^∕μ$ rather than with the scale of $σμ^$, where $σμ^2=Var(μ^)$. The smaller is the γ, the more accurate is the estimation of μ. The relationship between γ and the other parameters of the proposed model is $γ=rb1−f0ln(NN0)μkbN0$, which was obtained easily by joining the definition of γ and the equation for $Var(μ^g)$ in (10). Using the equation for γ above, we calculated its value for several usual settings of parameters for estimating mutation rates of influenza viruses, and the results indicated that if an estimation is obtained from a single culture, then the γ would be too large for application purposes. Multiple cultures are needed for applications if one wants to obtain estimations with reasonably good confidence levels.

### 3.1 Estimation from Multiple Cultures

For C cultures, let Ni be the culture size, Mi the number of mutants, and $μ^i$ the estimator for μ for the ith culture, i = 1, , C, where μ is the mutation rate common for all cultures. We assume Ni's are known and constants (for being conditioned on Ni's), Mi's are random variables. There are two ways to pool estimators $μ^i’s$: the Simple Average and the Weighted Average.

#### Simple Average

Define $μ^a=1C∑i=1Cμ^i$, by (9) we have

$μ^a=rbkC∑i=1CMiNi−f0iln(NiN0i),$
(12)

where N0i and f0i are the initial culture size and frequency of mutants in the ith culture. The variance of $μ^a$ is

$Var(μ^a)=rb2μkbC2∑i=1C1−f0iN0i{ln(NiN0i)}2,$
(13)

which can be estimated by replacing the μ in (13) by the $μ^a$ in (12).

#### Weighted Average

Define $μ^w=∑i=1Cwiμ^i$, where wi's are weights ($∑i=1Cwi=1$ and $wi≥0$). The variance of $μ^w$ can be minimized by allocating wi according to the inverse of variance of $μ^i$ (e.g., see p308 in Rao [11]), by which the optimal weighted-average estimator is

$μ^w=rbkT∑i=1CN0i(MiNi−f0i)ln(NiN0i)1−f0i,$
(14)

where $T=∑j=1CN0j{ln(NjN0j)}2∕(1−f0j)$. The variance of $μ^w$ is $Var(μ^w)=rb2μkbT$, Which can be estimated by

$V^μ^w=rb3b(kT)2∑i=1CN0i(MiNi−f0i)ln(NiN0i)1−f0i.$
(15)

In general, the weighted-average estimator $μ^w$ has a smaller variance and hence should be prefered. If the sizes of all cultures are similar, then $μ^w$ and $μ^a$ are approximately the same. Some users may favor the simple-average estimator $μ^a$ for its intuitiveness and simplicity.

### 3.2 Comparing Mutation Rates

To compare the mutation rates of two different strains of microbes is to test H0 : μ12 versus Ha: μ12, where μ1 and μ2 are the two mutation rates. Let $μ^1$ and $μ^2$ be the estimators for μ1 and μ2, and let $V^μ^1$ and $V^μ^2$ be the estimated variances of $μ^1$ and $μ^2$. Because $μ^1$ and $μ^2$ are linear sums of independent estimators, the test statistic $Z=μ^1−μ^2V^μ^1+V^μ^2$ is distributed approximately normal if both C1 and C2 are large. The two mutation rates are significantly different if Z > zα, where zα is the upper α-quantile of standard normal distribution. For smaller C1 and C2, the zα should be replaced by tα (C1 + C2 - 2) which is the upper α-quantile of t-distribution with degree freedom of C1 + C2 - 2.

### 3.3 Examples

In 1979, an H1N1 avian influenza virus crossed the species barrier and established a new lineage in European swine. To determine whether a high mutation rate (in this study a higher rate of point mutations) had contributed to interspecies transmission, Stech et al. [13] compared the mutation rate of A/Swine/Germany/2/81(H1N1), a well-characterized early isolate of the above-mentioned lineage, with that of A/Mallard/New York/6750/78(H2N2), a virus that was well-established in avian hosts. Each culture was started with a single wild-type virus, hence N0 = 1 and f0 = 0. The final culture size N and the number of mutants M were obtained by counting viruses in absence and presence of amantadine. Since a virus survives in the presence of amantadine if and only if there is at least one mutated nucleotide at any of 4 specific sites in its genome, hence k = 4. There were 17 cultures for the Swine virus and 10 cultures for the Mallard virus. Let μsw be the mutation rate of Swine virus and μmal be that of the Mallard virus. The interest of the study is to test H0 : μsw ≤μmal vs. Ha : μswmal. We applied the method of weighted average for estimations and tests, and did the calculation for two cases: b = 2 and b = 5. Assuming b = 2, for the Swine virus: $μ^sw=6.67×10−6$ and $σ^μ^sw=1.46×10−5$; for the Mallard virus: $μ^mal=3.23×10−5$ and $σ^μ^mal=4.58×10−5$. Assumming b = 5, for the Swine virus: $μ^sw=9.68×10−6$ and $σ^μ^sw=1.59×10−5$; for the mallerd virus: $μ^mal=4.69×10−5$ and $σ^μ^mal=5.06×10−5$ for b = 2, Z = -0.534 and p-value = 0.703. for b = 5, Z = -0.701 and p-value = 0.758. Both results indicate that the mutation rate of Swine virus is not significantly larger than that of Mallard virus.

### 3.4 Number of Cultures: Sample Size Calculation

To plan an experiment with desired accuracy for estimation of a mutation rate or desired power for detecting a given difference for comparison of two mutation rates, we need a sufficient number of cultures. At the planning stage, we may assume N0i = N0, f0i = f0, and Ni = N for all i's. Then $Vμ^a=Vμ^w=rb2(1−f0)μCkbN0{ln(NN0)}2$

#### Estimation

The number of cultures for estimation with a given γ is

$C=rb2(1−f0)μkbN0{γln(N∕N0)}2,$
(16)

where $γ(=σμ^aμ=σμ^wμ)$ measures the accuracy of estimators $μ^a$ and $μ^w$ relative to the magnitude of μ. A smaller γ leads to a better estimation, but demands a larger sample size C, as shown in Tables 1 and and22.

Number of Cultures for Estimating Mutation Rate1
Number of Cultures for Comparing Mutation Rates12

#### Comparison

To test the hypotheses H0 : μ1 ≤ μ2 versus Ha : μ1 > μ2 with significance level α and power 1 - β to detect a given relative difference d, the number of cultures in each strain group is

$C={rb(zα+zβ)dln(NN0)}2(d+2)(1−f0)μ2kbN0,$
(17)

where $d=μ1−μ2μ2$ which is the difference of μ1 and μ2 relative to the magnitude of μ2. For a two-sided test, C is obtained by replacing zα by zα/2 in (17). C1 and C2 can be different, but must satisfy equation $(zα+zβdμ2)2=1Vμ^1+Vμ^2$, where $Vμ^1=rb2(1+d)μ2kbC11−f0N0(ln(N∕N0))2$ and $Vμ^2=rb2μ2kbC21−f0N0(ln(N∕N0))2$.

The numbers of cultures are calculated in Tables 1 and and22 for the estimation and the comparison of mutation rates, respectively. For example, to test H0 : μ1 ≤ μ2 versus Hα : μ1 > μ2 with α = 0.05 and power 1 - β = 0.8 for detecting a relative difference of d = 3. Assume k = 4, N0 = 1, f0 = 0; Ni = 1010 for all i's; μ2 = 3 × 10-5 and b = 5, then we have C = 43.7 from Table 2. If the initial number of viruses per culture is increased to N0 = 10 and there are no initial mutants (f0 = 0), then the number of cultures per strain can be reduced to C = 5.4.

### 3.5 Effciency of Estimation and Comparison

The number of elements in population N increases from N0 at the beginning to its final value at the end of experiment. During this process the number of mutants M as a function of N can be viewed as a stochastic process with the time domain in terms of N or logb(N). In real experiments, however, M and N during this process are unobservable except their last values at the end of the experiment. The μ hence can only be estimated from the last values of N and M. On the other hand, the times at which mutations really happen during the process cause a very large variation in the M at the end of experiment, and hence causes a large variation in $μ^$. For example, one mutation at the first generation will produce mutant descendents as many as 1/b of population at the end of the experiment, whereas one mutation at the second to last generation produces b mutant descendents. The times of mutations during the process are unobservable and hence unknown. Therefore, the variance of $μ^$ is very large which makes the statistical inference with required accuracy very difficult because the number of cultures required could be impractically large. To reduce the variability of estimation, by contending an analogy of hitting-jackpot, Luria and Delbruck [10] proposed an estimator by assuming no mutation until a time when one mutation is expected in the population. This proposition actually is to let the culture size increase from N0 to $N0∗$ until the time mentioned above and assume no mutants in the $N0∗$ elements and take them as new initial starters. The accuracy of estimation by this practice is not controllable because the $N0∗$ is large and hence there is a good chance that mutants could have existed among the $N0∗$ pretended new starters. The mutation rate is overestimated by this method with a probability of 0.632, as pointed out by Armitage [1].

We propose another approach for improving the efficiency of statistical inference. As the trends shown in Tables 1 and and2,2, the number of cultures C can be reduced by increasing initial culture size N0 and/or final culture size N, for achieving the same accuracy in estimation or power in hypotheses testing. The difficulties are that the N is limited in real experiments, and that if N0 is greater than 1, then the number of mutants M0 among the N0 starters is unknown (f0 is hence unknown). However, the latter difficulty can be get around in a following way. The number of mutants M consists of two parts, M = M1 + M2. M1 is the number of descendants from M0 initial mutants, or M1 = M0bg where g is the number for current generation; M2 is the number of mutated descendants from the N0-M0 initial unmutated elements. It is always true that ff0, because f = M/NM1/N = M0bg/N0bg = f0. In a culture, if M/N < 1/N0 is observed, then it implies f0 = 0 (or M0 = 0). The N0 initiators for the generation 0 should have been selected before culture starts, and must not be retrospectively claimed after observing M and N. It is incorrect to retrospectively choose an arbitrary N0 that satisfies M/N < 1/N0 and claim it as the initial population size, because N0 should be a constant that does not depend on M and N. For multiple parallel cultures, assuming N0i initiators for the ith culture, we may identify cultures that have M0i = 0 by checking whether Mi/Ni < 1/N0i. We then estimate μ only from the cultures with M0i = 0 and using $μ^a$ in (12) or $μ^w$ in (14) and for which setting f0i = 0 for all i's.

## 4 Mutation Rate in Time

The mutation rate in time (denoted as a) is the probability that a microbe mutates in an unit time. Luria and Delbruck [10] proposed two methods for estimating this mutation rate: the P0 method and the mean method. The P0 method is simple to use, however, information from data is not fully utilized because only the presence and absence of mutants are counted whereas the numbers of mutants in cultures are ignored, and hence it requires a large number of cultures. The mean method of Luria and Delbruck model was further developed by Lea and Coulson [8] by deriving the probability distribution of the number of mutants and proposing the median method and the maximum likelihood method based on the distribution. A more general model was proposed by Armitage [1], which extends the results by Luria and Delbruck [10] and Lea and Coulson [8] in the way that mutants and nonmutants in a culture may grow with two different rates exponentially and the model includes the back-mutation. However, the inferences of mutation rates basically remain the same because the parameters of the more delicate models are not estimable from only two available values, M and N, from the data. In this section, we derive an estimator for the mutation rate in time based on the proposed stochastic model. The estimator is similar to that by the Luria and Delbruck type models, but with a substantially larger variance.

### 4.1 A Stochastic Process: Number of MutantsMt

Let Nt denote the N at time t and assume Nt a deterministic function (e.g., Nt = N0eρt) because the variation of Nt is usually unsubstantial for a given culture. Let Mt denote the M at time t, then Mt is a stochastic process with t in time domain. Mt has a large variability and hence can not be approximated by a deterministic function. Let $Mt∗≡E(Mt)$ be the expectation of Mt. In the proposed model, $Mt∗$ is a deterministic function of t for being a function of Nt. Theoretically, the Nt is allowed to be any positive nondecreasing function of t in a certain period of time since the start of the experiment. Assuming f0 = 0 and k = 1, then from the equation (1),

$Mt∗=Nt{1−(1−(1−1∕b)μ)logb(NtN0)},$
(18)

where μ is the mutation rate in replication. As a function of t, the $Mt∗$ would be known if Nt is given. From the equation for Var(M) in (1), the variance of number of mutants Mt at time t is

$Var(Mt)=μNt2bN0(1−(1−1∕b)μ)logb(Nt∕N0)−1.⋅{(1−(1−1∕b)μ)logb(Nt∕N0)−N0∕Nt}.$
(19)

### 4.2 Exponential Growth

Assume Nt = N0eρt, where ρ is the proliferation rate, then $ρ=Nt′Nt$ which implies that ρ is the individual growth rate, or the growth per element per unit time. A culture grows from the generation 0 to the generation g in the time interval [0,t], hence g and t are related by g = λt where 1/λ is the mean life-span of one generation. Then Nt = N0eρt can be obtained from Ng = N0bg by letting ρ = λln(b). Let a be the mutation rate in time, the probability of mutation per element per unit time. Assume a is obtained by allocating probability mass μ (the probability of mutation per element per replication) uniformly on an element's life-span, hence $a=μ1∕λ=μλ$. Then $μ=aln(b)ρ$ and by (1) we have

$Mt∗=N0eρt[1−{1−(1−1∕b)aln(b)ρ}ρtln(b)]$
(20)

and $Var(Mt)=aln(b)ρbN0e2ρt(1−(1−1∕b)aln(b)ρ)ρtln(b)−1{(1−(1−1∕b)aln(b)ρ)ρtln(b)−e−ρt}$. If at ≈ 0 (i.e., μln(N)≈ 0), then we have

$Mt∗≈(1−1∕b)atN0eρtandVar(Mt)≈rba(b−1)ρN0e2ρt.$
(21)

At the time t, the expected number of mutants in culture, $Mt∗$ in (21), is similar to that of Luria and Delbruck [10], differing with a coefficient (1-1/b). The variance of the number of mutants Var(Mt) in (21) as a function of t is et with a constant coefficient and thus is similar to $a2N02e2ρt$, the one by Luria and Delbruck [10], which is also et with a constant coefficient. However, the ratio of two coefficients is $rb∕aN03(b−1)ρ$, which implies that the coefficient of former would be substantially larger than that of the latter when a is very small as in most applications. Hence the variance of the number of mutants in a population by the proposed model could be substantially larger than that by the Luria and Delbruck type model, even though the estimates for the number of mutants by the two models are very similar. For multiple parallel cultures, the proliferation rate ρ may vary substantially among cultures because parallel cultures can grow into quite diffierent sizes in a same period of time. For the ith culture with the initial size Ni0 and the size Nit at time t, the proliferation rate ρi can be estimated by $ρ^i=ln(NitNi0)∕t$. The mutation rate in time a varies among cultures if ρ does, however, on the contrary the mutation rate in replicationμ is a constant across multiple cultures. The mutation rate in time for the ith culture $ai(=μρiln(b))$ can be estimated by $a^i=Mit(1−1∕b)tNit$ from the ith culture alone; or by $a^i=μ^ln(NitNi0)∕tln(b)$, where $μ^$ is the estimator for μ from all cultures.

## 5 Divergent Culture Sizes

All cultures in an experiment are assumed to grow into around one common culture size in the traditional fluctuation analysis. In this section, we investigate how the estimation of mutation rate could be affected under the assumption of a common culture size when culture sizes actually diverge. The following analysis is based on the proposed stochastic model, however, its conclusion should be implicative to the fluctuation analysis using the traditional Luria-Delbruck type models.

Let $N‒$ be the sample mean of Ni's and $M‒$ be the sample mean of Mi's for C cultures for which all cultures have the same N0 and f0 and assuming f0 = 0. Then $μ^cs=rbM‒N‒kln(N‒∕N0)$ by (9) is an estimator for γ obtained by assuming equal culture sizes, where the subscript “cs” stands for “common size”. We can rewrite $μ^cs=1C∑i=1Cμ^i$, where $μ^i=rbMiN‒kln(N‒∕N0)$. Then we have

$E(μ^cs)=μ1C∑i=1CNiln(Ni∕N0)N‒ln(N‒∕N0)andVar(μ^cs)=rb2μkbCN01C∑i=1CNi2{N‒ln(N‒∕N0)}2$

. The first equation above indicates that $μ^cs$ is unbiased $(E(μ^cs)=μ)$ if Ni's are all equal, and could be biased if otherwise. On the contrary, the $μ^a$ by (12) is unbiased $(E(μ^a)=μ)$ even for unequal Ni's. The culture sizes Ni's of the parallel cultures can be regarded as samples of a random variable Nt. Now we consider the situation when the number of parallel cultures, C, becoming large. Let $rμcs≡limC→∞E(μ^cs)∕μ$, then $rμcs=E[Ntln(Nt∕N0)]E(Nt)ln(E(Nt)∕N0)$. Asymptotically, $μ^cs$ is unbiased to μ if $rμcs=1$, and $μ^cs$ is biased if $rμcs≠1$. Let $bμ^cs≡[E(μ^cs)−μ]∕μ$, which is the bias of $μ^cs$ relative to μ, then $rμcs=1+bμ^cs$. Let $rσcs2,σa2≡limC→∞Var(μ^cs)∕Var(μ^a)$, then $rσcs2,σa2=E(Nt2)[E(Nt)]2[ln(E(Nt∕N0))]2E[(ln(Nt∕N0))−2]$. Asymptotically, $μ^cs$ is less efficient than $μ^a$ if $rσcs2,σa2>1$, and $μ^cs$ is at least as efficient as $μ^a$ if $rσcs2,σa2≤1$.

For the population size $Nt=N0eρt$, we assume ρ is constant for one culture in the entire proliferation process, but ρ could be different for different cultures and thus can be considered as a random number for the multiple parallel cultures. Let m(τ) be the moment generating function of ρ, then $m(τ)≡E(eρτ)$. Let $φ(τ)≡ln(ln(m(τ)))$ and let ′(τ) be the derivative of (τ). Then it is straightforward to show that $rμcs=tφ′(t)$ for the $rμcs$ defined above. If ρ is normally distributed with mean ρ* and variance $σρ2$, then $m(τ)=exp(ρ∗τ+σρ2τ2∕2)$, and with some algebra, $bμ^cs=(2ρ∗σρ2t+1)−1$ for the relative bias $bμ^cs$ defined above. This equation implies that the bias of $μ^cs$ equals 0 if $σρ2=0$. That is, if the proliferation rates are identical across the parallel cultures, then culture sizes would be the same at the end of the experiment, and consequently the estimation of mutation rate is unbiased. The equation also indicates that if the proliferation rates are not identical $(σρ2>0)$, then the bias increases in t, that is, the bias is getting larger if the experiment lasts longer time. Fortunately, this relative bias is asymptotically bounded by 1. We may assume ρ>c for somec> 0, practically by excluding cultures which almost do not grow. Then $rσcs2,σa2≥eσρ2t2∕(2ρ∗c2t+σρ2c2)$, which implies that the ratio of variances increases in t unboundedly. The loss of efficiency could be substantial if the variance of $μ^cs$ is much larger than that of $μ^a$, because the sample sizes required are proportional to the variances of the estimators for μ. For the example in Section 3.3, the culture sizes range from 3.4×108 to 1.2×1010. We have $r^μcs=1.037$ and $r^σcs2,σa2=3.32$. The relative bias of $μ^cs$ for μ is $b^μ^cs=0.037$ and is practically negligible. The ratio of variances for $r^μcs$ to $r^μa$ is 3.32 which implies that the number of cultures for $μ^cs$ must be 3.32 times as large as that for $μ^a$ in order to achieve a same accuracy for estimation or a same power for comparison. The analysis above indicates that an estimation assuming equal culture sizes may lose efficiency substantially when culture sizes actually diverge.

## 6 Conclusion

Estimation and comparison of mutation rates in cells and microbes are important measurements in biological studies. However, the results could be misleading if the variation of the estimation or the errors of the comparison are ignored. Large variability in estimations had been observed by biological researchers (e.g., Kendal and Frost [6]), and could not be adequately explained by the traditional models. In this paper we proposed a model based on the branching process such that some realistic aspects of proliferation and mutation of cells and microbes are better represented in the model. The large variability is inherent to the number of mutants in a culture, and hence is inherent to the estimators for mutation rates. For biological researchers who want to obtain reliable estimations and comparisons of mutation rates, we propose statistical planning of experiments by assuming adequate initial and final culture sizes and calculating number of cultures required to achieve desired accuracy in estimation and statistical power for comparison. The traditional fluctuation analysis assume a common culture size among all cultures. In practice, culture sizes could differ substantially, especially when culture grow into large sizes. We demonstrated that the efficiency of estimation could be much lowered if assumed homogenous culture sizes actually diverge. For the traditional models, to make culture sizes homogenous, the cultures cannot be allowed to grow into very large sizes, which is a restriction that lowers the efficiency of estimation and comparison. On the contrary, the proposed model allows cultures grow into large sizes by taking account of different culture sizes. By this model, if the cultures can grow into larger sizes, then a smaller number of cultures is required to achieve a same accuracy of estimation or testing. An implication of this fact is that the laboratory work in experiments could be reduced by letting cultures grow for a longer time. The proposed model should be applicable to the mutation in the proliferation of cells or the asexual proliferation of eukaryotic and prokaryotic organisms, in which the new births are unlikely being given by the new members of the population and the mutation rate in replication is constant during the proliferation process.

## Acknowledgements

This work was supported in part by Cancer Center Support grant P30 CA-21765 from the National Cancer Institute, Bethesda, MD, and by the American Lebanese Syrian Associated Charities (ALSAC), Memphis, Tennessee, USA. The authors wish to thank the two anonymous referees for their comments, especially the first referee for very helpful suggestions.

## 7 Appendix

#### A1

Derivation of equations in (1): Since Eξ = 1 - (1 - 1/b)μ, N = N0bg and X0 = N0(1 - f0), we have $EXg=X0(Eξ)g=X0[b−(b−1)μ]g=N(1−f0)[1−(1−1∕b)μ]g$, which leads to the equation for E(M) in (1) by joining with E(M) = N - EXg and g = logb (N/N0). For branching process, $Var(Xg)=X0Var(ξ)(Eξ)g−1(Eξ)g−1Eξ−1$, which leads to the equation for Var(M) in (1) by joining with identities Var(M) = Var(N - M) = Var(Xg), = b - (b - 1)μ, and Var(ξ) = (b - 1)(1 - μ)μ.

#### A2

Derivation of the equation (2): Solving μ in the equation for E(M) in (1), we have identity $μ=bb−1[1−exp{−ln(1−E(M)∕N)+ln(1−f0)logb(N∕N0)}]≈bb−1−ln(1−E(M)∕N)+ln(1−f0)logb(N∕N0)$. Assume E(M)/N and f0 are close to 0, then ln(1 - E(M)/N) ≈ -E(M)/N and ln(1 - f0) ≈ -f0; consequently, $μ≈b1−bE(M)∕N−f0logb(N∕N0)$ which leads to the equation (2) by joining with logb(N/N0) = ln(N/N0)/ ln(b) and rb = ln(b)/(1 - 1/b) and replacing E(M)/N by its unbiased estimator M/N.

#### A3

Derivation of the equation (3): Assume N0/N ≈ 0, which is true for most of applications. Let g = logb(N/N0), then the equation for Var(M) in (1) leads to $Var(M)≈(1−f0)μN2bN0[1−(1−∕b)μ]2g−1≈(1−f0)μN2bN0[1−(2g−1)(1−1∕b)μ+O(μ2)]$. Since M is the only random number in the right side of the equation (2), we have $Var(μ^)=rb2{ln(N∕N0)}2N2Var(M)$. Then we get $Var(μ^)≈rb(1−f0)bN0[rbμ{ln(N∕N0)}2−2μ2ln(N∕N0)]$ by joining two equations above. If μ is much smaller than $1ln(N∕N0)$, then the second term of $Var(μ^)$ above can be ignored and thus the equation (3) holds.

#### A4

Derivation of the equation (10): For $μ^g$ in (9) and $μ^$ in (2), we have $μ^g=μ^∕k$ and thus $Var(μ^g)=Var(μ^)∕k2$, which leads to (10) by joining with the equation (3) and the identity μg = μ/k.

## References

[1] Armitage PJ. The Statistical Theory of Bacterial Populations Subject to Mutation. Journal of The Royal Statistics Society. 1952;B14:1–40.
[2] Drake JW. The Molecular Basis of Mutation. Holden-Day; San Francisco: 1970.
[3] Haccou P, Jagers P, Vatutin V. Branching Processes: Variation, Growth, and Extinction of Populations. Cambridge University Press; Cambridge, UK: 2005.
[4] Hay AJ, Wolstenholme AJ, Skehel JJ, Smith MH. The molecular basis of the specific antiinfluenza action of amantadine. European Molecular Biology Organization Journal. 1985;4:3021–3024. [PubMed]
[5] Karlin S, Taylor HM. A First Course in Stochastic Process. Academic Press; New York: 1975.
[6] Kendal WS, Frost P. Pitfalls and Practice of Luria-Delbruck Fluctuation Analysis: A Review. Cancer Research. 1988;48:1060–1065. [PubMed]
[7] Kimmel M, Axelrod D. Branching Processes in Biology. Spring-Verlag; New York: 2002.
[8] Lea DE, Coulson CA. The Distribution of the Numbers of Mutants in Bacterial Populations. Journal of Genetics. 1949;49:264–285. [PubMed]
[9] Li IC, Fu J, Hung YT, Chu EHY. Estimation of Mutation Rates in Cultured Mammalian Cells. Mutation Research. 1983;111:253–262. [PubMed]
[10] Luria SE, Delbruck M. Mutation of Bacteria from Virus Sensitivity to Virus Resistance. Genetics. 1943;28:492–511. [PubMed]
[11] Rao CR. Linear Statistical Inference and Its Applications. 2nd Ed. Wiley; New York: 1973.
[12] Rossman TG, Goncharova EI, Nadas A. Modeling and Measurement of the Spontaneous Mutation Rate in Mammalian Cells. Mutation Research. 1995;328:21–30. [PubMed]
[13] Stech J, Xiong X, Scholtissek C, Webster RG. Independence of Evolutionary and Mutational Rates after Transmission of Avian Influenza Viruses to Swine. Journal of Virology. 1999;73(3):1878–1884. [PubMed]
[14] Stewart FM, Gordon DM, Levin BR. Fluctuation Analysis: The Probability Distribution of the Number of Mutants Under Different Conditions. Genetics. 1990;124:175–185. [PubMed]

 PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers.