Our model focuses on an individual site of the mitochondrial genomes in the maternal ancestry of one individual. When a heteroplasmy (comprising two different nucleotides, generically X, Y) is observed at a site, we presume, this is a consequence of a somatic substitution X→Y or Y→X at that site in a maternal ancestor some g generations prior, inherited by a daughter, which has persisted for g generations.
Consider the maternal ancestry of an individual
A0, with
A1 its mother and
A2 its maternal grandmother, etc. Suppose a nucleotide substitution
X→
Y has occurred at a site in one genome of a maternal line ancestor
Ag (
g≥1), which is inherited by
Ag−1. Our model shows that the probability of an additional substitution inherited at that site among
Ag−1,

…,
A0 is less than 10
−2, so we will neglect this possibility. Suppose this site heteroplasmy persists for (exactly)
k≥1 generations, inherited by
Ag−1,

…,
Ag−k, but not by
Ag−k−1. If
k≥
g, then
A0 will be heteroplasmic at that site, although not necessarily observable. If
k<
g,
Ag−k−1,

…,
A0 are not heteroplasmic at that site, with their genomes either containing all
X or all
Y.
We propose a model where each oocyte recruits Nx mitochondrial genomes independently from the population of its mother's genomes. (The number Nx is sometimes referred to as the number of segregating units. For the Adélie penguins, we estimated Nx to have a 95% confidence interval (CI) of 25<Nx<69.)
The observed levels of most site heteroplasmies from the blood samples of a mother and her chicks closely agree, which reflects a close agreement in the germ line. We will assume that the variation, after accounting for measurement uncertainty, is due to sampling at the recruiting bottleneck and that the proportions in the blood sample estimate the inherited proportions.
If
A1 (the mother of
A0) contains a site heteroplasmy with the nucleotides
Y and
X appearing in proportions
ϕ and 1−
ϕ, then the probability (using a binomial selection with replacement) that
A0 inherits
i genomes with allele
Y and
Nx−
i genomes of allele
X is
If
A1 had inherited
j copies of allele
Y and
Nx−
j of allele
X from her mother
A2, we assume she exhibits the proportions
ϕ=
j/
Nx and 1−
ϕ=1−
j/
Nx of the alleles in the genomes available for inheritance. Hence,
is the probability that
A0 inherits
i copies of allele
Y, and
Nx−
i copies of allele
X, given her mother had inherited
j and
Nx−
j corresponding copies. Let
be the matrix of these probabilities, where the
Nx−1 rows and columns are indexed by
i,
j![[set membership]](/corehtml/pmc/pmcents/x2208.gif)
{1,2,

…,
Nx−1}.
Given that
Ag has a somatic mutation in a descendant of one of her
Nx founding genomes, the probability that
Ag−1 inherits more than one mutated genome is very small. Hence, we will assume
Ag−1 is heteroplasmic at that site, with proportion 1/
Nx of its genomes containing
Y. If the heteroplasmy is lost
g generations later, then
A0 has either all
X or all
Y at that site. Let
hX,
Y be the proportion of cases where the heteroplasmy persists and
hX and
hY be the proportions where the mutation is lost or fixed. The neutral model predicts that as
g increases
In a simulation study of 107 heteroplasmic site histories, we followed the introduction of one mutation until the site heteroplasmy was lost, for Nx=20 and for Nx=40. We found () for θ=0.23 that hX≈((Nx−1)/Nx) and hY≈1/Nx. also gives the average numbers of generations that the site heteroplasmies persist, and are observable. We note that over all histories, the average numbers of generations a site heteroplasmy was observable were almost identical for Nx=20 and 40, but the average variation in the levels of a heteroplasmy at a site between a mother and her chick differed significantly.
| Table 1Site heteroplasmy histories: results from 107 simulations for Nx=20 and 40 segregating units. (In more than 99.9% of the histories, the introduced mutation is either lost or fixed within 200 generations, and no site heteroplasmy survived more than 520 (more ...) |
For each
Ak in the ancestry, let
nk be the number of its founding genomes with nucleotide
Y at that site. We have assumed that
ng−1=1. If for
k<
g−1,
nk=0 or
Nx, the heteroplasmy is lost. Suppose 1≤
nk+1=
j<
Nx, then the probability that
nk=
i, (1≤
i<
Nx) is
In lemma 1 of the electronic supplementary material, we show that
which is the
ith entry of the leading column of

. Assuming that the probability that
Ag introduces a somatic mutation into the germ line at a selected site is
α, then summing over all generations
g≥1, the probability that
A0 has a site heteroplasmy with
n0=
i (1≤
i<
Nx) is
where

is the first entry in the
ith row of
As

is the expected number of generations that a new site heteroplasmy persists with
i copies (), the expected number of generations a heteroplasmy is observable at that site is
Most site heteroplasmies never reach the detection threshold, those that do, usually persist for many more than

generations ().
In , we plot the values of (
Q20)
i,1 and (
Q40)
i,1, noting that these two distributions are almost identical, and that

is closely approximated by 2/
i within the observed region. (The limit

as
Nx→∞ was noted by Fisher and Wright (
Ewens 2004, eqn (1.56)).)
We show in lemma 2 of the electronic supplementary material, that assuming

, the probability
β that a site has an observable heteroplasmy is closely approximated by 2
α
ln(
θ−1−1). Assuming neutral evolution, 1/
Nx of the substitutions entering the germ line will become fixed in the maternal line of descent, so that the mutation rate can be estimated as
where
t is the generation time.