(a) Current *N*_{e} *estimated from heterozygote excess*

In the absence of selection, mutation and migration, an infinitely large population with discrete generations will attain Hardy–Weinberg equilibrium by one generation of random mating. At the equilibrium, gene frequencies and genotype frequencies are constant from generation to generation, and there is a simple relationship between genotype frequencies and gene frequencies called the Hardy–Weinberg law. For a finite population, however, the Hardy–Weinberg law is violated because genetic drift generates both chance deviations of genotype frequencies from the expected Hardy–Weinberg proportions (HWP) and a systematic bias due to the discreteness of the possible numbers of different genotypes (

Kimura & Crow 1963). The bias is towards a heterozygote excess and homozygote deficiency compared with HWP, by an amount of

in a Wright–Fisher population with size

*N* (

Crow & Kimura 1970). This bias can be regarded as due to the statistical sampling of a finite number of offspring, and applies to a random sample of offspring when

*N* refers to the sample size.

In a finite diploid population with separate sexes, there are additional chance deviations of genotype frequencies from the expected HWP and a systematic heterozygote excess in the offspring, caused by the genetic drift that occurred in the parental generation. This systematic bias can be regarded as being caused by the process of genetic sampling of finite numbers of male and female parents, resulting in a stochastic difference in gene frequency between male and female parents.

Robertson (1965) showed that in an idealized population but with separate sexes with

*N*_{m} male and

*N*_{f} female parents, the heterozygote excess in the progeny is

where

is the effective size of the parental population (

Caballero 1994). Note that

*α*_{p} is independent of

*α*_{o} in a population with separate sexes. The former is caused by the drift in the parental population and is thus determined by the

*N*_{e} of the parental population, while the latter is caused by the drift (or sampling) in offspring and is thus determined by the

*N*_{e} (or sample size) of offspring.

Equations

(2.1) and

(2.2) suggest an estimator of effective size using the heterozygote excess observed at some marker loci from a single sample. The estimator derived by

Pudovkin *et al*. (1996) for a single biallelic locus is

where

,

*H*_{exp}=2

*p*(1−

*p*) is the expected heterozygosity calculated from the gene frequency (

*p*) observed in a sample of

*N* offspring, and

*H*_{obs} is the observed heterozygosity in the sample. For mutiallelic loci,

*D* is calculated as the average across alleles per locus, and across loci (

Pudovkin *et al*. 1996;

Luikart & Cornuet 1999).

Computer simulation studies indicate that this estimator is little biased, but has very low precision (

Pudovkin *et al*. 1996;

Luikart & Cornuet 1999). Applying estimator

(2.3) to 10 datasets with known small parental population sizes,

Luikart & Cornuet (1999) obtained five

*N*_{e} estimates that are infinity. Until now, there have been few applications of this estimator to the analysis of real datasets. Because the average heterozygosity excess is reciprocally proportional to

*N*_{e}, the precision of estimator

(2.3) decreases with an increasing true effective size. The estimator is useful only for very small random mating populations when many markers are genotyped from a large sample.

The estimation of

*N*_{e} based on heterozygosity excess can be improved in several aspects. First, more general equations for heterozygosity excess at diploid autosomal loci and haplodiploid loci (or species) owing to sampling/drift in parents (

*α*_{p}) and offspring (

*α*_{o}) were derived for populations with an arbitrary distribution of reproductive success (

Wang 1996). These formulae yield a general and simple relationship between

*N*_{e} and heterozygosity excess when the covariance between the numbers of male and female offspring per parent is zero, which seems to be plausible for most natural populations. A more general estimator should be developed based on these formulae, noting that the heterozygosity excess observed in a sample is partitioned into

*α*_{o} and

*α*_{p} owing to statistical sampling and drift, respectively. Second, the current estimator makes no weighting among the

*D* values calculated from different alleles and loci, resulting in a potential loss of precision. More appropriately, weights should be applied to different alleles and loci depending on their sampling variances determined by allele frequencies and sample sizes. Third, genetic drift generates not only a heterozygosity excess on average, but also chance deviations of genotype frequencies from the expected HWP. The variance of the deviations should have a simple relationship with

*N*_{e} and thus can be used for estimating

*N*_{e} as well. Using information on both the mean and variance of the deviations may potentially improve the precision of

*N*_{e} estimates substantially.

The assumptions of no mutation and no selection in the heterozygosity excess method are valid in general, because only one generation is concerned and markers are ‘neutral’. The assumption of a single isolated population without immigration is violated in some natural populations. When immigration exists, but is ignored,

*N*_{e} would be underestimated because in addition to drift, immigration also produces heterozygote excess. The strongest assumption made by the heterozygosity excess method seems to be random mating. When mating is not at random, then the heterozygosity excess generated by drift can easily be overwhelmed by that generated by non-random mating. Although the heterozygosity excess has been quantified in a finite population with non-random mating such as partial selfing or full-sib mating (

Wang 1996) and can be used to infer

*N*_{e}, the proportions of selfing or full-sib mating must be known

*a priori*. In reality, such proportions are at best estimated and their sampling errors could further affect the precision and accuracy of

*N*_{e} estimated from heterozygosity excess.

(b) Short- or long-term *N*_{e} *estimated from linkage disequilibrium*

Linkage disequilibrium (LD) is the non-random association between alleles at different loci in gametes. It can be produced in principle by a number of factors such as migration, direct or indirect (e.g. hitchhiking) selection, and genetic drift in finite populations. For neutral loci unlinked with selected loci in an isolated population with random mating, LD would come exclusively from genetic drift and can be used to estimate

*N*_{e} (

Hill 1981).

The LD for alleles A and B at two loci is defined as the difference between the frequency of gametes (chromosomes) bearing both alleles (

*p*_{AB}) and the product of the allele frequencies (

*p*_{A}*p*_{B}),

*D*_{AB}=

*p*_{AB}−

*p*_{A}*p*_{B}. The correlation between

*p*_{A} and

*p*_{B} is

(

Hill & Robertson 1968). If the two loci are neutral and the population has been in isolation with a constant effective size of

*N*_{e} for sufficiently long time, then

*r*_{AB} will be drawn from an equilibrium distribution determined by

*N*_{e} and the recombination rate between loci,

*c*. At this equilibrium, we have

*E*(

*r*_{AB})=0 and

approximately (

Weir & Hill 1980). When a sample of

*n* chromosomes or individuals are used to estimate allele frequencies and LD, an additional part of

*V*(

*r*_{AB}) owing to sampling, 1/

*n*, should also be included.

Hill (1974) showed that the contribution from sampling is the same whether

*n* chromosomes are extracted and identified or

*n* diploid individuals with unknown linkage phase are analysed.

For a number of

*L* loci, there are

*k*=

*L*(

*L*−1)/2 pairs of loci. Denote the correlation, recombination fraction and sample size for the

*i*th pair as

*r*_{i},

*c*_{i}, and

*n*_{i} (

*i*=1, 2,…,

*k*).

Hill (1981) derived a multilocus

*N*_{e} estimator using the formulation for a single locus above and an approximate variance–covariance matrix of

*r*_{i}^{2}among pairs of loci,

where

, and

*r*_{i} can be estimated by a variety of methods (

Weir 1979). The variance of the estimate is

An analysis based on equation

(2.6) showed that precise estimates of

*N*_{e} can be obtained only when the sample size

*n* is large relative to the ratio

*N*_{e}/

*γ*~4

*N*_{e}*c* (

Hill 1981) and/or

*k* is large.

Hill (1981) applied his method to the analysis of two datasets on

*Drosophila melanogaster*. The first is a sample of 198 flies collected from a wild population in North Carolina (

Langley *et al*. 1977). For each sampled individual, the second and third chromosomes were extracted and analysed for six and five enzyme loci, respectively. The estimate of

*N*_{e} from equation

(2.5) was negative, indicating that the observed LD is less than that expected from sampling alone and thus the best estimate of

*N*_{e} is infinitely large. The second dataset is from the Maine cage population of

Langley *et al*. (1978). Three and four enzyme loci on the second and third chromosomes, respectively, were analysed for a sample of 635 to 756 flies, depending on the pair of loci. Applying equations

(2.5) and

(2.6) yields a

*N*_{e} estimate of 363 with a standard deviation of 170. The

*N*_{e} estimate of 363 is below the census size of this cage population, 1000. Applying the LD estimator of

*N*_{e} to several other empirical datasets yielded plausible results (e.g.

Bartley *et al*. 1992;

Ardren & Kapuscinski 2003).

Several issues associated with the LD estimator of

*N*_{e} need further consideration. First, the estimator was derived using a number of approximations. For example, equation

(2.4) is not a good approximation when gene frequencies are very close to zero or one (

Hill 1981;

Hudson 1985). A simulation study is necessary to investigate how reliable these approximations are, whether the estimator is biased or not, and how accurate the estimate of variance as formulated in equation

(2.6) is. Second, the estimator assumed an isolated equilibrium population with a constant

*N*_{e}. The assumption may not be tenable for some natural populations, especially in the long run.

Hill (1981) showed that more information comes from pairs of more tightly linked loci, and therefore one may choose closely linked markers to better estimate

*N*_{e} from LD. However, the tighter the linkage between markers, the longer the time it requires the LD to reach an equilibrium distribution. In other words, the LD for more tightly linked markers observed in the current population would reflect (remember) a longer period of the past demographic history of the population, and thus is more probably affected by factors such as founder event, migration and fluctuation in population size (

Hill 1981). On the positive side, one may estimate

*N*_{e} separately from loosely linked loci and from tightly linked loci to obtain some information on the demographic history of the population (

Hill 1981). On the negative side, it is difficult to interpret the exact meaning of the

*N*_{e} estimate obtained. For example, a smaller estimate of

*N*_{e} from tightly linked markers than that from loosely linked markers may be a result of a founder event, a bottleneck in population size in the remote past, a gradual growth in population size, an immigration or a hybridization event in the past or any combination of these events. Third, the LD estimator uses information on pairs of loci, and additional information could be exploited from groups of three, four or more loci. Although the LD among three or more loci generated by drift in a finite random mating population was investigated by

Hill (1976), the results are less precise than those for pairs of loci and thus are considered of no practical value in estimating

*N*_{e} (

Hill 1981). Fourth, the estimator was derived assuming biallelic loci. At present, highly polymorphic markers that may have scores of alleles per locus, such as microsatellites, are widely available. One possible way to use multiallelic markers in the LD estimator is to consider, in turn, each allele and bin all the other alleles at a locus, as suggested by

Waples (1991). This may be plausible for unlinked loci. How this approximate treatment affects the accuracy and precision of the LD estimator needs further investigation.

Recently,

Hayes *et al*. (2003) developed a method to use the LD information from multiple densely spaced markers on a chromosome segment for inferring the

*N*_{e} at different time points in the past. They proposed a novel multilocus measure of LD, the chromosome segment homozygosity (CSH), which is defined as the probability that two homologous chromosome segments drawn at random from the population are from a common ancestor without intervening recombination. The CSH cannot be observed directly from marker data but can be inferred from marker haplotype (for the segment) frequencies and marker frequencies, both being observable from a sample of individuals. The expectation of CSH in a population with a constant

*N*_{e} is the same as that of

*r*^{2} in equation

(2.4), which is approximately 1/(4

*N*_{e}*c*+1) for small values of

*c*, the recombination fraction or the length of the chromosome segment in Morgans (

Hayes *et al*. 2003). When

*N*_{e} changes linearly over time, then the expectation of CSH is

*ca*. 1/(4

*N*_{e,t}*c*+1), where

*N*_{e,t} is the effective population size at

*t*=1/2

*c* generations ago in the past (

Hayes *et al*. 2003). Therefore, CSHs for chromosome segments of different lengths (

*c*) can be used to estimate the

*N*_{e}s at different generations in the past. Applying this method to a human haplotype dataset including 24 single nucleotide polymorphisms (SNPs) and 2 microsatellites in a 1

cM region, Hayes

*et al*. obtained an estimate of

*N*_{e} of about 5000 at about 2000 generations ago, using short lengths of haplotypes, and of about 15

000 at about 182 generations ago, using long lengths of haplotypes. The result suggests an exponential growth of the human population in the past, which seems to be plausible. Hayes

*et al*. also applied their method to a dataset comprising 16 microsatellites in a 65

cM segment on chromosome 20, sampled from 264 Australian Holstein–Friesian cows. The estimated

*N*_{e}s are 250 and 1000 using CSH at large and small lengths, respectively. This suggests a decline in

*N*_{e} for the population, which again seems to be compatible with the known breeding history of this breed.

There is still room to refine the CSH method. For example, the recent

*N*_{e} using large chromosome segments tends to be overestimated, suggesting that equation

(2.4) may not be a good approximation for larger segments in estimating the

*N*_{e} in the more recent past. Further, sampling effects due to small sample sizes need to be accounted for; otherwise, like LD for a pair of loci, the effect of

*N*_{e,t} on CSH will probably be swamped by sampling effects. Mutations, which are ignored by

Hayes *et al*. (2003), may not be a problem for estimating the recent

*N*_{e} using large chromosome segments, but could have a measurable effect on estimating the

*N*_{e} hundreds or thousands generations into the past using very small segments.

The above LD methods using either pairs of loci or segments of chromosomes assume an isolated population without immigration. The assumption may not be tenable for some natural populations, especially over long time-intervals. Immigration could lead to underestimation of

*N*_{e} from the LD methods, because the LD generated by migration is falsely regarded as that produced by drift.

Vitalis & Couvet (2001) proposed an estimator that can disentangle migration from drift as sources of LD and thus can estimate both simultaneously. Under the infinite island model (

Wright 1951), the

*F*_{st} for a focal population can be estimated from single locus genotype frequencies and gene frequencies observed from a sample. At equilibrium under migration and drift, the expected value of

*F*_{st} is

*ca* 1/(1+4

*N*_{e}*m*), where

*N*_{e} is the effective size and

*m* is the immigration rate for the focal population. To quantify the non-random association of alleles between a pair of loci,

Vitalis & Couvet (2001) defined the parameter of ‘within-subpopulation identity disequilibrium’ as the excess of two-locus identity probabilities over the product of single-locus identity probabilities among individuals within subpopulations. This parameter (denoted by

), when standardized in a way similar to

*r*_{AB} in equation

(2.4), can be estimated from two-locus genotype frequencies and gene frequencies observed in a sample. The expected value of

is a known function of one- and two-locus identity probabilities, which, in turn, are determined by parameters

*m* and

*N*_{e} (

Vitalis & Couvet 2001). From these known relations and the estimated values of

*F*_{st} and

, one can obtain separate estimates of

*m* and

*N*_{e}. Simulations show that in general the estimator could return reasonably good estimates of

*m* and

*N*_{e} when 50 individuals are sampled and 8 or more microsatellites are genotyped, if

*N*_{e} is not large (<100).

The simulations also indicate that

*N*_{e} is generally underestimated, especially when mutation is assumed to follow the

*k*-allele model (

Vitalis & Couvet 2001). The method relies on the infinite island model under drift and migration equilibrium, which requires a constant population size and migration rate over many generations. When

*N*_{e} is large, then the one- and two-locus identity disequilibria generated by drift would be too small to allow an accurate estimate of

*N*_{e}. In such a case, use of closely linked markers can increase the power, but a much larger number of generations would be necessary for the equilibrium to be reached.

(c) Short-term *N*_{e} *estimated from temporal samples*

In general, the allele frequency of a population changes over time owing to either the systematic forces of mutation, selection and migration or the stochastic force of genetic drift, or both. When all the systematic forces are excluded, the observed change in allele frequency comes solely from genetic drift and can thus be used to infer how strong the drift is or how large the

*N*_{e} of the population is. The so-called ‘temporal methods’ for estimating

*N*_{e} are based on this logic, and were proposed by

Krimbas & Tsakas (1971) and subsequently developed by many others (e.g.

Nei & Tajima 1981;

Pollak 1983;

Waples 1989).

The basic protocol of the approach is as follows. Suppose we have an isolated random mating population with discrete generations and we wish to measure its (average)

*N*_{e} during a certain period of time. Two samples of individuals can be taken at random from the population, the first sample at the beginning (generation 0) and the second sample at the end (generation

*t*) of the period of time. The two samples are then genotyped for a number of neutral markers to estimate allele frequencies, which are used to estimate the standardized variance in the temporal changes of allele frequency,

*F*. The estimated

*F*,

, is thus contributed by both sampling and genetic drift. When the sampling effect is accounted for,

reflects the strength of genetic drift and, in expectation, is reciprocally proportional to

*N*_{e}. An estimate of

*N*_{e} can then be obtained from

.

In the above protocol, the main difficulty comes from estimating

*F* and finding its expectation. Several estimators of

*F* were available, among which a widely used one was developed by

Nei & Tajima (1981),

where

*x*_{i} and

*y*_{i} are the observed frequencies of allele

*i* at a locus with

*k* alleles in the first and second samples, respectively. For multiple loci,

is calculated as the average of single locus estimates. The expectation of

depends on the sampling scheme that is used to sample genes from the population.

Nei & Tajima (1981) distinguished two sampling schemes.

Scheme 1 assumes that the population's census size (when samples are taken),

*N*, is equal to its

*N*_{e}, that allele frequencies are determined by sampling

*S*_{j} (

*j*=0 and

*t* for the first and second sample, respectively) individuals out of

*N*, and that sampling at any generation does not affect population allele frequencies and

*N*_{e}. The latter assumption holds when individuals are sampled after reproduction or when they are returned to the population after examination of genotypes. The exact expectation of

under this sampling scheme is difficult to obtain, but an approximation was derived by Nei & Tajima as

. Using

as its expectation, we can therefore obtain

Scheme 2 assumes that

*N* (when samples are taken) is much larger than

*N*_{e}, and that individuals for determining allele frequencies and those for generating the next generation are sampled separately from the population of

*N* individuals. Under this sampling scheme,

approximately (

Nei & Tajima 1981). When

*N**N*_{e}, the estimator of

*N*_{e} is

The uncertainty of

can be assessed from that of

. It was shown that

follows roughly the

*χ*^{2} distribution (

Lewontin & Krakauer 1973;

Nei & Tajima 1981). When

*n* loci are used in the estimation,

follows roughly a

*χ*^{2} distribution with

*k*_{I} degree of freedom, where

and

*k*_{i} is the number of alleles at locus

*i*. The

values that give the 2.5 and 97.5% cumulative probabilities can be obtained from this

*χ*^{2} distribution, which are then used to determine the 95% confidence limits of

*N*_{e}. In practice, the confidence interval determined by this procedure is a slightly overestimate.

Recently, probability methods have been developed to improve the estimates of

*N*_{e} from temporal samples.

Williamson & Slatkin (1999) proposed a likelihood framework to estimate

*N*_{e} and its change over time, using two or more samples of biallelic markers. The work was extended by

Anderson *et al*. (2000), who described a Monte Carlo approach to computing the likelihood with data on multiallelic markers. Their algorithm, using importance sampling, is highly computationally demanding. Although it produced reasonably good estimates of

*N*_{e} with small Monte Carlo variance when applied to small problems, it failed to converge when applied to data involving loci with many alleles (

Anderson *et al*. 2000).

Wang (2001) proposed a method to calculate the likelihood for multiallelic markers, which is computationally very efficient and applies to any number of alleles per locus. This method transforms a

*k*-allele locus into

*k* biallelic ‘loci’, each having one of the

*k* alleles with all the other alleles pooled. The overall log-likelihood is approximated by the sum of the log-likelihoods across the biallelic ‘loci’ multiplied by the factor of (

*k*−1)/

*k* to account for the dependence of the

*k* ‘loci’. This treatment reduces to the exact likelihood given by Williamson & Slatkin when markers are biallelic (

*k*=2), and yields indistinguishable results for

in terms of accuracy and precision from those from the exact likelihood method when markers are tri-allelic. Note that the strength of dependence of allele frequencies at a locus decreases with

*k*, and therefore

*k*=2 and 3 are the worst possible cases for this approximate treatment of multiallelic loci.

The changes in allele frequency in a population can be modelled by a forward probabilistic approach, as adopted by the likelihood methods above, or by a backward coalescent approach (

Berthier *et al*. 2002;

Beaumont 2003*b*;

Laval *et al*. 2003). These coalescent-based methods are Bayesian, usually resorting to Markov chain Monte Carlo (MCMC) to approximate the posterior distribution of

*N*_{e}. One potential advantage of the coalescent approach is its computational efficiency when sample size is small but

*N*_{e} is very large.

The probabilistic approaches have several advantages over moment estimators. First, they generally have higher accuracy and precision than moment methods, as verified by several extensive simulation studies (e.g.

Wang 2001;

Berthier *et al*. 2002;

Tallmon *et al*. 2004). Moment estimators tend to overestimate

*N*_{e} when genetic drift is strong (measured by

*t*/2

*N*_{e}) and when markers with high allelic diversity are used. In such cases, some low-frequency alleles observed in the first sample are absent from the second sample. Moment estimators implicitly assume that these alleles are lost from the population exactly at the

*t*th generation, while in reality they may be lost one or more generations before the

*t*th generation. Because of this, the variance of the change in allele frequency is underestimated and

*N*_{e} is overestimated. Furthermore, several approximations were made in deriving equations

(2.8 and

2.9) or similar moment estimators, resulting in further bias and imprecision, especially when the sampling interval is short (

*t*<3;

Nei & Tajima 1981). Second, likelihood methods naturally weigh information optimally. When there are many temporal samples, for example, the relative information content in different sampling intervals will depend on the relative sample sizes and the number of generations between the samples. This difference in information content is automatically taken into account in likelihood or Bayesian methods but is difficult to incorporate into the moment estimators. Third, the underlying demographic model of likelihood methods is flexible and can be easily modified to allow for the estimation of other interesting parameters, such as population growth rate (

Williamson & Slatkin 1999;

Wang 2001;

Beaumont 2003*b*). The disadvantage of probabilistic approaches is that they require much more computation than moment estimators, and as a result, some of them have not been tested extensively and have difficulty in being extended to more complicated demographic models.

All the above-mentioned temporal methods assume an isolated population without immigration. However, this assumption may not be valid for most populations in the real world, which are connected through gene flow in the forms of gametes and/or individuals. Furthermore, migration, even occurring at an extremely low rate, can substantially alter the genetic makeup of a population and its changes over time. Therefore, current temporal methods can considerably bias estimates of

*N*_{e} for populations with immigration. Recently,

Wang & Whitlock (2003) extend previous moment and maximum-likelihood methods to allow the joint estimation of

*N*_{e} and migration rate (

*m*) using genetic samples taken from different populations and time. It is shown that, compared with genetic drift acting alone, migration results in changes in allele frequency that are greater in the short term and smaller in the long term, leading to under- and overestimation of

*N*_{e}, respectively, if it is ignored.

Temporal methods have been developed mainly for populations with discrete generations. However, long-lived species typically have overlapping generations and samples from them generally contain individuals of different age/sex groups. In principle, the basic methods for discrete generations should apply approximately to populations with overlapping generations, when individuals are sampled representatively (proportionately) from all age/sex classes of the population with an interval of one or preferably more generations (

Nei & Tajima 1981). Although prolonging the sampling interval is possible, representative sampling is perhaps not realistic in many cases. First, there might be little information about the composition (distributions of individual age/sex) of the population in question before samples are taken. Second, individuals of different ages/sexes may have different behaviour and habitat preferences, and may not even coexist in the same region, making identification of the appropriate biological population and obtaining a representative sample difficult or virtually impossible (

Jorde & Ryman 1995). Even if representative sampling and a long sampling interval have been achieved, the methods assuming discrete generations do not use the information fully in samples from an age-structured population, and result in low-precision estimates. This is because an age-structured population does not constitute a homogeneous breeding unit and different age/sex classes are genetically correlated (

Hill 1979). Realizing the above problems,

Jorde & Ryman (1995) developed a moment estimator of

*N*_{e} applicable to populations with overlapping generations, based on the ideal model of a fixed age/sex structure and a constant number of individuals of each age/sex class (

Felsenstein 1971;

Hill 1972,

1979). The main difficulty in applying their moment estimator to populations with overlapping generations is how to group the data into discrete classes appropriately. A sample now contains individuals from several different age/sex classes, so the temporal changes in allele frequencies can be measured in many different ways through grouping the data differently. For example, allele frequencies can be compared between sampling years within the same age class or between age classes within a single sample, or between cohorts (individuals born in the same year) regardless of the ages and samples the individuals come from. Indeed, there are many different ways data can be grouped, and the particular grouping that is the best and how to combine estimates of

*N*_{e} obtained from different groupings are not immediately apparent. A likelihood method could solve this problem, so that the available data from different age/sex classes in multiple samples can be optimally used to give a single best estimate of

*N*_{e}.

(e) Ancient *N*_{e} *estimated from current genetic variation*

The polymorphisms of markers observed in a sample from the current population can be used to infer the *θ* or the long-term *N*_{e} of the population over the past of the order of *N*_{e} generations. For an ancestral species that became extinct in the remote past, it is usually impossible to apply the same approach to estimating its *N*_{e}, because genetic polymorphism is generally not observable. However, the genetic polymorphism of an ancestral species (ancestral polymorphism) can be inferred indirectly from that of its two or more descendant species, which can then be used to estimate the ancestral *θ* or *N*_{e}.

A simple method uses orthologous DNA sequences from three closely related species with a known phylogeny (

Nei 1987;

Wu 1991;

Hudson 1992), and is thus called the ‘trichotomy method’. A classical example is the trio consisting of humans, chimpanzees and gorillas, with the first speciation event leading to gorilla lineage and the second speciation event leading to humans and chimpanzees. The principle of the trichotomy method can be illustrated using the above classical example. The genealogy of orthologous DNA sequences sampled from such a trio may or may not be identical to the species phylogeny because of the presence of ancestral polymorphism common to the descendant species. The extent of inconsistency between gene trees and the species tree depends on the amount of ancestral polymorphism and the time-interval (

*T*, in generations) between the two successive speciation events. The larger the

*N*_{e} of the ancestral species common to humans and chimpanzees relative to

*T*, the greater the proportion,

*P*_{dis}, of gene trees discordant with the species tree. The two orthologous genes from humans and chimpanzees may either be derived from a common ancestor in the ancestral species during the time-interval

*T* or remain distinct throughout interval

*T*. The probability of the second event is

*e*^{−t}, where

*t*=

*T*/2

*N*_{e} for autosomal diploid loci,

*t*=

*T*/(

*N*_{e}/2) for

*Y*-linked or mitochondrial loci assuming a sex ratio of 1, and

*t*=

*T*/(3

*N*_{e}/2) for

*X*-linked loci. Given the second event, there must exist three distinct ancestral gene lineages before the first speciation event, and three equally probable gene genealogies are possible. One genealogy is consistent with the species tree, and the other two genealogies are discordant with the species tree, resulting in a total discordant proportion of (2/3)

*e*^{−t}. Equating this expected proportion to the observed proportion of discordance among many unlinked autosomal loci,

, we therefore obtain

The trichotomy method was applied to human and great ape sequence data, yielding estimates for the

*N*_{e} of the human–chimpanzee ancestral population in the range of 50

000 to 150

000, depending on the datasets used, gene tree reconstruction methods applied and the generation intervals (15 or 20 years) assumed (

Ruvolo 1997;

Chen & Li 2001).

The trichotomy method is quite simple, but has several drawbacks (

Takahata & Satta 2002;

Yang 2002). First, the method assumes that gene trees are correctly constructed, so that the inconstancy between gene and species trees comes solely from ancestral polymorphism. In reality, however, gene trees are estimated from sequence data, and thus suffer from sampling errors owing to limited information in the sequence data. The sampling error in reconstructed gene trees is usually quite high, because closely related species are considered and sequences from them are highly similar. The sampling errors in gene tree reconstruction would result in the overestimation of

, and thus overestimation of

*N*_{e}. Second, the trichotomy method is inefficient because only a fraction of the information available from the sequence data is used, while other information (such as branch lengths in gene trees) that is useful for

*N*_{e} inference is ignored. Third, the interval between the two speciation events,

*T*, was assumed to be known, but in reality it may be unknown or at best is estimated with large sampling errors.

It is obviously desirable to estimate

*N*_{e} using information on both the topologies and branch lengths of gene trees after accounting for their uncertainties in reconstruction.

Takahata (1986) suggested such a method for estimating the

*N*_{e} of the common ancestors of two closely related species, using a pair of orthologous genes with one from each of the two species. The coalescence time of the two orthologous genes consists of two parts, the species divergence time

*y* and the time

*x* that the two genes coexist in the ancestral species before they coalesce. For any pair of orthologous genes,

*y* is unknown but fixed, while

*x* is variable among pairs of orthologous genes. Both the mean and variance of

*x* among pairs of orthologous genes depend on the

*N*_{e} of the ancestral species. Using many pairs of orthologous genes, therefore, it is possible to extract information on

*x* and

*y* and, thus, to estimate the

*N*_{e} of the ancestral species. Such a moment estimator was developed by

Takahata (1986), and was later on extended to a full likelihood method and to the case of three extant species, where the

*N*_{e}s of the two extinct ancestral species as well as the two speciation dates were estimated jointly (

Takahata *et al*. 1995;

Takahata & Satta 2002). Yang (

1997,

2002) extended the method to account for variation in mutation rate among loci, and used the finite-sites model of

Jukes & Cantor (1969) to correct for multiple substitutions at the same site. Applying his likelihood and Bayesian methods to the data of

Chen & Li (2001),

Yang (2002) obtained an estimate of the

*N*_{e} of the human–chimpanzee ancestral population. The estimate is in the range of 12

000 to 21

000, much smaller than those of previous studies.

Rannala & Yang (2003) further extended Yang's Bayesian method, allowing the use of multiple sequences from a species and an arbitrary number of species with a known topology of the species tree.

Another class of methods are suitable for an ancestral species that recently split and diverged into two closely related species.

Wakeley & Hey (1997) proposed a moment estimator of the effective size of an ancestral species assumed to have recently been split into two isolated sister species evolving independently. The segregating sites in a sample of orthologous DNA sequences from the two sister species can be partitioned into four mutually exclusive categories. Category one comprises sites that are polymorphic in species 1 but monomorphic in species 2. Category 2 comprises sites that are polymorphic in species 2 but monomorphic in species 1. Category 3 comprises sites that are polymorphic in both species. Category 4 comprises sites showing fixed differences between the two species. Under the neutral infinite-sites model, and assuming a molecular clock and constant effective sizes of the ancestral (

*N*_{A}) and the two descendent species (

*N*_{1},

*N*_{2}),

Wakeley & Hey (1997) derived the expected numbers of sites in the four categories. The four expected numbers turn out to be functions of the four parameters,

*θ*_{i}=4

*N*_{i}*u* (for

*i*=1, 2, A) and

*τ*=2

*ut*, where

*u* is the mutation rate per sequence per generation and

*t* is the divergence time in generations. The four parameters can be obtained by equating the expected to the observed numbers of sites in the four categories and solving the four equations. Simulations show that all the four parameters can be estimated reasonably well with little bias when data on many unlinked loci are available. In the case of few non-recombinant loci, however, the four expected numbers are highly negatively correlated, resulting in a failure of the estimation procedure. Recently,

Nielsen & Wakeley (2001) extended the above isolation model to allow asymmetrical migration between the two descendant species and proposed full likelihood and Bayesian methods to estimate all of the six parameters (the above four parameters and two migration rates) jointly. Their methods apply to a single locus without recombination. More recently, the methods were generalized to allow the use of multiple unlinked loci with the same or different modes of inheritance (

Hey & Nielsen 2004).

Compared with the long-term

*N*_{e} of an extant population (species), the

*N*_{e} of an ancestral extinct species is more difficult to estimate, because the genetic variation observed in current samples has less information about the more remote past. Furthermore, methods for estimating ancestral

*N*_{e} require more assumptions, and some of them are more likely to be violated than those for estimating the long-term

*N*_{e} of an extant species. For example, the molecular clock hypothesis typically assumed in methods for estimating ancestral

*N*_{e} may be a good approximation when the species involved are tightly related and the total divergence time is not very long. Otherwise, different mutation (or substitution) rates must be assumed on different lineages. Because of the long evolutionary history involved, mutation rate heterogeneity both among nucleotide sites within a locus and among different loci might play an important role in interpreting the data. Failure to account for the heterogeneity could lead to an overestimation of

*x* and thus to an overestimation of the ancestral

*N*_{e} (

Takahata & Satta 2002). Currently, mutation rate heterogeneity was taken partially into account by some methods (e.g.

Yang 1997,

2002) but was ignored by most others. All of the above methods assume no intragenic recombination during the long period in which the sampled sequences coalesce into their common ancestor. Recombination reduces the variance in coalescence times (

*x*) across loci, resulting in an underestimation of the ancestral

*N*_{e} when ignored (

Takahata & Satta 2002).

Wall (2003) proposed a likelihood method based on summary statistics of data, in which he incorporated intragenic recombination but assumed the infinite-sites mutation model with a constant mutation rate among nucleotide sites within a locus and among loci. The method uses a single DNA sequence from each of three or more species, and is computationally very intensive.