In 2004, Hebert

*et al*. published a paper that they claimed demonstrated the power of DNA barcoding to clarify the taxonomy of a group of butterflies,

*Astraptes fulgerator*, that are morphologically similar as adults. The single partial mtDNA fragment they used—part of the cytochrome oxidase 1 gene—indicated at least 10 genetically distinct clades on a tree of 466 individuals.

Hebert *et al*. (2004) argued a case for recognizing these clades as morphologically cryptic species.

However, a collection of sequences sampled randomly from a population in which individuals can interbreed freely (i.e. a panmictic population) can frequently appear to have a degree of cladistic structure. Even if the individuals are morphologically identical and there is no *a priori* reason to believe that the group can be divided further, it is still possible to encounter a cladistic pattern and within- and between-clade genetic distances that appear to reinforce the view that the group is an assemblage of genetically distinct ‘species’. How then can we distinguish between a sound hypothesis of cryptic speciation and a spurious one?

Obviously, one approach is to obtain sequences from other, unlinked, loci: if phylogenies are congruent across loci, and this congruence is unlikely to be due to chance, then the hypothesis that cryptic species are present will be warranted. However, it is useful to know, given a single locus, whether it is worth further time and effort to test the hypothesis of cryptic speciation.

Here, we propose a simple, single-locus test that is based on the coalescent. The coalescent is a mathematical description of the genealogy of a sample of sequences from a Wright–Fisher population.

Kingman (1982*a*,

*b*) showed that the times to common ancestry of pairs of lineages, measured from present to past, can be approximated by exponential random variables with the expected time proportional to 2

*N*/

*i*(

*i*−1), where

*N* is the effective size of the population and

*i* is the number of lineages that have yet to coalesce as we move from the tips to the root of the tree. The method we have developed tests the null hypothesis that the apparent ‘distinctiveness’ of any specified clade is a consequence of the coalescent process acting on a single, panmictic population of constant size. More formally, for any given node on a genealogy that defines a putative (cryptic) species or other taxonomic unit, we calculate the ratio,

*M*, of the sum of the intervals spanning the node to the tips and the sum of intervals between the node and the root (or its most recent ancestor; ), and we compute the probability of obtaining that value of

*M* under a standard coalescent model. This statistic represents an intuitive measure of what people believe is an indication of taxonomic distinctiveness—indeed, those who advocate the use of DNA barcoding set such a threshold to assign sequences to species and higher taxonomic units (

Ratnasingham & Hebert 2007).

(a) The probability density function of M

To compute the distribution of

*M*, we need the probability density function (PDF) of the sum,

*s*_{k}, of

*k* independent exponential random variables, each with a unique mean

, and this is given by (

Khuong & Kong 2006)

*M* is the ratio of two sums, representing the distances of the ‘species-defining’ node,

*x*, to the tips of the tree (

*s*_{t}), and to its ancestor,

*a* (

*s*_{a}). Note that under the coalescent, we assume that time can be measured according to the ticking of the molecular clock; consequently, under the null hypothesis of a single panmictic, constant-sized population, these distances are composed of (

*n*−1−

*a*) and

*a* exponential random variables, respectively, where

*n* is the number of sequences in the sample. Each of these intervals has an expected time (in substitutions) equal to

, where

(

*μ* is the mutation rate, and the proportionality constant depends on whether the population is haploid or diploid) and

*i* is the number of lineages that have yet to coalesce. The probability of having a value of

*M*=

*s*_{t}/

*s*_{a} less than the observed value,

*m*, is

where

. In fact,

*p*(

*M*≤

*m*) is invariant under

*Θ* because the terms of

*Θ* in the numerator cancel the terms in the denominator. Hence, we can rewrite

*λ*_{r}=

*r*(

*r*−1).

For a specified node, equation

(3.2) gives a closed-form solution to the probability, under the coalescent, of observing a ratio of node-to-tip distance to node-to-ancestor distance smaller than

*m*. If this probability is sufficiently high (say, greater than 0.05), one would provisionally accept the null hypothesis that the observed ratio is a consequence of the conditions of the Wright–Fisher population model, these conditions being panmixis, and the absence of selection, changes in population size or population subdivision. In other words, the hypothesis that the specified node identifies a cryptic species would not be statistically significant. There is a technical issue with the computation of equation

(3.2) that relates to computational precision. We have found, using several programming languages (e.g. C++,

Java and

VisualBasic.Net), that the value of

*p*(

*M*≤

*m*) cannot be calculated reliably when

*a*+

*t*>45. With the matrix computing language

Matlab, it is possible to calculate

*p*(

*M*≤

*m*) for higher values, using its ‘variable precision arithmetic’ function, but there is a significant increase in computational time. However, since

*p*(

*M*≤

*m*) decreases monotonically as

*a*+

*t* increases, we have taken the approach that if

*p*(

*M*≤

*m*)<

*α* at

*a*+

*t*=40, we say that the

*M* ratio is significantly different from that expected under the coalescent.

A reasonable objection to this test is that it relies on only one of many possible coalescent models. For instance, one can imagine a different model of a single panmictic species with a population size that has decreased from some time in the past. The genealogical consequence of this type of dynamic is a tree with short coalescent intervals towards the tips and long coalescent intervals closer to the root. This pattern is likely to show up as statistically significant with the test we propose here, and one would be fooled into thinking that cryptic species are present when, in fact, they are not. Similarly, since most hypotheses of cryptic species are proposed after the tree is constructed, there needs to be some *a posteriori* correction of the level of significance. An uncorrected *p*-value will be too liberal. Our response to both of these criticisms is to admit that the test is liberal. However, it means that if we cannot reject the null hypothesis, then it makes it even harder to believe that cryptic species are present based on the phylogeny alone.

We applied our test to the 466 sequences obtained by

Hebert *et al*. (2004). We began by building a neighbour-joining tree in PAUP

^{*} (

Swofford 1999) using K2P distances. We next optimized the branch lengths of the tree using a clock-constrained maximum-likelihood procedure, also with PAUP

^{*}. A molecular clock allows us to recover the order of the coalescent nodes on the tree. For each putative cryptic species within

*A. fulgerator* (labelled using the nomenclature of

Hebert *et al*. (2004)), we measured

*M* as the ratio of the distance between the putative species-defining node and the tips of the tree to the distance between that node and its most recent common ancestor. For example, for the clade SENNOV, the species-defining node is the fourteenth coalescent node from the base of the tree, and the ancestral node is the fifth coalescent node from the base of the tree (). The results () indicate that of the 12 groups proposed by Hebert

*et al*., TRIGO, FABOV and INGCUP have

*p*-values greater than 5 per cent (although the last is right on the cusp of statistical significance and, given the problem with precision noted above, should probably be considered a statistically supported group). MYST, NUMT and BYTTNER have identical sequences and they return a

*p*-value equal to 0, although one would be justifiably cautious in quoting this value. This leaves 6 of the 12 species for which the null hypothesis has been conclusively rejected.

It is important to understand what it means when we reject the null hypothesis constructed here. Each test is an independent test of a specified clade. The ratio of intra- and extra-clade distances is compared against a null distribution of ratios that is not conditioned on the existing coalescent intervals of the reconstructed phylogeny. This *p*-value is simply the probability of obtaining the observed ratio from the space of all possible topologies and all possible coalescent intervals (under a constant-sized Wright–Fisher population model).

One could argue that such a test is inappropriate because significant results for some clades necessarily change the form of the null hypothesis as it is applied to other clades. In our example, for instance, both HIHAMP and YESSEN have rejected the null hypothesis of panmixis. This should mean that, logically, FABOV should also be treated as a separate species, but it does not reject the null hypothesis. Similarly, if all species except one closer to the tips are not significantly supported by our test, then this surely means that the taxonomic validity of the significant clade is at least logically in doubt. These are problems that we have yet to solve.

Hebert *et al*. (2004) provided other evidence for the presence of several cryptic species within the group, including the colour variation in caterpillars, and the different preferences for food plants. The question that needs to be asked is whether such corroborative evidence is obtained after the groups are ‘identified’, in which case there is cause to wonder how easy it would be to obtain equally corroborative evidence for any random grouping of individuals. It is difficult to know how to design appropriate

*a posteriori* corrections for corroborative evidence.

(c) The challenge

There are several issues that we believe make the identification of cryptic species a particularly difficult problem. First, as with any hypothesis test, failure to reject the null hypothesis does not mean that the alternative is false, only that there is insufficient evidence of its truth. As we accumulate more genetic information, we are likely to see more genetic variation among morphologically identical individuals. Similarly, we need to work out how corroborating evidence (collected after the genetic analyses) should be handled. The challenge is to develop a suite of methods that tests the hypothesis of cryptic speciation rigorously and critically.

Rosenberg (2007) has developed a statistical method to test whether the observed monophyly of a specified group could have occurred by chance. The test takes account of the number of sequences in the specified group.

Pons *et al*. (2006) have developed an explicit model that combines a Yule branching process to describe a species tree and the coalescent for intraspecific genealogies to locate the ‘switch points’ on a phylogeny that correspond to the transitions from Yule to coalescent processes.

Fontaneto *et al*. (2007), building on the analysis by

Pons *et al*. (2006), designed a test to compare the likelihoods of a phylogeny fitted assuming a coalescent process against that fitted with an estimated number of independently evolving clusters. Together with the method outlined here, we may have a near-complete toolbox of such tests.

A second challenge is to determine the best approach to deal with the false discovery rate (FDR) that is inherent in any procedure that constructs a hypothesis (in this case, of cryptic species) *a posteriori*. A good FDR correction recovers the frequency with which patterns that fool us into thinking they are biologically significant emerge by chance alone. How are we to do this with the identification of cryptic species? The test we have developed here is, strictly speaking, an *a priori* test applied to an *a posteriori* diagnosis. We apply the test assuming that the particular group we are testing is one we had intended to test all along—it is only with this assumption that we are able to test each node in isolation. However, in reality, we test nodes that appear after the phylogeny has been constructed.

Biologists use both cladistic (i.e. monophyly) and phenetic (i.e. ‘genetic distinctiveness’) criteria to identify the taxa hidden within the trees. To correct for the FDR, it would seem that we need to merge

Rosenberg's (2007) test and the test we have developed here. Alternatively, the tests by Pons

*et al*. and Fontaneto

*et al*. have the advantage of treating all clades simultaneously, and so may be less prone to false discovery.