Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC3597466

Formats

Article sections

- Abstract
- 1. Introduction
- 2. The neighbor-joining algorithm
- 3. An admixture scenario
- 4. The neighbor-joining algorithm in an admixture scenario
- 5. Properties
- 6. Discussion
- References

Authors

Related links

Pac Symp Biocomput. Author manuscript; available in PMC 2013 March 14.

Published in final edited form as:

PMCID: PMC3597466

NIHMSID: NIHMS441952

NAAMA M. KOPELMAN, Porter School of Environmental Studies, Department of Zoology, Tel Aviv University, Ramat Aviv, Israel;

The publisher's final edited version of this article is available free at Pac Symp Biocomput

See other articles in PMC that cite the published article.

Neighbor-joining is one of the most widely used methods for constructing evolutionary trees. This approach from phylogenetics is often employed in population genetics, where distance matrices obtained from allele frequencies are used to produce a representation of population relationships in the form of a tree. In phylogenetics, the utility of neighbor-joining derives partly from a result that for a class of distance matrices including those that are *additive* or tree-like—generated by summing weights over the edges connecting pairs of taxa in a tree to obtain pairwise distances—application of neighbor-joining recovers exactly the underlying tree. For populations within a species, however, migration and admixture can produce distance matrices that reflect more complex processes than those obtained from the bifurcating trees typical in the multispecies context. Admixed populations—populations descended from recent mixture of groups that have long been separated—have been observed to be located centrally in inferred neighbor-joining trees, with short external branches incident to the path connecting their source populations. Here, using a simple model, we explore mathematically the behavior of an admixed population under neighbor-joining. We show that with an additive distance matrix, a population admixed among two source populations necessarily lies on the path between the sources. Relaxing the additivity requirement, we examine the smallest nontrivial case—four populations, one of which is admixed between two of the other three—showing that the two source populations never merge with each other before one of them merges with the admixed population. Furthermore, the distance on the constructed tree between the admixed population and either source population is always smaller than the distance between the source populations, and the external branch for the admixed population is always incident to the path connecting the sources. We define three properties that hold for four taxa and that we hypothesize are satisfied under more general conditions: *antecedence of clustering*, *intermediacy of distances*, and *intermediacy of path lengths*. Our findings can inform interpretations of neighbor-joining trees with admixed groups, and they provide an explanation for patterns observed in trees of human populations.

Distance matrix methods in phylogenetics construct trees of taxa using algorithms applied to matrices that tabulate pairwise evolutionary distances between the taxa.^{1,2} Among these methods, neighbor-joining^{3,4} is one of the most popular.^{5–7} One of its key features is its consistency: if the distance matrix is *additive*, such that a tree of taxa exists that generates the distances in the matrix, then neighbor-joining recovers this exact tree.^{5,8,9} Further, neighbor-joining is robust in that theoretical and simulation-based studies have found it to infer sensible trees under a broad range of mathematical and biological conditions.^{7,9–13}

As trees have long been used in population genetics to describe relationships among populations,^{14,15} the neighbor-joining algorithm has been applied extensively as a population clustering tool, using distance matrices calculated from population-level allele frequencies. In humans, neighbor-joining trees have been and continue to be a regular feature of studies of population relationships.^{16–20} In population-genetic studies, because migration and admixture sometimes generate evolutionary histories that cannot easily be described by a bifurcating tree of populations, a neighbor-joining tree is treated as a type of population clustering diagram rather than a precise representation of the evolutionary history of the populations.

When neighbor-joining has been used with admixed populations—populations recently descended from two or more source groups that have long been separated—particular characteristics of the inferred trees have often been observed (Fig. 1). For example, one simulation study based on human data identified a reduction in the external branch length leading to an admixed population as the strength of gene flow with other populations was increased.^{21} It has also been suggested on the basis of observed human population trees that a short external branch for a population on a constructed neighbor-joining tree can imply recent admixture of the population, and that admixed populations often appear in the “middle” of a neighbor-joining tree, on branches incident to paths connecting possible source populations.^{21–25} This pattern is evident in Fig. 2, in which admixed Mestizo populations from Latin America lie on branches incident to the path connecting Native American and European populations. Here, we seek to understand these results on the behavior of admixed populations in the application of the neighbor-joining algorithm. We therefore apply neighbor-joining to populations that satisfy a simple admixture model, first considering the case in which the distance matrix is additive. Next, for the case of *n* = 4 taxa, we use a mechanistic mathematical investigation to examine three specific properties of neighbor-joining trees involving an admixed population.

Properties observed for admixed taxa in neighbor-joining trees. Taxon *t*_{8} represents an admixture of source populations *t*_{1} and *t*_{2}. The admixed taxon appears on a short external branch incident to the path connecting the source populations. Denoting distances **...**

We briefly review the neighbor-joining algorithm.^{3,4} Consider a set of *n* taxa, together with a distance function *d* computed for each pair of taxa, such that the distance between taxa *i* and *j* is denoted *d _{ij}*. The algorithm takes as input the distance matrix

As in other agglomerative algorithms that construct bifurcating trees,^{2,32} at each of a series of steps, the two nearest taxa according to a selection criterion are connected to a new interior node, becoming “neighbors” on the constructed tree. Branch lengths from the new node to the nodes it agglomerates, as well as the distances to all remaining nodes, are then calculated, and a new distance matrix is obtained. This procedure is repeated iteratively until the last three nodes remain, and these three nodes are then connected to a final interior node. Because the last three nodes are always joined, the number of taxa must exceed three for neighbor-joining to have a nontrivial decision at the first step.

At each step, the key decision is the choice of the pair of taxa that are agglomerated. Neighbor-joining uses an *n* × *n* matrix *Q*, containing entries *q _{ij}* for pairs of taxa (

$${q}_{ij}=(n-2){d}_{ij}-\sum _{k=1}^{n}{d}_{ik}-\sum _{k=1}^{n}{d}_{jk}.$$

(1)

The two taxa that are agglomerated are those with the minimal value of *q _{ij}* (choosing randomly in case of ties). If taxa

$${d}_{iu}=\frac{1}{2}{d}_{ij}+\frac{1}{2(n-2)}\left(\sum _{k=1}^{n}{d}_{ik}-\sum _{k=1}^{n}{d}_{jk}\right)$$

(2)

$${d}_{ju}=\frac{1}{2}{d}_{ij}+\frac{1}{2(n-2)}\left(\sum _{k=1}^{n}{d}_{jk}-\sum _{k=1}^{n}{d}_{ik}\right).$$

(3)

The distances of all remaining nodes *k* to node *u* are computed as

$${d}_{ku}=({d}_{ik}+{d}_{jk}-{d}_{ij})/2.$$

(4)

The next agglomeration then proceeds from an (*n* − 1) × (*n* − 1) distance matrix that replaces distances involving nodes *i* and *j* with those involving the single node *u*.

We examine a scenario in which one of the taxa is admixed among two of the others. This taxon can be viewed as having been formed from its two source taxa, such that individual members of the taxon have ancestors in both source groups. We label the taxa *t*_{1}, *t*_{2}, …, *t _{n}*. Without loss of generality, let taxon

In a standard statistical model of admixture used in population genetics, allele frequencies in an admixed taxon are given by linear combinations of the allele frequencies of the source taxa.^{33–37} We denote by *λ* the proportion of the ancestry of taxon *t _{n}* arising from

$${p}_{{t}_{n}}=\lambda {p}_{{t}_{1}}+(1-\lambda ){p}_{{t}_{2}}.$$

(5)

It follows that if for each of the taxa in a pair, a distance function *d* is linear in each component of the allele frequency vector at a locus, then the distances between the admixed taxon and other taxa are obtained as linear combinations of corresponding distances involving taxa *t*_{1} and *t*_{2}. Therefore, for 1 ≤ *i* ≤ *n* − 1,

$${d}_{{t}_{n},{t}_{i}}=\lambda {d}_{{t}_{1},{t}_{i}}+(1-\lambda ){d}_{{t}_{2},{t}_{i}}.$$

(6)

Eq. 6 continues to hold if for a series of loci, the distance function *d* is linear for each taxon in each component of the allele frequency vector *at each locus*, as would occur if the distance between a pair of taxa at a set of loci were computed as the mean of locus-wise distances that were each linear in the components of the allele frequency vector at the specified locus.

We assume that the distance function supplied to neighbor-joining satisfies eq. 6, and that it is symmetric, nonnegative, and zero if and only if it is computed between a taxon and itself; we otherwise do not concern ourselves with the form of the function. While typical population-genetic distance functions often involve nonlinear relationships with allele frequencies and do not necessarily follow eq. 6—consider the nonlinear graphs in Figure 3 of Boca & Rosenberg,^{38} which illustrate that for the *F _{ST}* measure and an admixed population

Our goal is to construct a distance matrix according to the admixture rule in eq. 6, mechanistically apply neighbor-joining to the matrix, and characterize the properties of the inference process and the resulting inferred tree. We examine two settings. In the first, arbitrarily many taxa are considered, and their distances produce an additive distance matrix (and therefore satisfy a *tree metric*^{39}). In the second, a general matrix is investigated, with distances that do not necessarily follow a tree metric, but the matrix includes only four taxa.

We first assume that the distance matrix is additive. In this case, by the consistency property of the neighbor-joining algorithm,^{5,8,9} distances between taxa on the constructed neighbor-joining tree exactly equal those of the input matrix. Denote by the distance function computed for pairs of nodes in the inferred neighbor-joining tree, such that for taxa *t _{i}* and

$${\widehat{d}}_{{t}_{i},{t}_{j}}={d}_{{t}_{i},{t}_{j}}.$$

(7)

Because *d*_{t1,tn} = (1 − *λ*)*d*_{t1,t2} and *d*_{t2,tn} = *λd*_{t1,t2} by eq. 6,

$${\widehat{d}}_{{t}_{1},{t}_{n}}=(1-\lambda ){\widehat{d}}_{{t}_{1},{t}_{2}}$$

(8)

$${\widehat{d}}_{{t}_{2},{t}_{n}}=\lambda {\widehat{d}}_{{t}_{1},{t}_{2}}.$$

(9)

It follows that _{t1,tn} + _{t2,tn} = _{t1,t2}, from which we can infer that taxa *t*_{1}, *t*_{2}, and *t _{n}* are collinear in the inferred neighbor-joining tree, with

We can obtain an even stronger result. Consider a case with at least four taxa: *t*_{1}, *t*_{2}, *t _{n}*, and, without loss of generality,

The case of an additive distance matrix for *n* taxa. (A) Illustration of distances in the model. In the text it is shown that *y* = 0. (B) The structure required for a tree. Taxa *t*_{1} and *t*_{2} lie at multifurcating nodes, with *t*_{n} on the line connecting them. **...**

By eq. 7, _{t1,tn} = *y* + *z* = *d*_{t1,tn} = (1 − *λ*)*x*. By eqs. 6 and 7,

$${d}_{{t}_{3},{t}_{n}}=\lambda {d}_{{t}_{3},{t}_{1}}+(1-\lambda ){d}_{{t}_{3},{t}_{2}}$$

(10)

$${\widehat{d}}_{{t}_{3},{t}_{n}}=\lambda {\widehat{d}}_{{t}_{3},{t}_{1}}+(1-\lambda ){\widehat{d}}_{{t}_{3},{t}_{2}}.$$

(11)

In other words, *c* + *z* = *λ*(*c* + *y*) + (1 − *λ*)(*c* + *x* − *y*). Together with the relationship *y* + *z* = (1 − *λ*)*x* and the assumption that *λ* > 0, eq. 11 implies that *y* = 0. It then follows that taxon *t*_{3} lies on a line with taxa *t*_{1}, *t*_{2}, and *t _{n}*. Further, taxon

We have therefore shown that for an additive tree with taxon *t _{n}* admixed between

The additive case can assist in explaining phenomena observed empirically with admixed populations in the application of neighbor-joining:^{21–25} in the additive case, *t _{n}* has external branch length 0, a result compatible with the short external branches detected for admixed taxa. Further,

The additive case is restrictive and atypical of the population-genetic context, in which migration and admixture generate non-tree-like evolution. We can then consider the more general setting of arbitrary genetic distance matrices with positive entries, examining the smallest nontrivial case, with *n* = 4 taxa. In this case, the admixed taxon is *t*_{4}, with source taxa *t*_{1} and *t*_{2}. We set the distances among taxa *t*_{1}, *t*_{2}, and *t*_{3} to be *d*_{t1,t2} = *x*, *d*_{t1,t3} = *y* and *d*_{t2,t3} = *z*, for some positive *x*, *y*, and *z*. Employing eq. 6, the distance matrix *D* has the form:

$$D=\left(\begin{array}{cccc}0& x& y& (1-\lambda )x\\ x& 0& z& \lambda x\\ y& z& 0& \lambda y+(1-\lambda )z\\ (1-\lambda )x& \lambda x& \lambda y+(1-\lambda )z& 0\end{array}\right).$$

(12)

Using eq. 1 to calculate the matrix *Q* used in deciding which taxa will agglomerate, we obtain

$$Q=\left(\begin{array}{cccc}0& {q}_{1}& {q}_{2}& {q}_{3}\\ {q}_{1}& 0& {q}_{3}& {q}_{2}\\ {q}_{2}& {q}_{3}& 0& {q}_{1}\\ {q}_{3}& {q}_{2}& {q}_{1}& 0\end{array}\right),$$

(13)

where

$${q}_{1}=-(x+y+z)$$

(14)

$${q}_{2}=-(2-\lambda )x-\lambda y-(2-\lambda )z$$

(15)

$${q}_{3}=-(1+\lambda )x-(1+\lambda )y-(1-\lambda )z.$$

(16)

Examining the relationships among *q*_{1}, *q*_{2}, and *q*_{3}, we have that

$${q}_{1}<{q}_{2}\iff x+z<y$$

(17)

$${q}_{1}<{q}_{3}\iff x+y<z$$

(18)

$${q}_{2}<{q}_{3}\iff y<(1-2\lambda )x+z.$$

(19)

As in the work of Eickmeyer & Yoshida,^{40} we partition the four-dimensional space of possible values of (*λ*, *q*_{1}, *q*_{2}, *q*_{3}) according to the tree topologies produced by neighbor-joining.

Three tree topologies are possible with the four taxa (Fig. 4). In Fig. 4A, taxon *t*_{4} is separated by three edges from taxa *t*_{1} and *t*_{2}, which themselves are separated by only two edges. In Fig. 4B, *t*_{4} is separated by two edges from *t*_{2} and by three edges from *t*_{1}; *t*_{1} and *t*_{2} are separated by three edges. Taxa *t*_{1} and *t*_{2} are also separated by three edges in Fig. 4C, but *t*_{4} is instead separated by two edges from *t*_{1} and by three edges from *t*_{2}.

The three possible topologies for *n* = 4 taxa. Node *u* is the unique node that places *t*_{1}, *t*_{2}, and *t*_{4} in different subtrees.

Seven possibilities exist for the smallest entry of *Q*: (1) *q*_{1}, (2) *q*_{2}, (3) *q*_{3}, (4) *q*_{1} and *q*_{2} (tied), (5) *q*_{1} and *q*_{3} (tied), (6) *q*_{2} and *q*_{3} (tied), and (7) *q*_{1}, *q*_{2}, and *q*_{3} (all tied). Each choice leads to a particular outcome among the three tree topologies in Fig. 4, with two or more topologies being possible outcomes in cases that involve ties. For each value among *q*_{1}, *q*_{2}, and *q*_{3}, two pairs of taxa produce the same value in the matrix *Q*. It can be shown that in each case, either choice of which pair is first to agglomerate leads to the same inferred tree. Without loss of generality, we choose the pair that does not include taxon *t*_{3}.

Four of the seven cases are not possible. In case 1, summing *x* + *z* < *y* and *x* + *y* < *z* in eqs. 17 and 18, we obtain *x* < 0. In case 4, setting *q*_{1} = *q*_{2} in eq. 17, *x* + *z* = *y*, from which we obtain *x* < 0 using *x* + *y* < *z* in eq. 18. Similarly, in case 5, *q*_{1} = *q*_{3} in eq. 18 produces *x* < 0 using *x* + *z* < *y* in eq. 17. In case 7, eqs. 17–19 become equalities, leading to *x* = 0 when *x* + *z* = *y* is substituted into eq. 19. All of these cases contradict the assumption that *x* > 0.

We consider the three allowable cases (cases 2, 3, and 6). For each of the possible inferred neighbor-joining trees, denote by *u* the unique interior node that places taxa *t*_{1}, *t*_{2}, and *t*_{4} in distinct subtrees (Fig. 4). Denote by the distance between nodes on the inferred tree. In case 2, *q*_{2} is smallest, taxa *t*_{2} and *t*_{4} agglomerate first, and using eqs. 2–4, we obtain

$${\widehat{d}}_{u,{t}_{2}}=(\lambda /4)(3x-y+z)$$

(20)

$${\widehat{d}}_{u,{t}_{4}}=(\lambda /4)(x+y-z)$$

(21)

$${\widehat{d}}_{u,{t}_{1}}=(1-\lambda )x.$$

(22)

We can show that _{u,t4} < _{u,t2} and _{u,t4} < _{u,t1}. The first of these two inequalities is equivalent to *λy* < *λ*(*x* + *z*), which holds because *λ* > 0, and because *y* < *x* + *z* by eq. 17. For the second inequality, note first that *y* < (1 − 2*λ*)*x* + *z* by eq. 19. Substituting the right-hand side in place of *y* in eq. 21, _{u,t4} is less than [2*λ*(1 − *λ*)]*x*/4, which in turn is less than _{u,t1} because 0 < *λ* < 1.

In case 3, *q*_{3} is smallest, taxa *t*_{1} and *t*_{4} agglomerate first, and using eqs. 2–4, we obtain

$${\widehat{d}}_{u,{t}_{1}}=[(1-\lambda )/4](3x+y-z)$$

(23)

$${\widehat{d}}_{u,{t}_{4}}=[(1-\lambda )/4](x-y+z)$$

(24)

$${\widehat{d}}_{u,{t}_{2}}=\lambda x.$$

(25)

Similarly to case 2, we show _{u,t4} < _{u,t1} and _{u,t4} < _{u,t2}. The first inequality is equivalent to (1 − *λ*)*z* < (1 − *λ*)(*x* + *y*), which holds because *λ* < 1, and because *z* < *x* + *y* by eq. 18. For the second equality, (1 − 2*λ*)*x* + *z* < *y* by eq. 19. Substituting the left-hand side in place of *y* in eq. 24, _{u,t4} is less than [(2*λ*(1 − *λ*)]*x*/4, which in turn is smaller than _{u,t2} because 0 < *λ* < 1.

Finally, in case 6, *q*_{2} and *q*_{3} are tied with the smallest values, and either *t*_{2} and *t*_{4} agglomerate first as in case 2, or *t*_{1} and *t*_{4} agglomerate first as in case 3. Neighbor-joining produces the tree in Fig. 4C with probability 1/2, and the tree in Fig. 4B with probability 1/2. With either choice, the same arguments used to demonstrate _{u,t4} < _{u,t1} and _{u,t4} < _{u,t2} in cases 2 and 3 apply, except that *y* is equal to (instead of greater than or less than) (1 − 2*λ*)*x* + *z*.

This collection of results demonstrates three phenomena for four-taxon trees built from distance matrices formed according to our admixture model. (1) The admixed taxon agglomerates with one of its two source taxa before the sources agglomerate with each other. Cases 2, 3, and 6 are the only ones allowable, and in these cases, the first neighbor-joining step agglomerates the admixed taxon *t*_{4} with one of the sources. (2) Denoting by *u* the unique node for which the admixed taxon and its source taxa all lie in different subtrees, the distance on the neighbor-joining tree of the admixed taxon to *u* is smaller than the distances to *u* of both source taxa. We demonstrated this result in each of the allowed cases, and it therefore holds in general. (3) The number of edges separating the source taxa on the inferred neighbor-joining tree, for each source taxon, is greater than or equal to the number of edges separating the admixed taxon from the source taxon. Only the trees in Figs. 4B and 4C are possible outcomes of neighbor-joining in our model, and the result holds for each of these trees.

Using the four-taxon results, we can formally define three properties of a distance matrix and its resulting neighbor-joining tree. The properties are well-defined for arbitrary *n*, and it is possible to evaluate whether a given *n*-taxon distance matrix satisfies them when neighbor-joining is applied. All three properties are possessed by all matrices generated by the four-taxon case of our admixture model.

The admixed taxon clusters with one of its source taxa before the source taxa cluster together. Stated precisely, some clade containing *t*_{1} but not *t*_{2} or *t _{n}* merges with some clade containing

Here we allow a clade to have any size, and potentially only a single taxon. In identifying the steps at which *t*_{1}, *t*_{2}, and *t _{n}* merge into the neighbor-joining tree, as in our four-taxon case, to ensure that these taxa do not all merge simultaneously at the final stage, we adopt the convention that if a four-taxon stage is reached in which

The distance on the constructed neighbor-joining tree between the admixed taxon and either of its source taxa is smaller than the corresponding distance between the two source taxa. That is, _{t1,tn} < _{t1,t2} and _{t2,tn} < _{t1,t2}. Equivalently, if *u* is the unique node in the constructed neighbor-joining tree for which *t*_{1}, *t*_{2}, and *t _{n}* lie in different subtrees, then

The number of edges separating the source taxa in the constructed neighbor-joining tree is greater than or equal to the number of edges separating the admixed taxon and either source taxon. If we define * _{ij}* as the number of edges in the path separating nodes

We have already demonstrated that in our admixture model, Properties 2 and 3 hold for all distance matrices in the *n*-taxon additive case; for Property 2, using eqs. 8 and 9 and 0 < *λ* < 1, _{t1,tn} < _{t1,t2} and _{t2,tn} < _{t1,t2}. For Property 3, we have shown that for an *n*-taxon additive distance matrix, taxon *t _{n}* lies on the interior of the path connecting

We have examined neighbor-joining in a model in which an admixed taxon is produced from two source taxa, finding that for a four-taxon scenario, distance matrices and their resulting trees possess three properties: *antecedence of clustering*, in which the admixed population clusters with one of the sources before the sources cluster with each other; *intermediacy of distances*, in which the distance on the constructed tree between the admixed taxon and either source taxon is less than the distance between the sources; and *intermediacy of path lengths*, in which the number of edges separating the admixed taxon and either source taxon is no larger than the number of edges separating the sources. We have further shown that for an arbitrary number of taxa, the latter two properties hold when the distance matrix is additive.

By a mechanistic examination, we have found that our model has features seen in empirical observations of neighbor-joining trees that involve admixed populations. In particular, the placement of admixed populations on short external branches incident to the paths connecting their source populations^{21–25} matches the demonstration in the additive and four-taxon cases of the *intermediacy of distances* and *intermediacy of path lengths* properties. The theoretical approach validates the view that populations that are centrally located on neighbor-joining trees and that possess short external branches might be recently admixed.

Our results suggest a broader investigation of the extent to which the three properties hold with an arbitrary number of taxa. We have not reported a result regarding *antecedence of clustering* in the *n*-taxon additive case, nor have we commented on any of the properties for general *n*-taxon distance matrices that are not necessarily additive. However, we expect that Properties 1–3 will be satisfied by our admixture model considerably more often than in a model in which no special constraints are imposed on distances that involve the *n*th taxon. As our model also involves an *n*th taxon with special features, the general analysis of the model might benefit from the “rogue taxon” framework of Cueto & Matsen,^{41} in which the addition of an *n*th taxon alters the tree produced for an initial group of *n* − 1 taxa.

An additional direction is to study alternative admixture models. Distance methods are most sensible when a distance is nearly additive; however, eq. 6 severely restricts the distance matrix, as it forces a structure with two multifurcating nodes. This aspect of the model can be relaxed by assuming that the distance is additive for taxa *t*_{1}, *t*_{2}, …, *t _{n}*

$${d}_{{t}_{n},{t}_{i}}=\{\begin{array}{ccc}{d}_{{t}_{1},{t}_{i}}-(1-\lambda ){d}_{{t}_{1},{t}_{2}}& \text{if}& (1-2\lambda ){d}_{{t}_{1},{t}_{2}}\le {d}_{{t}_{1},{t}_{i}}\\ {d}_{{t}_{2},{t}_{i}}-\lambda {d}_{{t}_{1},{t}_{2}}& \text{if}& (1-2\lambda ){d}_{{t}_{1},{t}_{2}}\ge {d}_{{t}_{1},{t}_{i}}.\end{array}$$

(26)

With this distance function, *t _{n}* is simply placed on the path from

Two possible placements of *t*_{n} with respect to *t*_{1}, *t*_{2}, and *t*_{i}, illustrating how *t*_{1}, *t*_{2}, …, *t*_{n} can have an additive distance matrix when *t*_{1}, *t*_{2}, …, *t*_{n}_{−1} have an additive distance matrix. For convenience, we illustrate only one representative **...**

To obtain eq. 26, we first suppose that *t*_{1}, *t*_{2}, …, *t _{n}*

$${\widehat{d}}_{{t}_{1},u}=({\widehat{d}}_{{t}_{1},{t}_{2}}+{\widehat{d}}_{{t}_{1},{t}_{i}}-{\widehat{d}}_{{t}_{2},{t}_{i}})/2$$

(27)

$${\widehat{d}}_{{t}_{2},u}=({\widehat{d}}_{{t}_{1},{t}_{2}}+{\widehat{d}}_{{t}_{2},{t}_{i}}-{\widehat{d}}_{{t}_{1},{t}_{i}})/2$$

(28)

$${\widehat{d}}_{{t}_{i},u}=({\widehat{d}}_{{t}_{1},{t}_{i}}+{\widehat{d}}_{{t}_{2},{t}_{i}}-{\widehat{d}}_{{t}_{1},{t}_{2}})/2.$$

(29)

If _{t1,tn} ≤ _{t1,u}, then *t _{n}* lies on the path from

$${\widehat{d}}_{{t}_{n},{t}_{i}}={\widehat{d}}_{u,{t}_{i}}+{\widehat{d}}_{{t}_{1},{t}_{2}}-{\widehat{d}}_{{t}_{1},{t}_{n}}.$$

(30)

If, on the other hand, _{t1,tn} ≥ _{t1,u}, then *t _{n}* lies on the path from

$${\widehat{d}}_{{t}_{n},{t}_{i}}={\widehat{d}}_{u,{t}_{i}}+{\widehat{d}}_{{t}_{1},{t}_{2}}-{\widehat{d}}_{{t}_{2},{t}_{n}}.$$

(31)

Applying eqs. 8, 9, and 27–29 together with the fact that = *d* for additive distance matrices, we produce the relationship in eq. 26.

Analysis of the three properties using this modified form for the admixture model, or more generally using specific distance functions commonly employed in population genetics, will further illuminate the features of neighbor-joining in admixed populations. Such analyses might also facilitate investigations of the behavior with admixed populations of other tree-building methods, or of phylogenetic network methods^{42} that are more directly designed to accommodate taxa with non-tree-like evolutionary histories.

We thank M. Feldman and four reviewers for comments. This work has been supported by National Institutes of Health grant R01 GM081441, by National Science Foundation grants BCS-1024627 and DBI-1146722, and by the Burroughs Wellcome Fund.

NAAMA M. KOPELMAN, Porter School of Environmental Studies, Department of Zoology, Tel Aviv University, Ramat Aviv, Israel.

LEWI STONE, Porter School of Environmental Studies, Department of Zoology, Tel Aviv University, Ramat Aviv, Israel.

OLIVIER GASCUEL, Méthodes et Algorithmes pour la Bioinformatique, LIRMM-CNRS, Montpellier, France.

NOAH A. ROSENBERG, Department of Biology, Stanford University, Stanford, California, USA.

1. Swofford DL, Olsen GJ, Waddell PJ, Hillis DM. Phylogenetic inference. In: Hillis DM, Moritz C, Mable BK, editors. Molecular Systematics. Sinauer; Sunderland, MA: 1996. pp. 407–514.

2. Felsenstein J. Inferring Phylogenies. Sinauer; Sunderland, MA: 2004.

3. Saitou N, Nei M. Mol Biol Evol. 1987;4:406. [PubMed]

4. Studier JA, Keppler KJ. Mol Biol Evol. 1988;5:729. [PubMed]

5. Bryant D. J Classif. 2005;22:3.

6. Gascuel O, Steel M. Mol Biol Evol. 2006;23:1997. [PubMed]

7. Mihaescu R, Levy D, Pachter L. Algorithmica. 2009;54:1.

8. Gascuel O. Concerning the NJ algorithm and its unweighted version, UNJ. In: Mirkin B, McMorris FR, Roberts FS, Rzhetsky A, editors. Mathematical Hierarchies and Biology. American Mathematical Society; Providence: 1997. pp. 149–170.

9. Atteson K. Algorithmica. 1999;25:251.

10. Saitou N, Imanishi T. Mol Biol Evol. 1989;6:514.

11. Kuhner MK, Felsenstein J. Mol Biol Evol. 1994;11:459. [PubMed]

12. Russo CAM, Takezaki N, Nei M. Mol Biol Evol. 1996;13:525. [PubMed]

13. Kalinowski ST. Heredity. 2009;102:506. [PubMed]

14. Edwards AWF, Cavalli-Sforza LL. Reconstruction of evolutionary trees. In: Heywood VH, McNeill J, editors. Phenetic and Phylogenetic Classification. Systematics Association; London: 1964. pp. 67–76.

15. Cavalli-Sforza LL, Edwards AWF. Evolution. 1967;21:550.

16. Bowcock AM, Ruiz-Linares A, Tomfohrde J, Minch E, Kidd JR, Cavalli-Sforza LL. Nature. 1994;368:455. [PubMed]

17. Pritchard JK, Stephens M, Donnelly P. Genetics. 2000;155:945. [PubMed]

18. Jakobsson M, Scholz SW, Scheet P, Gibbs JR, VanLiere JM, Fung H-C, Szpiech ZA, Degnan JH, Wang K, Guerreiro R, Bras JM, Schymick JC, Hernandez DG, Traynor BJ, Simon-Sanchez J, Matarin M, Britton A, van de Leemput J, Rafferty I, Bucan M, Cann HM, Hardy JA, Rosenberg NA, Singleton AB. Nature. 2008;451:998. [PubMed]

19. Atzmon G, Hao L, Pe’er I, Velez C, Pearlman A, Palamara PF, Morrow B, Friedman E, Oddoux C, Burns E, Ostrer H. Am J Hum Genet. 2010;86:850. [PubMed]

20. Hunley K, Healy M. Am J Phys Anthropol. 2011;146:530. [PubMed]

21. Ruiz-Linares A, Minch E, Meyer D, Cavalli-Sforza LL. Analysis of classical and DNA markers for reconstructing human population history. In: Brenner S, Hanihara K, editors. The Origin and Past of Modern Humans as Viewed from DNA. World Scientific; Singapore: 1995. pp. 123–148.

22. Bowcock AM, Kidd JR, Mountain JL, Hebert JM, Carotenuto L, Kidd KK, Cavalli-Sforza LL. Proc Natl Acad Sci USA. 1991;88:839. [PubMed]

23. Mountain JL, Lin AA, Bowcock AM, Cavalli-Sforza LL. Phil Trans R Soc Lond B Biol Sci. 1992;337:159. [PubMed]

24. Lin AA, Hebert JM, Mountain JL, Cavalli-Sforza LL. Gene Geog. 1994;8:191. [PubMed]

25. Mountain JL, Cavalli-Sforza LL. Proc Natl Acad Sci USA. 1994;91:6515. [PubMed]

26. Felsenstein J. PHYLIP (Phylogeny Inference Package) version 3.6 (Department of Genome Sciences. University of Washington; Seattle: 2005.

27. Rosenberg NA, Mahajan S, Ramachandran S, Zhao C, Pritchard JK, Feldman MW. PLoS Genet. 2005;1:660. [PMC free article] [PubMed]

28. Wang S, Lewis CM, Jr, Jakobsson M, Ramachandran S, Ray N, Bedoya G, Rojas W, Parra MV, Molina JA, Gallo C, Mazzotti G, Poletti G, Hill K, Hurtado AM, Labuda D, Klitz W, Barrantes R, Bortolini MC, Salzano FM, Petzl-Erler ML, Tsuneto LT, Llop E, Rothhammer F, Excoffier L, Feldman MW, Rosenberg NA, Ruiz-Linares A. PLoS Genet. 2007;3:2049. [PMC free article] [PubMed]

29. Wang S, Ray N, Rojas W, Parra MV, Bedoya G, Gallo C, Poletti G, Mazzotti G, Hill K, Hurtado AM, Camrena B, Nicolini H, Klitz W, Barrantes R, Molina JA, Freimer NB, Bortolini MC, Salzano FM, Petzl-Erler ML, Tsuneto LT, Dipierri JE, Alfaro EL, Bailliet G, Bianchi NO, Llop E, Rothhammer F, Excoffier L, Ruiz-Linares A. PLoS Genet. 2008;4:e1000037. [PMC free article] [PubMed]

30. Minch E, Ruiz Linares A, Goldstein DB, Feldman MW, Cavalli-Sforza LL. MI-CROSAT (version 1.5d): a program for calculating statistics on microsatellite data. Department of Genetics, Stanford University; Stanford, CA: 1998.

31. Mountain JL, Cavalli-Sforza LL. Am J Hum Genet. 1997;61:705. [PubMed]

32. Gascuel O. Mol Biol Evol. 1994;11:961. [PubMed]

33. Long JC, Smouse PE. Am J Phys Anthropol. 1983;61:411. [PubMed]

34. Fournier DA, Beacham TD, Riddell BE, Busack CA. Can J Fish Aquat Sci. 1984;41:400.

35. Rosenberg NA, Li LM, Ward R, Pritchard JK. Am J Hum Genet. 2003;73:1402. [PubMed]

36. Tang H, Peng J, Wang P, Risch NJ. Genet Epidemiol. 2005;28:289. [PubMed]

37. Alexander DH, Novembre J, Lange K. Genome Res. 2009;19:1655. [PubMed]

38. Boca SM, Rosenberg NA. Theor Pop Biol. 2011;80:208. [PMC free article] [PubMed]

39. Semple C, Steel M. Phylogenetics. Oxford University Press; Oxford: 2003.

40. Eickmeyer K, Yoshida R. Lect Notes Comp Sci. 2008;5147:81.

41. Cueto MA, Matsen FA. Bull Math Biol. 2011;73:1202. [PubMed]

42. Huson DH, Rupp R, Scornavacca C. Phylogenetic Networks: Concepts, Algorithms and Applications. Cambridge University Press; Cambridge: 2010.

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |