Home | About | Journals | Submit | Contact Us | Français |

**|**HHS Author Manuscripts**|**PMC2758697

Formats

Article sections

Authors

Related links

Appl Math Lett. Author manuscript; available in PMC 2010 November 1.

Published in final edited form as:

Appl Math Lett. 2009 November 1; 22(11): 1778–1780.

doi: 10.1016/j.aml.2009.06.019PMCID: PMC2758697

NIHMSID: NIHMS130229

Mike Steel, Biomathematics Research Centre, University of Canterbury, New Zealand, Email: zn.ca.yrubretnac.htam@leetS.M;.

We improve the lower bound on the extremal version of the Maximum Agreement Subtree problem. Namely we prove that two binary trees on the same *n* leaves have subtrees with the same ≥ *c* log log *n* leaves which are homeomorphic, such that homeomorphism is identity on the leaves.

A *phylogenetic X-tree* is a binary tree in which the leaves are labelled bijectively with labels from a set *X* (usually {1, 2, …, *n*}) and internal vertices are unlabelled. Two phylogenetic X-trees are considered the same, if there is a label-preserving graph isomorphism between them.

If *T* is phylogenetic X-tree and *Y* *X* is a set of labels, then the *induced binary subtree* *T| _{Y}* is defined as follows: (a) take the subtree induced by

For *X* = {1, 2, 3, 4, 5, 6} and the two phylogenetic *X*–trees shown (*T* and *F*), a maximum agreement subtree is the phylogenetic tree *G = T|*_{Y} = F|_{Y} shown, where *Y* = {2, 3, 4, 6}.

If |*Y*| = 4, the induced binary subtree is often identified with an unordered partition of *Y* into two two-element sets, obtained by removing the (unique) internal edge of *T| _{Y}*. This partition is known as

An important algorithmic problem, known as the *Maximum Agreement Subtree Problem*, is the following: given two phylogenetic *X*–trees, find a common induced binary subtree of the largest possible size.

This problem has a history that spans more than 25 years, from papers in the early 1980s by Gordon [6], and Finden and Gordon [4]; to its implementation in the late 1990s in the widely-used phylogenetic software PAUP [12]. Somewhat surprisingly, this problem can be solved in polynomial time [11] (see also [5] and [8]).

Here we focus on the extremal version of the problem, where we ask how small a maximum agreement subtree can possibly be. This is motivated by a recent approach to tree comparison [3] based on the distribution of the maximum agreement subtree for pairs of randomly selected trees – thus we are interested here in the range of this distribution.

Let mast(*n*) denote the smallest order (number of leaves, or vertices) of the maximum agreement subtree of two phylogenetic X-trees with |*X*| = *n*. In 1992, Kubicka, Kubicki, and McMorris [7] showed that *c*_{1}(log log *n*)^{1/2} < mast(*n*) < *c*_{2} log *n* with some explicit constants.

The purpose of our note is to remove the squareroot sign from the lower bound. This is achieved by changing the order of two combinatorial steps, one resulting in taking logarithm twice, the other taking a squareroot. Of course, the squareroot sign after the log log is no longer visible.

First we would like to exhibit a direct connection to Ramsey theory, which might explain the large gap between the lower and upper bounds for mast(*n*). Let ${R}_{2}^{k}$ (*n*, ) denote the smallest integer *m* such that for any coloration of the *k*-element subsets of any *m*-element set with colors Red and Blue, there exists an *n*-element subset of the *m*-element set, such that every *k*-element subset of the *n*-element set is colored Red, or there exists an -element subset of the *m*-element set, such that every *k*-element subset of the -element set is colored Blue (see Chapter 14 in [9]).

**Claim.** mast[${R}_{2}^{4}$ (*n*, 6)] ≥ *n*.

**Proof.** We first recall an observation from [1] that for |*X*| = 6, any two phylogenetic X-trees share a quartet split. Given *T* and *F* arbitrary phylogenetic *X*—trees with |*X*| = ${R}_{2}^{4}$ (*n*, 6), color 4-subsets of *X* Red, if they define the same quartet split, otherwise Blue. No six elements of *X* can have all 4-subsets Blue by the previous reference, so there are *n* elements from X such that all their 4-subsets are colored Red. As the binary tree is determined by its quartet splits, these *n* elements span a size *n* agreement subtree, thereby establishing the Claim.

This approach would give an explicit lower bound for mast(*n*) in the form of a multiply-iterated logarithm, much weaker than *c*_{1}(log log *n*)^{1/2}.

Before proving our result, we quickly show *c*_{1}(log log *n*)^{1/2} < mast(*n*) following the approach in the 1992 paper by Kubicka, Kubicki, and McMorris [7]. Recall that a *caterpillar* is a tree, which has a path such that every leaf has a neighbor on the path (for example, the tree *F* in Fig. 1). Let us be given two phylogenetic X-trees *T* and *F* with |*X*| = *n*. As our trees are binary, the diameter of *T* is at least *c*_{3} log *n*. Therefore *T* must have an induced binary caterpillar subtree with leaf set *Y*, such that |*Y*| ≥ *c*_{3} log *n*. Consider the induced binary subtree *F| _{Y}*, which must have diameter ≥

Before turning to our main result, we need some definitions. We say that a phylogenetic X-tree *T* is *drawn on the plane* if it is drawn as a plane graph. The *circumference of phylogenetic tree drawn on the plane* is the cyclic permutation of *X*, the leaf set, as we walk around *T* clockwise. This concept has been been a useful combinatorial tool elsewhere (see, for example, [10]) and we illustrate it here in Fig. 1 by noting that the circumference of this drawing of *T* is the cyclic permutation (1, 4, 3, 6, 5, 2).

Note that for *Y* *X* the induced binary subtree of *T* (by *Y*) has a natural drawing following steps (a) and (b) by deleting edges and vertices from the plane drawing, and then removing the vertex designation of vertices of degree 2, but keeping the curve representing the path for representing the new edge. For this natural drawing of *T| _{Y}*, the circumference is the circumference of the drawing of

**Theorem 1.** For a constant *c* > 0, we have:

$$\text{mast}(n)>c\phantom{\rule{thinmathspace}{0ex}}\text{log log}\phantom{\rule{thinmathspace}{0ex}}n.$$

**Proof.** Take two arbitrary phylogenetic X-trees, *T* and *F*, with |*X*| = *n* and draw them in the plane. Cut the resulting circumferences anywhere to obtain two (linear) permutations of *X*. By the Erdős-Szekeres Theorem, there is subset *U* *X*, such that the two permutations either put *U* into the same linear order, or into opposite linear order, and |*U*| ≥ *c*_{5}*n*^{1/2}. Like in the proof explained before the theorem, *T| _{U}* has diameter ≥

**Remark.** It would be interesting to see whether Theorem 1 can be tightened. In particular, it is conceivable that the much stronger bound *c*′ log(*n*) < mast(*n*) holds, which would be best possible, up to the constant factor. Also, with slightly more care Theorem 1 can be expressed in a more explicit form mast(*n*) > $\frac{1}{4}$ log_{2} log_{2}(*n* − 1).

**Publisher's Disclaimer: **This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

^{*}This author was supported in part by the New Zealand Marsden Fund and the Allan Wilson Centre for Molecular Ecology and Evolution.

^{†}This author was supported in part by the NIH NIGMS contract 1 R01 GM078991-01, by the NSF DMS contract 0701111, by a Marie Curie Fellowship HUBI MTKD-CT-2006-042794, and by the 2007 Phylogeny program of the Isaac Newton Institute, Cambridge, where this work started.

Mike Steel, Biomathematics Research Centre, University of Canterbury, New Zealand, Email: zn.ca.yrubretnac.htam@leetS.M..

László A. Székely, Department of Mathematics, University of South Carolina, Email: ude.cs.htam@ylekezs..

1. Bandelt H-J, Dress AWM. Reconstructing the shape of a tree from observed dissimilarity data. Advances in Applied Mathematics. 1986;7:309–343.

2. Colonius H, Schulze HH. Tree structures for proximity data. British Journal of Mathematical and Statistical Psychology. 1981;34:167–180.

3. de Vienne DM, et al. A congruence test for testing topological similarity between trees. Bioinformatics. 2007;23:3119–3124. [PubMed]

4. Finden CR, Gordon AD. Obtaining common pruned trees. J. Classification. 1985;2 255-116.

5. Goddard W, Kubicka E, Kubicki G, McMorris FR. The agreement metric for labeled binary trees. Mathematical Biosciences. 1994;123:215–226. [PubMed]

6. Gordon AD. Consensus supetrees: the synthesis of rooted trees containing overlapping sets of labelled leaves. J. Classification. 1986;3:335–348.

7. Kubicka E, Kubicki G, McMorris FR. On agreement subtrees of two binary trees. Congressus Numerantium. 1992;88:217–224.

8. Kubicka E, Kubicki G, McMorris FR. An algorithm to find agreement subtrees. J. Classification. 1995;12:91–99.

9. Lovász L. Combinatorial Problems and Exercises. 2nd ed. North-Holland: 1993.

10. Semple C, Steel M. Cyclic permutations and evolutionary trees. Advances in Applied Mathematics. 2004;32(4):669–680.

11. Steel M, Warnow T. Kaikoura tree theorems: computing the maximum agreement subtree. Information Processing Letters. 1993;48:77–82.

12. Swofford DL. PAUP*. Phylogenetic Analysis Using Parsimony (*and Other Methods). Version 4. Sunderland, Massachusetts: Sinauer Associates; 2003.

PubMed Central Canada is a service of the Canadian Institutes of Health Research (CIHR) working in partnership with the National Research Council's national science library in cooperation with the National Center for Biotechnology Information at the U.S. National Library of Medicine(NCBI/NLM). It includes content provided to the PubMed Central International archive by participating publishers. |