Search tips
Search criteria

Results 1-25 (610520)

Clipboard (0)

Related Articles

1.  Improved Heuristics for Minimum-Flip Supertree Construction 
The utility of the matrix representation with flipping (MRF) supertree method has been limited by the speed of its heuristic algorithms. We describe a new heuristic algorithm for MRF supertree construction that improves upon the speed of the previous heuristic by a factor of n (the number of taxa in the supertree). This new heuristic makes MRF tractable for large-scale supertree analyses and allows the first comparisons of MRF with other supertree methods using large empirical data sets. Analyses of three published supertree data sets with between 267 to 571 taxa indicate that MRF supertrees are equally or more similar to the input trees on average than matrix representation with parsimony (MRP) and modified min-cut supertrees. The results also show that large differences may exist between MRF and MRP supertrees and demonstrate that the MRF supertree method is a practical and potentially more accurate alternative to the nearly ubiquitous MRP supertree method.
PMCID: PMC2674677  PMID: 19455229
Supertree; phylogenetic trees; matrix representation with flipping; matrix representation with parsimony; tree search heuristics
2.  A simulation study comparing supertree and combined analysis methods using SMIDGen 
Supertree methods comprise one approach to reconstructing large molecular phylogenies given multi-marker datasets: trees are estimated on each marker and then combined into a tree (the "supertree") on the entire set of taxa. Supertrees can be constructed using various algorithmic techniques, with the most common being matrix representation with parsimony (MRP). When the data allow, the competing approach is a combined analysis (also known as a "supermatrix" or "total evidence" approach) whereby the different sequence data matrices for each of the different subsets of taxa are concatenated into a single supermatrix, and a tree is estimated on that supermatrix.
In this paper, we describe an extensive simulation study we performed comparing two supertree methods, MRP and weighted MRP, to combined analysis methods on large model trees. A key contribution of this study is our novel simulation methodology (Super-Method Input Data Generator, or SMIDGen) that better reflects biological processes and the practices of systematists than earlier simulations. We show that combined analysis based upon maximum likelihood outperforms MRP and weighted MRP, giving especially big improvements when the largest subtree does not contain most of the taxa.
This study demonstrates that MRP and weighted MRP produce distinctly less accurate trees than combined analyses for a given base method (maximum parsimony or maximum likelihood). Since there are situations in which combined analyses are not feasible, there is a clear need for better supertree methods. The source tree and combined datasets used in this study can be used to test other supertree and combined analysis methods.
PMCID: PMC2837663  PMID: 20047664
3.  An experimental study of Quartets MaxCut and other supertree methods 
Supertree methods represent one of the major ways by which the Tree of Life can be estimated, but despite many recent algorithmic innovations, matrix representation with parsimony (MRP) remains the main algorithmic supertree method.
We evaluated the performance of several supertree methods based upon the Quartets MaxCut (QMC) method of Snir and Rao and showed that two of these methods usually outperform MRP and five other supertree methods that we studied, under many realistic model conditions. However, the QMC-based methods have scalability issues that may limit their utility on large datasets. We also observed that taxon sampling impacted supertree accuracy, with poor results obtained when all of the source trees were only sparsely sampled. Finally, we showed that the popular optimality criterion of minimizing the total topological distance of the supertree to the source trees is only weakly correlated with supertree topological accuracy. Therefore evaluating supertree methods on biological datasets is problematic.
Our results show that supertree methods that improve upon MRP are possible, and that an effort should be made to produce scalable and robust implementations of the most accurate supertree methods. Also, because topological accuracy depends upon taxon sampling strategies, attempts to construct very large phylogenetic trees using supertree methods should consider the selection of source tree datasets, as well as supertree methods. Finally, since supertree topological error is only weakly correlated with the supertree's topological distance to its source trees, development and testing of supertree methods presents methodological challenges.
PMCID: PMC3101644  PMID: 21504600
4.  Robinson-Foulds Supertrees 
Supertree methods synthesize collections of small phylogenetic trees with incomplete taxon overlap into comprehensive trees, or supertrees, that include all taxa found in the input trees. Supertree methods based on the well established Robinson-Foulds (RF) distance have the potential to build supertrees that retain much information from the input trees. Specifically, the RF supertree problem seeks a binary supertree that minimizes the sum of the RF distances from the supertree to the input trees. Thus, an RF supertree is a supertree that is consistent with the largest number of clusters (or clades) from the input trees.
We introduce efficient, local search based, hill-climbing heuristics for the intrinsically hard RF supertree problem on rooted trees. These heuristics use novel non-trivial algorithms for the SPR and TBR local search problems which improve on the time complexity of the best known (naïve) solutions by a factor of Θ(n) and Θ(n2) respectively (where n is the number of taxa, or leaves, in the supertree). We use an implementation of our new algorithms to examine the performance of the RF supertree method and compare it to matrix representation with parsimony (MRP) and the triplet supertree method using four supertree data sets. Not only did our RF heuristic provide fast estimates of RF supertrees in all data sets, but the RF supertrees also retained more of the information from the input trees (based on the RF distance) than the other supertree methods.
Our heuristics for the RF supertree problem, based on our new local search algorithms, make it possible for the first time to estimate large supertrees by directly optimizing the RF distance from rooted input trees to the supertrees. This provides a new and fast method to build accurate supertrees. RF supertrees may also be useful for estimating majority-rule(-) supertrees, which are a generalization of majority-rule consensus trees.
PMCID: PMC2846952  PMID: 20181274
5.  MRL and SuperFine+MRL: new supertree methods 
Supertree methods combine trees on subsets of the full taxon set together to produce a tree on the entire set of taxa. Of the many supertree methods, the most popular is MRP (Matrix Representation with Parsimony), a method that operates by first encoding the input set of source trees by a large matrix (the "MRP matrix") over {0,1, ?}, and then running maximum parsimony heuristics on the MRP matrix. Experimental studies evaluating MRP in comparison to other supertree methods have established that for large datasets, MRP generally produces trees of equal or greater accuracy than other methods, and can run on larger datasets. A recent development in supertree methods is SuperFine+MRP, a method that combines MRP with a divide-and-conquer approach, and produces more accurate trees in less time than MRP. In this paper we consider a new approach for supertree estimation, called MRL (Matrix Representation with Likelihood). MRL begins with the same MRP matrix, but then analyzes the MRP matrix using heuristics (such as RAxML) for 2-state Maximum Likelihood.
We compared MRP and SuperFine+MRP with MRL and SuperFine+MRL on simulated and biological datasets. We examined the MRP and MRL scores of each method on a wide range of datasets, as well as the resulting topological accuracy of the trees. Our experimental results show that MRL, coupled with a very good ML heuristic such as RAxML, produced more accurate trees than MRP, and MRL scores were more strongly correlated with topological accuracy than MRP scores.
SuperFine+MRP, when based upon a good MP heuristic, such as TNT, produces among the best scores for both MRP and MRL, and is generally faster and more topologically accurate than other supertree methods we tested.
PMCID: PMC3308190  PMID: 22280525
MRP; MRL; supertrees; phylogenetics
6.  Phylogeny and divergence of the pinnipeds (Carnivora: Mammalia) assessed using a multigene dataset 
Phylogenetic comparative methods are often improved by complete phylogenies with meaningful branch lengths (e.g., divergence dates). This study presents a dated molecular supertree for all 34 world pinniped species derived from a weighted matrix representation with parsimony (MRP) supertree analysis of 50 gene trees, each determined under a maximum likelihood (ML) framework. Divergence times were determined by mapping the same sequence data (plus two additional genes) on to the supertree topology and calibrating the ML branch lengths against a range of fossil calibrations. We assessed the sensitivity of our supertree topology in two ways: 1) a second supertree with all mtDNA genes combined into a single source tree, and 2) likelihood-based supermatrix analyses. Divergence dates were also calculated using a Bayesian relaxed molecular clock with rate autocorrelation to test the sensitivity of our supertree results further.
The resulting phylogenies all agreed broadly with recent molecular studies, in particular supporting the monophyly of Phocidae, Otariidae, and the two phocid subfamilies, as well as an Odobenidae + Otariidae sister relationship; areas of disagreement were limited to four more poorly supported regions. Neither the supertree nor supermatrix analyses supported the monophyly of the two traditional otariid subfamilies, supporting suggestions for the need for taxonomic revision in this group. Phocid relationships were similar to other recent studies and deeper branches were generally well-resolved. Halichoerus grypus was nested within a paraphyletic Pusa, although relationships within Phocina tend to be poorly supported. Divergence date estimates for the supertree were in good agreement with other studies and the available fossil record; however, the Bayesian relaxed molecular clock divergence date estimates were significantly older.
Our results join other recent studies and highlight the need for a re-evaluation of pinniped taxonomy, especially as regards the subfamilial classification of otariids and the generic nomenclature of Phocina. Even with the recent publication of new sequence data, the available genetic sequence information for several species, particularly those in Arctocephalus, remains very limited, especially for nuclear markers. However, resolution of parts of the tree will probably remain difficult, even with additional data, due to apparent rapid radiations. Our study addresses the lack of a recent pinniped phylogeny that includes all species and robust divergence dates for all nodes, and will therefore prove indispensable to comparative and macroevolutionary studies of this group of carnivores.
PMCID: PMC2245807  PMID: 17996107
7.  Split-based computation of majority-rule supertrees 
Supertree methods combine overlapping input trees into a larger supertree. Here, I consider split-based supertree methods that first extract the split information of the input trees and subsequently combine this split information into a phylogeny. Well known split-based supertree methods are matrix representation with parsimony and matrix representation with compatibility. Combining input trees on the same taxon set, as in the consensus setting, is a well-studied task and it is thus desirable to generalize consensus methods to supertree methods.
Here, three variants of majority-rule (MR) supertrees that generalize majority-rule consensus trees are investigated. I provide simple formulas for computing the respective score for bifurcating input- and supertrees. These score computations, together with a heuristic tree search minmizing the scores, were implemented in the python program PluMiST (Plus- and Minus SuperTrees) available from The different MR methods were tested by simulation and on real data sets. The search heuristic was successful in combining compatible input trees. When combining incompatible input trees, especially one variant, MR(-) supertrees, performed well.
The presented framework allows for an efficient score computation of three majority-rule supertree variants and input trees. I combined the score computation with a heuristic search over the supertree space. The implementation was tested by simulation and on real data sets and showed promising results. Especially the MR(-) variant seems to be a reasonable score for supertree reconstruction. Generalizing these computations to multifurcating trees is an open problem, which may be tackled using this framework.
PMCID: PMC3169514  PMID: 21752249
8.  Supertrees Based on the Subtree Prune-and-Regraft Distance 
Systematic Biology  2014;63(4):566-581.
Supertree methods reconcile a set of phylogenetic trees into a single structure that is often interpreted as a branching history of species. A key challenge is combining conflicting evolutionary histories that are due to artifacts of phylogenetic reconstruction and phenomena such as lateral gene transfer (LGT). Many supertree approaches use optimality criteria that do not reflect underlying processes, have known biases, and may be unduly influenced by LGT. We present the first method to construct supertrees by using the subtree prune-and-regraft (SPR) distance as an optimality criterion. Although calculating the rooted SPR distance between a pair of trees is NP-hard, our new maximum agreement forest-based methods can reconcile trees with hundreds of taxa and > 50 transfers in fractions of a second, which enables repeated calculations during the course of an iterative search. Our approach can accommodate trees in which uncertain relationships have been collapsed to multifurcating nodes. Using a series of benchmark datasets simulated under plausible rates of LGT, we show that SPR supertrees are more similar to correct species histories than supertrees based on parsimony or Robinson–Foulds distance criteria. We successfully constructed an SPR supertree from a phylogenomic dataset of 40,631 gene trees that covered 244 genomes representing several major bacterial phyla. Our SPR-based approach also allowed direct inference of highways of gene transfer between bacterial classes and genera. A Small number of these highways connect genera in different phyla and can highlight specific genes implicated in long-distance LGT. [Lateral gene transfer; matrix representation with parsimony; phylogenomics; prokaryotic phylogeny; Robinson–Foulds; subtree prune-and-regraft; supertrees.]
PMCID: PMC4055872  PMID: 24695589
9.  Constructing majority-rule supertrees 
Supertree methods combine the phylogenetic information from multiple partially-overlapping trees into a larger phylogenetic tree called a supertree. Several supertree construction methods have been proposed to date, but most of these are not designed with any specific properties in mind. Recently, Cotton and Wilkinson proposed extensions of the majority-rule consensus tree method to the supertree setting that inherit many of the appealing properties of the former.
We study a variant of one of Cotton and Wilkinson's methods, called majority-rule (+) supertrees. After proving that a key underlying problem for constructing majority-rule (+) supertrees is NP-hard, we develop a polynomial-size exact integer linear programming formulation of the problem. We then present a data reduction heuristic that identifies smaller subproblems that can be solved independently. While this technique is not guaranteed to produce optimal solutions, it can achieve substantial problem-size reduction. Finally, we report on a computational study of our approach on various real data sets, including the 121-taxon, 7-tree Seabirds data set of Kennedy and Page.
The results indicate that our exact method is computationally feasible for moderately large inputs. For larger inputs, our data reduction heuristic makes it feasible to tackle problems that are well beyond the range of the basic integer programming approach. Comparisons between the results obtained by our heuristic and exact solutions indicate that the heuristic produces good answers. Our results also suggest that the majority-rule (+) approach, in both its basic form and with data reduction, yields biologically meaningful phylogenies.
PMCID: PMC2826330  PMID: 20047658
10.  Triplet supertree heuristics for the tree of life 
BMC Bioinformatics  2009;10(Suppl 1):S8.
There is much interest in developing fast and accurate supertree methods to infer the tree of life. Supertree methods combine smaller input trees with overlapping sets of taxa to make a comprehensive phylogenetic tree that contains all of the taxa in the input trees. The intrinsically hard triplet supertree problem takes a collection of input species trees and seeks a species tree (supertree) that maximizes the number of triplet subtrees that it shares with the input trees. However, the utility of this supertree problem has been limited by a lack of efficient and effective heuristics.
We introduce fast hill-climbing heuristics for the triplet supertree problem that perform a step-wise search of the tree space, where each step is guided by an exact solution to an instance of a local search problem. To realize time efficient heuristics we designed the first nontrivial algorithms for two standard search problems, which greatly improve on the time complexity to the best known (naïve) solutions by a factor of n and n2 (the number of taxa in the supertree). These algorithms enable large-scale supertree analyses based on the triplet supertree problem that were previously not possible. We implemented hill-climbing heuristics that are based on our new algorithms, and in analyses of two published supertree data sets, we demonstrate that our new heuristics outperform other standard supertree methods in maximizing the number of triplets shared with the input trees.
With our new heuristics, the triplet supertree problem is now computationally more tractable for large-scale supertree analyses, and it provides a potentially more accurate alternative to existing supertree methods.
PMCID: PMC2648750  PMID: 19208181
11.  An Application of Supertree Methods to Mammalian Mitogenomic Sequences 
Two different approaches can be used in phylogenomics: combined or separate analysis. In the first approach, different datasets are combined in a concatenated supermatrix. In the second, datasets are analyzed separately and the phylogenetic trees are then combined in a supertree. The supertree method is an interesting alternative to avoid missing data, since datasets that are analyzed separately do not need to represent identical taxa. However, the supertree approach and the corresponding consensus methods have been highly criticized for not providing valid phylogenetic hypotheses. In this study, congruence of trees estimated by consensus and supertree approaches were compared to model trees obtained from a combined analysis of complete mitochondrial sequences of 102 species representing 93 mammal families. The consensus methods produced poorly resolved consensus trees and did not perform well, except for the majority rule consensus with compatible groupings. The weighted supertree and matrix representation with parsimony methods performed equally well and were highly congruent with the model trees. The most similar supertree method was the least congruent with the model trees. We conclude that some of the methods tested are worth considering in a phylogenomic context.
PMCID: PMC2880846  PMID: 20535231
combined analysis; consensus; DNA sequences; phylogenomics; separate analysis; supermatrix
12.  A fungal phylogeny based on 42 complete genomes derived from supertree and combined gene analysis 
To date, most fungal phylogenies have been derived from single gene comparisons, or from concatenated alignments of a small number of genes. The increase in fungal genome sequencing presents an opportunity to reconstruct evolutionary events using entire genomes. As a tool for future comparative, phylogenomic and phylogenetic studies, we used both supertrees and concatenated alignments to infer relationships between 42 species of fungi for which complete genome sequences are available.
A dataset of 345,829 genes was extracted from 42 publicly available fungal genomes. Supertree methods were employed to derive phylogenies from 4,805 single gene families. We found that the average consensus supertree method may suffer from long-branch attraction artifacts, while matrix representation with parsimony (MRP) appears to be immune from these. A genome phylogeny was also reconstructed from a concatenated alignment of 153 universally distributed orthologs. Our MRP supertree and concatenated phylogeny are highly congruent. Within the Ascomycota, the sub-phyla Pezizomycotina and Saccharomycotina were resolved. Both phylogenies infer that the Leotiomycetes are the closest sister group to the Sordariomycetes. There is some ambiguity regarding the placement of Stagonospora nodurum, the sole member of the class Dothideomycetes present in the dataset.
Within the Saccharomycotina, a monophyletic clade containing organisms that translate CTG as serine instead of leucine is evident. There is also strong support for two groups within the CTG clade, one containing the fully sexual species Candida lusitaniae, Candida guilliermondii and Debaryomyces hansenii, and the second group containing Candida albicans, Candida dubliniensis, Candida tropicalis, Candida parapsilosis and Lodderomyces elongisporus. The second major clade within the Saccharomycotina contains species whose genomes have undergone a whole genome duplication (WGD), and their close relatives. We could not confidently resolve whether Candida glabrata or Saccharomyces castellii lies at the base of the WGD clade.
We have constructed robust phylogenies for fungi based on whole genome analysis. Overall, our phylogenies provide strong support for the classification of phyla, sub-phyla, classes and orders. We have resolved the relationship of the classes Leotiomyctes and Sordariomycetes, and have identified two classes within the CTG clade of the Saccharomycotina that may correlate with sexual status.
PMCID: PMC1679813  PMID: 17121679
13.  PhySIC_IST: cleaning source trees to infer more informative supertrees 
BMC Bioinformatics  2008;9:413.
Supertree methods combine phylogenies with overlapping sets of taxa into a larger one. Topological conflicts frequently arise among source trees for methodological or biological reasons, such as long branch attraction, lateral gene transfers, gene duplication/loss or deep gene coalescence. When topological conflicts occur among source trees, liberal methods infer supertrees containing the most frequent alternative, while veto methods infer supertrees not contradicting any source tree, i.e. discard all conflicting resolutions. When the source trees host a significant number of topological conflicts or have a small taxon overlap, supertree methods of both kinds can propose poorly resolved, hence uninformative, supertrees.
To overcome this problem, we propose to infer non-plenary supertrees, i.e. supertrees that do not necessarily contain all the taxa present in the source trees, discarding those whose position greatly differs among source trees or for which insufficient information is provided. We detail a variant of the PhySIC veto method called PhySIC_IST that can infer non-plenary supertrees. PhySIC_IST aims at inferring supertrees that satisfy the same appealing theoretical properties as with PhySIC, while being as informative as possible under this constraint. The informativeness of a supertree is estimated using a variation of the CIC (Cladistic Information Content) criterion, that takes into account both the presence of multifurcations and the absence of some taxa. Additionally, we propose a statistical preprocessing step called STC (Source Trees Correction) to correct the source trees prior to the supertree inference. STC is a liberal step that removes the parts of each source tree that significantly conflict with other source trees. Combining STC with a veto method allows an explicit trade-off between veto and liberal approaches, tuned by a single parameter.
Performing large-scale simulations, we observe that STC+PhySIC_IST infers much more informative supertrees than PhySIC, while preserving low type I error compared to the well-known MRP method. Two biological case studies on animals confirm that the STC preprocess successfully detects anomalies in the source trees while STC+PhySIC_IST provides well-resolved supertrees agreeing with current knowledge in systematics.
The paper introduces and tests two new methodologies, PhySIC_IST and STC, that demonstrate the interest in inferring non-plenary supertrees as well as preprocessing the source trees. An implementation of the methods is available at: .
PMCID: PMC2576265  PMID: 18834542
14.  Fast and Consistent Estimation of Species Trees Using Supermatrix Rooted Triples 
Molecular Biology and Evolution  2009;27(3):552-569.
Concatenated sequence alignments are often used to infer species-level relationships. Previous studies have shown that analysis of concatenated data using maximum likelihood (ML) can produce misleading results when loci have differing gene tree topologies due to incomplete lineage sorting. Here, we develop a polynomial time method that utilizes the modified mincut supertree algorithm to construct an estimated species tree from inferred rooted triples of concatenated alignments. We term this method SuperMatrix Rooted Triple (SMRT) and use the notation SMRT-ML when rooted triples are inferred by ML. We use simulations to investigate the performance of SMRT-ML under Jukes–Cantor and general time-reversible substitution models for four- and five-taxon species trees and also apply the method to an empirical data set of yeast genes. We find that SMRT-ML converges to the correct species tree in many cases in which ML on the full concatenated data set fails to do so. SMRT-ML can be conservative in that its output tree is often partially unresolved for problematic clades. We show analytically that when the species tree is clocklike and mutations occur under the Cavender–Farris–Neyman substitution model, as the number of genes increases, SMRT-ML is increasingly likely to infer the correct species tree even when the most likely gene tree does not match the species tree. SMRT-ML is therefore a computationally efficient and statistically consistent estimator of the species tree when gene trees are distributed according to the multispecies coalescent model.
PMCID: PMC2877557  PMID: 19833741
phylogenetics; phylogenomics; anomaly zone; anomalous gene tree; statistical consistency; lineage sorting
15.  A supertree of Temnospondyli: cladogenetic patterns in the most species-rich group of early tetrapods 
As the most diverse group of early tetrapods, temnospondyls provide a unique opportunity to investigate cladogenetic patterns among basal limbed vertebrates. We present five species-level supertrees for temnospondyls, built using a variety of methods. The standard MRP majority rule consensus including minority components shows slightly greater resolution than other supertrees, and its shape matches well several currently accepted hypotheses of higher-level phylogeny for temnospondyls as a whole. Also, its node support is higher than those of other supertrees (except the combined standard plus Purvis MRP supertree). We explore the distribution of significant as well as informative changes (shifts) in branch splitting employing the standard MRP supertree as a reference, and discuss the temporal distribution of changes in time-sliced, pruned trees derived from this supertree. Also, we analyse those shifts that are most relevant to the end-Permian mass extinction. For the Palaeozoic, shifts occur almost invariably along branches that connect major Palaeozoic groups. By contrast, shifts in the Mesozoic occur predominantly within major groups. Numerous shifts bracket narrowly the end-Permian extinction, indicating not only rapid recovery and extensive diversification of temnospondyls over a short time period after the extinction event (possibly less than half a million years), but also the role of intense cladogenesis in the late part of the Permian (although this was counteracted by numerous ‘background’ extinctions).
PMCID: PMC2293949  PMID: 17925278
Temnospondyli; supertree; diversification; Stereospondyli; Permian; Triassic
16.  Total Evidence, Average Consensus and Matrix Representation with Parsimony: What a Difference Distances Make 
Matrix representation with parsimony (MRP) can be used to combine trees in the supertree or the consensus settings. However, despite its popularity, it is still unclear whether MRP is really a consensus method or whether it behaves more like the total evidence approach. Previous simulations have shown that it approximates total evidence trees, whereas other studies have depicted similarities with average consensus trees. In this paper, we assess the hypothesis that MRP is equally related to both approaches. We conducted a simulation study to evaluate the accuracy of total evidence with that or various consensus methods, including MRP. Our results show that the total evidence trees are not significantly more accurate than average consensus trees that accounts for branch lengths, but that both perform better than MRP trees in the consensus setting. The accuracy rate of all methods was similarly affected by the number of taxa, the number of partitions, and the heterogeneity of the data.
PMCID: PMC2674675  PMID: 19455197
Average consensus; character congruence; taxonomic congruence; matrix representation with parsimony; total evidence; simulation study
17.  The origins of species richness in the Hymenoptera: insights from a family-level supertree 
The order Hymenoptera (bees, ants, wasps, sawflies) contains about eight percent of all described species, but no analytical studies have addressed the origins of this richness at family-level or above. To investigate which major subtaxa experienced significant shifts in diversification, we assembled a family-level phylogeny of the Hymenoptera using supertree methods. We used sister-group species-richness comparisons to infer the phylogenetic position of shifts in diversification.
The supertrees most supported by the underlying input trees are produced using matrix representation with compatibility (MRC) (from an all-in and a compartmentalised analysis). Whilst relationships at the tips of the tree tend to be well supported, those along the backbone of the tree (e.g. between Parasitica superfamilies) are generally not. Ten significant shifts in diversification (six positive and four negative) are found common to both MRC supertrees. The Apocrita (wasps, ants, bees) experienced a positive shift at their origin accounting for approximately 4,000 species. Within Apocrita other positive shifts include the Vespoidea (vespoid wasps/ants containing 24,000 spp.), Anthophila + Sphecidae (bees/thread-waisted wasps; 22,000 spp.), Bethylidae + Chrysididae (bethylid/cuckoo wasps; 5,200 spp.), Dryinidae (dryinid wasps; 1,100 spp.), and Proctotrupidae (proctotrupid wasps; 310 spp.). Four relatively species-poor families (Stenotritidae, Anaxyelidae, Blasticotomidae, Xyelidae) have undergone negative shifts. There are some two-way shifts in diversification where sister taxa have undergone shifts in opposite directions.
Our results suggest that numerous phylogenetically distinctive radiations contribute to the richness of large clades. They also suggest that evolutionary events restricting the subsequent richness of large clades are common. Problematic phylogenetic issues in the Hymenoptera are identified, relating especially to superfamily validity (e.g. "Proctotrupoidea", "Mymarommatoidea"), and deeper apocritan relationships. Our results should stimulate new functional studies on the causes of the diversification shifts we have identified. Possible drivers highlighted for specific adaptive radiations include key anatomical innovations, the exploitation of rich host groups, and associations with angiosperms. Low richness may have evolved as a result of geographical isolation, specialised ecological niches, and habitat loss or competition.
PMCID: PMC2873417  PMID: 20423463
18.  A supertree approach to shorebird phylogeny 
Order Charadriiformes (shorebirds) is an ideal model group in which to study a wide range of behavioural, ecological and macroevolutionary processes across species. However, comparative studies depend on phylogeny to control for the effects of shared evolutionary history. Although numerous hypotheses have been presented for subsets of the Charadriiformes none to date include all recognised species. Here we use the matrix representation with parsimony method to produce the first fully inclusive supertree of Charadriiformes. We also provide preliminary estimates of ages for all nodes in the tree.
Three main lineages are revealed: i) the plovers and allies; ii) the gulls and allies; and iii) the sandpipers and allies. The relative position of these clades is unresolved in the strict consensus tree but a 50% majority-rule consensus tree indicates that the sandpiper clade is sister group to the gulls and allies whilst the plover group is placed at the base of the tree. The overall topology is highly consistent with recent molecular hypotheses of shorebird phylogeny.
The supertree hypothesis presented herein is (to our knowledge) the only complete phylogenetic hypothesis of all extant shorebirds. Despite concerns over the robustness of supertrees (see Discussion), we believe that it provides a valuable framework for testing numerous evolutionary hypotheses relating to the diversity of behaviour, ecology and life-history of the Charadriiformes.
PMCID: PMC515296  PMID: 15329156
19.  The Supertree Tool Kit 
BMC Research Notes  2010;3:95.
Large phylogenies are crucial for many areas of biological research. One method of creating such large phylogenies is the supertree method, but creating supertrees containing thousands of taxa, and hence providing a comprehensive phylogeny, requires hundred or even thousands of source input trees. Managing and processing these data in a systematic and error-free manner is challenging and will become even more so as supertrees contain ever increasing numbers of taxa. Protocols for processing input source phylogenies have been proposed to ensure data quality, but no robust software implementations of these protocols as yet exist.
The aim of the Supertree Tool Kit (STK) is to aid in the collection, storage and processing of input source trees for use in supertree analysis. It is therefore invaluable when creating supertrees containing thousands of taxa and hundreds of source trees. The STK is a Perl module with executable scripts to carry out various steps in the processing protocols. In order to aid processing we have added meta-data, via XML, to each tree which contains information such as the bibliographic source information for the tree and how the data were derived, for instance the character data used to carry out the original analysis. These data are essential parts of previously proposed protocols.
The STK is a bioinformatics tool designed to make it easier to process source phylogenies for inclusion in supertree analysis from hundreds or thousands of input source trees, whilst reducing potential errors and enabling easy sharing of such datasets. It has been successfully used to create the largest known supertree to date containing over 5000 taxa from over 700 source phylogenies.
PMCID: PMC2872655  PMID: 20377857
20.  Many hexapod groups originated earlier and withstood extinction events better than previously realized: inferences from supertrees 
Comprising over half of all described species, the hexapods are central to understanding the evolution of global biodiversity. Direct fossil evidence suggests that new hexapod orders continued to originate from the Jurassic onwards, and diversity is presently higher than ever. Previous studies also suggest that several shifts in net diversification rate have occurred at higher taxonomic levels. However, their inferred timing is phylogeny dependent. We re-examine these issues using the supertree approach to provide, to our knowledge, the first composite estimates of hexapod order-level phylogeny. The Purvis matrix representation with parsimony method provides the most optimal supertree, but alternative methods are considered. Inferring ghost ranges shows richness of terminal lineages in the order-level phylogeny to peak just before the end-Permian extinction, rather than the present day, indicating that at least 11 more lineages survived this extinction than implied by fossils alone. The major upshift in diversification is associated with the origin of wings/wing folding and for the first time, to our knowledge, significant downshifts are shown associated with the origin of species-poor taxa (e.g. Neuropterida, Zoraptera). Polyneopteran phylogeny, especially the position of Zoraptera, remains important resolve because this influences findings regarding shifts in diversification. Our study shows how combining fossil with phylogenetic information can improve macroevolutionary inferences.
PMCID: PMC2871844  PMID: 20129983
fossil record; key innovations; insect diversity; macroevolution; mass extinction; supertree
21.  Reweaving the Tapestry: a Supertree of Birds 
PLoS Currents  2014;6:ecurrents.tol.c1af68dda7c999ed9f1e4b2d2df7a08e.
Our knowledge of the avian tree of life remains uncertain, particularly at deeper levels due to the rapid diversification early in their evolutionary history. They are the most abundant land vertebrate on the planet and have been of great historical interest to systematists. Birds are also economically and ecologically important and as a result are intensively studied, yet despite their importance and interest to humans around 13% of taxa currently on the endangered species list perhaps as a result of human activity. Despite all this no comprehensive phylogeny that includes both extinct and extant species currently exists. Here we present a species-level supertree, constructed using the Matrix Representation with Parsimony method, of Aves containing approximately two thirds of all species from nearly 1000 source phylogenies with a broad taxonomic coverage. The source data for the tree were collected and processed according to a strict protocol to ensure robust and accurate data handling. The resulting tree topology is largely consistent with molecular hypotheses of avian phylogeny. We identify areas that are in broad agreement with current views on avian systematics and also those that require further work. We also highlight the need for leaf-based support measures to enable the identification of rogue taxa in supertrees. This is a first attempt at a supertree of both extinct and extant birds, it is not intended to be utilised in an overhaul of avian systematics or as a basis for taxonomic re-classification but provides a strong basis on which to base further studies on macroevolution, conservation, biodiversity, comparative biology and character evolution, in particular the inclusion of fossils will allow the study of bird evolution and diversification throughout deep time.
PMCID: PMC4055607  PMID: 24944845
22.  A higher-level MRP supertree of placental mammals 
The higher-level phylogeny of placental mammals has long been a phylogenetic Gordian knot, with disagreement about both the precise contents of, and relationships between, the extant orders. A recent MRP supertree that favoured 'outdated' hypotheses (notably, monophyly of both Artiodactyla and Lipotyphla) has been heavily criticised for including low-quality and redundant data. We apply a stringent data selection protocol designed to minimise these problems to a much-expanded data set of morphological, molecular and combined source trees, to produce a supertree that includes every family of extant placental mammals.
The supertree is well-resolved and supports both polyphyly of Lipotyphla and paraphyly of Artiodactyla with respect to Cetacea. The existence of four 'superorders' – Afrotheria, Xenarthra, Laurasiatheria and Euarchontoglires – is also supported. The topology is highly congruent with recent (molecular) phylogenetic analyses of placental mammals, but is considerably more comprehensive, being the first phylogeny to include all 113 extant families without making a priori assumptions of suprafamilial monophyly. Subsidiary analyses reveal that the data selection protocol played a key role in the major changes relative to a previously published higher-level supertree of placentals.
The supertree should provide a useful framework for hypothesis testing in phylogenetic comparative biology, and supports the idea that biogeography has played a crucial role in the evolution of placental mammals. Our results demonstrate the importance of minimising poor and redundant data when constructing supertrees.
PMCID: PMC1654192  PMID: 17101039
23.  Fossil gaps inferred from phylogenies alter the apparent nature of diversification in dragonflies and their relatives 
The fossil record has suggested that clade growth may differ in marine and terrestrial taxa, supporting equilibrial models in the former and expansionist models in the latter. However, incomplete sampling may bias findings based on fossil data alone. To attempt to correct for such bias, we assemble phylogenetic supertrees on one of the oldest clades of insects, the Odonatoidea (dragonflies, damselflies and their extinct relatives), using MRP and MRC. We use the trees to determine when, and in what clades, changes in taxonomic richness have occurred. We then test whether equilibrial or expansionist models are supported by fossil data alone, and whether findings differ when phylogenetic information is used to infer gaps in the fossil record.
There is broad agreement in family-level relationships between both supertrees, though with some uncertainty along the backbone of the tree regarding dragonflies (Anisoptera). "Anisozygoptera" are shown to be paraphyletic when fossil information is taken into account. In both trees, decreases in net diversification are associated with species-poor extant families (Neopetaliidae, Hemiphlebiidae), and an upshift is associated with Calopterygidae + Polythoridae. When ghost ranges are inferred from the fossil record, many families are shown to have much earlier origination dates. In a phylogenetic context, the number of family-level lineages is shown to be up to twice as high as the fossil record alone suggests through the Cretaceous and Cenozoic, and a logistic increase in richness is detected in contrast to an exponential increase indicated by fossils alone.
Our analysis supports the notion that taxa, which appear to have diversified exponentially using fossil data, may in fact have diversified more logistically. This in turn suggests that one of the major apparent differences between the marine and terrestrial fossil record may simply be an artifact of incomplete sampling. Our results also support previous notions that adult colouration plays an important role in odonate radiation, and that Anisozygoptera should be grouped in a single inclusive taxon with Anisoptera, separate from Zygoptera.
PMCID: PMC3179963  PMID: 21917167
24.  A genus-level supertree of the Dinosauria. 
One of the ultimate aims of systematics is the reconstruction of the tree of life. This is a huge undertaking that is inhibited by the existence of a computational limit to the inclusiveness of phylogenetic analyses. Supertree methods have been developed to overcome, or at least to go around this problem by combining smaller, partially overlapping cladograms. Here, we present a very inclusive generic-level supertree of Dinosauria (covering a total of 277 genera), which is remarkably well resolved and provides some clarity in many contentious areas of dinosaur systematics.
PMCID: PMC1690971  PMID: 12028774
25.  Characterizing compatibility and agreement of unrooted trees via cuts in graphs 
Deciding whether there is a single tree —a supertree— that summarizes the evolutionary information in a collection of unrooted trees is a fundamental problem in phylogenetics. We consider two versions of this question: agreement and compatibility. In the first, the supertree is required to reflect precisely the relationships among the species exhibited by the input trees. In the second, the supertree can be more refined than the input trees.
Testing for compatibility is an NP-complete problem; however, the problem is solvable in polynomial time when the number of input trees is fixed. Testing for agreement is also NP-complete, but it is not known whether it is fixed-parameter tractable. Compatibility can be characterized in terms of the existence of a specific kind of triangulation in a structure known as the display graph. Alternatively, it can be characterized as a chordal graph sandwich problem in a structure known as the edge label intersection graph. No characterization of agreement was known.
We present a simple and natural characterization of compatibility in terms of minimal cuts in the display graph, which is closely related to compatibility of splits. We then derive a characterization for agreement.
Explicit characterizations of tree compatibility and agreement are essential to finding practical algorithms for these problems. The simplicity of the characterizations presented here could help to achieve this goal.
PMCID: PMC4013835  PMID: 24742332
Phylogenies; Supertrees; Compatibility; Agreement; Cuts in graphs; Chordal graphs

Results 1-25 (610520)