We analyzed six nuclear genes (AATS, CAD, TPI, SNF, PGD
, and RNA POL II
), comprising 5736 base pairs (bp), to infer the phylogeny of 29 species representing all 11 holometabolous orders and two hemimetabolous insect outgroups (see Table ). ML and Bayesian (BI) analyses yielded congruent trees with high posterior probabilities and mixed bootstrap values (Figures and ). All orders were found to be monophyletic, including Mecoptera with Siphonaptera as its sister group. Hymenoptera are the basal-most branching lineage, concordant with the phylogenomic findings of Savard et al
]. The enigmatic Strepsiptera are unequivocally placed as the sister group to Coleoptera, providing additional evidence for the traditional morphological placement of the twisted-wing parasites. In accordance with previous morphological and molecular hypotheses, our study finds Holometabola to be divided into two major lineages, Neuropteroidea and Mecopterida. Within these two lineages, the traditional respective supra-ordinal groupings are recovered; Neuropteroidea includes Coleoptera, Strepsiptera, and Neuropterida (Neuroptera, Megaloptera, and Raphidioptera), and Mecopterida includes Amphiesmenoptera (Lepidoptera and Trichoptera) + Antliophora (Diptera, Mecoptera, and Siphonaptera).
Genes sampled for Holometabola and out-groups.
Figure 1 The phylogeny of holometabolous insects with divergence time estimates. Posterior probabilities/maximum-likelihood bootstrap values are shown at each node. Error bars reflect the 95% confidence interval surrounding each date of divergence. NEU = Neuropterida; (more ...)
Figure 2 The congruent maximum-likelihood and Bayesian topology. Maximum-likelihood branch lengths, posterior probabilities are shown above and maximum-likelihood bootstrap values below. Although one strepsipteran in the family Halictophagidae has an exceptionally (more ...)
It appears that the use of nuclear protein-coding genes, six in our study and 185 in Savard et al
], has brought decisive and robust results to the previously obscured phylogenetic placement of Hymenoptera. Most previous morphological hypotheses favored a sister group relationship between Hymenoptera and Mecopterida, although strong supporting evidence was lacking [7
]. Mitochondrial genomes also favor the Hymenoptera + Mecopterida relationship, although not definitively, as the authors suggest that another 'plausible alternative placement is at the base of Holometabola' [13
]. 18S rDNA paradoxically supports both previously mentioned and novel Hymenoptera hypotheses depending on alignment strategy and taxon sampling [5
]. Our results constitute the tipping point of the compounding evidence (extensive sample of nuclear genes, fossil evidence, wing characters, and introns of elongation factor 1-alpha) that Hymenoptera are the earliest branching lineage of the holometabolan radiation [14
Currently, the hypothesis that fleas are actually members of the scorpionfly order Mecoptera has gained wide acceptance [5
]. Analyses based on morphology, ribosomal and mitochondrial DNA have strongly supported the collapse of the Siphonaptera and their inclusion within the Mecoptera as the sister group to the wingless family of snow scorpionflies, Boreidae [5
]. Our data provide no indication of a close relationship between fleas and boreids. We found the traditional grouping of Mecoptera, with the exclusion of the fleas, to be highly supported in our analyses. No variation of taxon sampling, character inclusion, or methodology resulted in the placement of the fleas within Mecoptera. Our results suggest that the morphological characters grouping the fleas and the boreids, such as wing reduction and characters of oogenesis, be further investigated [7
The controversial placement of Strepsiptera has been the subject of much debate, particularly in regard to whether strepsipterans are affected by a methodological artifact known as long-branch attraction (LBA). LBA is an analytical phenomenon in phylogenetic studies in which rapidly evolving sequences cluster counter to their true evolutionary history due to non-inherited similarity of rapidly accumulating mutations in independent lineages. Theoretical demonstrations of LBA identify it as a particularly difficult problem for parsimony analyses [42
] in which the interpretation of shared derived features are maximized as the basis for explaining common ancestry [44
]. Model-based approaches, such as ML and BI methods, make corrections for the increased chance of spurious grouping in these lineages by including information about the probability of specific changes along a branch of the tree into the analysis. However, molecular models are still widely considered to be under-developed and model-based methods can still be subject to long-branch grouping errors due to the unpredictability of evolutionary rates [46
Halteria, as supported by 18S rDNA, is often cited as the first empirical evidence for LBA and initiated the development and use of parametric simulation as a statistical test for detecting LBA [28
]. Both flies and strepsipterans have exhibited 'long' branches in previous 18S analyses. Similarly, in our current study one strepsipteran has a uniquely long branch, and the taxon with the next longest branch is the coleopteran Tribolium
. To address the possibility that in our analyses the Strepsiptera + Coleoptera relationship is a spurious artifact due to LBA, we thoroughly examined our data and modified our analyses to detect and potentially rectify effects of LBA.
Although LBA is a well-documented phenomenon, its precise detection is a challenge [28
]. Currently, the retrieval of conflicting results from maximum parsimony (MP) and ML, parametric simulation, and the visualization of conflict in a dataset can all provide suggestive evidence that LBA may be affecting an analysis [48
]. Our parsimony trees agree with the topology generated by both ML and BI, a finding not suggestive of LBA.
Parametric simulation, a method developed by Huelsenbeck [29
] to test the rDNA-based Halteria findings for LBA, can provide statistical support that branches are long enough to attract. In a procedure similar to a parametric bootstrap, simulated datasets are generated according to a tree in which taxa with elevated rates of evolution are separated in the topology; in this case, the strepsipterans are separated from the coleopterans and constrained to the base of Holometabola. The simulated datasets are then analyzed to determine whether the putative long-branched taxa will cluster counter to their placement in the tree on which the data were simulated. If Strepsiptera and Coleoptera consistently form a clade in analyses of the simulated datasets, we would conclude that grouping to be the result of LBA. None of our 100 ML analyses of the simulated data resulted in the attraction of long-branched strepsipterans and coleopterans to each other. This finding signifies that in our dataset, in contrast to the original rDNA data, there is no statistical evidence to suggest that the rates of evolution in the strepsipteran and coleopteran branches are sufficiently elevated to attract each other, counter to their accurate (simulated) evolutionary placement.
In contrast to other methods that are implemented post-analysis, visualizing conflict in a dataset can be used to identify the potential for LBA prior to analysis [50
]. A dataset likely to be affected by LBA should exhibit conflicting signal supporting both the artifactual relationship and the actual evolutionary relationship. We utilized two visualization methods, likelihood mapping and neighbor-nets, and our results were not definitive. Likelihood mapping, a quartet puzzling method, showed little conflict (revealed by only 0.4% of unresolved quartets while 10% to 15% is considered high) (Figure ). However, our neighbor-net analysis, a network showing all compatible and incompatible splits, did show conflicting signal throughout our dataset (Figure ). The conflicting splits exist across many regions of the tree, not just regarding Strepsiptera, indicating that there is no reason to suspect LBA in regards to Strepsiptera more than other clades. Yet when a network including Strepsiptera is directly compared with a network with Strepsiptera excluded, it is evident that the conflict in this dataset is substantially alleviated by the absence of the strepsipterans, particularly in respect to the reticulation at the base of Diptera. This is not a clear sign of LBA, but it does suggest that there is conflicting support for the placement of Strepsiptera and their relationship to Diptera.
Figure 3 Conflict visualization using likelihood mapping in Tree Puzzle. (a) The tips of the triangle are considered 'basins of attraction' that contain the likelihoods of the percentage of quartets that are fully resolved. The center of the triangle represents (more ...)
Figure 4 Neighbor-nets showing conflicting splits when all taxa are included compared with when Strepsiptera are excluded. The decreased level of conflict in the dataset exhibited when the fast-evolving Strepsiptera are excluded may be considered indicative of (more ...)
To explore further the potential for LBA identified by the neighbor-net, we utilized a four- cluster likelihood mapping analysis to again visualize the degree of conflicting signal regarding the placement of Strepsiptera. We divided the taxa into four clusters: (1) Neuropteroidea (which includes Coleoptera); (2) Mecopterida (which includes Diptera); (3) Hymenoptera; (4) Strepsiptera. The possible relationships between these four clades generate three possible topologies, each represented by a tip of the triangle. This quartet puzzling method plots the probability of each possible quartet closest to the topology that it favors. Each region of the triangle or 'basin of attraction' contains a percentage of quartets that support a particular topology. This analysis again reveals the conflicting signal in our dataset and shows that we have signal supporting all three hypotheses regarding the placement of Strepsiptera, with slightly more support in this analysis for a close relationship of Strepsiptera and Mecopterida (including the flies) (Figure ).
Though our concatenated dataset clearly results in the placement of Strepsiptera with Coleoptera in MP, ML, and BI, there is evidence that some signal supports a closer relationship between Strepsiptera and Diptera. To determine the source of this conflicting signal, we examined ML analyses of the six individual gene trees. Data contributing phylogenetic information for the placement of Strepsiptera is available for five out of six genes, and three out of those five genes place Strepsiptera within the close vicinity of Coleoptera or Neuropterida. The gene tree for CAD
, however, recovers Halteria, with Strepsiptera as the sister group to Diptera. At 2000 bp, CAD
is the longest gene in the dataset and in recent years has become a staple for resolving Mesozoic-age divergences among flies. The topology of the CAD
ML tree reveals that Diptera and Strepsiptera all have the longest branches in the tree, similar to the initial 18S findings, suggesting the possibility that LBA may play a role in the CAD
recovery of Halteria. It has been hypothesized that Diptera have experienced accelerated evolution in comparison to other insects [52
], and by observing their long branches in various datasets we can surmise that Strepsiptera may have as well. Rapid evolution in specific loci, such as 18S and CAD
, could lead to LBA and the erroneous grouping of Diptera and Strepsiptera. The reliance on a single locus for phylogenetic resolution, though useful in some circumstances, can clearly result in inaccurate conclusions. No single gene in our dataset recovers our well-supported phylogeny that is congruent to morphological hypotheses. Our phylogeny relies on the concatenation of all six genes to overcome the misleading signal in CAD
placing Strepsiptera as the sister group to Diptera.
Our findings are robust over multiple phylogenetic methods intended to counter LBA including: the removal of third positions, RY coding of first and third positions, the removal of out-groups and long branches, and the use of a conservative alignment (with fast evolving positions removed by the program Gblocks [49
] (Table ). In light of the fact that our many attempts to identify or ameliorate LBA did not result in a positive detection of LBA or a change in our results, we concluded that the Strepsiptera + Coleoptera relationship is not a clear case of systematic error due to LBA. Our study is the first to rely on multiple genes to re-address the placement of Strepsiptera and our robust findings should reignite the debate regarding the morphologically dissimilar orders Strepsiptera and Diptera as sister groups. In light of our findings, upcoming work involving much larger genomic datasets (S. Longhorn, pers. comm.), and the re-examination of existing morphological characters shared by strepsipterans and beetles [7
], we anticipate that the phylogenetic placement of Strepsiptera will cease to be considered the most controversial issue in holometabolan phylogenetics.
Clade recovery results from maximum-likelihood analyses with varied taxon and character inclusion used to counter long-branch attraction.
We used the concatenated nucleotide sequence data for all six genes, the BI phylogenetic tree topology of Figure , and several fossil-based minimum age constraints to estimate divergence times for major holometabolous lineages using the relaxed clock BI method implemented in the programs Estbranches and Multidivtime [53
]. Congruent to the findings of Gaunt and Miles [35
], multiple nuclear genes place the origin of Holometabola around 355 Ma, within the Carboniferous, but substantially earlier than traditional estimates and older than any clearly assignable holometabolan fossil (Figure ).
As the earliest branching lineage in the phylogeny, the Hymenoptera originate just after the mean estimated age for Holometabola. This date is considerably older than existing fossil estimates (an increasingly common feature of most molecule-based divergence time estimates), a pattern suggesting either an incomplete fossil record, biases in parameter choice, model mis-specification, or some combination of these [54
]. The split between the two major sub-clades Neuropteroidea and Mecopterida took place just within the Permian (300 Ma), with the Amphiesmenoptera/Antliophora diverging at 284 Ma. The origins of the extant holometabolous orders (excluding the Hymenoptera) appear to have occurred in relatively rapid succession, with dates of origin falling in the range of approximately 274 Ma to 213 Ma; the earliest divergences were the Coleoptera/Strepsiptera (274 Ma), while the most recent were the Raphidioptera and Megaloptera, splitting at 213 Ma. According to current evidence, Diptera and Mecoptera + Siphonaptera last shared a recent common ancestor approximately 256 Ma. Though some of our findings for the mean age of origin do not precisely correspond with traditional ages based on fossils, most of the published fossil-based values do fall within the 95% posterior probability range interval for our molecule-based estimates. Additionally, the insect fossil record is currently dramatically expanding [1
], and thus, better fossil calibrations coupled with larger samples of genes and taxa, as well as improved analytical methods, should continue to sharpen divergence time estimates for the major holometabolous clades.
Molecular divergence time estimates and fossils agree that Holometabola had its origins within the Paleozoic, most likely in the late Carboniferous. The subsequent origins of the extant orders (excluding Hymenoptera) took place primarily within the Triassic, with primary splits occurring at the end of the Permian, and with the crown group diversification of many orders beginning in the early Jurassic. Most explanations for the enormous species diversity of holometabolous insect clades that have dominated the earth's terrestrial ecosystems since the Jurassic feature 'key innovations', such as adaptations associated with feeding on vascular plants, separation of adult, larval, and pupal stages, or morphological developments like the 'wasp waist', beetle elytron, and fly puparium [8
]. Ultimately, these disparate adaptations seem to have had similar macro-evolutionary effects repeated widely across holometabolous groups; they allowed specific clades to exploit the resources provided by an increasingly complex environment and rapidly speciate in an expanding arena of biological interactions, undoubtedly propelled in many cases by flowering plant diversification [57
]. The testing of key innovation hypotheses to explain the prodigious diversity of holometabolous insects remains a major task of insect phylogenetic research [33
], but extreme diversity has made it difficult to resolve phylogenetic relationships among the major lineages. Consequently, conflicting lines of evidence will continue to make holometabolan phylogeny one of the most important, and revisited questions in insect phylogenetics.