The 1,000 plants (1KP) project is an international multi-disciplinary consortium that has generated transcriptome data from over 1,000 plant species, with exemplars for all of the major lineages across the Viridiplantae (green plants) clade. Here, we describe how to access the data used in a phylogenomics analysis of the first 85 species, and how to visualize our gene and species trees. Users can develop computational pipelines to analyse these data, in conjunction with data of their own that they can upload. Computationally estimated protein-protein interactions and biochemical pathways can be visualized at another site. Finally, we comment on our future plans and how they fit within this scalable system for the dissemination, visualization, and analysis of large multi-species data sets.
Viridiplantae; Biodiversity; Transcriptomes; Phylogenomics; Interactions; Pathways
Xenoturbellida and Acoelomorpha are marine worms with contentious ancestry. Both were originally associated with the flatworms (Platyhelminthes), but molecular data haverevised their phylogenetic positions, generally linking Xenoturbellida to the deuterostomes1,2 and positioning the Acoelomorpha as the most basally branching bilaterian group(s)3–6. Recent phylogenomic data suggested that Xenoturbellida and Acoelomorpha are sister taxa and together constitute an early branch of Bilateria7. Here we assemble three independent data sets—mitochondrial genes, a phylogenomic data set of 38,330 amino-acid positions and new microRNA (miRNA) complements—and show that the position of Acoelomorpha is strongly affected by a long-branch attraction (LBA) artefact. When we minimize LBA we find consistent support for a position of both acoelomorphs and Xenoturbella within the deuterostomes. The most likely phylogeny links Xenoturbella and Acoelomorpha in a clade we call Xenacoelomorpha. The Xenacoelomorpha is the sister group of the Ambulacraria (hemichordates and echinoderms). We show that analyses of miRNA complements8 have been affected by character loss in the acoels and that both groups possess one miRNA and the gene Rsb66 otherwise specific to deuterostomes. In addition, Xenoturbella shares one miRNA with the ambulacrarians, and two with the acoels. This phylogeny makes sense of the shared characteristics of Xenoturbellida and Acoelomorpha, such as ciliary ultrastructure and diffuse nervous system, and implies the loss of various deuterostome characters in the Xenacoelomorpha including coelomic cavities, through gut and gill slits.
It was a zoological sensation when a living specimen of the coelacanth was first discovered in 1938, as this lineage of lobe-finned fish was thought to have gone extinct 70 million years ago. The modern coelacanth looks remarkably similar to many of its ancient relatives, and its evolutionary proximity to our own fish ancestors provides a glimpse of the fish that first walked on land. Here we report the genome sequence of the African coelacanth, Latimeria chalumnae. Through a phylogenomic analysis, we conclude that the lungfish, and not the coelacanth, is the closest living relative of tetrapods. Coelacanth protein-coding genes are significantly more slowly evolving than those of tetrapods, unlike other genomic features . Analyses of changes in genes and regulatory elements during the vertebrate adaptation to land highlight genes involved in immunity, nitrogen excretion and the development of fins, tail, ear, eye, brain, and olfaction. Functional assays of enhancers involved in the fin-to-limb transition and in the emergence of extra-embryonic tissues demonstrate the importance of the coelacanth genome as a blueprint for understanding tetrapod evolution.
Genomes of animals as different as sponges and humans show conservation of global architecture. Here we show that multiple genomic features including transposon diversity, developmental gene repertoire, physical gene order, and intron-exon organization are shattered in the tunicate Oikopleura, belonging to the sister group of vertebrates and retaining chordate morphology. Ancestral architecture of animal genomes can be deeply modified and may therefore be largely nonadaptive. This rapidly evolving animal lineage thus offers unique perspectives on the level of genome plasticity. It also illuminates issues as fundamental as the mechanisms of intron gain.
The intention of this editorial is to steer researchers through methodological choices in molecular evolution, drawing on the combined expertise of the authors. Our aim is not to review the most advanced methods for a specific task. Rather, we define several general guidelines to help with methodology choices at different stages of a typical phylogenetic ‘pipeline’. We are not able to provide exhaustive citation of a literature that is vast and plentiful, but we point the reader to a set of classical textbooks that reflect the state-of-the-art. We do not wish to appear overly critical of outdated methodology but rather provide some practical guidance on the sort of issues which should be considered. We stress that a reported study should be well-motivated and evaluate a specific hypothesis or scientific question. However, a publishable study should not be merely a compilation of available sequences for a protein family of interest followed by some standard analyses, unless it specifically addresses a scientific hypothesis or question. The rapid pace at which sequence data accumulate quickly outdates such publications. Although clearly, discoveries stemming from data mining, reports of new tools and databases and review papers are also desirable.
Cnidaria (corals, sea anemones, hydroids, jellyfish) is a phylum of relatively simple aquatic animals characterized by the presence of the cnidocyst: a cell containing a giant capsular organelle with an eversible tubule (cnida). Species within Cnidaria have life cycles that involve one or both of the two distinct body forms, a typically benthic polyp, which may or may not be colonial, and a typically pelagic mostly solitary medusa. The currently accepted taxonomic scheme subdivides Cnidaria into two main assemblages: Anthozoa (Hexacorallia + Octocorallia) – cnidarians with a reproductive polyp and the absence of a medusa stage – and Medusozoa (Cubozoa, Hydrozoa, Scyphozoa, Staurozoa) – cnidarians that usually possess a reproductive medusa stage. Hypothesized relationships among these taxa greatly impact interpretations of cnidarian character evolution.
We expanded the sampling of cnidarian mitochondrial genomes, particularly from Medusozoa, to reevaluate phylogenetic relationships within Cnidaria. Our phylogenetic analyses based on a mitochogenomic dataset support many prior hypotheses, including monophyly of Hexacorallia, Octocorallia, Medusozoa, Cubozoa, Staurozoa, Hydrozoa, Carybdeida, Chirodropida, and Hydroidolina, but reject the monophyly of Anthozoa, indicating that the Octocorallia + Medusozoa relationship is not the result of sampling bias, as proposed earlier. Further, our analyses contradict Scyphozoa [Discomedusae + Coronatae], Acraspeda [Cubozoa + Scyphozoa], as well as the hypothesis that Staurozoa is the sister group to all the other medusozoans.
Cnidarian mitochondrial genomic data contain phylogenetic signal informative for understanding the evolutionary history of this phylum. Mitogenome-based phylogenies, which reject the monophyly of Anthozoa, provide further evidence for the polyp-first hypothesis. By rejecting the traditional Acraspeda and Scyphozoa hypotheses, these analyses suggest that the shared morphological characters in these groups are plesiomorphies, originated in the branch leading to Medusozoa. The expansion of mitogenomic data along with improvements in phylogenetic inference methods and use of additional nuclear markers will further enhance our understanding of the phylogenetic relationships and character evolution within Cnidaria.
Cnidaria; Medusozoa; Acraspeda; Anthozoa; mito-phylogenomics
While a unique origin of the euarthropods is well established, relationships between the four euarthropod classes—chelicerates, myriapods, crustaceans and hexapods—are less clear. Unsolved questions include the position of myriapods, the monophyletic origin of chelicerates, and the validity of the close relationship of euarthropods to tardigrades and onychophorans. Morphology predicts that myriapods, insects and crustaceans form a monophyletic group, the Mandibulata, which has been contradicted by many molecular studies that support an alternative Myriochelata hypothesis (Myriapoda plus Chelicerata). Because of the conflicting insights from published molecular datasets, evidence from nuclear-coding genes needs corroboration from independent data to define the relationships among major nodes in the euarthropod tree. Here, we address this issue by analysing two independent molecular datasets: a phylogenomic dataset of 198 protein-coding genes including new sequences for myriapods, and novel microRNA complements sampled from all major arthropod lineages. Our phylogenomic analyses strongly support Mandibulata, and show that Myriochelata is a tree-reconstruction artefact caused by saturation and long-branch attraction. The analysis of the microRNA dataset corroborates the Mandibulata, showing that the microRNAs miR-965 and miR-282 are present and expressed in all mandibulate species sampled, but not in the chelicerates. Mandibulata is further supported by the phylogenetic analysis of a comprehensive morphological dataset covering living and fossil arthropods, and including recently proposed, putative apomorphies of Myriochelata. Our phylogenomic analyses also provide strong support for the inclusion of pycnogonids in a monophyletic Chelicerata, a paraphyletic Cycloneuralia, and a common origin of Arthropoda (tardigrades, onychophorans and arthropods), suggesting that previous phylogenies grouping tardigrades and nematodes may also have been subject to tree-reconstruction artefacts.
arthropod; phylogeny; Mandibulata; microRNA
Contradicting the prejudice that endosymbiosis is a rare phenomenon, Husník and co-workers show in BMC Biology that bacterial endosymbiosis has occured several times independently during insect evolution. Rigorous phylogenetic analyses, in particular using complex models of sequence evolution and an original site removal procedure, allow this conclusion to be established after eschewing inference artefacts that usually plague the positioning of highly divergent endosymbiont genomic sequences.
See research article http://www.biomedcentral.com/1741-7007/9/87
In the eight years since phylogenomics was introduced as the intersection of genomics and phylogenetics, the field has provided fundamental insights into gene function, genome history and organismal relationships. The utility of phylogenomics is growing with the increase in the number and diversity of taxa for which whole genome and large transcriptome sequence sets are being generated. We assert that the synergy between genomic and phylogenetic perspectives in comparative biology would be enhanced by the development and refinement of minimal reporting standards for phylogenetic analyses. Encouraged by the development of the Minimum Information About a Microarray Experiment (MIAME) standard, we propose a similar roadmap for the development of a Minimal Information About a Phylogenetic Analysis (MIAPA) standard. Key in the successful development and implementation of such a standard will be broad participation by developers of phylogenetic analysis software, phylogenetic database developers, practitioners of phylogenomics, and journal editors.
The terrestrial habitat was colonized by the ancestors of modern land plants about 500 to 470 million years ago. Today it is widely accepted that land plants (embryophytes) evolved from streptophyte algae, also referred to as charophycean algae. The streptophyte algae are a paraphyletic group of green algae, ranging from unicellular flagellates to morphologically complex forms such as the stoneworts (Charales). For a better understanding of the evolution of land plants, it is of prime importance to identify the streptophyte algae that are the sister-group to the embryophytes. The Charales, the Coleochaetales or more recently the Zygnematales have been considered to be the sister group of the embryophytes However, despite many years of phylogenetic studies, this question has not been resolved and remains controversial.
Here, we use a large data set of nuclear-encoded genes (129 proteins) from 40 green plant taxa (Viridiplantae) including 21 embryophytes and six streptophyte algae, representing all major streptophyte algal lineages, to investigate the phylogenetic relationships of streptophyte algae and embryophytes. Our phylogenetic analyses indicate that either the Zygnematales or a clade consisting of the Zygnematales and the Coleochaetales are the sister group to embryophytes.
Our analyses support the notion that the Charales are not the closest living relatives of embryophytes. Instead, the Zygnematales or a clade consisting of Zygnematales and Coleochaetales are most likely the sister group of embryophytes. Although this result is in agreement with a previously published phylogenetic study of chloroplast genomes, additional data are needed to confirm this conclusion. A Zygnematales/embryophyte sister group relationship has important implications for early land plant evolution. If substantiated, it should allow us to address important questions regarding the primary adaptations of viridiplants during the conquest of land. Clearly, the biology of the Zygnematales will receive renewed interest in the future.
Model violations constitute the major limitation in inferring accurate phylogenies. Characterizing properties of the data that are not being correctly handled by current models is therefore of prime importance. One of the properties of protein evolution is the variation of the relative rate of substitutions across sites and over time, the latter is the phenomenon called heterotachy. Its effect on phylogenetic inference has recently obtained considerable attention, which led to the development of new models of sequence evolution. However, thus far focus has been on the quantitative heterogeneity of the evolutionary process, thereby overlooking more qualitative variations.
We studied the importance of variation of the site-specific amino-acid substitution process over time and its possible impact on phylogenetic inference. We used the CAT model to define an infinite mixture of substitution processes characterized by equilibrium frequencies over the twenty amino acids, a useful proxy for qualitatively estimating the evolutionary process. Using two large datasets, we show that qualitative changes in site-specific substitution properties over time occurred significantly. To test whether this unaccounted qualitative variation can lead to an erroneous phylogenetic tree, we analyzed a concatenation of mitochondrial proteins in which Cnidaria and Porifera were erroneously grouped. The progressive removal of the sites with the most heterogeneous CAT profiles across clades led to the recovery of the monophyly of Eumetazoa (Cnidaria+Bilateria), suggesting that this heterogeneity can negatively influence phylogenetic inference.
The time-heterogeneity of the amino-acid replacement process is therefore an important evolutionary aspect that should be incorporated in future models of sequence change.
Resolving the evolutionary relationships among Fungi remains challenging because of their highly variable evolutionary rates, and lack of a close phylogenetic outgroup. Nucleariida, an enigmatic group of amoeboids, have been proposed to emerge close to the fungal-metazoan divergence and might fulfill this role. Yet, published phylogenies with up to five genes are without compelling statistical support, and genome-level data should be used to resolve this question with confidence.
Our analyses with nuclear (118 proteins) and mitochondrial (13 proteins) data now robustly associate Nucleariida and Fungi as neighbors, an assemblage that we term 'Holomycota'. With Nucleariida as an outgroup, we revisit unresolved deep fungal relationships.
Our phylogenomic analysis provides significant support for the paraphyly of the traditional taxon Zygomycota, and contradicts a recent proposal to include Mortierella in a phylum Mucoromycotina. We further question the introduction of separate phyla for Glomeromycota and Blastocladiomycota, whose phylogenetic positions relative to other phyla remain unresolved even with genome-level datasets. Our results motivate broad sampling of additional genome sequences from these phyla.
Inferring the relationships among Bilateria has been an active and controversial research area since Haeckel. The lack of a sufficient number of phylogenetically reliable characters was the main limitation of traditional phylogenies based on morphology. With the advent of molecular data, this problem has been replaced by another one, statistical inconsistency, which stems from an erroneous interpretation of convergences induced by multiple changes. The analysis of alignments rich in both genes and species, combined with a probabilistic method (maximum likelihood or Bayesian) using sophisticated models of sequence evolution, should alleviate these two major limitations. We applied this approach to a dataset of 94 genes and 79 species using CAT, a previously developed model accounting for site-specific amino acid replacement patterns. The resulting tree is in good agreement with current knowledge: the monophyly of most major groups (e.g. Chordata, Arthropoda, Lophotrochozoa, Ecdysozoa, Protostomia) was recovered with high support. Two results are surprising and are discussed in an evo–devo framework: the sister-group relationship of Platyhelminthes and Annelida to the exclusion of Mollusca, contradicting the Neotrochozoa hypothesis, and, with a lower statistical support, the paraphyly of Deuterostomia. These results, in particular the status of deuterostomes, need further confirmation, both through increased taxonomic sampling, and future improvements of probabilistic models.
phylogenomics; deuterostomes; systematic error; taxon sampling
The evolutionary rate at a given homologous position varies across time. When sufficiently pronounced, this phenomenon – called heterotachy – may produce artefactual phylogenetic reconstructions under the commonly used models of sequence evolution. These observations have motivated the development of models that explicitly recognize heterotachy, with research directions proposed along two main axes: 1) the covarion approach, where sites switch from variable to invariable states; and 2) the mixture of branch lengths (MBL) approach, where alignment patterns are assumed to arise from one of several sets of branch lengths, under a given phylogeny.
Here, we report the first statistical comparisons contrasting the performance of covarion and MBL modeling strategies. Using simulations under heterotachous conditions, we explore the properties of three model comparison methods: the Akaike information criterion, the Bayesian information criterion, and cross validation. Although more time consuming, cross validation appears more reliable than AIC and BIC as it directly measures the predictive power of a model on 'future' data. We also analyze three large datasets (nuclear proteins of animals, mitochondrial proteins of mammals, and plastid proteins of plants), and find the optimal number of components of the MBL model to be two for all datasets, indicating that this model is preferred over the standard homogeneous model. However, the covarion model is always favored over the optimal MBL model.
We demonstrated, using three large datasets, that the covarion model is more efficient at handling heterotachy than the MBL model. This is probably due to the fact that the MBL model requires a serious increase in the number of parameters, as compared to two supplementary parameters of the covarion approach. Further improvements of the both the mixture and the covarion approaches might be obtained by modeling heterogeneous behavior both along time and across sites.
Acoel flatworms are small marine worms traditionally considered to belong to the phylum Platyhelminthes. However, molecular phylogenetic analyses suggest that acoels are not members of Platyhelminthes, but are rather extant members of the earliest diverging Bilateria. This result has been called into question, under suspicions of a long branch attraction (LBA) artefact. Here we re-examine this problem through a phylogenomic approach using 68 different protein-coding genes from the acoel Convoluta pulchra and 51 metazoan species belonging to 15 different phyla. We employ a mixture model, named CAT, previously found to overcome LBA artefacts where classical models fail. Our results unequivocally show that acoels are not part of the classically defined Platyhelminthes, making the latter polyphyletic. Moreover, they indicate a deuterostome affinity for acoels, potentially as a sister group to all deuterostomes, to Xenoturbellida, to Ambulacraria, or even to chordates. However, the weak support found for most deuterostome nodes, together with the very fast evolutionary rate of the acoel Convoluta pulchra, call for more data from slowly evolving acoels (or from its sister-group, the Nemertodermatida) to solve this challenging phylogenetic problem.
Hedgehog proteins are important cell–cell signalling proteins utilized during the development of multicellular animals. Members of the hedgehog gene family have not been detected outside the Metazoa, raising unanswered questions about their evolutionary origin. Here we report a highly unusual hedgehog-related gene from a choanoflagellate, a close unicellular relative of the animals. The deduced C-terminal domain, Hoglet-C, is homologous to the autocatalytic domain of Hedgehog proteins and is predicted to function in autocatalytic cleavage of the precursor peptide. In contrast, the N-terminal Hoglet-N peptide has no similarity to the signalling peptide of Hedgehog (Hh-N). Instead, Hoglet-N is deduced to be a secreted protein with an enormous threonine-rich domain of unprecedented size and purity (over 200 threonine residues) and two polysaccharide-binding domains. Structural modelling reveals that these domains have a novel combination of features found in cellulose-binding domains (CBD) of types IIa and IIb, and are expected to bind cellulose. We propose that the two CBD domains enable Hoglet-N to bind to plant matter, tethering an amorphous nucleophilic anchor, facilitating transient adhesion of the choanoflagellate cell. Since Hh-C and Hoglet-C are homologous, but Hh-N and Hoglet-N are not, we argue that metazoan hedgehog genes evolved by fusion of two distinct genes.
Monosiga; multicellularity; adhesion; hoglet; cellulose-binding domains
The First Phylogenomics Conference was held in Ste-Adèle (Québec, Canada) in March 2006. Selected papers appear in this special issue of BMC Evolutionary Biology. Here, we give an introduction to the field and provide an overview of the articles presented in this issue.
Thanks to the large amount of signal contained in genome-wide sequence alignments, phylogenomic analyses are converging towards highly supported trees. However, high statistical support does not imply that the tree is accurate. Systematic errors, such as the Long Branch Attraction (LBA) artefact, can be misleading, in particular when the taxon sampling is poor, or the outgroup is distant. In an otherwise consistent probabilistic framework, systematic errors in genome-wide analyses can be traced back to model mis-specification problems, which suggests that better models of sequence evolution should be devised, that would be more robust to tree reconstruction artefacts, even under the most challenging conditions.
We focus on a well characterized LBA artefact analyzed in a previous phylogenomic study of the metazoan tree, in which two fast-evolving animal phyla, nematodes and platyhelminths, emerge either at the base of all other Bilateria, or within protostomes, depending on the outgroup. We use this artefactual result as a case study for comparing the robustness of two alternative models: a standard, site-homogeneous model, based on an empirical matrix of amino-acid replacement (WAG), and a site-heterogeneous mixture model (CAT). In parallel, we propose a posterior predictive test, allowing one to measure how well a model acknowledges sequence saturation.
Adopting a Bayesian framework, we show that the LBA artefact observed under WAG disappears when the site-heterogeneous model CAT is used. Using cross-validation, we further demonstrate that CAT has a better statistical fit than WAG on this data set. Finally, using our statistical goodness-of-fit test, we show that CAT, but not WAG, correctly accounts for the overall level of saturation, and that this is due to a better estimation of site-specific amino-acid preferences.
The CAT model appears to be more robust than WAG against LBA artefacts, essentially because it correctly anticipates the high probability of convergences and reversions implied by the small effective size of the amino-acid alphabet at each site of the alignment. More generally, our results provide strong evidence that site-specificities in the substitution process need be accounted for in order to obtain more reliable phylogenetic trees.
Phylogenetic analyses based on datasets rich in both genes and species (phylogenomics) are becoming a standard approach to resolve evolutionary questions. However, several difficulties are associated with the assembly of large datasets, such as multiple copies of a gene per species (paralogous or xenologous genes), lack of some genes for a given species, or partial sequences. The use of undetected paralogous or xenologous genes in phylogenetic inference can lead to inaccurate results, and the use of partial sequences to a lack of resolution. A tool that selects sequences, species, and genes, while dealing with these issues, is needed in a phylogenomics context.
Here, we present SCaFoS, a tool that quickly assembles phylogenomic datasets containing maximal phylogenetic information while adjusting the amount of missing data in the selection of species, sequences and genes. Starting from individual sequence alignments, and using monophyletic groups defined by the user, SCaFoS creates chimeras with partial sequences, or selects, among multiple sequences, the orthologous and/or slowest evolving sequences. Once sequences representing each predefined monophyletic group have been selected, SCaFos retains genes according to the user's allowed level of missing data and generates files for super-matrix and super-tree analyses in several formats compatible with standard phylogenetic inference software. Because no clear-cut criteria exist for the sequence selection, a semi-automatic mode is available to accommodate user's expertise.
SCaFos is able to deal with datasets of hundreds of species and genes, both at the amino acid or nucleotide level. It has a graphical interface and can be integrated in an automatic workflow. Moreover, SCaFoS is the first tool that integrates user's knowledge to select orthologous sequences, creates chimerical sequences to reduce missing data and selects genes according to their level of missing data. Finally, applying SCaFoS to different datasets, we show that the judicious selection of genes, species and sequences reduces tree reconstruction artefacts, especially if the dataset includes fast evolving species.
The aim of protein design is to predict amino-acid sequences compatible with a given target structure. Traditionally envisioned as a purely thermodynamic question, this problem can also be understood in a wider context, where additional constraints are captured by learning the sequence patterns displayed by natural proteins of known conformation. In this latter perspective, however, we still need a theoretical formalization of the question, leading to general and efficient learning methods, and allowing for the selection of fast and accurate objective functions quantifying sequence/structure compatibility.
We propose a formulation of the protein design problem in terms of model-based statistical inference. Our framework uses the maximum likelihood principle to optimize the unknown parameters of a statistical potential, which we call an inverse potential to contrast with classical potentials used for structure prediction. We propose an implementation based on Markov chain Monte Carlo, in which the likelihood is maximized by gradient descent and is numerically estimated by thermodynamic integration. The fit of the models is evaluated by cross-validation. We apply this to a simple pairwise contact potential, supplemented with a solvent-accessibility term, and show that the resulting models have a better predictive power than currently available pairwise potentials. Furthermore, the model comparison method presented here allows one to measure the relative contribution of each component of the potential, and to choose the optimal number of accessibility classes, which turns out to be much higher than classically considered.
Altogether, this reformulation makes it possible to test a wide diversity of models, using different forms of potentials, or accounting for other factors than just the constraint of thermodynamic stability. Ultimately, such model-based statistical analyses may help to understand the forces shaping protein sequences, and driving their evolution.
Probabilistic methods have progressively supplanted the Maximum Parsimony (MP) method for inferring phylogenetic trees. One of the major reasons for this shift was that MP is much more sensitive to the Long Branch Attraction (LBA) artefact than is Maximum Likelihood (ML). However, recent work by Kolaczkowski and Thornton suggested, on the basis of simulations, that MP is less sensitive than ML to tree reconstruction artefacts generated by heterotachy, a phenomenon that corresponds to shifts in site-specific evolutionary rates over time. These results led these authors to recommend that the results of ML and MP analyses should be both reported and interpreted with the same caution. This specific conclusion revived the debate on the choice of the most accurate phylogenetic method for analysing real data in which various types of heterogeneities occur. However, variation of evolutionary rates across species was not explicitly incorporated in the original study of Kolaczkowski and Thornton, and in most of the subsequent heterotachous simulations published to date, where all terminal branch lengths were kept equal, an assumption that is biologically unrealistic.
In this report, we performed more realistic simulations to evaluate the relative performance of MP and ML methods when two kinds of heterogeneities are considered: (i) within-site rate variation (heterotachy), and (ii) rate variation across lineages. Using a similar protocol as Kolaczkowski and Thornton to generate heterotachous datasets, we found that heterotachy, which constitutes a serious violation of existing models, decreases the accuracy of ML whatever the level of rate variation across lineages. In contrast, the accuracy of MP can either increase or decrease when the level of heterotachy increases, depending on the relative branch lengths. This result demonstrates that MP is not insensitive to heterotachy, contrary to the report of Kolaczkowski and Thornton. Finally, in the case of LBA (i.e. when two non-sister lineages evolved faster than the others), ML outperforms MP over a wide range of conditions, except for unrealistic levels of heterotachy.
For realistic combinations of both heterotachy and variation of evolutionary rates across lineages, ML is always more accurate than MP. Therefore, ML should be preferred over MP for analysing real data, all the more so since parametric methods also allow one to handle other types of biological heterogeneities much better, such as among sites rate variation. The confounding effects of heterotachy on tree reconstruction methods do exist, but can be eschewed by the development of mixture models in a probabilistic framework, as proposed by Kolaczkowski and Thornton themselves.
The genomes of three Pyrococcus species, P.abyssi, P.furiosus and P.horikoshii, were compared at the DNA level, taking advantage of our identification of their replication origins. Three types of rearrangements have been identified: (i) inversion and translation across the replication axis (origin/terminus), (ii) inversion and translocation restricted to a replichore (the half chromosome divided by the replication axis) and (iii) apparent mobility of long clusters of repeated sequences. Rearrangements restricted within a replichore were more common between P.furiosus and the two other Pyrococcus species than between P.horikoshii and P.abyssi. A strong correlation was found between 23 homologous insertion sequence elements, present only in P.furiosus, and recombined segment boundaries, suggesting that transposition events have been a major cause of genomic disruption in this species. Moreover, gene orientation bias was much more disrupted than strand composition biases in fragments that switched their orientation within a replichore upon recombination. This allowed us to conclude that one reversion and one translation occurred in P.abyssi after its divergence from P.horikoshii, and that a smaller segment has specifically recombined in P.furiosus. Whereas a majority of genes are transcribed in the same direction as DNA replication in P.horikoshii and P.abyssi, the colinearity of transcription and replication is only maintained for highly transcribed genes in P.furiosus. We discuss the implications of genomic rearrangements on gene orientation and composition biases, and their consequences on sequence evolution.
We have previously identified in the human genome a family of 200 endogenous retrovirus-like elements, the HERV-L elements, disclosing similarities with the foamy retroviruses and which might be the evolutionary intermediate between classical intracellular retrotransposons and infectious retroviruses. Southern blot analysis of a large series of mammalian genomic DNAs shows that HERV-L-related elements—so-called ERV-L—are present among all placental mammals, suggesting that ERV-L elements were already present at least 70 million years ago. Most species exhibit a low copy number of ERV-L elements (from 10 to 30), while simians (not prosimians) and mice (not rats) have been subjected to bursts resulting in increases in the number of copies up to 200. The burst of copy number in primates can be dated to shortly after the prosimian and simian branchpoint, 45 to 65 million years ago, whereas murine species have been subjected to two much more recent bursts (less than 10 million years ago), occurring after the Mus/Rattus split. We have amplified and sequenced 360-bp ERV-L internal fragments of the highly conserved pol gene from a series of 22 mammalian species. These sequences exhibit high percentages of identity (57 to 99%) with the murine fully coding MuERV-L element. Phylogenetic analyses allowed the establishment of a plausible evolutionary scheme for ERV-L elements, which accounts for the high level of sequence conservation and the widespread dispersion among mammals.