Search tips
Search criteria

Results 1-25 (690541)

Clipboard (0)

Related Articles

1.  CoPAP: Coevolution of Presence–Absence Patterns 
Nucleic Acids Research  2013;41(Web Server issue):W232-W237.
Evolutionary analysis of phyletic patterns (phylogenetic profiles) is widely used in biology, representing presence or absence of characters such as genes, restriction sites, introns, indels and methylation sites. The phyletic pattern observed in extant genomes is the result of ancestral gain and loss events along the phylogenetic tree. Here we present CoPAP (coevolution of presence–absence patterns), a user-friendly web server, which performs accurate inference of coevolving characters as manifested by co-occurring gains and losses. CoPAP uses state-of-the-art probabilistic methodologies to infer coevolution and allows for advanced network analysis and visualization. We developed a platform for comparing different algorithms that detect coevolution, which includes simulated data with pairs of coevolving sites and independent sites. Using these simulated data we demonstrate that CoPAP performance is higher than alternative methods. We exemplify CoPAP utility by analyzing coevolution among thousands of bacterial genes across 681 genomes. Clusters of coevolving genes that were detected using our method largely coincide with known biosynthesis pathways and cellular modules, thus exhibiting the capability of CoPAP to infer biologically meaningful interactions. CoPAP is freely available for use at
PMCID: PMC3692100  PMID: 23748951
2.  Models of gene gain and gene loss for probabilistic reconstruction of gene content in the last universal common ancestor of life 
Biology Direct  2013;8:32.
The problem of probabilistic inference of gene content in the last common ancestor of several extant species with completely sequenced genomes is: for each gene that is conserved in all or some of the genomes, assign the probability that its ancestral gene was present in the genome of their last common ancestor.
We have developed a family of models of gene gain and gene loss in evolution, and applied the maximum-likelihood approach that uses phylogenetic tree of prokaryotes and the record of orthologous relationships between their genes to infer the gene content of LUCA, the Last Universal Common Ancestor of all currently living cellular organisms. The crucial parameter, the ratio of gene losses and gene gains, was estimated from the data and was higher in models that take account of the number of in-paralogs in genomes than in models that treat gene presences and absences as a binary trait.
While the numbers of genes that are placed confidently into LUCA are similar in the ML methods and in previously published methods that use various parsimony-based approaches, the identities of genes themselves are different. Most of the models of either kind treat the genes found in many existing genomes in a similar way, assigning to them high probabilities of being ancestral (“high ancestrality”). The ML models are more likely than others to assign high ancestrality to the genes that are relatively rare in the present-day genomes.
This article was reviewed by Martijn A Huynen, Toni Gabaldón and Fyodor Kondrashov.
PMCID: PMC3892064  PMID: 24354654
3.  A Maximum Likelihood Method for Reconstruction of the Evolution of Eukaryotic Gene Structure 
Spliceosomal introns are one of the principal distinctive features of eukaryotes. Nevertheless, different large-scale studies disagree about even the most basic features of their evolution. In order to come up with a more reliable reconstruction of intron evolution, we developed a model that is far more comprehensive than previous ones. This model is rich in parameters, and estimating them accurately is infeasible by straightforward likelihood maximization. Thus, we have developed an expectation-maximization algorithm that allows for efficient maximization. Here, we outline the model and describe the expectation-maximization algorithm in detail. Since the method works with intron presence–absence maps, it is expected to be instrumental for the analysis of the evolution of other binary characters as well.
PMCID: PMC3410445  PMID: 19381540
Maximum likelihood; expectation-maximization; intron evolution; ancestral reconstruction; eukaryotic gene structure
4.  Optimized ancestral state reconstruction using Sankoff parsimony 
BMC Bioinformatics  2009;10:51.
Parsimony methods are widely used in molecular evolution to estimate the most plausible phylogeny for a set of characters. Sankoff parsimony determines the minimum number of changes required in a given phylogeny when a cost is associated to transitions between character states. Although optimizations exist to reduce the computations in the number of taxa, the original algorithm takes time O(n2) in the number of states, making it impractical for large values of n.
In this study we introduce an optimization of Sankoff parsimony for the reconstruction of ancestral states when ultrametric or additive cost matrices are used. We analyzed its performance for randomly generated matrices, Jukes-Cantor and Kimura's two-parameter models of DNA evolution, and in the reconstruction of elongation factor-1α and ancestral metabolic states of a group of eukaryotes, showing that in all cases the execution time is significantly less than with the original implementation.
The algorithms here presented provide a fast computation of Sankoff parsimony for a given phylogeny. Problems where the number of states is large, such as reconstruction of ancestral metabolism, are particularly adequate for this optimization. Since we are reducing the computations required to calculate the parsimony cost of a single tree, our method can be combined with optimizations in the number of taxa that aim at finding the most parsimonious tree.
PMCID: PMC2677398  PMID: 19200389
5.  Inference and Characterization of Horizontally Transferred Gene Families Using Stochastic Mapping 
Molecular Biology and Evolution  2009;27(3):703-713.
Macrogenomic events, in which genes are gained and lost, play a pivotal evolutionary role in microbial evolution. Nevertheless, probabilistic-evolutionary models describing such events and methods for their robust inference are considerably less developed than existing methodologies for analyzing site-specific sequence evolution. Here, we present a novel method for the inference of gains and losses of gene families. First, we develop probabilistic-evolutionary models describing the dynamics of gene-family content, which are more biologically realistic than previously suggested models. In our likelihood-based models, gains and losses are represented by transitions between presence and absence, given an underlying phylogeny. We employ a mixture-model approach in which we allow both the gain rate and the loss rate to vary among gene families. Second, we use these models together with the analytic implementation of stochastic mapping to infer branch-specific events. Our novel methodology allows us to infer and quantify horizontal gene transfer (HGT) events. This enables us to rank various gene families and lineages according to their propensity to undergo gains and losses. Applying our methodology to 4,873 gene families shows that: 1) the novel mixture models describe the observed variability in gene-family content among microbes significantly better than previous models; 2) The stochastic mapping approach enables accurate inference of gain and loss events based on simulations; 3) At least 34% of the gene families analyzed are inferred to have experienced HGT at least once during their evolution; and 4) Gene families that were inferred to experience HGT are both enriched and depleted with respect to specific functional categories.
PMCID: PMC2822287  PMID: 19808865
phyletic pattern; probabilistic-evolutionary models; mixture models; genome evolution; horizontal gene transfer; gene-family content
6.  Analysis on the reconstruction accuracy of the Fitch method for inferring ancestral states 
BMC Bioinformatics  2011;12:18.
As one of the most widely used parsimony methods for ancestral reconstruction, the Fitch method minimizes the total number of hypothetical substitutions along all branches of a tree to explain the evolution of a character. Due to the extensive usage of this method, it has become a scientific endeavor in recent years to study the reconstruction accuracies of the Fitch method. However, most studies are restricted to 2-state evolutionary models and a study for higher-state models is needed since DNA sequences take the format of 4-state series and protein sequences even have 20 states.
In this paper, the ambiguous and unambiguous reconstruction accuracy of the Fitch method are studied for N-state evolutionary models. Given an arbitrary phylogenetic tree, a recurrence system is first presented to calculate iteratively the two accuracies. As complete binary tree and comb-shaped tree are the two extremal evolutionary tree topologies according to balance, we focus on the reconstruction accuracies on these two topologies and analyze their asymptotic properties. Then, 1000 Yule trees with 1024 leaves are generated and analyzed to simulate real evolutionary scenarios. It is known that more taxa not necessarily increase the reconstruction accuracies under 2-state models. The result under N-state models is also tested.
In a large tree with many leaves, the reconstruction accuracies of using all taxa are sometimes less than those of using a leaf subset under N-state models. For complete binary trees, there always exists an equilibrium interval [a, b] of conservation probability, in which the limiting ambiguous reconstruction accuracy equals to the probability of randomly picking a state. The value b decreases with the increase of the number of states, and it seems to converge. When the conservation probability is greater than b, the reconstruction accuracies of the Fitch method increase rapidly. The reconstruction accuracies on 1000 simulated Yule trees also exhibit similar behaviors. For comb-shaped trees, the limiting reconstruction accuracies of using all taxa are always less than or equal to those of using the nearest root-to-leaf path when the conservation probability is not less than 1N. As a result, more taxa are suggested for ancestral reconstruction when the tree topology is balanced and the sequences are highly similar, and a few taxa close to the root are recommended otherwise.
PMCID: PMC3030536  PMID: 21226965
7.  Malin: maximum likelihood analysis of intron evolution in eukaryotes 
Bioinformatics  2008;24(13):1538-1539.
Summary: Malin is a software package for the analysis of eukaryotic gene structure evolution. It provides a graphical user interface for various tasks commonly used to infer the evolution of exon–intron structure in protein-coding orthologs. Implemented tasks include the identification of conserved homologous intron sites in protein alignments, as well as the estimation of ancestral intron content, lineage-specific intron losses and gains. Estimates are computed either with parsimony, or with a probabilistic model that incorporates rate variation across lineages and intron sites.
Availability: Malin is available as a stand-alone Java application, as well as an application bundle for MacOS X, at the website The software is distributed under a BSD-style license.
PMCID: PMC2718671  PMID: 18474506
8.  Understanding Phenotypical Character Evolution in Parmelioid Lichenized Fungi (Parmeliaceae, Ascomycota) 
PLoS ONE  2013;8(11):e83115.
Parmelioid lichens form a species-rich group of predominantly foliose and fruticose lichenized fungi encompassing a broad range of morphological and chemical diversity. Using a multilocus approach, we reconstructed a phylogeny including 323 OTUs of parmelioid lichens and employed ancestral character reconstruction methods to understand the phenotypical evolution within this speciose group of lichen-forming fungi. Specifically, we were interested in the evolution of growth form, epicortex structure, and cortical chemistry. Since previous studies have shown that results may differ depending on the reconstruction method used, here we employed both maximum-parsimony and maximum-likelihood approaches to reconstruct ancestral character states. We have also implemented binary and multistate coding of characters and performed parallel analyses with both coding types to assess for potential coding-based biases. We reconstructed the ancestral states for nine well-supported major clades in the parmelioid group, two higher-level sister groups and the ancestral character state for all parmelioid lichens. We found that different methods for coding phenotypical characters and different ancestral character state reconstruction methods mostly resulted in identical reconstructions but yield conflicting inferences of ancestral states, in some cases. However, we found support for the ancestor of parmelioid lichens having been a foliose lichen with a non-pored epicortex and pseudocyphellae. Our data suggest that some traits exhibit patterns of evolution consistent with adaptive radiation.
PMCID: PMC3843734  PMID: 24312438
9.  Bootstrapping phylogenies inferred from rearrangement data 
Large-scale sequencing of genomes has enabled the inference of phylogenies based on the evolution of genomic architecture, under such events as rearrangements, duplications, and losses. Many evolutionary models and associated algorithms have been designed over the last few years and have found use in comparative genomics and phylogenetic inference. However, the assessment of phylogenies built from such data has not been properly addressed to date. The standard method used in sequence-based phylogenetic inference is the bootstrap, but it relies on a large number of homologous characters that can be resampled; yet in the case of rearrangements, the entire genome is a single character. Alternatives such as the jackknife suffer from the same problem, while likelihood tests cannot be applied in the absence of well established probabilistic models.
We present a new approach to the assessment of distance-based phylogenetic inference from whole-genome data; our approach combines features of the jackknife and the bootstrap and remains nonparametric. For each feature of our method, we give an equivalent feature in the sequence-based framework; we also present the results of extensive experimental testing, in both sequence-based and genome-based frameworks. Through the feature-by-feature comparison and the experimental results, we show that our bootstrapping approach is on par with the classic phylogenetic bootstrap used in sequence-based reconstruction, and we establish the clear superiority of the classic bootstrap for sequence data and of our corresponding new approach for rearrangement data over proposed variants. Finally, we test our approach on a small dataset of mammalian genomes, verifying that the support values match current thinking about the respective branches.
Our method is the first to provide a standard of assessment to match that of the classic phylogenetic bootstrap for aligned sequences. Its support values follow a similar scale and its receiver-operating characteristics are nearly identical, indicating that it provides similar levels of sensitivity and specificity. Thus our assessment method makes it possible to conduct phylogenetic analyses on whole genomes with the same degree of confidence as for analyses on aligned sequences. Extensions to search-based inference methods such as maximum parsimony and maximum likelihood are possible, but remain to be thoroughly tested.
PMCID: PMC3487984  PMID: 22931958
Bootstrap; Jackknife; Phylogenetic reconstruction; Rearrangement; Gene order; Comparative genomics
10.  Prevalence of intron gain over intron loss in the evolution of paralogous gene families 
Nucleic Acids Research  2004;32(12):3724-3733.
The mechanisms and evolutionary dynamics of intron insertion and loss in eukaryotic genes remain poorly understood. Reconstruction of parsimonious scenarios of gene structure evolution in paralogous gene families in animals and plants revealed numerous gains and losses of introns. In all analyzed lineages, the number of acquired new introns was substantially greater than the number of lost ancestral introns. This trend held even for lineages in which vertical evolution of genes involved more intron losses than gains, suggesting that gene duplication boosts intron insertion. However, dating gene duplications and the associated intron gains and losses based on the molecular clock assumption showed that very few, if any, introns were gained during the last ∼100 million years of animal and plant evolution, in agreement with previous conclusions reached through analysis of orthologous gene sets. These results are generally compatible with the emerging notion of intensive insertion and loss of introns during transitional epochs in contrast to the relative quiet of the intervening evolutionary spans.
PMCID: PMC484173  PMID: 15254274
11.  Inferring duplications, losses, transfers and incomplete lineage sorting with nonbinary species trees 
Bioinformatics  2012;28(18):i409-i415.
Motivation: Gene duplication (D), transfer (T), loss (L) and incomplete lineage sorting (I) are crucial to the evolution of gene families and the emergence of novel functions. The history of these events can be inferred via comparison of gene and species trees, a process called reconciliation, yet current reconciliation algorithms model only a subset of these evolutionary processes.
Results: We present an algorithm to reconcile a binary gene tree with a nonbinary species tree under a DTLI parsimony criterion. This is the first reconciliation algorithm to capture all four evolutionary processes driving tree incongruence and the first to reconcile non-binary species trees with a transfer model. Our algorithm infers all optimal solutions and reports complete, temporally feasible event histories, giving the gene and species lineages in which each event occurred. It is fixed-parameter tractable, with polytime complexity when the maximum species outdegree is fixed. Application of our algorithms to prokaryotic and eukaryotic data show that use of an incomplete event model has substantial impact on the events inferred and resulting biological conclusions.
Availability: Our algorithms have been implemented in Notung, a freely available phylogenetic reconciliation software package, available at
PMCID: PMC3436813  PMID: 22962460
12.  A phylogenetic approach to gene expression data: evidence for the evolutionary origin of mammalian leukocyte phenotypes 
Evolution & development  2009;11(4):382-390.
The evolution of multicellular organisms involved the evolution of specialized cell types performing distinct functions; and specialized cell types presumably arose from more generalized ancestral cell types as a result of mutational event, such as gene duplication and changes in gene expression. We used characters based on gene expression data to reconstruct evolutionary relationships among 11 types of lymphocytes by the maximum parsimony method. The resulting phylogenetic tree showed expected patterns including separation of the lymphoid and myeloid lineages; clustering together of granulocyte types; and pairing of phenotypically similar cell types such as T-helper cells type 1 and T-helper cells type 2 (Th1 and Th2). We used phylogenetic analyses of sequence data to determine the time of origin of genes showing significant expression difference between Th1 and Th2 cells. Many such genes, particularly those involved in the regulation of gene expression or activation of proteins, were of ancient origin, having arisen by gene duplication before the most recent common ancestor (MRCA) of tetrapods and teleosts. However, certain other genes with significant expression difference between Th1 and Th2 arose after the tetrapod--teleost MRCA, and some of the latter were specific to eutherian (placental) mammals. This evolutionary pattern is consistent with previous evidence that, while bony fishes possess Th1 and Th2 cells, the latter differ phenotypically in important respects from the corresponding cells of mammals. Our results support a gradualistic model of the evolution of distinctive cellular phenotypes whereby the unique characteristics of a given cell type arise as a result of numerous independent mutational changes over hundreds of millions of years.
PMCID: PMC2810715  PMID: 19601972
13.  A likelihood framework to analyse phyletic patterns 
Probabilistic evolutionary models revolutionized our capability to extract biological insights from sequence data. While these models accurately describe the stochastic processes of site-specific substitutions, single-base substitutions represent only a fraction of all the events that shape genomes. Specifically, in microbes, events in which entire genes are gained (e.g. via horizontal gene transfer) and lost play a pivotal evolutionary role. In this research, we present a novel likelihood-based evolutionary model for gene gains and losses, and use it to analyse genome-wide patterns of the presence and absence of gene families. The model assumes a Markovian stochastic process, where gains and losses are represented by the transition between presence and absence, respectively, given an underlying phylogenetic tree. To account for differences in the rates of gain and loss of different gene families, we assume among-gene family rate variability, thus allowing for more accurate description of the data. Using the Bayesian approach, we estimated an evolutionary rate for each gene family. Simulation studies demonstrated that our methodology accurately infers these rates. Our methodology was applied to analyse a large corpus of data, consisting of 4873 gene families spanning 63 species and revealed novel insights regarding the evolutionary nature of genome-wide gain and loss dynamics.
PMCID: PMC2607420  PMID: 18852099
phyletic pattern; probabilistic evolutionary models; genome evolution; gene gain and loss; horizontal gene transfer; gene content
14.  Floral Evolution of Philodendron Subgenus Meconostigma (Araceae) 
PLoS ONE  2014;9(2):e89701.
Elucidating the evolutionary patterns of flower and inflorescence structure is pivotal to understanding the phylogenetic relationships of Angiosperms as a whole. The inflorescence morphology and anatomy of Philodendron subgenus Meconostigma, belonging to the monocot family Araceae, has been widely studied but the evolutionary relationships of subgenus Meconostigma and the evolution of its flower characters have hitherto remained unclear. This study examines gynoecium evolution in subgenus Meconostigma in the context of an estimated molecular phylogeny for all extant species of subgenus Meconostigma and analysis of ancestral character reconstructions of some gynoecial structures. The phylogenetic reconstructions of all extant Meconostigma species were conducted under a maximum likelihood approach based on the sequences of two chloroplast (trnk and matK) and two nuclear (ETS and 18S) markers. This topology was used to reconstruct the ancestral states of seven floral characters and to elucidate their evolutionary pattern in the Meconostigma lineage. Our phylogeny shows that Meconostigma is composed of two major clades, one comprising two Amazonian species and the other all the species from the Atlantic Forest and Cerrado biomes with one Amazonian species. The common ancestor of the species of subgenus Meconostigma probably possessed short stylar lobes, long stylar canals, a stylar body, a vascular plexus in the gynoecium and druses in the stylar parenchyma but it is uncertain whether raphide inclusions were present in the parenchyma. The ancestral lineage also probably possessed up to 10 ovary locules. The evolution of these characters seems to have occurred independently in some lineages. We propose that the morphological and anatomical diversity observed in the gynoecial structures of subgenus Meconostigma is the result of an ongoing process of fusion of floral structures leading to a reduction of energy wastage and increase in stigmatic surface.
PMCID: PMC3935929  PMID: 24586972
15.  Inference of Gain and Loss Events from Phyletic Patterns Using Stochastic Mapping and Maximum Parsimony—A Simulation Study 
Genome Biology and Evolution  2011;3:1265-1275.
Bacterial evolution is characterized by frequent gain and loss events of gene families. These events can be inferred from phyletic pattern data—a compact representation of gene family repertoire across multiple genomes. The maximum parsimony paradigm is a classical and prevalent approach for the detection of gene family gains and losses mapped on specific branches. We and others have previously developed probabilistic models that aim to account for the gain and loss stochastic dynamics. These models are a critical component of a methodology termed stochastic mapping, in which probabilities and expectations of gain and loss events are estimated for each branch of an underlying phylogenetic tree. In this work, we present a phyletic pattern simulator in which the gain and loss dynamics are assumed to follow a continuous-time Markov chain along the tree. Various models and options are implemented to make the simulation software useful for a large number of studies in which binary (presence/absence) data are analyzed. Using this simulation software, we compared the ability of the maximum parsimony and the stochastic mapping approaches to accurately detect gain and loss events along the tree. Our simulations cover a large array of evolutionary scenarios in terms of the propensities for gene family gains and losses and the variability of these propensities among gene families. Although in all simulation schemes, both methods obtain relatively low levels of false positive rates, stochastic mapping outperforms maximum parsimony in terms of true positive rates. We further studied the factors that influence the performance of both methods. We find, for example, that the accuracy of maximum parsimony inference is substantially reduced when the goal is to map gain and loss events along internal branches of the phylogenetic tree. Furthermore, the accuracy of stochastic mapping is reduced with smaller data sets (limited number of gene families) due to unreliable estimation of branch lengths. Our simulator and simulation results are additionally relevant for the analysis of other types of binary-coded data, such as the existence of homologues restriction sites, gaps, and introns, to name a few. Both the simulation software and the inference methodology are freely available at a user-friendly server:
PMCID: PMC3215202  PMID: 21971516
phyletic pattern; stochastic mapping; maximum parsimony; evolutionary models
16.  The historical biogeography of Mammalia 
Palaeobiogeographic reconstructions are underpinned by phylogenies, divergence times and ancestral area reconstructions, which together yield ancestral area chronograms that provide a basis for proposing and testing hypotheses of dispersal and vicariance. Methods for area coding include multi-state coding with a single character, binary coding with multiple characters and string coding. Ancestral reconstruction methods are divided into parsimony versus Bayesian/likelihood approaches. We compared nine methods for reconstructing ancestral areas for placental mammals. Ambiguous reconstructions were a problem for all methods. Important differences resulted from coding areas based on the geographical ranges of extant species versus the geographical provenance of the oldest fossil for each lineage. Africa and South America were reconstructed as the ancestral areas for Afrotheria and Xenarthra, respectively. Most methods reconstructed Eurasia as the ancestral area for Boreoeutheria, Euarchontoglires and Laurasiatheria. The coincidence of molecular dates for the separation of Afrotheria and Xenarthra at approximately 100 Ma with the plate tectonic sundering of Africa and South America hints at the importance of vicariance in the early history of Placentalia. Dispersal has also been important including the origins of Madagascar's endemic mammal fauna. Further studies will benefit from increased taxon sampling and the application of new ancestral area reconstruction methods.
PMCID: PMC3138613  PMID: 21807730
ancestral areas; dispersal; historical biogeography; Mammalia; vicariance
17.  Patterns of intron gain and conservation in eukaryotic genes 
The presence of introns in protein-coding genes is a universal feature of eukaryotic genome organization, and the genes of multicellular eukaryotes, typically, contain multiple introns, a substantial fraction of which share position in distant taxa, such as plants and animals. Depending on the methods and data sets used, researchers have reached opposite conclusions on the causes of the high fraction of shared introns in orthologous genes from distant eukaryotes. Some studies conclude that shared intron positions reflect, almost entirely, a remarkable evolutionary conservation, whereas others attribute it to parallel gain of introns. To resolve these contradictions, it is crucial to analyze the evolution of introns by using a model that minimally relies on arbitrary assumptions.
We developed a probabilistic model of evolution that allows for variability of intron gain and loss rates over branches of the phylogenetic tree, individual genes, and individual sites. Applying this model to an extended set of conserved eukaryotic genes, we find that parallel gain, on average, accounts for only ~8% of the shared intron positions. However, the distribution of parallel gains over the phylogenetic tree of eukaryotes is highly non-uniform. There are, practically, no parallel gains in closely related lineages, whereas for distant lineages, such as animals and plants, parallel gains appear to contribute up to 20% of the shared intron positions. In accord with these findings, we estimated that ancestral introns have a high probability to be retained in extant genomes, and conversely, that a substantial fraction of extant introns have retained their positions since the early stages of eukaryotic evolution. In addition, the density of sites that are available for intron insertion is estimated to be, approximately, one in seven basepairs.
We obtained robust estimates of the contribution of parallel gain to the observed sharing of intron positions between eukaryotic species separated by different evolutionary distances. The results indicate that, although the contribution of parallel gains varies across the phylogenetic tree, the high level of intron position sharing is due, primarily, to evolutionary conservation. Accordingly, numerous introns appear to persist in the same position over hundreds of millions of years of evolution. This is compatible with recent observations of a negative correlation between the rate of intron gain and coding sequence evolution rate of a gene, suggesting that at least some of the introns are functionally relevant.
PMCID: PMC2151770  PMID: 17935625
18.  SIMMAP: Stochastic character mapping of discrete traits on phylogenies 
BMC Bioinformatics  2006;7:88.
Character mapping on phylogenies has played an important, if not critical role, in our understanding of molecular, morphological, and behavioral evolution. Until very recently we have relied on parsimony to infer character changes. Parsimony has a number of serious limitations that are drawbacks to our understanding. Recent statistical methods have been developed that free us from these limitations enabling us to overcome the problems of parsimony by accommodating uncertainty in evolutionary time, ancestral states, and the phylogeny.
SIMMAP has been developed to implement stochastic character mapping that is useful to both molecular evolutionists, systematists, and bioinformaticians. Researchers can address questions about positive selection, patterns of amino acid substitution, character association, and patterns of morphological evolution.
Stochastic character mapping, as implemented in the SIMMAP software, enables users to address questions that require mapping characters onto phylogenies using a probabilistic approach that does not rely on parsimony. Analyses can be performed using a fully Bayesian approach that is not reliant on considering a single topology, set of substitution model parameters, or reconstruction of ancestral states. Uncertainty in these quantities is accommodated by using MCMC samples from their respective posterior distributions.
PMCID: PMC1403802  PMID: 16504105
19.  Plausibility of inferred ancestral phenotypes and the evaluation of alternative models of limb evolution in scincid lizards 
Biology Letters  2009;6(3):354-358.
Phylogenetic approaches to inferring ancestral character states are becoming increasingly sophisticated; however, the potential remains for available methods to yield strongly supported but inaccurate ancestral state estimates. The consistency of ancestral states inferred for two or more characters affords a useful criterion for evaluating ancestral trait reconstructions. Ancestral state estimates for multiple characters that entail plausible phenotypes when considered together may reasonably be assumed to be reliable. However, the accuracy of inferred ancestral states for one or more characters may be questionable where combined reconstructions imply implausible phenotypes for a proportion of internal nodes. This criterion for assessing reconstructed ancestral states is applied here in evaluating inferences of ancestral limb morphology in the scincid lizard clade Lerista. Ancestral numbers of digits for the manus and pes inferred assuming the models that best fit the data entail ancestral digit configurations for many nodes that differ fundamentally from configurations observed among known species. However, when an alternative model is assumed for the pes, inferred ancestral digit configurations are invariably represented among observed phenotypes. This indicates that a suboptimal model for the pes (and not the model providing the best fit to the data) yields accurate ancestral state estimates.
PMCID: PMC2880043  PMID: 20007166
ancestral state; Bayesian inference; Lerista; limb reduction; Squamata
20.  New insights into Trimezieae (Iridaceae) phylogeny: what do molecular data tell us? 
Annals of Botany  2012;110(3):689-702.
Background and Aims
The Neotropical tribe Trimezieae are taxonomically difficult. They are generally characterized by the absence of the features used to delimit their sister group Tigridieae. Delimiting the four genera that make up Trimezieae is also problematic. Previous family-level phylogenetic analyses have not examined the monophyly of the tribe or relationships within it. Reconstructing the phylogeny of Trimezieae will allow us to evaluate the status of the tribe and genera and to examine the suitability of characters traditionally used in their taxonomy.
Maximum parsimony and Bayesian phylogenetic analyses are presented for 37 species representing all four genera of Trimezieae. Analyses were based on nrITS sequences and a combined plastid dataset. Ancestral character state reconstructions were used to investigate the evolution of ten morphological characters previously considered taxonomically useful.
Key Results
Analyses of nrITS and plastid datasets strongly support the monophyly of Trimezieae and recover four principal clades with varying levels of support; these clades do not correspond to the currently recognized genera. Relationships within the four clades are not consistently resolved, although the conflicting resolutions are not strongly supported in individual analyses. Ancestral character state reconstructions suggest considerable homoplasy, especially in the floral characters used to delimit Pseudotrimezia.
The results strongly support recognition of Trimezieae as a tribe but suggest that both generic- and species-level taxonomy need revision. Further molecular analyses, with increased sampling of taxa and markers, are needed to support any revision. Such analyses will help determine the causes of discordance between the plastid and nuclear data and provide a framework for identifying potential morphological synapomorphies for infra-tribal groups. The results also suggest Trimezieae provide a promising model for evolutionary research.
PMCID: PMC3400455  PMID: 22711695
DNA sequences; Iridaceae; Iridoideae; morphology; Neomarica; Neotropics; phylogenetic analysis; Pseudiris; Pseudotrimezia; Trimezia; Trimezieae
21.  Evolutionary models for insertions and deletions in a probabilistic modeling framework 
BMC Bioinformatics  2005;6:63.
Probabilistic models for sequence comparison (such as hidden Markov models and pair hidden Markov models for proteins and mRNAs, or their context-free grammar counterparts for structural RNAs) often assume a fixed degree of divergence. Ideally we would like these models to be conditional on evolutionary divergence time.
Probabilistic models of substitution events are well established, but there has not been a completely satisfactory theoretical framework for modeling insertion and deletion events.
I have developed a method for extending standard Markov substitution models to include gap characters, and another method for the evolution of state transition probabilities in a probabilistic model. These methods use instantaneous rate matrices in a way that is more general than those used for substitution processes, and are sufficient to provide time-dependent models for standard linear and affine gap penalties, respectively.
Given a probabilistic model, we can make all of its emission probabilities (including gap characters) and all its transition probabilities conditional on a chosen divergence time. To do this, we only need to know the parameters of the model at one particular divergence time instance, as well as the parameters of the model at the two extremes of zero and infinite divergence.
I have implemented these methods in a new generation of the RNA genefinder QRNA (eQRNA).
These methods can be applied to incorporate evolutionary models of insertions and deletions into any hidden Markov model or stochastic context-free grammar, in a pair or profile form, for sequence modeling.
PMCID: PMC1087829  PMID: 15780137
22.  Novel methodology for construction and pruning of quasi-median networks 
BMC Bioinformatics  2008;9:115.
Visualising the evolutionary history of a set of sequences is a challenge for molecular phylogenetics. One approach is to use undirected graphs, such as median networks, to visualise phylogenies where reticulate relationships such as recombination or homoplasy are displayed as cycles. Median networks contain binary representations of sequences as nodes, with edges connecting those sequences differing at one character; hypothetical ancestral nodes are invoked to generate a connected network which contains all most parsimonious trees. Quasi-median networks are a generalisation of median networks which are not restricted to binary data, although phylogenetic information contained within the multistate positions can be lost during the preprocessing of data. Where the history of a set of samples contain frequent homoplasies or recombination events quasi-median networks will have a complex topology. Graph reduction or pruning methods have been used to reduce network complexity but some of these methods are inapplicable to datasets in which recombination has occurred and others are procedurally complex and/or result in disconnected networks.
We address the problems inherent in construction and reduction of quasi-median networks. We describe a novel method of generating quasi-median networks that uses all characters, both binary and multistate, without imposing an arbitrary ordering of the multistate partitions. We also describe a pruning mechanism which maintains at least one shortest path between observed sequences, displaying the underlying relations between all pairs of sequences while maintaining a connected graph.
Application of this approach to 5S rDNA sequence data from sea beet produced a pruned network within which genetic isolation between populations by distance was evident, demonstrating the value of this approach for exploration of evolutionary relationships.
PMCID: PMC2267707  PMID: 18298812
23.  FastML: a web server for probabilistic reconstruction of ancestral sequences 
Nucleic Acids Research  2012;40(Web Server issue):W580-W584.
Ancestral sequence reconstruction is essential to a variety of evolutionary studies. Here, we present the FastML web server, a user-friendly tool for the reconstruction of ancestral sequences. FastML implements various novel features that differentiate it from existing tools: (i) FastML uses an indel-coding method, in which each gap, possibly spanning multiples sites, is coded as binary data. FastML then reconstructs ancestral indel states assuming a continuous time Markov process. FastML provides the most likely ancestral sequences, integrating both indels and characters; (ii) FastML accounts for uncertainty in ancestral states: it provides not only the posterior probabilities for each character and indel at each sequence position, but also a sample of ancestral sequences from this posterior distribution, and a list of the k-most likely ancestral sequences; (iii) FastML implements a large array of evolutionary models, which makes it generic and applicable for nucleotide, protein and codon sequences; and (iv) a graphical representation of the results is provided, including, for example, a graphical logo of the inferred ancestral sequences. The utility of FastML is demonstrated by reconstructing ancestral sequences of the Env protein from various HIV-1 subtypes. FastML is freely available for all academic users and is available online at
PMCID: PMC3394241  PMID: 22661579
24.  Why do snails have hairs? A Bayesian inference of character evolution 
Costly structures need to represent an adaptive advantage in order to be maintained over evolutionary times. Contrary to many other conspicuous shell ornamentations of gastropods, the haired shells of several Stylommatophoran land snails still lack a convincing adaptive explanation. In the present study, we analysed the correlation between the presence/absence of hairs and habitat conditions in the genus Trochulus in a Bayesian framework of character evolution.
Haired shells appeared to be the ancestral character state, a feature most probably lost three times independently. These losses were correlated with a shift from humid to dry habitats, indicating an adaptive function of hairs in moist environments. It had been previously hypothesised that these costly protein structures of the outer shell layer facilitate the locomotion in moist habitats. Our experiments, on the contrary, showed an increased adherence of haired shells to wet surfaces.
We propose the hypothesis that the possession of hairs facilitates the adherence of the snails to their herbaceous food plants during foraging when humidity levels are high. The absence of hairs in some Trochulus species could thus be explained as a loss of the potential adaptive function linked to habitat shifts.
PMCID: PMC1310604  PMID: 16271138
25.  Accommodating natural and sexual selection in butterfly wing pattern evolution 
Visual patterns in animals may serve different functions, such as attracting mates and deceiving predators. If a signal is used for multiple functions, the opportunity arises for conflict among the different functions, preventing optimization for any one visual signal. Here we investigate the hypothesis that spatial separation of different visual signal functions has occurred in Bicyclus butterflies. Using phylogenetic reconstructions of character evolution and comparisons of evolutionary rates, we found dorsal surface characters to evolve at higher rates than ventral characters. Dorsal characters also displayed sex-based differences in evolutionary rates more often than did ventral characters. Thus, dorsal characters corresponded to our predictions of mate signalling while ventral characters appear to play an important role in predator avoidance. Forewing characters also fit a model of mate signalling, and displayed higher rates of evolution than hindwing characters. Our results, as well as the behavioural and developmental data from previous studies of Bicyclus species, support the hypothesis that spatial separation of visual signal functions has occurred in Bicyclus butterflies. This study is the first to demonstrate, in a phylogenetic framework, that spatial separation of signals used for mate signalling and those used for predator avoidance is a viable strategy to accommodate multiple signal functions. This signalling strategy has important ramifications on the developmental evolution of wing pattern elements and diversification of butterfly species.
PMCID: PMC2690465  PMID: 19364741
eyespot; modularity; Nymphalidae; likelihood; Bicyclus; wing patterns

Results 1-25 (690541)