PMCC PMCC

Search tips
Search criteria

Advanced
Results 1-24 (24)
 

Clipboard (0)
None
Journals
Year of Publication
Document Types
1.  EGN: a wizard for construction of gene and genome similarity networks 
Background
Increasingly, similarity networks are being used for evolutionary analyses of molecular datasets. These networks are very useful, in particular for the analysis of gene sharing, lateral gene transfer and for the detection of distant homologs. Currently, such analyses require some computer programming skills due to the limited availability of user-friendly freely distributed software. Consequently, although appealing, the construction and analyses of these networks remain less familiar to biologists than do phylogenetic approaches.
Results
In order to ease the use of similarity networks in the community of evolutionary biologists, we introduce a software program, EGN, that runs under Linux or MacOSX. EGN automates the reconstruction of gene and genome networks from nucleic and proteic sequences. EGN also implements statistics describing genetic diversity in these samples, for various user-defined thresholds of similarities. In the interest of studying the complexity of evolutionary processes affecting microbial evolution, we applied EGN to a dataset of 571,044 proteic sequences from the three domains of life and from mobile elements. We observed that, in Borrelia, plasmids play a different role than in most other eubacteria. Rather than being genetic couriers involved in lateral gene transfer, Borrelia’s plasmids and their genes act as private genetic goods, that contribute to the creation of genetic diversity within their parasitic hosts.
Conclusion
EGN can be used for constructing, analyzing, and mining molecular datasets in evolutionary studies. The program can help increase our knowledge of the processes through which genes from distinct sources and/or from multiple genomes co-evolve in lineages of cellular organisms.
doi:10.1186/1471-2148-13-146
PMCID: PMC3727994  PMID: 23841456
Similarity network; Evolution; Borrelia; Plasmid; Genomics; Graph; Pathogen; Lateral gene transfer; Introgressive descent
2.  Simulating gene trees under the multispecies coalescent and time-dependent migration 
Background
The multispecies coalescent model has become popular in recent years as a framework to infer a species phylogeny from multilocus genetic data collected from multiple individuals. The model assumes that speciation occurs at a specific point in time, after which the two sub-species evolve in total isolation. However in reality speciation may occur over an extended period of time, during which sister lineages remain in partial contact. Inference of multispecies phylogenies under those conditions is difficult. Indeed even designing simulators which correctly sample gene histories under these conditions is non-trivial.
Results
In this paper we present a method and software which simulates gene trees under both the multispecies coalescent and migration. Our approach allows for both population sizes and migration rates to change over the species lifetime. Also, migration rates are specified in units of fraction of emigrants per time unit, which makes them easier to interpret. Overall this setup covers a wide range of migration scenarios. The software can be used to investigate properties of gene trees under different migration settings and to generate test cases for programs which infer species trees and/or migration from sequence data. Using simulated data we investigate the effect of migrations between sister lineages on the inference of multispecies phylogenies and on post analysis detection.
Conclusions
Our results indicate that while estimation of species tree topology can be quite robust to the presence of gene flow, the inference and detection of migration is problematic, even with methods based on full likelihood models.
doi:10.1186/1471-2148-13-44
PMCID: PMC3646706  PMID: 23418788
3.  LineageSpecificSeqgen: generating sequence data with lineage-specific variation in the proportion of variable sites 
Background
Commonly used phylogenetic models assume a homogeneous evolutionary process throughout the tree. It is known that these homogeneous models are often too simplistic, and that with time some properties of the evolutionary process can change (due to selection or drift). In particular, as constraints on sequences evolve, the proportion of variable sites can vary between lineages. This affects the ability of phylogenetic methods to correctly estimate phylogenetic trees, especially for long timescales. To date there is no phylogenetic model that allows for change in the proportion of variable sites, and the degree to which this affects phylogenetic reconstruction is unknown.
Results
We present LineageSpecificSeqgen, an extension to the seq-gen program that allows generation of sequences with both changes in the proportion of variable sites and changes in the rate at which sites switch between being variable and invariable. In contrast to seq-gen and its derivatives to date, we interpret branch lengths as the mean number of substitutions per variable site, as opposed to the mean number of substitutions per site (which is averaged over all sites, including invariable sites). This allows specification of the substitution rates of variable sites, independently of the proportion of invariable sites.
Conclusion
LineageSpecificSeqgen allows simulation of DNA and amino acid sequence alignments under a lineage-specific evolutionary process. The program can be used to test current models of evolution on sequences that have undergone lineage-specific evolution. It facilitates the development of both new methods to identify such processes in real data, and means to account for such processes. The program is available at: http://awcmee.massey.ac.nz/downloads.htm.
doi:10.1186/1471-2148-8-317
PMCID: PMC2613921  PMID: 19021917
4.  Non-homogeneous models of sequence evolution in the Bio++ suite of libraries and programs 
Background
Accurately modeling the sequence substitution process is required for the correct estimation of evolutionary parameters, be they phylogenetic relationships, substitution rates or ancestral states; it is also crucial to simulate realistic data sets. Such simulation procedures are needed to estimate the null-distribution of complex statistics, an approach referred to as parametric bootstrapping, and are also used to test the quality of phylogenetic reconstruction programs. It has often been observed that homologous sequences can vary widely in their nucleotide or amino-acid compositions, revealing that sequence evolution has changed importantly among lineages, and may therefore be most appropriately approached through non-homogeneous models. Several programs implementing such models have been developed, but they are limited in their possibilities: only a few particular models are available for likelihood optimization, and data sets cannot be easily generated using the resulting estimated parameters.
Results
We hereby present a general implementation of non-homogeneous models of substitutions. It is available as dedicated classes in the Bio++ libraries and can hence be used in any C++ program. Two programs that use these classes are also presented. The first one, Bio++ Maximum Likelihood (BppML), estimates parameters of any non-homogeneous model and the second one, Bio++ Sequence Generator (BppSeqGen), simulates the evolution of sequences from these models. These programs allow the user to describe non-homogeneous models through a property file with a simple yet powerful syntax, without any programming required.
Conclusion
We show that the general implementation introduced here can accommodate virtually any type of non-homogeneous models of sequence evolution, including heterotachous ones, while being computer efficient. We furthermore illustrate the use of such general models for parametric bootstrapping, using tests of non-homogeneity applied to an already published ribosomal RNA data set.
doi:10.1186/1471-2148-8-255
PMCID: PMC2559849  PMID: 18808672
5.  Exploring historical trends using taxonomic name metadata 
Background
Authority and year information have been attached to taxonomic names since Linnaean times. The systematic structure of taxonomic nomenclature facilitates the ability to develop tools that can be used to explore historical trends that may be associated with taxonomy.
Results
From the over 10.7 million taxonomic names that are part of the uBio system [4], approximately 3 million names were identified to have taxonomic authority information from the years 1750 to 2004. A pipe-delimited file was then generated, organized according to a Linnaean hierarchy and by years from 1750 to 2004, and imported into an Excel workbook. A series of macros were developed to create an Excel-based tool and a complementary Web site to explore the taxonomic data. A cursory and speculative analysis of the data reveals observable trends that may be attributable to significant events that are of both taxonomic (e.g., publishing of key monographs) and societal importance (e.g., world wars). The findings also help quantify the number of taxonomic descriptions that may be made available through digitization initiatives.
Conclusion
Temporal organization of taxonomic data can be used to identify interesting biological epochs relative to historically significant events and ongoing efforts. We have developed an Excel workbook and complementary Web site that enables one to explore taxonomic trends for Linnaean taxonomic groupings, from Kingdoms to Families.
doi:10.1186/1471-2148-8-144
PMCID: PMC2408592  PMID: 18477399
6.  ReXSpecies – a tool for the analysis of the evolution of gene regulation across species 
Background
Annotated phylogenetic trees that display the evolution of transcription factor binding in regulatory regions are useful for e.g. 1) narrowing down true positive predicted binding sites, providing predictions for binding sites that can be tested experimentally, and 2) giving insight into the evolution of gene regulation and regulatory networks.
Results
We describe ReXSpecies, a web-server that processes the sequence information of a regulatory region for multiple species and associated (predicted) transcription factor binding sites into two figures: a) An annotated alignment of sequence and binding sites, consolidated and filtered for ease of use, and b) an annotated tree labeled by the gain and loss of binding sites, where the tree can be calculated from the data or taken from a trusted taxonomy, and the labels are calculated based on standard or Dollo parsimony. For genes involved in mammalian pluripotency, ReXSpecies trees highlight useful patterns of transcription factor binding site gain and loss, e.g. for the Oct and Sox group of factors in the 3' untranslated region of the cystic fibrosis transmembrane conductance regulator gene, which closely match experimental data.
Conclusion
ReXSpecies post-processes the information provided by transcription factor binding site prediction tools, in order to compare data from many species. The tool eases visualization and successive interpretation of transcription factor binding data in an evolutionary context. The ReXSpecies URL can be found in the Availability and requirements section.
doi:10.1186/1471-2148-8-111
PMCID: PMC2346469  PMID: 18410675
7.  PhyloFinder: An intelligent search engine for phylogenetic tree databases 
Background
Bioinformatic tools are needed to store and access the rapidly growing phylogenetic data. These tools should enable users to identify existing phylogenetic trees containing a specified taxon or set of taxa and to compare a specified phylogenetic hypothesis to existing phylogenetic trees.
Results
PhyloFinder is an intelligent search engine for phylogenetic databases that we have implemented using trees from TreeBASE. It enables taxonomic queries, in which it identifies trees in the database containing the exact name of the query taxon and/or any synonymous taxon names, and it provides spelling suggestions for the query when there is no match. Additionally, PhyloFinder can identify trees containing descendants or direct ancestors of the query taxon. PhyloFinder also performs phylogenetic queries, in which it identifies trees that contain the query tree or topologies that are similar to the query tree.
Conclusion
PhyloFinder can enhance the utility of any tree database by providing tools for both taxonomic and phylogenetic queries as well as visualization tools that highlight the query results and provide links to NCBI and TBMap. An implementation of PhyloFinder using trees from TreeBASE is available from the web client application found in the availability and requirements section.
doi:10.1186/1471-2148-8-90
PMCID: PMC2362120  PMID: 18366717
8.  PhyloSort: a user-friendly phylogenetic sorting tool and its application to estimating the cyanobacterial contribution to the nuclear genome of Chlamydomonas 
Background
Phylogenomic pipelines generate a large collection of phylogenetic trees that require manual inspection to answer questions about gene or genome evolution. A notable application of phylogenomics is to photosynthetic organelle (plastid) endosymbiosis. In the case of primary endosymbiosis, a heterotrophic protist engulfed a cyanobacterium, giving rise to the first photosynthetic eukaryote. Plastid establishment precipitated extensive gene transfer from the endosymbiont to the nuclear genome of the 'host'. Estimating the magnitude of this endosymbiotic gene transfer (EGT) and determining the functions of the prokaryotic genes remain controversial issues. We used phylogenomics to study EGT in the model green alga Chlamydomonas reinhardtii. To facilitate this procedure, we developed PhyloSort to rapidly search large collection of trees for monophyletic relationships. Here we present PhyloSort and its application to estimating EGT in Chlamydomonas.
Results
PhyloSort is an open-source tool to sort phylogenetic trees by searching for user specified subtrees that contain a monophyletic group of interest defined by operational taxonomic units in a phylogenomic context. Using PhyloSort, we identified 897 Chlamydomonas genes of putative cyanobacterial origin, of which 531 had bootstrap support values ≥ 50% for the grouping of the algal and cyanobacterial homologs.
Conclusion
PhyloSort can be applied to quantify the number of genes that support different evolutionary hypotheses such as a taxonomic classification or endosymbiotic or horizontal gene transfer events. In our application, we demonstrate that cyanobacteria account for 3.5–6% of the protein-coding genes in the nuclear genome of Chlamydomonas.
doi:10.1186/1471-2148-8-6
PMCID: PMC2254586  PMID: 18194581
9.  Visualization and Exploration of Conserved Regulatory Modules Using ReXSpecies 2 
Background
The prediction of transcription factor binding sites is difficult for many reasons. Thus, filtering methods are needed to enrich for biologically relevant (true positive) matches in the large amount of computational predictions that are frequently generated from promoter sequences.
Results
ReXSpecies 2 filters predictions of transcription factor binding sites and generates a set of figures displaying them in evolutionary context. More specifically, it uses position specific scoring matrices to search for motifs that specify transcription factor binding sites. It removes redundant matches and filters the remaining matches by the phylogenetic group that the matrices belong to. It then identifies potential transcriptional modules, and generates figures that highlight such modules, taking evolution into consideration. Module formation, scoring by evolutionary criteria and visual clues reduce the amount of predictions to a manageable scale. Identification of transcription factor binding sites of particular functional importance is left to expert filtering. ReXSpecies 2 interacts with genome browsers to enable scientists to filter predictions together with other sequence-related data.
Conclusions
Based on ReXSpecies 2, we derive plausible hypotheses about the regulation of pluripotency. Our tool is designed to analyze transcription factor binding site predictions considering their common pattern of occurrence, highlighting their evolutionary history.
doi:10.1186/1471-2148-11-267
PMCID: PMC3203875  PMID: 21942985
10.  PhyDesign: an online application for profiling phylogenetic informativeness 
Background
The rapid increase in number of sequenced genomes for species across of the tree of life is revealing a diverse suite of orthologous genes that could potentially be employed to inform molecular phylogenetic studies that encompass broader taxonomic sampling. Optimal usage of this diversity of loci requires user-friendly tools to facilitate widespread cost-effective locus prioritization for phylogenetic sampling. The Townsend (2007) phylogenetic informativeness provides a unique empirical metric for guiding marker selection. However, no software or automated methodology to evaluate sequence alignments and estimate the phylogenetic informativeness metric has been available.
Results
Here, we present PhyDesign, a platform-independent online application that implements the Townsend (2007) phylogenetic informativeness analysis, providing a quantitative prediction of the utility of loci to solve specific phylogenetic questions. An easy-to-use interface facilitates uploading of alignments and ultrametric trees to calculate and depict profiles of informativeness over specified time ranges, and provides rankings of locus prioritization for epochs of interest.
Conclusions
By providing these profiles, PhyDesign facilitates locus prioritization increasing the efficiency of sequencing for phylogenetic purposes compared to traditional studies with more laborious and low capacity screening methods, as well as increasing the accuracy of phylogenetic studies. Together with a manual and sample files, the application is freely accessible at http://phydesign.townsend.yale.edu.
doi:10.1186/1471-2148-11-152
PMCID: PMC3124428  PMID: 21627831
11.  A novel web-based TinT application and the chronology of the Primate Alu retroposon activity 
Background
DNA sequences afford access to the evolutionary pathways of life. Particularly mobile elements that constantly co-evolve in genomes encrypt recent and ancient information of their host's history. In mammals there is an extraordinarily abundant activity of mobile elements that occurs in a dynamic succession of active families, subfamilies, types, and subtypes of retroposed elements. The high frequency of retroposons in mammals implies that, by chance, such elements also insert into each other. While inactive elements are no longer able to retropose, active elements retropose by chance into other active and inactive elements. Thousands of such directional, element-in-element insertions are found in present-day genomes. To help analyze these events, we developed a computational algorithm (Transpositions in Transpositions, or TinT) that examines the different frequencies of nested transpositions and reconstructs the chronological order of retroposon activities.
Results
By examining the different frequencies of such nested transpositions, the TinT application reconstructs the chronological order of retroposon activities. We use such activity patterns as a comparative tool to (1) delineate the historical rise and fall of retroposons and their relations to each other, (2) understand the retroposon-induced complexity of recent genomes, and (3) find selective informative homoplasy-free markers of phylogeny. The efficiency of the new application is demonstrated by applying it to dimeric Alu Short INterspersed Elements (SINE) to derive a complete chronology of such elements in primates.
Conclusion
The user-friendly, web-based TinT interface presented here affords an easy, automated screening for nested transpositions from genome assemblies or trace data, assembles them in a frequency-matrix, and schematically displays their chronological activity history.
doi:10.1186/1471-2148-10-376
PMCID: PMC3014933  PMID: 21126360
12.  BMGE (Block Mapping and Gathering with Entropy): a new software for selection of phylogenetic informative regions from multiple sequence alignments 
Background
The quality of multiple sequence alignments plays an important role in the accuracy of phylogenetic inference. It has been shown that removing ambiguously aligned regions, but also other sources of bias such as highly variable (saturated) characters, can improve the overall performance of many phylogenetic reconstruction methods. A current scientific trend is to build phylogenetic trees from a large number of sequence datasets (semi-)automatically extracted from numerous complete genomes. Because these approaches do not allow a precise manual curation of each dataset, there exists a real need for efficient bioinformatic tools dedicated to this alignment character trimming step.
Results
Here is presented a new software, named BMGE (Block Mapping and Gathering with Entropy), that is designed to select regions in a multiple sequence alignment that are suited for phylogenetic inference. For each character, BMGE computes a score closely related to an entropy value. Calculation of these entropy-like scores is weighted with BLOSUM or PAM similarity matrices in order to distinguish among biologically expected and unexpected variability for each aligned character. Sets of contiguous characters with a score above a given threshold are considered as not suited for phylogenetic inference and then removed. Simulation analyses show that the character trimming performed by BMGE produces datasets leading to accurate trees, especially with alignments including distantly-related sequences. BMGE also implements trimming and recoding methods aimed at minimizing phylogeny reconstruction artefacts due to compositional heterogeneity.
Conclusions
BMGE is able to perform biologically relevant trimming on a multiple alignment of DNA, codon or amino acid sequences. Java source code and executable are freely available at ftp://ftp.pasteur.fr/pub/GenSoft/projects/BMGE/.
doi:10.1186/1471-2148-10-210
PMCID: PMC3017758  PMID: 20626897
13.  WAMI: a web server for the analysis of minisatellite maps 
Background
Minisatellites are genomic loci composed of tandem arrays of short repetitive DNA segments. A minisatellite map is a sequence of symbols that represents the tandem repeat array such that the set of symbols is in one-to-one correspondence with the set of distinct repeats. Due to variations in repeat type and organization as well as copy number, the minisatellite maps have been widely used in forensic and population studies. In either domain, researchers need to compare the set of maps to each other, to build phylogenetic trees, to spot structural variations, and to study duplication dynamics. Efficient algorithms for these tasks are required to carry them out reliably and in reasonable time.
Results
In this paper we present WAMI, a web-server for the analysis of minisatellite maps. It performs the above mentioned computational tasks using efficient algorithms that take the model of map evolution into account. The WAMI interface is easy to use and the results of each analysis task are visualized.
Conclusions
To the best of our knowledge, WAMI is the first server providing all these computational facilities to the minisatellite community. The WAMI web-interface and the source code of the underlying programs are available at http://www.nubios.nileu.edu.eg/tools/wami.
doi:10.1186/1471-2148-10-167
PMCID: PMC2897807  PMID: 20525398
14.  SPRIT: Identifying horizontal gene transfer in rooted phylogenetic trees 
Background
Phylogenetic trees based on sequences from a set of taxa can be incongruent due to horizontal gene transfer (HGT). By identifying the HGT events, we can reconcile the gene trees and derive a taxon tree that adequately represents the species' evolutionary history. One HGT can be represented by a rooted Subtree Prune and Regraft (RSPR) operation and the number of RSPRs separating two trees corresponds to the minimum number of HGT events. Identifying the minimum number of RSPRs separating two trees is NP-hard, but the problem can be reduced to fixed parameter tractable. A number of heuristic and two exact approaches to identifying the minimum number of RSPRs have been proposed. This is the first implementation delivering an exact solution as well as the intermediate trees connecting the input trees.
Results
We present the SPR Identification Tool (SPRIT), a novel algorithm that solves the fixed parameter tractable minimum RSPR problem and its GPL licensed Java implementation. The algorithm can be used in two ways, exhaustive search that guarantees the minimum RSPR distance and a heuristic approach that guarantees finding a solution, but not necessarily the minimum one. We benchmarked SPRIT against other software in two different settings, small to medium sized trees i.e. five to one hundred taxa and large trees i.e. thousands of taxa. In the small to medium tree size setting with random artificial incongruence, SPRIT's heuristic mode outperforms the other software by always delivering a solution with a low overestimation of the RSPR distance. In the large tree setting SPRIT compares well to the alternatives when benchmarked on finding a minimum solution within a reasonable time. SPRIT presents both the minimum RSPR distance and the intermediate trees.
Conclusions
When used in exhaustive search mode, SPRIT identifies the minimum number of RSPRs needed to reconcile two incongruent rooted trees. SPRIT also performs quick approximations of the minimum RSPR distance, which are comparable to, and often better than, purely heuristic solutions. Put together, SPRIT is an excellent tool for identification of HGT events and pinpointing which taxa have been involved in HGT.
doi:10.1186/1471-2148-10-42
PMCID: PMC2829038  PMID: 20152048
15.  BLAST-EXPLORER helps you building datasets for phylogenetic analysis 
Background
The right sampling of homologous sequences for phylogenetic or molecular evolution analyses is a crucial step, the quality of which can have a significant impact on the final interpretation of the study. There is no single way for constructing datasets suitable for phylogenetic analysis, because this task intimately depends on the scientific question we want to address, Moreover, database mining softwares such as BLAST which are routinely used for searching homologous sequences are not specifically optimized for this task.
Results
To fill this gap, we designed BLAST-Explorer, an original and friendly web-based application that combines a BLAST search with a suite of tools that allows interactive, phylogenetic-oriented exploration of the BLAST results and flexible selection of homologous sequences among the BLAST hits. Once the selection of the BLAST hits is done using BLAST-Explorer, the corresponding sequence can be imported locally for external analysis or passed to the phylogenetic tree reconstruction pipelines available on the Phylogeny.fr platform.
Conclusions
BLAST-Explorer provides a simple, intuitive and interactive graphical representation of the BLAST results and allows selection and retrieving of the BLAST hit sequences based a wide range of criterions. Although BLAST-Explorer primarily aims at helping the construction of sequence datasets for further phylogenetic study, it can also be used as a standard BLAST server with enriched output. BLAST-Explorer is available at http://www.phylogeny.fr
doi:10.1186/1471-2148-10-8
PMCID: PMC2821324  PMID: 20067610
16.  PROCOV: maximum likelihood estimation of protein phylogeny under covarion models and site-specific covarion pattern analysis 
Background
The covarion hypothesis of molecular evolution holds that selective pressures on a given amino acid or nucleotide site are dependent on the identity of other sites in the molecule that change throughout time, resulting in changes of evolutionary rates of sites along the branches of a phylogenetic tree. At the sequence level, covarion-like evolution at a site manifests as conservation of nucleotide or amino acid states among some homologs where the states are not conserved in other homologs (or groups of homologs). Covarion-like evolution has been shown to relate to changes in functions at sites in different clades, and, if ignored, can adversely affect the accuracy of phylogenetic inference.
Results
PROCOV (protein covarion analysis) is a software tool that implements a number of previously proposed covarion models of protein evolution for phylogenetic inference in a maximum likelihood framework. Several algorithmic and implementation improvements in this tool over previous versions make computationally expensive tree searches with covarion models more efficient and analyses of large phylogenomic data sets tractable. PROCOV can be used to identify covarion sites by comparing the site likelihoods under the covarion process to the corresponding site likelihoods under a rates-across-sites (RAS) process. Those sites with the greatest log-likelihood difference between a 'covarion' and an RAS process were found to be of functional or structural significance in a dataset of bacterial and eukaryotic elongation factors.
Conclusion
Covarion models implemented in PROCOV may be especially useful for phylogenetic estimation when ancient divergences between sequences have occurred and rates of evolution at sites are likely to have changed over the tree. It can also be used to study lineage-specific functional shifts in protein families that result in changes in the patterns of site variability among subtrees.
doi:10.1186/1471-2148-9-225
PMCID: PMC2758850  PMID: 19737395
17.  PhyloExplorer: a web server to validate, explore and query phylogenetic trees 
Background
Many important problems in evolutionary biology require molecular phylogenies to be reconstructed. Phylogenetic trees must then be manipulated for subsequent inclusion in publications or analyses such as supertree inference and tree comparisons. However, no tool is currently available to facilitate the management of tree collections providing, for instance: standardisation of taxon names among trees with respect to a reference taxonomy; selection of relevant subsets of trees or sub-trees according to a taxonomic query; or simply computation of descriptive statistics on the collection. Moreover, although several databases of phylogenetic trees exist, there is currently no easy way to find trees that are both relevant and complementary to a given collection of trees.
Results
We propose a tool to facilitate assessment and management of phylogenetic tree collections. Given an input collection of rooted trees, PhyloExplorer provides facilities for obtaining statistics describing the collection, correcting invalid taxon names, extracting taxonomically relevant parts of the collection using a dedicated query language, and identifying related trees in the TreeBASE database.
Conclusion
PhyloExplorer is a simple and interactive website implemented through underlying Python libraries and MySQL databases. It is available at: and the source code can be downloaded from: .
doi:10.1186/1471-2148-9-108
PMCID: PMC2695458  PMID: 19450253
18.  BEAST: Bayesian evolutionary analysis by sampling trees 
Background
The evolutionary analysis of molecular sequence variation is a statistical enterprise. This is reflected in the increased use of probabilistic models for phylogenetic inference, multiple sequence alignment, and molecular population genetics. Here we present BEAST: a fast, flexible software architecture for Bayesian analysis of molecular sequences related by an evolutionary tree. A large number of popular stochastic models of sequence evolution are provided and tree-based models suitable for both within- and between-species sequence data are implemented.
Results
BEAST version 1.4.6 consists of 81000 lines of Java source code, 779 classes and 81 packages. It provides models for DNA and protein sequence evolution, highly parametric coalescent analysis, relaxed clock phylogenetics, non-contemporaneous sequence data, statistical alignment and a wide range of options for prior distributions. BEAST source code is object-oriented, modular in design and freely available at under the GNU LGPL license.
Conclusion
BEAST is a powerful and flexible evolutionary analysis package for molecular sequence variation. It also provides a resource for the further development of new models and statistical methods of evolutionary analysis.
doi:10.1186/1471-2148-7-214
PMCID: PMC2247476  PMID: 17996036
19.  SCaFoS: a tool for Selection, Concatenation and Fusion of Sequences for phylogenomics 
BMC Evolutionary Biology  2007;7(Suppl 1):S2.
Background
Phylogenetic analyses based on datasets rich in both genes and species (phylogenomics) are becoming a standard approach to resolve evolutionary questions. However, several difficulties are associated with the assembly of large datasets, such as multiple copies of a gene per species (paralogous or xenologous genes), lack of some genes for a given species, or partial sequences. The use of undetected paralogous or xenologous genes in phylogenetic inference can lead to inaccurate results, and the use of partial sequences to a lack of resolution. A tool that selects sequences, species, and genes, while dealing with these issues, is needed in a phylogenomics context.
Results
Here, we present SCaFoS, a tool that quickly assembles phylogenomic datasets containing maximal phylogenetic information while adjusting the amount of missing data in the selection of species, sequences and genes. Starting from individual sequence alignments, and using monophyletic groups defined by the user, SCaFoS creates chimeras with partial sequences, or selects, among multiple sequences, the orthologous and/or slowest evolving sequences. Once sequences representing each predefined monophyletic group have been selected, SCaFos retains genes according to the user's allowed level of missing data and generates files for super-matrix and super-tree analyses in several formats compatible with standard phylogenetic inference software. Because no clear-cut criteria exist for the sequence selection, a semi-automatic mode is available to accommodate user's expertise.
Conclusion
SCaFos is able to deal with datasets of hundreds of species and genes, both at the amino acid or nucleotide level. It has a graphical interface and can be integrated in an automatic workflow. Moreover, SCaFoS is the first tool that integrates user's knowledge to select orthologous sequences, creates chimerical sequences to reduce missing data and selects genes according to their level of missing data. Finally, applying SCaFoS to different datasets, we show that the judicious selection of genes, species and sequences reduces tree reconstruction artefacts, especially if the dataset includes fast evolving species.
doi:10.1186/1471-2148-7-S1-S2
PMCID: PMC1796611  PMID: 17288575
20.  PHOG-BLAST – a new generation tool for fast similarity search of protein families 
Background
The need to compare protein profiles frequently arises in various protein research areas: comparison of protein families, domain searches, resolution of orthology and paralogy. The existing fast algorithms can only compare a protein sequence with a protein sequence and a profile with a sequence. Algorithms to compare profiles use dynamic programming and complex scoring functions.
Results
We developed a new algorithm called PHOG-BLAST for fast similarity search of profiles. This algorithm uses profile discretization to convert a profile to a finite alphabet and utilizes hashing for fast search. To determine the optimal alphabet, we analyzed columns in reliable multiple alignments and obtained column clusters in the 20-dimensional profile space by applying a special clustering procedure. We show that the clustering procedure works best if its parameters are chosen so that 20 profile clusters are obtained which can be interpreted as ancestral amino acid residues. With these clusters, only less than 2% of columns in multiple alignments are out of clusters. We tested the performance of PHOG-BLAST vs. PSI-BLAST on three well-known databases of multiple alignments: COG, PFAM and BALIBASE. On the COG database both algorithms showed the same performance, on PFAM and BALIBASE PHOG-BLAST was much superior to PSI-BLAST. PHOG-BLAST required 10–20 times less computer memory and computation time than PSI-BLAST.
Conclusion
Since PHOG-BLAST can compare multiple alignments of protein families, it can be used in different areas of comparative proteomics and protein evolution. For example, PHOG-BLAST helped to build the PHOG database of phylogenetic orthologous groups. An essential step in building this database was comparing protein complements of different species and orthologous groups of different taxons on a personal computer in reasonable time. When it is applied to detect weak similarity between protein families, PHOG-BLAST is less precise than rigorous profile-profile comparison method, though it runs much faster and can be used as a hit pre-selecting tool.
doi:10.1186/1471-2148-6-51
PMCID: PMC1522020  PMID: 16792802
21.  JMATING: a software for the analysis of sexual selection and sexual isolation effects from mating frequency data 
Background
Many different sexual isolation and sexual selection statistics have been proposed in the past. However, there is no available software that implements all these statistical estimators and their corresponding tests for the study of mating behaviour.
Results
JMATING is an easy-to-use program developed in Java for the analysis of mating frequency data to study sexual selection and sexual isolation effects from laboratory experiments as well as descriptive studies accomplished in the wild. The software allows the re-organization of the data previous to the analysis, the estimation of the most important estimators, and a battery of complementary statistical tests.
Conclusion
JMATING is the first complete and versatile software for the analyses of mating frequency data. It is available at and requires the Java runtime environment.
doi:10.1186/1471-2148-6-40
PMCID: PMC1479385  PMID: 16684352
22.  PATRISTIC: a program for calculating patristic distances and graphically comparing the components of genetic change 
Background
Phylogenies are commonly used to analyse the differences between genes, genomes and species. Patristic distances calculated from tree branch lengths describe the amount of genetic change represented by a tree and are commonly compared with other measures of mutation to investigate the substitutional processes or the goodness of fit of a tree to the raw data. Up until now no universal tool has been available for calculating patristic distances and correlating them with other genetic distance measures.
Results
PATRISTICv1.0 is a java program that calculates patristic distances from large trees in a range of file formats and allows graphical and statistical interpretation of distance matrices calculated by other programs.
Conclusion
The software overcomes some logistic barriers to analysing signals in sequences. In additional to calculating patristic distances, it provides plots for any combination of matrices, calculates commonly used statistics, allows data such as isolation dates to be entered and reorders matrices with matching species or gene labels. It will be used to analyse rates of mutation and substitutional saturation and the evolution of viruses. It is available at and requires the Java runtime environment.
doi:10.1186/1471-2148-6-1
PMCID: PMC1352388  PMID: 16388682
23.  Deduction of probable events of lateral gene transfer through comparison of phylogenetic trees by recursive consolidation and rearrangement 
Background
When organismal phylogenies based on sequences of single marker genes are poorly resolved, a logical approach is to add more markers, on the assumption that weak but congruent phylogenetic signal will be reinforced in such multigene trees. Such approaches are valid only when the several markers indeed have identical phylogenies, an issue which many multigene methods (such as the use of concatenated gene sequences or the assembly of supertrees) do not directly address. Indeed, even when the true history is a mixture of vertical descent for some genes and lateral gene transfer (LGT) for others, such methods produce unique topologies.
Results
We have developed software that aims to extract evidence for vertical and lateral inheritance from a set of gene trees compared against an arbitrary reference tree. This evidence is then displayed as a synthesis showing support over the tree for vertical inheritance, overlaid with explicit lateral gene transfer (LGT) events inferred to have occurred over the history of the tree. Like splits-tree methods, one can thus identify nodes at which conflict occurs. Additionally one can make reasonable inferences about vertical and lateral signal, assigning putative donors and recipients.
Conclusion
A tool such as ours can serve to explore the reticulated dimensionality of molecular evolution, by dissecting vertical and lateral inheritance at high resolution. By this, we mean that individual nodes can be examined not only for congruence, but also for coherence in light of LGT. We assert that our tools will facilitate the comparison of phylogenetic trees, and the interpretation of conflicting data.
doi:10.1186/1471-2148-5-27
PMCID: PMC1087482  PMID: 15819979
24.  TREEFINDER: a powerful graphical analysis environment for molecular phylogenetics 
Background
Most analysis programs for inferring molecular phylogenies are difficult to use, in particular for researchers with little programming experience.
Results
TREEFINDER is an easy-to-use integrative platform-independent analysis environment for molecular phylogenetics. In this paper the main features of TREEFINDER (version of April 2004) are described. TREEFINDER is written in ANSI C and Java and implements powerful statistical approaches for inferring gene tree and related analyzes. In addition, it provides a user-friendly graphical interface and a phylogenetic programming language.
Conclusions
TREEFINDER is a versatile framework for analyzing phylogenetic data across different platforms that is suited both for exploratory as well as advanced studies.
doi:10.1186/1471-2148-4-18
PMCID: PMC459214  PMID: 15222900

Results 1-24 (24)