1.  EREM: Parameter Estimation and Ancestral Reconstruction by Expectation-Maximization Algorithm for a Probabilistic Model of Genomic Binary Characters Evolution 
Advances in Bioinformatics  2010;2010:167408.
Evolutionary binary characters are features of species or genes, indicating the absence (value zero) or presence (value one) of some property. Examples include eukaryotic gene architecture (the presence or absence of an intron in a particular locus), gene content, and morphological characters. In many studies, the acquisition of such binary characters is assumed to represent a rare evolutionary event, and consequently, their evolution is analyzed using various flavors of parsimony. However, when gain and loss of the character are not rare enough, a probabilistic analysis becomes essential. Here, we present a comprehensive probabilistic model to describe the evolution of binary characters on a bifurcating phylogenetic tree. A fast software tool, EREM, is provided, using maximum likelihood to estimate the parameters of the model and to reconstruct ancestral states (presence and absence in internal nodes) and events (gain and loss events along branches).
PMCID: PMC2866244  PMID: 20467467
2.  Evolution of diverse cell division and vesicle formation systems in Archaea 
Nature Reviews. Microbiology  2010;8(10):731-741.
Recently a novel cell division system comprised of homologues of eukaryotic ESCRT-III (endosomal sorting complex required for transport III) proteins was discovered in the hyperthermophilic crenarchaeote Sulfolobus acidocaldarius. On the basis of this discovery, we undertook a comparative genomic analysis of the machineries for cell division and vesicle formation in Archaea. Archaea possess at least three distinct membrane remodelling systems: the FtsZ-based bacterial-type system, the ESCRT-III-based eukaryote-like system and a putative novel system that uses an archaeal actin-related protein. Many archaeal genomes encode assortments of components from different systems. Evolutionary reconstruction from these findings suggests that the last common ancestor of the extant Archaea possessed a complex membrane remodelling apparatus, different components of which were lost during subsequent evolution of archaeal lineages. By contrast, eukaryotes seem to have inherited all three ancestral systems.
PMCID: PMC3293450  PMID: 20818414
3.  The Incredible Expanding Ancestor of Eukaryotes 
Cell  2010;140(5):606-608.
Comparing the genome sequences of free-living organisms in the five eukaryotic supergroups enables predictions to be made about the genome of the last common ancestor of eukaryotes. The genome sequence of the amoeboflagellate Naegleria gruberi reported by Fritz-Laylin et al. (2010) reveals the surprising complexity of this unicellular organism and, by inference, of the last common eukaryotic ancestor.
PMCID: PMC3293451  PMID: 20211127
4.  The wonder world of microbial viruses 
The first congress on Viruses of Microbes took place at the Institut Pasteur in Paris, France, on 21–25 June 2010. The advances in genomics and metagenomics reported at this meeting reveal striking and unexpected complexity of the virus world. Viruses, in particular viruses that infect prokaryotes and unicellular eukaryotes, are emerging as the most abundant class of biological entities on earth and a major evolutionary and geochemical force.
PMCID: PMC3293457  PMID: 20954874
5.  Constraints and plasticity in genome and molecular-phenome evolution 
Nature Reviews. Genetics  2010;11(7):487-498.
Multiple constraints variously affect different parts of the genomes of diverse life forms. The selective pressures that shape the evolution of viral, archaeal, bacterial and eukaryotic genomes differ markedly, even among relatively closely related animal and bacterial lineages; by contrast, constraints affecting protein evolution seem to be more universal. The constraints that shape the evolution of genomes and phenomes are complemented by the plasticity and robustness of genome architecture, expression and regulation. Taken together, these findings are starting to reveal complex networks of evolutionary processes that must be integrated to attain a new synthesis of evolutionary biology.
PMCID: PMC3273317  PMID: 20548290
6.  Evolution of AANAT: expansion of the gene family in the cephalochordate amphioxus 
The arylalkylamine N-acetyltransferase (AANAT) family is divided into structurally distinct vertebrate and non-vertebrate groups. Expression of vertebrate AANATs is limited primarily to the pineal gland and retina, where it plays a role in controlling the circadian rhythm in melatonin synthesis. Based on the role melatonin plays in biological timing, AANAT has been given the moniker "the Timezyme". Non-vertebrate AANATs, which occur in fungi and protists, are thought to play a role in detoxification and are not known to be associated with a specific tissue.
We have found that the amphioxus genome contains seven AANATs, all having non-vertebrate type features. This and the absence of AANATs from the genomes of Hemichordates and Urochordates support the view that a major transition in the evolution of the AANATs may have occurred at the onset of vertebrate evolution. Analysis of the expression pattern of the two most structurally divergent AANATs in Branchiostoma lanceolatum (bl) revealed that they are expressed early in development and also in the adult at low levels throughout the body, possibly associated with the neural tube. Expression is clearly not exclusively associated with the proposed analogs of the pineal gland and retina. blAANAT activity is influenced by environmental lighting, but light/dark differences do not persist under constant light or constant dark conditions, indicating they are not circadian in nature. bfAANATα and bfAANATδ' have unusually alkaline (> 9.0) optimal pH, more than two pH units higher than that of vertebrate AANATs.
The substrate selectivity profiles of bfAANATα and δ' are relatively broad, including alkylamines, arylalkylamines and diamines, in contrast to vertebrate forms, which selectively acetylate serotonin and other arylalkylamines. Based on these features, it appears that amphioxus AANATs could play several roles, including detoxification and biogenic amine inactivation. The presence of seven AANATs in amphioxus genome supports the view that arylalkylamine and polyamine acetylation is important to the biology of this organism and that these genes evolved in response to specific pressures related to requirements for amine acetylation.
PMCID: PMC2897805  PMID: 20500864
7.  Two new families of the FtsZ-tubulin protein superfamily implicated in membrane remodeling in diverse bacteria and archaea 
Biology Direct  2010;5:33.
Several recent discoveries reveal unexpected versatility of the bacterial and archaeal cytoskeleton systems that are involved in cell division and other processes based on membrane remodeling. Here we apply methods for distant protein sequence similarity detection, phylogenetic approaches, and genome context analysis to described two previously unnoticed families of the FtsZ-tubulin superfamily. One of these families is limited in its spread to Proteobacteria whereas the other is represented in diverse bacteria and archaea, and might be the key component of a novel, multicomponent membrane remodeling system that also includes a Von Willebrand A domain-containing protein, a distinct GTPase and membrane transport proteins of the OmpA family.
This article was reviewed by Purificación López-García and Gáspár Jékely; for complete reviews, see the Reviewers Reports section.
PMCID: PMC2875224  PMID: 20459678
8.  A low-polynomial algorithm for assembling clusters of orthologous groups from intergenomic symmetric best matches 
Bioinformatics  2010;26(12):1481-1487.
Motivation: Identifying orthologous genes in multiple genomes is a fundamental task in comparative genomics. Construction of intergenomic symmetrical best matches (SymBets) and joining them into clusters is a popular method of ortholog definition, embodied in several software programs. Despite their wide use, the computational complexity of these programs has not been thoroughly examined.
Results: In this work, we show that in the standard approach of iteration through all triangles of SymBets, the memory scales with at least the number of these triangles, O(g3) (where g = number of genomes), and construction time scales with the iteration through each pair, i.e. O(g6). We propose the EdgeSearch algorithm that iterates over edges in the SymBet graph rather than triangles of SymBets, and as a result has a worst-case complexity of only O(g3log g). Several optimizations reduce the run-time even further in realistically sparse graphs. In two real-world datasets of genomes from bacteriophages (POGs) and Mollicutes (MOGs), an implementation of the EdgeSearch algorithm runs about an order of magnitude faster than the original algorithm and scales much better with increasing number of genomes, with only minor differences in the final results, and up to 60 times faster than the popular OrthoMCL program with a 90% overlap between the identified groups of orthologs.
Availability and implementation: C++ source code freely available for download at
Supplementary information: Supplementary materials are available at Bioinformatics online.
PMCID: PMC2881409  PMID: 20439257
9.  Non-homologous isofunctional enzymes: A systematic analysis of alternative solutions in enzyme evolution 
Biology Direct  2010;5:31.
Evolutionarily unrelated proteins that catalyze the same biochemical reactions are often referred to as analogous - as opposed to homologous - enzymes. The existence of numerous alternative, non-homologous enzyme isoforms presents an interesting evolutionary problem; it also complicates genome-based reconstruction of the metabolic pathways in a variety of organisms. In 1998, a systematic search for analogous enzymes resulted in the identification of 105 Enzyme Commission (EC) numbers that included two or more proteins without detectable sequence similarity to each other, including 34 EC nodes where proteins were known (or predicted) to have distinct structural folds, indicating independent evolutionary origins. In the past 12 years, many putative non-homologous isofunctional enzymes were identified in newly sequenced genomes. In addition, efforts in structural genomics resulted in a vastly improved structural coverage of proteomes, providing for definitive assessment of (non)homologous relationships between proteins.
We report the results of a comprehensive search for non-homologous isofunctional enzymes (NISE) that yielded 185 EC nodes with two or more experimentally characterized - or predicted - structurally unrelated proteins. Of these NISE sets, only 74 were from the original 1998 list. Structural assignments of the NISE show over-representation of proteins with the TIM barrel fold and the nucleotide-binding Rossmann fold. From the functional perspective, the set of NISE is enriched in hydrolases, particularly carbohydrate hydrolases, and in enzymes involved in defense against oxidative stress.
These results indicate that at least some of the non-homologous isofunctional enzymes were recruited relatively recently from enzyme families that are active against related substrates and are sufficiently flexible to accommodate changes in substrate specificity.
This article was reviewed by Andrei Osterman, Keith F. Tipton (nominated by Martijn Huynen) and Igor B. Zhulin. For the full reviews, go to the Reviewers' comments section.
PMCID: PMC2876114  PMID: 20433725
10.  Distinct Patterns of Expression and Evolution of Intronless and Intron-Containing Mammalian Genes 
Molecular Biology and Evolution  2010;27(8):1745-1749.
Comparison of expression levels and breadth and evolutionary rates of intronless and intron-containing mammalian genes shows that intronless genes are expressed at lower levels, tend to be tissue specific, and evolve significantly faster than spliced genes. By contrast, monomorphic spliced genes that are not subject to detectable alternative splicing and polymorphic alternatively spliced genes show similar statistically indistinguishable patterns of expression and evolution. Alternative splicing is most common in ancient genes, whereas intronless genes appear to have relatively recent origins. These results imply tight coupling between different stages of gene expression, in particular, transcription, splicing, and nucleocytosolic transport of transcripts, and suggest that formation of intronless genes is an important route of evolution of novel tissue-specific functions in animals.
PMCID: PMC2908711  PMID: 20360214
alternative splicing; intronless genes; monomorphic genes; polymorphic genes; mammalian gene evolution
11.  Abundance of type I toxin–antitoxin systems in bacteria: searches for new candidates and discovery of novel families 
Nucleic Acids Research  2010;38(11):3743-3759.
Small, hydrophobic proteins whose synthesis is repressed by small RNAs (sRNAs), denoted type I toxin–antitoxin modules, were first discovered on plasmids where they regulate plasmid stability, but were subsequently found on a few bacterial chromosomes. We used exhaustive PSI-BLAST and TBLASTN searches across 774 bacterial genomes to identify homologs of known type I toxins. These searches substantially expanded the collection of predicted type I toxins, revealed homology of the Ldr and Fst toxins, and suggested that type I toxin–antitoxin loci are not spread by horizontal gene transfer. To discover novel type I toxin–antitoxin systems, we developed a set of search parameters based on characteristics of known loci including the presence of tandem repeats and clusters of charged and bulky amino acids at the C-termini of short proteins containing predicted transmembrane regions. We detected sRNAs for three predicted toxins from enterohemorrhagic Escherichia coli and Bacillus subtilis, and showed that two of the respective proteins indeed are toxic when overexpressed. We also demonstrated that the local free-energy minima of RNA folding can be used to detect the positions of the sRNA genes. Our results suggest that type I toxin–antitoxin modules are much more widely distributed among bacteria than previously appreciated.
PMCID: PMC2887945  PMID: 20156992
12.  Taming of the shrewd: novel eukaryotic genes from RNA viruses 
BMC Biology  2010;8:2.
Genomes of several yeast species contain integrated DNA copies of complete genomes or individual genes of non-retroviral double-strand RNA viruses as reported in a recent BMC Biology article by Taylor and Bruenn. The integrated virus-specific sequences are at least partially expressed and seem to evolve under pressure of purifying selection, indicating that these are functional genes. Together with similar reports on integrated copies of some animal RNA viruses, these results suggest that integration of DNA copies of non-reverse-transcribing RNA viruses might be much more common than previously thought. The integrated copies could contribute to acquired immunity to the respective viruses.
PMCID: PMC2823675  PMID: 20067611

