Search tips
Search criteria

Results 1-12 (12)

Clipboard (0)
more »
Year of Publication
Document Types
1.  Universal Pacemaker of Genome Evolution in Animals and Fungi and Variation of Evolutionary Rates in Diverse Organisms 
Genome Biology and Evolution  2014;6(6):1268-1278.
Gene evolution is traditionally considered within the framework of the molecular clock (MC) model whereby each gene is characterized by an approximately constant rate of evolution. Recent comparative analysis of numerous phylogenies of prokaryotic genes has shown that a different model of evolution, denoted the Universal PaceMaker (UPM), which postulates conservation of relative, rather than absolute evolutionary rates, yields a better fit to the phylogenetic data. Here, we show that the UPM model is a better fit than the MC for genome wide sets of phylogenetic trees from six species of Drosophila and nine species of yeast, with extremely high statistical significance. Unlike the prokaryotic phylogenies that include distant organisms and multiple horizontal gene transfers, these are simple data sets that cover groups of closely related organisms and consist of gene trees with the same topology as the species tree. The results indicate that both lineage-specific and gene-specific rates are important in genome evolution but the lineage-specific contribution is greater. Similar to the MC, the gene evolution rates under the UPM are strongly overdispersed, approximately 2-fold compared with the expectation from sampling error alone. However, we show that neither Drosophila nor yeast genes form distinct clusters in the tree space. Thus, the gene-specific deviations from the UPM, although substantial, are uncorrelated and most likely depend on selective factors that are largely unique to individual genes. Thus, the UPM appears to be a key feature of genome evolution across the history of cellular life.
PMCID: PMC4079209  PMID: 24812293
molecular clock; genome evolution; phylogenetic trees; relative evolution rates
2.  Stability along with Extreme Variability in Core Genome Evolution 
Genome Biology and Evolution  2013;5(7):1393-1402.
The shape of the distribution of evolutionary distances between orthologous genes in pairs of closely related genomes is universal throughout the entire range of cellular life forms. The near invariance of this distribution across billions of years of evolution can be accounted for by the Universal Pace Maker (UPM) model of genome evolution that yields a significantly better fit to the phylogenetic data than the Molecular Clock (MC) model. Unlike the MC, the UPM model does not assume constant gene-specific evolutionary rates but rather postulates that, in each evolving lineage, the evolutionary rates of all genes change (approximately) in unison although the pacemakers of different lineages are not necessarily synchronized. Here, we dissect the nearly constant evolutionary rate distribution by comparing the genome-wide relative rates of evolution of individual genes in pairs or triplets of closely related genomes from diverse bacterial and archaeal taxa. We show that, although the gene-specific relative rate is an important feature of genome evolution that explains more than half of the variance of the evolutionary distances, the ranges of relative rate variability are extremely broad even for universal genes. Because of this high variance, the gene-specific rate is a poor predictor of the conservation rank for any gene in any particular lineage.
PMCID: PMC3730350  PMID: 23821522
evolutionary rate; universal genes; molecular clock; universal pacemaker of genome evolution
3.  Gene Frequency Distributions Reject a Neutral Model of Genome Evolution 
Genome Biology and Evolution  2013;5(1):233-242.
Evolution of prokaryotes involves extensive loss and gain of genes, which lead to substantial differences in the gene repertoires even among closely related organisms. Through a wide range of phylogenetic depths, gene frequency distributions in prokaryotic pangenomes bear a characteristic, asymmetrical U-shape, with a core of (nearly) universal genes, a “shell” of moderately common genes, and a “cloud” of rare genes. We employ mathematical modeling to investigate evolutionary processes that might underlie this universal pattern. Gene frequency distributions for almost 400 groups of 10 bacterial or archaeal species each over a broad range of evolutionary distances were fit to steady-state, infinite allele models based on the distribution of gene replacement rates and the phylogenetic tree relating the species in each group. The fits of the theoretical frequency distributions to the empirical ones yield model parameters and estimates of the goodness of fit. Using the Akaike Information Criterion, we show that the neutral model of genome evolution, with the same replacement rate for all genes, can be confidently rejected. Of the three tested models with purifying selection, the one in which the distribution of replacement rates is derived from a stochastic population model with additive per-gene fitness yields the best fits to the data. The selection strength estimated from the fits declines with evolutionary divergence while staying well outside the neutral regime. These findings indicate that, unlike some other universal distributions of genomic variables, for example, the distribution of paralogous gene family membership, the gene frequency distribution is substantially affected by selection.
PMCID: PMC3595032  PMID: 23315380
gene frequency distribution; steady genome model; goodness of fit; evolution mechanisms
4.  Related Giant Viruses in Distant Locations and Different Habitats: Acanthamoeba polyphaga moumouvirus Represents a Third Lineage of the Mimiviridae That Is Close to the Megavirus Lineage 
Genome Biology and Evolution  2012;4(12):1324-1330.
The 1,021,348 base pair genome sequence of the Acanthamoeba polyphaga moumouvirus, a new member of the Mimiviridae family infecting Acanthamoeba polyphaga, is reported. The moumouvirus represents a third lineage beside mimivirus and megavirus. Thereby, it is a new member of the recently proposed Megavirales order. This giant virus was isolated from a cooling tower water in southeastern France but is most closely related to Megavirus chiliensis, which was isolated from ocean water off the coast of Chile. The moumouvirus is predicted to encode 930 proteins, of which 879 have detectable homologs. Among these predicted proteins, for 702 the closest homolog was detected in Megavirus chiliensis, with the median amino acid sequence identity of 62%. The evolutionary affinity of moumouvirus and megavirus was further supported by phylogenetic tree analysis of conserved genes. The moumouvirus and megavirus genomes share near perfect orthologous gene collinearity in the central part of the genome, with the variations concentrated in the terminal regions. In addition, genomic comparisons of the Mimiviridae reveal substantial gene loss in the moumouvirus lineage. The majority of the remaining moumouvirus proteins are most similar to homologs from other Mimiviridae members, and for 27 genes the closest homolog was found in bacteria. Phylogenetic analysis of these genes supported gene acquisition from diverse bacteria after the separation of the moumouvirus and megavirus lineages. Comparative genome analysis of the three lineages of the Mimiviridae revealed significant mobility of Group I self-splicing introns, with the highest intron content observed in the moumouvirus genome.
PMCID: PMC3542560  PMID: 23221609
moumouvirus; mimivirus; giant virus; megavirus; Mimiviridae; Megavirales; horizontal gene transfer; viral genome; nucleo-cytoplasmic large DNA viruses
5.  A Tight Link between Orthologs and Bidirectional Best Hits in Bacterial and Archaeal Genomes 
Genome Biology and Evolution  2012;4(12):1286-1294.
Orthologous relationships between genes are routinely inferred from bidirectional best hits (BBH) in pairwise genome comparisons. However, to our knowledge, it has never been quantitatively demonstrated that orthologs form BBH. To test this “BBH-orthology conjecture,” we take advantage of the operon organization of bacterial and archaeal genomes and assume that, when two genes in compared genomes are flanked by two BBH show statistically significant sequence similarity to one another, these genes are bona fide orthologs. Under this assumption, we tested whether middle genes in “syntenic orthologous gene triplets” form BBH. We found that this was the case in more than 95% of the syntenic gene triplets in all genome comparisons. A detailed examination of the exceptions to this pattern, including maximum likelihood phylogenetic tree analysis, showed that some of these deviations involved artifacts of genome annotation, whereas very small fractions represented random assignment of the best hit to one of closely related in-paralogs, paralogous displacement in situ, or even less frequent genuine violations of the BBH–orthology conjecture caused by acceleration of evolution in one of the orthologs. We conclude that, at least in prokaryotes, genes for which independent evidence of orthology is available typically form BBH and, conversely, BBH can serve as a strong indication of gene orthology.
PMCID: PMC3542571  PMID: 23160176
orthology; bidirectional best hit; genome comparison; synteny
6.  Negative Correlation between Expression Level and Evolutionary Rate of Long Intergenic Noncoding RNAs 
Genome Biology and Evolution  2011;3:1390-1404.
Mammalian genomes contain numerous genes for long noncoding RNAs (lncRNAs). The functions of the lncRNAs remain largely unknown but their evolution appears to be constrained by purifying selection, albeit relatively weakly. To gain insights into the mode of evolution and the functional range of the lncRNA, they can be compared with much better characterized protein-coding genes. The evolutionary rate of the protein-coding genes shows a universal negative correlation with expression: highly expressed genes are on average more conserved during evolution than the genes with lower expression levels. This correlation was conceptualized in the misfolding-driven protein evolution hypothesis according to which misfolding is the principal cost incurred by protein expression. We sought to determine whether long intergenic ncRNAs (lincRNAs) follow the same evolutionary trend and indeed detected a moderate but statistically significant negative correlation between the evolutionary rate and expression level of human and mouse lincRNA genes. The magnitude of the correlation for the lincRNAs is similar to that for equal-sized sets of protein-coding genes with similar levels of sequence conservation. Additionally, the expression level of the lincRNAs is significantly and positively correlated with the predicted extent of lincRNA molecule folding (base-pairing), however, the contributions of evolutionary rates and folding to the expression level are independent. Thus, the anticorrelation between evolutionary rate and expression level appears to be a general feature of gene evolution that might be caused by similar deleterious effects of protein and RNA misfolding and/or other factors, for example, the number of interacting partners of the gene product.
PMCID: PMC3242500  PMID: 22071789
long noncoding RNA; ncRNA; RNA expression; genomic alignments; introns; RNA folding
7.  Viruses with More Than 1,000 Genes: Mamavirus, a New Acanthamoeba polyphaga mimivirus Strain, and Reannotation of Mimivirus Genes 
The genome sequence of the Mamavirus, a new Acanthamoeba polyphaga mimivirus strain, is reported. With 1,191,693 nt in length and 1,023 predicted protein-coding genes, the Mamavirus has the largest genome among the known viruses. The genomes of the Mamavirus and the previously described Mimivirus are highly similar in both the protein-coding genes and the intergenic regions. However, the Mamavirus contains an extra 5′-terminal segment that encompasses primarily disrupted duplicates of genes present elsewhere in the genome. The Mamavirus also has several unique genes including a small regulatory polyA polymerase subunit that is shared with poxviruses. Detailed analysis of the protein sequences of the two Mimiviruses led to a substantial amendment of the functional annotation of the viral genomes.
PMCID: PMC3163472  PMID: 21705471
Mimivirus; viral genome; nucleocytoplasmic large DNA viruses
8.  Connections between Alternative Transcription and Alternative Splicing in Mammals 
The majority of mammalian genes produce multiple transcripts resulting from alternative splicing (AS) and/or alternative transcription initiation (ATI) and alternative transcription termination (ATT). Comparative analysis of the number of alternative nucleotides, isoforms, and introns per locus in genes with different types of alternative events suggests that ATI and ATT contribute to the diversity of human and mouse transcriptome even more than AS. There is a strong negative correlation between AS and ATI in 5′ untranslated regions (UTRs) and AS in coding sequences (CDSs) but an even stronger positive correlation between AS in CDSs and ATT in 3′ UTRs. These observations could reflect preferential regulation of distinct, large groups of genes by different mechanisms: 1) regulation at the level of transcription initiation and initiation of translation resulting from ATI and AS in 5′ UTRs and 2) posttranslational regulation by different protein isoforms. The tight linkage between AS in CDSs and ATT in 3′ UTRs suggests that variability of 3′ UTRs mediates differential translational regulation of alternative protein forms. Together, the results imply coordinate evolution of AS and alternative transcription, processes that occur concomitantly within gene expression factories.
PMCID: PMC2975443  PMID: 20889654
alternative splicing; alternative transcription initiation; alternative transcription termination; gene expression factories
9.  The Tree and Net Components of Prokaryote Evolution 
Phylogenetic trees of individual genes of prokaryotes (archaea and bacteria) generally have different topologies, largely owing to extensive horizontal gene transfer (HGT), suggesting that the Tree of Life (TOL) should be replaced by a “net of life” as the paradigm of prokaryote evolution. However, trees remain the natural representation of the histories of individual genes given the fundamentally bifurcating process of gene replication. Therefore, although no single tree can fully represent the evolution of prokaryote genomes, the complete picture of evolution will necessarily combine trees and nets. A quantitative measure of the signals of tree and net evolution is derived from an analysis of all quartets of species in all trees of the “Forest of Life” (FOL), which consists of approximately 7,000 phylogenetic trees for prokaryote genes including approximately 100 nearly universal trees (NUTs). Although diverse routes of net-like evolution collectively dominate the FOL, the pattern of tree-like evolution that reflects the consistent topologies of the NUTs is the most prominent coherent trend. We show that the contributions of tree-like and net-like evolutionary processes substantially differ across bacterial and archaeal lineages and between functional classes of genes. Evolutionary simulations indicate that the central tree-like signal cannot be realistically explained by a self-reinforcing pattern of biased HGT.
PMCID: PMC2997564  PMID: 20889655
phylogenetic tree; horizontal gene transfer; species quartets; computer simulation
10.  Relative Contributions of Intrinsic Structural–Functional Constraints and Translation Rate to the Evolution of Protein-Coding Genes 
A long-standing assumption in evolutionary biology is that the evolution rate of protein-coding genes depends, largely, on specific constraints that affect the function of the given protein. However, recent research in evolutionary systems biology revealed unexpected, significant correlations between evolution rate and characteristics of genes or proteins that are not directly related to specific protein functions, such as expression level and protein–protein interactions. The strongest connections were consistently detected between protein sequence evolution rate and the expression level of the respective gene. A recent genome-wide proteomic study revealed an extremely strong correlation between the abundances of orthologous proteins in distantly related animals, the nematode Caenorhabditis elegans and the fruit fly Drosophila melanogaster. We used the extensive protein abundance data from this study along with short-term evolutionary rates (ERs) of orthologous genes in nematodes and flies to estimate the relative contributions of structural–functional constraints and the translation rate to the evolution rate of protein-coding genes. Together the intrinsic constraints and translation rate account for approximately 50% of the variance of the ERs. The contribution of constraints is estimated to be 3- to 5-fold greater than the contribution of translation rate.
PMCID: PMC2940324  PMID: 20624725
protein evolution; structural–functional constraints; misfolding; protein abundance
11.  A Universal Nonmonotonic Relationship between Gene Compactness and Expression Levels in Multicellular Eukaryotes 
Analysis of gene architecture and expression levels of four organisms, Homo sapiens, Caenorhabditis elegans, Drosophila melanogaster, and Arabidopsis thaliana, reveals a surprising, nonmonotonic, universal relationship between expression level and gene compactness. With increasing expression level, the genes tend at first to become longer but, from a certain level of expression, they become more and more compact, resulting in an approximate bell-shaped dependence. There are two leading hypotheses to explain the compactness of highly expressed genes. The selection hypothesis predicts that gene compactness is predominantly driven by the level of expression, whereas the genomic design hypothesis predicts that expression breadth across tissues is the driving force. We observed the connection between gene expression breadth in humans and gene compactness to be significantly weaker than the connection between expression level and compactness, a result that is compatible with the selection hypothesis but not the genome design hypothesis. The initial gene elongation with increasing expression level could be explained, at least in part, by accumulation of regulatory elements enhancing expression, in particular, in introns. This explanation is compatible with the observed positive correlation between intron density and expression level of a gene. Conversely, the trend toward increasing compactness for highly expressed genes could be caused by selection for minimization of energy and time expenditure during transcription and splicing and for increased fidelity of transcription, splicing, and/or translation that is likely to be particularly critical for highly expressed genes. Regardless of the exact nature of the forces that shape the gene architecture, we present evidence that, at least, in animals, coding and noncoding parts of genes show similar architectonic trends.
PMCID: PMC2817431  PMID: 20333206
eukaryotic gene structure; eukaryotic gene architecture; selection on gene compactness; genomic design; intron functionality; intron density
12.  Analysis of Rare Genomic Changes Does Not Support the Unikont–Bikont Phylogeny and Suggests Cyanobacterial Symbiosis as the Point of Primary Radiation of Eukaryotes 
The deep phylogeny of eukaryotes is an important but extremely difficult problem of evolutionary biology. Five eukaryotic supergroups are relatively well established but the relationship between these supergroups remains elusive, and their divergence seems to best fit a “Big Bang” model. Attempts were made to root the tree of eukaryotes by using potential derived shared characters such as unique fusions of conserved genes. One popular model of eukaryotic evolution that emerged from this type of analysis is the unikont–bikont phylogeny: The unikont branch consists of Metazoa, Choanozoa, Fungi, and Amoebozoa, whereas bikonts include the rest of eukaryotes, namely, Plantae (green plants, Chlorophyta, and Rhodophyta), Chromalveolata, excavates, and Rhizaria. We reexamine the relationships between the eukaryotic supergroups using a genome-wide analysis of rare genomic changes (RGCs) associated with multiple, conserved amino acids (RGC_CAMs and RGC_CAs), to resolve trifurcations of major eukaryotic lineages. The results do not support the basal position of Chromalveolata with respect to Plantae and unikonts or the monophyly of the bikont group and appear to be best compatible with the monophyly of unikonts and Chromalveolata. Chromalveolata show a distinct, additional signal of affinity with Plantae, conceivably, owing to genes transferred from the secondary, red algal symbiont. Excavates are derived forms, with extremely long branches that complicate phylogenetic inference; nevertheless, the RGC analysis suggests that they are significantly more likely to cluster with the unikont–Chromalveolata assemblage than with the Plantae. Thus, the first split in eukaryotic evolution might lie between photosynthetic and nonphotosynthetic forms and so could have been triggered by the endosymbiosis between an ancestral unicellular eukaryote and a cyanobacterium that gave rise to the chloroplast.
PMCID: PMC2817406  PMID: 20333181
eukaryotic phylogeny; rare genomic changes; parsimony; substitutions; insertions; deletions

Results 1-12 (12)