A stochastic, agent-based mathematical model of the coevolution of the archaeal and bacterial adaptive immunity system, CRISPR-Cas, and lytic viruses shows that CRISPR-Cas immunity can stabilize the virus-host coexistence rather than leading to the extinction of the virus. In the model, CRISPR-Cas immunity does not specifically promote viral diversity, presumably because the selection pressure on each single proto-spacer is too weak. However, the overall virus diversity in the presence of CRISPR-Cas grows due to the increase of the host and, accordingly, the virus population size. Above a threshold value of total viral diversity, which is proportional to the viral mutation rate and population size, the CRISPR-Cas system becomes ineffective and is lost due to the associated fitness cost. Our previous modeling study has suggested that the ubiquity of CRISPR-Cas in hyperthermophiles, which contrasts its comparative low prevalence in mesophiles, is due to lower rates of mutation fixation in thermal habitats. The present findings offer a complementary, simpler perspective on this contrast through the larger population sizes of mesophiles compared to hyperthermophiles, because of which CRISPR-Cas can become ineffective in mesophiles. The efficacy of CRISPR-Cas sharply increases with the number of proto-spacers per viral genome, potentially explaining the low information content of the proto-spacer-associated motif (PAM) that is required for spacer acquisition by CRISPR-Cas because a higher specificity would restrict the number of spacers available to CRISPR-Cas, thus hampering immunity. The very existence of the PAM might reflect the tradeoff between the requirement of diverse spacers for efficient immunity and avoidance of autoimmunity.
The question whether adaptation follows a deterministic route largely prescribed by the environment or can proceed along a large number of alternative trajectories has engaged extensive research over the recent years. Experimental evolution studies enabled by advances in high throughput techniques for genome sequencing and manipulation, along with increasingly detailed mathematical modeling of fitness landscapes, are beginning to allow quantitative exploration of the repeatability of evolutionary trajectories. It is becoming clear that evolutionary trajectories in static correlated fitness landscapes are substantially non-random but the relative contributions of determinism and stochasticity in the evolution of specific phenotypes strongly depend on the specific conditions, particularly the magnitude of the selective pressure and the number of available beneficial mutations.
evolutionary trajectory; predictability of evolution; fitness landscape; divergence of trajectories
Accurate inference of orthologous genes is a pre-requisite for most comparative genomics studies, and is also important for functional annotation of new genomes. Identification of orthologous gene sets typically involves phylogenetic tree analysis, heuristic algorithms based on sequence conservation, synteny analysis, or some combination of these approaches. The most direct tree-based methods typically rely on the comparison of an individual gene tree with a species tree. Once the two trees are accurately constructed, orthologs are straightforwardly identified by the definition of orthology as those homologs that are related by speciation, rather than gene duplication, at their most recent point of origin. Although ideal for the purpose of orthology identification in principle, phylogenetic trees are computationally expensive to construct for large numbers of genes and genomes, and they often contain errors, especially at large evolutionary distances. Moreover, in many organisms, in particular prokaryotes and viruses, evolution does not appear to have followed a simple ‘tree-like’ mode, which makes conventional tree reconciliation inapplicable. Other, heuristic methods identify probable orthologs as the closest homologous pairs or groups of genes in a set of organisms. These approaches are faster and easier to automate than tree-based methods, with efficient implementations provided by graph-theoretical algorithms enabling comparisons of thousands of genomes. Comparisons of these two approaches show that, despite conceptual differences, they produce similar sets of orthologs, especially at short evolutionary distances. Synteny also can aid in identification of orthologs. Often, tree-based, sequence similarity- and synteny-based approaches can be combined into flexible hybrid methods.
homolog; ortholog; paralog; xenolog; orthologous groups; tree reconciliation; comparative genomics
In 2009, we are celebrating the 200th anniversary of Charles Darwin and the 150th jubilee of his masterpiece, the Origin of Species. Darwin developed the first coherent and compelling narrative of biological evolution and thus founded evolutionary biology—and modern biology in general, remembering the famous dictum of Dobzhansky. It is, however, counter-productive, and ultimately, a disservice to Darwin’s legacy, to define modern evolutionary biology as neo-Darwinism. The current picture of evolution, informed, in particular, by results of comparative genomics and systems biology, is by far more complex than that presented in the Origin of Species, so that Darwinian principles, including natural selection, are incorporated into the evolving new synthesis as important but certainly not all-embracing tenets. This expansion of evolutionary biology does not denigrate Darwin in the least but rather emphasizes the fertility of his ideas.
Darwin’s anniversary; Darwinism; modern synthesis; genome evolution; systems biology; horizontal gene transfer; Tree of Life
Planctomycetes, Verrucomicrobia and Chlamydia are prokaryotic phyla that are sometimes grouped together as the PVC superphylum of eubacteria. Some PVC species possess interesting attributes, in particular, internal membranes that superficially resemble eukaryotic endomembranes. Some biologists now claim that PVC bacteria are nucleus-bearing prokaryotes and that they are evolutionary intermediates in the transition from prokaryote to eukaryote. PVC prokaryotes do not possess a nucleus and are not intermediates in the prokaryote-to-eukaryote transition. All of the PVC traits that are currently cited as evidence for aspiring eukaryoticity are either analogous (the result of convergent evolution), not homologous, to eukaryotic traits; or else they are the result of lateral gene transfers. Here we summarize the evidence that shows why most of the purported similarities between the PVC bacteria and eukaryotes are analogous and the rest are consequence of lateral gene acquisition.
The widespread exchange of genes among prokaryotes, known as horizontal gene transfer (HGT), is often considered to “uproot” the Tree of Life (TOL). Indeed, it is by now fully clear that genes in general possess different evolutionary histories. However, the possibility remains that the TOL concept can be reformulated and remain valid as a statistical central trend in the phylogenetic “Forest of Life” (FOL). This article describes a computational pipeline developed to chart the FOL by comparative analysis of thousands of phylogenetic trees. This analysis reveals a distinct, consistent phylogenetic signal that is particularly strong among the Nearly Universal Trees (NUTs), which correspond to genes represented in all or most of the analyzed organisms. Despite the substantial amount of apparent HGT seen even among the NUTs, these gene transfers appear to be distributed randomly and do not obscure the central tree-like trend.
The recently discovered CRISPR-Cas adaptive immune system is present in almost all archaea and many bacteria. It consists of cassettes of CRISPR repeats that incorporate spacers homologous to fragments of viral or plasmid genomes that are employed as guide RNAs in the immune response, along with numerous CRISPR-associated (cas) genes that encode proteins possessing diverse, only partially characterized activities required for the action of the system. Here, we investigate the evolution of the cas genes and show that they evolve under purifying selection that is typically much weaker than the median strength of purifying selection affecting genes in the respective genomes. The exceptions are the cas1 and cas2 genes that typically evolve at levels of purifying selection close to the genomic median. Thus, although these genes are implicated in the acquisition of spacers from alien genomes, they do not appear to be directly involved in an arms race between bacterial and archaeal hosts and infectious agents. These genes might possess functions distinct from and additional to their role in the CRISPR-Cas-mediated immune response. Taken together with evidence of the frequent horizontal transfer of cas genes reported previously and with the wide-spread microscale recombination within these genes detected in this work, these findings reveal the highly dynamic evolution of cas genes. This conclusion is in line with the involvement of CRISPR-Cas in antiviral immunity that is likely to entail a coevolutionary arms race with rapidly evolving viruses. However, we failed to detect evidence of strong positive selection in any of the cas genes.
When Charles Darwin formulated the central principles of evolutionary biology in the Origin of Species in 1859 and the architects of the Modern Synthesis integrated these principles with population genetics almost a century later, the principal if not the sole objects of evolutionary biology were multicellular eukaryotes, primarily animals and plants. Before the advent of efficient gene sequencing, all attempts to extend evolutionary studies to bacteria have been futile. Sequencing of the rRNA genes in thousands of microbes allowed the construction of the three- domain “ribosomal Tree of Life” that was widely thought to have resolved the evolutionary relationships between the cellular life forms. However, subsequent massive sequencing of numerous, complete microbial genomes revealed novel evolutionary phenomena, the most fundamental of these being: (1) pervasive horizontal gene transfer (HGT), in large part mediated by viruses and plasmids, that shapes the genomes of archaea and bacteria and call for a radical revision (if not abandonment) of the Tree of Life concept, (2) Lamarckian-type inheritance that appears to be critical for antivirus defense and other forms of adaptation in prokaryotes, and (3) evolution of evolvability, i.e., dedicated mechanisms for evolution such as vehicles for HGT and stress-induced mutagenesis systems. In the non-cellular part of the microbial world, phylogenomics and metagenomics of viruses and related selfish genetic elements revealed enormous genetic and molecular diversity and extremely high abundance of viruses that come across as the dominant biological entities on earth. Furthermore, the perennial arms race between viruses and their hosts is one of the defining factors of evolution. Thus, microbial phylogenomics adds new dimensions to the fundamental picture of evolution even as the principle of descent with modification discovered by Darwin and the laws of population genetics remain at the core of evolutionary biology.
Darwin; modern synthesis; comparative genomics; tree of life; horizontal gene transfer
The nucleo-cytoplasmic large DNA viruses (NCLDV) constitute an apparently monophyletic group that consists of 6 families of viruses infecting a broad variety of eukaryotes. A comprehensive genome comparison and maximum-likelihood reconstruction of NCLDV evolution reveal a set of approximately 50 conserved genes that can be tentatively mapped to the genome of the common ancestor of this class of eukaryotic viruses. We address the origins and evolution of NCLDV.
Phylogenetic analysis indicates that some of the major clades of NCLDV infect diverse animals and protists, suggestive of early radiation of the NCLDV, possibly concomitant with eukaryogenesis. The core NCLDV genes seem to have originated from different sources including homologous genes of bacteriophages, bacteria and eukaryotes. These observations are compatible with a scenario of the origin of the NCLDV at an early stage of the evolution of eukaryotes through extensive mixing of genes from widely different genomes.
The common ancestor of the NCLDV probably evolved from a bacteriophage as a result of recruitment of numerous eukaryotic and some bacterial genes, and concomitant loss of the majority of phage genes except for a small core of genes coding for proteins essential for virus genome replication and virion formation.
Bacteriophage; Eukaryogenesis; Nucleo-cytoplasmic large DNA viruses, evolution; Phylogenetic analysis
The shape of the distribution of evolutionary distances between orthologous genes in pairs of closely related genomes is universal throughout the entire range of cellular life forms. The near invariance of this distribution across billions of years of evolution can be accounted for by the Universal Pace Maker (UPM) model of genome evolution that yields a significantly better fit to the phylogenetic data than the Molecular Clock (MC) model. Unlike the MC, the UPM model does not assume constant gene-specific evolutionary rates but rather postulates that, in each evolving lineage, the evolutionary rates of all genes change (approximately) in unison although the pacemakers of different lineages are not necessarily synchronized. Here, we dissect the nearly constant evolutionary rate distribution by comparing the genome-wide relative rates of evolution of individual genes in pairs or triplets of closely related genomes from diverse bacterial and archaeal taxa. We show that, although the gene-specific relative rate is an important feature of genome evolution that explains more than half of the variance of the evolutionary distances, the ranges of relative rate variability are extremely broad even for universal genes. Because of this high variance, the gene-specific rate is a poor predictor of the conservation rank for any gene in any particular lineage.
evolutionary rate; universal genes; molecular clock; universal pacemaker of genome evolution
The recent discovery of protein modification by SAMPs, ubiquitin-like (Ubl) proteins from the archaeon Haloferax volcanii, prompted a comprehensive comparative-genomic analysis of archaeal Ubl protein genes and the genes for enzymes thought to be functionally associated with Ubl proteins. This analysis showed that most archaea encode members of two major groups of Ubl proteins with the β-grasp fold, the ThiS and MoaD families, and indicated that the ThiS family genes are rarely linked to genes for thiamine or Mo/W cofactor metabolism enzymes but instead are most often associated with genes for enzymes of tRNA modification. Therefore it is hypothesized that the ancestral function of the archaeal Ubl proteins is sulfur insertion into modified nucleotides in tRNAs, an activity analogous to that of the URM1 protein in eukaryotes. Together with additional, previously described genomic associations, these findings indicate that systems for protein quality control operating at different levels, including tRNA modification that controls translation fidelity, protein ubiquitination that regulates protein degradation, and, possibly, mRNA degradation by the exosome, are functionally and evolutionarily linked.
Evolutionary reconstructions using maximum likelihood methods point to unexpectedly high densities of introns in protein-coding genes of ancestral eukaryotic forms including the last common ancestor of all extant eukaryotes. Combined with the evidence of the origin of spliceosomal introns from invading Group II self-splicing introns, these results suggest that early ancestral eukaryotic genomes consisted of up to 80% sequences derived from Group II introns, a much greater contribution of introns than that seen in any extant genome. An organism with such an unusual genome architecture could survive only under conditions of a severe population bottleneck.
effective population size; endosymbiosis; group II self-splicing introns; origin of eukaryotes; spliceosomal introns
Evolutionary binary characters are features of species or genes, indicating the absence (value zero) or presence (value one) of some property. Examples include eukaryotic gene architecture (the presence or absence of an intron in a particular locus), gene content, and morphological characters. In many studies, the acquisition of such binary characters is assumed to represent a rare evolutionary event, and consequently, their evolution is analyzed using various flavors of parsimony. However, when gain and loss of the character are not rare enough, a probabilistic analysis becomes essential. Here, we present a comprehensive probabilistic model to describe the evolution of binary characters on a bifurcating phylogenetic tree. A fast software tool, EREM, is provided, using maximum likelihood to estimate the parameters of the model and to reconstruct ancestral states (presence and absence in internal nodes) and events (gain and loss events along branches).
Important progress in understanding replication, interactions with host plants, and evolution of closteroviruses enabled engineering of several vectors for gene expression and virus-induced gene silencing. Due to the broad host range of closteroviruses, these vectors expanded vector applicability to include important woody plants such as citrus and grapevine. Furthermore, large closterovirus genomes offer genetic capacity and stability unrivaled by other plant viral vectors. These features provided immense opportunities for using closterovirus vectors for the functional genomics studies and pathogen control in economically valuable crops. This review briefly summarizes advances in closterovirus research during the last decade, explores the relationships between virus biology and vector design, and outlines the most promising directions for future application of closterovirus vectors.
viral vector; closteroviruses; RNAi; Beet yellows virus; Citrus tristeza virus; Grapevine leafroll-associated virus-2
Our knowledge of prokaryotic defense systems has vastly expanded as the result of comparative genomic analysis, followed by experimental validation. This expansion is both quantitative, including the discovery of diverse new examples of known types of defense systems, such as restriction-modification or toxin-antitoxin systems, and qualitative, including the discovery of fundamentally new defense mechanisms, such as the CRISPR-Cas immunity system. Large-scale statistical analysis reveals that the distribution of different defense systems in bacterial and archaeal taxa is non-uniform, with four groups of organisms distinguishable with respect to the overall abundance and the balance between specific types of defense systems. The genes encoding defense system components in bacterial and archaea typically cluster in defense islands. In addition to genes encoding known defense systems, these islands contain numerous uncharacterized genes, which are candidates for new types of defense systems. The tight association of the genes encoding immunity systems and dormancy- or cell death-inducing defense systems in prokaryotic genomes suggests that these two major types of defense are functionally coupled, providing for effective protection at the population level.
Viruses are the most abundant biological entities on earth and encompass a vast amount of genetic diversity. The recent rapid increase in the number of sequenced viral genomes has created unprecedented opportunities for gaining new insight into the structure and evolution of the virosphere. Here, we present an update of the phage orthologous groups (POGs), a collection of 4,542 clusters of orthologous genes from bacteriophages that now also includes viruses infecting archaea and encompasses more than 1,000 distinct virus genomes. Analysis of this expanded data set shows that the number of POGs keeps growing without saturation and that a substantial majority of the POGs remain specific to viruses, lacking homologues in prokaryotic cells, outside known proviruses. Thus, the great majority of virus genes apparently remains to be discovered. A complementary observation is that numerous viral genomes remain poorly, if at all, covered by POGs. The genome coverage by POGs is expected to increase as more genomes are sequenced. Taxon-specific, single-copy signature genes that are not observed in prokaryotic genomes outside detected proviruses were identified for two-thirds of the 57 taxa (those with genomes available from at least 3 distinct viruses), with half of these present in all members of the respective taxon. These signatures can be used to specifically identify the presence and quantify the abundance of viruses from particular taxa in metagenomic samples and thus gain new insights into the ecology and evolution of viruses in relation to their hosts.
We compare the sets of experimentally validated long intergenic non-coding (linc)RNAs from human and mouse and apply a maximum likelihood approach to estimate the total number of lincRNA genes as well as the size of the conserved part of the lincRNome. Under the assumption that the sets of experimentally validated lincRNAs are random samples of the lincRNomes of the corresponding species, we estimate the total lincRNome size at approximately 40,000 to 50,000 species, at least twice the number of protein-coding genes. We further estimate that the fraction of the human and mouse euchromatic genomes encoding lincRNAs is more than twofold greater than the fraction of protein-coding sequences. Although the sequences of most lincRNAs are much less strongly conserved than protein sequences, the extent of orthology between the lincRNomes is unexpectedly high, with 60 to 70% of the lincRNA genes shared between human and mouse. The orthologous mammalian lincRNAs can be predicted to perform equivalent functions; accordingly, it appears likely that thousands of evolutionarily conserved functional roles of lincRNAs remain to be characterized.
Genome analysis of humans and other mammals reveals a surprisingly small number of protein-coding genes, only slightly over 20,000 (although the diversity of actual proteins is substantially augmented by alternative transcription and alternative splicing). Recent analysis of the mammalian genomes and transcriptomes, in particular, using the RNAseq technology, shows that, in addition to protein-coding genes, mammalian genomes encode many long non-coding RNAs. For some of these transcripts, various regulatory functions have been demonstrated, but on the whole the repertoire of long non-coding RNAs remains poorly characterized. We compared the identified long intergenic non-coding (linc)RNAs from human and mouse, and employed a specially developed statistical technique to estimate the size and evolutionary conservation of the human and mouse lincRNomes. The estimates show that there are at least twice as many human and mouse lincRNAs than there are protein-coding genes. Moreover, about two third of the lincRNA genes appear to be conserved between human and mouse, implying thousands of conserved but still uncharacterized functions.
Evolution of prokaryotes involves extensive loss and gain of genes, which lead to substantial differences in the gene repertoires even among closely related organisms. Through a wide range of phylogenetic depths, gene frequency distributions in prokaryotic pangenomes bear a characteristic, asymmetrical U-shape, with a core of (nearly) universal genes, a “shell” of moderately common genes, and a “cloud” of rare genes. We employ mathematical modeling to investigate evolutionary processes that might underlie this universal pattern. Gene frequency distributions for almost 400 groups of 10 bacterial or archaeal species each over a broad range of evolutionary distances were fit to steady-state, infinite allele models based on the distribution of gene replacement rates and the phylogenetic tree relating the species in each group. The fits of the theoretical frequency distributions to the empirical ones yield model parameters and estimates of the goodness of fit. Using the Akaike Information Criterion, we show that the neutral model of genome evolution, with the same replacement rate for all genes, can be confidently rejected. Of the three tested models with purifying selection, the one in which the distribution of replacement rates is derived from a stochastic population model with additive per-gene fitness yields the best fits to the data. The selection strength estimated from the fits declines with evolutionary divergence while staying well outside the neutral regime. These findings indicate that, unlike some other universal distributions of genomic variables, for example, the distribution of paralogous gene family membership, the gene frequency distribution is substantially affected by selection.
gene frequency distribution; steady genome model; goodness of fit; evolution mechanisms
Bacteria and archaea face continual onslaughts of rapidly diversifying viruses and plasmids. Many prokaryotes maintain adaptive immune systems known as clustered regularly interspaced short palindromic repeats (CRISPR) and CRISPR-associated genes (Cas). CRISPR-Cas systems are genomic sensors that serially acquire viral and plasmid DNA fragments (spacers) that are utilized to target and cleave matching viral and plasmid DNA in subsequent genomic invasions, offering critical immunological memory. Only 50% of sequenced bacteria possess CRISPR-Cas immunity, in contrast to over 90% of sequenced archaea. To probe why half of bacteria lack CRISPR-Cas immunity, we combined comparative genomics and mathematical modeling. Analysis of hundreds of diverse prokaryotic genomes shows that CRISPR-Cas systems are substantially more prevalent in thermophiles than in mesophiles. With sequenced bacteria disproportionately mesophilic and sequenced archaea mostly thermophilic, the presence of CRISPR-Cas appears to depend more on environmental temperature than on bacterial-archaeal taxonomy. Mutation rates are typically severalfold higher in mesophilic prokaryotes than in thermophilic prokaryotes. To quantitatively test whether accelerated viral mutation leads microbes to lose CRISPR-Cas systems, we developed a stochastic model of virus-CRISPR coevolution. The model competes CRISPR-Cas-positive (CRISPR-Cas+) prokaryotes against CRISPR-Cas-negative (CRISPR-Cas−) prokaryotes, continually weighing the antiviral benefits conferred by CRISPR-Cas immunity against its fitness costs. Tracking this cost-benefit analysis across parameter space reveals viral mutation rate thresholds beyond which CRISPR-Cas cannot provide sufficient immunity and is purged from host populations. These results offer a simple, testable viral diversity hypothesis to explain why mesophilic bacteria disproportionately lack CRISPR-Cas immunity. More generally, fundamental limits on the adaptability of biological sensors (Lamarckian evolution) are predicted.
A remarkable recent discovery in microbiology is that bacteria and archaea possess systems conferring immunological memory and adaptive immunity. Clustered regularly interspaced short palindromic repeats (CRISPR) and CRISPR-associated genes (CRISPR-Cas) are genomic sensors that allow prokaryotes to acquire DNA fragments from invading viruses and plasmids. Providing immunological memory, these stored fragments destroy matching DNA in future viral and plasmid invasions. CRISPR-Cas systems also provide adaptive immunity, keeping up with mutating viruses and plasmids by continually acquiring new DNA fragments. Surprisingly, less than 50% of mesophilic bacteria, in contrast to almost 90% of thermophilic bacteria and Archaea, maintain CRISPR-Cas immunity. Using mathematical modeling, we probe this dichotomy, showing how increased viral mutation rates can explain the reduced prevalence of CRISPR-Cas systems in mesophiles. Rapidly mutating viruses outrun CRISPR-Cas immune systems, likely decreasing their prevalence in bacterial populations. Thus, viral adaptability may select against, rather than for, immune adaptability in prokaryotes.
The 1,021,348 base pair genome sequence of the Acanthamoeba polyphaga moumouvirus, a new member of the Mimiviridae family infecting Acanthamoeba polyphaga, is reported. The moumouvirus represents a third lineage beside mimivirus and megavirus. Thereby, it is a new member of the recently proposed Megavirales order. This giant virus was isolated from a cooling tower water in southeastern France but is most closely related to Megavirus chiliensis, which was isolated from ocean water off the coast of Chile. The moumouvirus is predicted to encode 930 proteins, of which 879 have detectable homologs. Among these predicted proteins, for 702 the closest homolog was detected in Megavirus chiliensis, with the median amino acid sequence identity of 62%. The evolutionary affinity of moumouvirus and megavirus was further supported by phylogenetic tree analysis of conserved genes. The moumouvirus and megavirus genomes share near perfect orthologous gene collinearity in the central part of the genome, with the variations concentrated in the terminal regions. In addition, genomic comparisons of the Mimiviridae reveal substantial gene loss in the moumouvirus lineage. The majority of the remaining moumouvirus proteins are most similar to homologs from other Mimiviridae members, and for 27 genes the closest homolog was found in bacteria. Phylogenetic analysis of these genes supported gene acquisition from diverse bacteria after the separation of the moumouvirus and megavirus lineages. Comparative genome analysis of the three lineages of the Mimiviridae revealed significant mobility of Group I self-splicing introns, with the highest intron content observed in the moumouvirus genome.
moumouvirus; mimivirus; giant virus; megavirus; Mimiviridae; Megavirales; horizontal gene transfer; viral genome; nucleo-cytoplasmic large DNA viruses
A fundamental observation of comparative genomics is that the distribution of evolution rates across the complete sets of orthologous genes in pairs of related genomes remains virtually unchanged throughout the evolution of life, from bacteria to mammals. The most straightforward explanation for the conservation of this distribution appears to be that the relative evolution rates of all genes remain nearly constant, or in other words, that evolutionary rates of different genes are strongly correlated within each evolving genome. This correlation could be explained by a model that we denoted Universal PaceMaker (UPM) of genome evolution. The UPM model posits that the rate of evolution changes synchronously across genome-wide sets of genes in all evolving lineages. Alternatively, however, the correlation between the evolutionary rates of genes could be a simple consequence of molecular clock (MC). We sought to differentiate between the MC and UPM models by fitting thousands of phylogenetic trees for bacterial and archaeal genes to supertrees that reflect the dominant trend of vertical descent in the evolution of archaea and bacteria and that were constrained according to the two models. The goodness of fit for the UPM model was better than the fit for the MC model, with overwhelming statistical significance, although similarly to the MC, the UPM is strongly overdispersed. Thus, the results of this analysis reveal a universal, genome-wide pacemaker of evolution that could have been in operation throughout the history of life.
A central concept of evolution is Molecular Clock according to which each gene evolves at a characteristic, near constant rate. Numerous studies support the Molecular Clock hypothesis in principle but also show that the clock is indeed very approximate. Genome-wide comparative analysis of phylogenetic trees described here reveals a distinct, more general feature of genome evolution that we called Universal Pacemaker. Under this model, when the rate of evolution changes, the change occurs synchronously in many if not all genes in the evolving genome. In other words, the relative rates of gene evolution remain constant across long evolutionary spans: if a gene is slow relative to the rest of the genes in the given lineage, it is always slow, and if it evolves fast, it is always fast. We show here that the Universal Pacemaker model fits the available data much better than the traditional Molecular Clock model. These findings are compatible with the previously observed accelerations and decelerations of evolution in individual lineages but we show that synchronous, genome-wide change of evolutionary rates is a global feature of genome evolution that appears to pervade the entire history of life.
Orthologous relationships between genes are routinely inferred from bidirectional best hits (BBH) in pairwise genome comparisons. However, to our knowledge, it has never been quantitatively demonstrated that orthologs form BBH. To test this “BBH-orthology conjecture,” we take advantage of the operon organization of bacterial and archaeal genomes and assume that, when two genes in compared genomes are flanked by two BBH show statistically significant sequence similarity to one another, these genes are bona fide orthologs. Under this assumption, we tested whether middle genes in “syntenic orthologous gene triplets” form BBH. We found that this was the case in more than 95% of the syntenic gene triplets in all genome comparisons. A detailed examination of the exceptions to this pattern, including maximum likelihood phylogenetic tree analysis, showed that some of these deviations involved artifacts of genome annotation, whereas very small fractions represented random assignment of the best hit to one of closely related in-paralogs, paralogous displacement in situ, or even less frequent genuine violations of the BBH–orthology conjecture caused by acceleration of evolution in one of the orthologs. We conclude that, at least in prokaryotes, genes for which independent evidence of orthology is available typically form BBH and, conversely, BBH can serve as a strong indication of gene orthology.
orthology; bidirectional best hit; genome comparison; synteny
Many viruses infecting animals and plants share common cores of homologous genes involved in the key processes of viral replication. In contrast, genes that mediate virus – host interactions including in many cases capsid protein genes are markedly different. There are three distinct scenarios for the origin of related viruses of plants and animals: i) evolution from a common ancestral virus predating the divergence of plants and animals; ii) horizontal transfer of viruses, for example, through insect vectors; iii) parallel origin from related genetic elements. We present evidence that each of these scenarios contributed, to a varying extent, to the evolution of different groups of viruses.
There are no known RNA viruses that infect Archaea. Filling this gap in our knowledge of viruses will enhance our understanding of the relationships between RNA viruses from the three domains of cellular life and, in particular, could shed light on the origin of the enormous diversity of RNA viruses infecting eukaryotes. We describe here the identification of novel RNA viral genome segments from high-temperature acidic hot springs in Yellowstone National Park in the United States. These hot springs harbor low-complexity cellular communities dominated by several species of hyperthermophilic Archaea. A viral metagenomics approach was taken to assemble segments of these RNA virus genomes from viral populations isolated directly from hot spring samples. Analysis of these RNA metagenomes demonstrated unique gene content that is not generally related to known RNA viruses of Bacteria and Eukarya. However, genes for RNA-dependent RNA polymerase (RdRp), a hallmark of positive-strand RNA viruses, were identified in two contigs. One of these contigs is approximately 5,600 nucleotides in length and encodes a polyprotein that also contains a region homologous to the capsid protein of nodaviruses, tetraviruses, and birnaviruses. Phylogenetic analyses of the RdRps encoded in these contigs indicate that the putative archaeal viruses form a unique group that is distinct from the RdRps of RNA viruses of Eukarya and Bacteria. Collectively, our findings suggest the existence of novel positive-strand RNA viruses that probably replicate in hyperthermophilic archaeal hosts and are highly divergent from RNA viruses that infect eukaryotes and even more distant from known bacterial RNA viruses. These positive-strand RNA viruses might be direct ancestors of RNA viruses of eukaryotes.