Virus genomes are prone to extensive gene loss, gain, and exchange and share no universal genes. Therefore, in a broad-scale study of virus evolution, gene and genome network analyses can complement traditional phylogenetics. We performed an exhaustive comparative analysis of the genomes of double-stranded DNA (dsDNA) viruses by using the bipartite network approach and found a robust hierarchical modularity in the dsDNA virosphere. Bipartite networks consist of two classes of nodes, with nodes in one class, in this case genomes, being connected via nodes of the second class, in this case genes. Such a network can be partitioned into modules that combine nodes from both classes. The bipartite network of dsDNA viruses includes 19 modules that form 5 major and 3 minor supermodules. Of these modules, 11 include tailed bacteriophages, reflecting the diversity of this largest group of viruses. The module analysis quantitatively validates and refines previously proposed nontrivial evolutionary relationships. An expansive supermodule combines the large and giant viruses of the putative order “Megavirales” with diverse moderate-sized viruses and related mobile elements. All viruses in this supermodule share a distinct morphogenetic tool kit with a double jelly roll major capsid protein. Herpesviruses and tailed bacteriophages comprise another supermodule, held together by a distinct set of morphogenetic proteins centered on the HK97-like major capsid protein. Together, these two supermodules cover the great majority of currently known dsDNA viruses. We formally identify a set of 14 viral hallmark genes that comprise the hubs of the network and account for most of the intermodule connections.
Viruses and related mobile genetic elements are the dominant biological entities on earth, but their evolution is not sufficiently understood and their classification is not adequately developed. The key reason is the characteristic high rate of virus evolution that involves not only sequence change but also extensive gene loss, gain, and exchange. Therefore, in the study of virus evolution on a large scale, traditional phylogenetic approaches have limited applicability and have to be complemented by gene and genome network analyses. We applied state-of-the art methods of such analysis to reveal robust hierarchical modularity in the genomes of double-stranded DNA viruses. Some of the identified modules combine highly diverse viruses infecting bacteria, archaea, and eukaryotes, in support of previous hypotheses on direct evolutionary relationships between viruses from the three domains of cellular life. We formally identify a set of 14 viral hallmark genes that hold together the genomic network.
The CRISPR-Cas adaptive immune system defends microbes against foreign genetic elements via DNA or RNA-DNA interference. We characterize the Class 2 type VI-A CRISPR-Cas effector C2c2 and demonstrate its RNA-guided RNase function. C2c2 from the bacterium Leptotrichia shahii provides interference against RNA phage. In vitro biochemical analysis show that C2c2 is guided by a single crRNA and can be programmed to cleave ssRNA targets carrying complementary protospacers. In bacteria, C2c2 can be programmed to knock down specific mRNAs. Cleavage is mediated by catalytic residues in the two conserved HEPN domains, mutations in which generate catalytically inactive RNA-binding proteins. These results broaden our understanding of CRISPR-Cas systems and suggest that C2c2 can be used to develop new RNA-targeting tools.
Bacterial genomes encode numerous homologs of Cas9, the effector protein of the type II CRISPR-Cas systems. The homology region includes the arginine-rich helix and the HNH nuclease domain that is inserted into the RuvC-like nuclease domain. These genes, however, are not linked to cas genes or CRISPR. Here, we show that Cas9 homologs represent a distinct group of nonautonomous transposons, which we denote ISC (insertion sequences Cas9-like). We identify many diverse families of full-length ISC transposons and demonstrate that their terminal sequences (particularly 3′ termini) are similar to those of IS605 superfamily transposons that are mobilized by the Y1 tyrosine transposase encoded by the TnpA gene and often also encode the TnpB protein containing the RuvC-like endonuclease domain. The terminal regions of the ISC and IS605 transposons contain palindromic structures that are likely recognized by the Y1 transposase. The transposons from these two groups are inserted either exactly in the middle or upstream of specific 4-bp target sites, without target site duplication. We also identify autonomous ISC transposons that encode TnpA-like Y1 transposases. Thus, the nonautonomous ISC transposons could be mobilized in trans either by Y1 transposases of other, autonomous ISC transposons or by Y1 transposases of the more abundant IS605 transposons. These findings imply an evolutionary scenario in which the ISC transposons evolved from IS605 family transposons, possibly via insertion of a mobile group II intron encoding the HNH domain, and Cas9 subsequently evolved via immobilization of an ISC transposon.
IMPORTANCE Cas9 endonucleases, the effectors of type II CRISPR-Cas systems, represent the new generation of genome-engineering tools. Here, we describe in detail a novel family of transposable elements that encode the likely ancestors of Cas9 and outline the evolutionary scenario connecting different varieties of these transposons and Cas9.
Microbial CRISPR-Cas systems are divided into Class 1, with multisubunit effector complexes, and Class 2, with single protein effectors. Currently, only two Class 2 effectors, Cas9 and Cpf1, are known. We describe here three distinct Class 2 CRISPR-Cas systems. The effectors of two of the identified systems, C2c1 and C2c3, contain RuvC-like endonuclease domains distantly related to Cpf1. The third system, C2c2, contains an effector with two predicted HEPN RNase domains. Whereas production of mature CRISPR RNA (crRNA) by C2c1 depends on tracrRNA, C2c2 crRNA maturation is tracrRNA-independent. We found that C2c1 systems can mediate DNA interference in a 5’-PAM-dependent fashion analogous to Cpf1. However, unlike Cpf1, which is a single-RNA-guided nuclease, C2c1 depends on both crRNA and tracrRNA for DNA cleavage. Finally, comparative analysis indicates that Class 2 CRISPR-Cas systems evolved on multiple occasions through recombination of Class 1 adaptation modules with effector proteins acquired from distinct mobile elements.
CRISPR-Cas adaptive immunity; Cas9; Cpf1; crRNA; tracrRNA; PAM; RuvC-like endonuclease; HEPN domain; computational discovery pipeline; RNA-seq
The microbial adaptive immune system CRISPR mediates defense against foreign genetic elements through two classes of RNA-guided nuclease effectors. Class 1 effectors utilize multi-protein complexes, whereas Class 2 effectors rely on single-component effector proteins such as the well-characterized Cas9. Here we report characterization of Cpf1, a putative Class 2 CRISPR effector. We demonstrate that Cpf1 mediates robust DNA interference with features distinct from Cas9. Cpf1 is a single RNA-guided endonuclease lacking tracrRNA, and it utilizes a T-rich protospacer adjacent motif. Moreover, Cpf1 cleaves DNA via a staggered DNA double stranded break. Out of 16 Cpf1-family proteins, we identified two candidate enzymes, from Acidominococcus and Lachnospiraceae, with efficient genome editing activity in human cells. Identifying this mechanism of interference broadens our understanding of CRISPR-Cas systems and advances their genome editing applications.
The history of life is punctuated by evolutionary transitions which engender emergence of new levels of biological organization that involves selection acting at increasingly complex ensembles of biological entities. Major evolutionary transitions include the origin of prokaryotic and then eukaryotic cells, multicellular organisms and eusocial animals. All or nearly all cellular life forms are hosts to diverse selfish genetic elements with various levels of autonomy including plasmids, transposons and viruses. I present evidence that, at least up to and including the origin of multicellularity, evolutionary transitions are driven by the coevolution of hosts with these genetic parasites along with sharing of ‘public goods’. Selfish elements drive evolutionary transitions at two distinct levels. First, mathematical modelling of evolutionary processes, such as evolution of primitive replicator populations or unicellular organisms, indicates that only increasing organizational complexity, e.g. emergence of multicellular aggregates, can prevent the collapse of the host–parasite system under the pressure of parasites. Second, comparative genomic analysis reveals numerous cases of recruitment of genes with essential functions in cellular life forms, including those that enable evolutionary transitions.
This article is part of the themed issue ‘The major synthetic evolutionary transitions’.
evolutionary transitions; mobile genetic elements; parasites; viruses; antivirus defence; host–parasite coevolution
The wide spread of gene exchange and loss in the prokaryotic world has prompted the concept of ‘lateral genomics’ to the point of an outright denial of the relevance of phylogenetic trees for evolution. However, the pronounced coherence congruence of the topologies of numerous gene trees, particularly those for (nearly) universal genes, translates into the notion of a statistical tree of life (STOL), which reflects a central trend of vertical evolution. The STOL can be employed as a framework for reconstruction of the evolutionary processes in the prokaryotic world. Quantitatively, however, horizontal gene transfer (HGT) dominates microbial evolution, with the rate of gene gain and loss being comparable to the rate of point mutations and much greater than the duplication rate. Theoretical models of evolution suggest that HGT is essential for the survival of microbial populations that otherwise deteriorate due to the Muller’s ratchet effect. Apparently, at least some bacteria and archaea evolved dedicated vehicles for gene transfer that evolved from selfish elements such as plasmids and viruses. Recent phylogenomic analyses suggest that episodes of massive HGT were pivotal for the emergence of major groups of organisms such as multiple archaeal phyla as well as eukaryotes. Similar analyses appear to indicate that, in addition to donating hundreds of genes to the emerging eukaryotic lineage, mitochondrial endosymbiosis severely curtailed HGT. These results shed new light on the routes of evolutionary transitions, but caution is due given the inherent uncertainty of deep phylogenies.
Horizontal gene transfer; prokaryotes; evolutionary transitions; microbial evolution; statistical tree of life
Unicellular eukaryotes and most prokaryotes possess distinct mechanisms of programmed cell death (PCD). How an “altruistic” trait, such as PCD, could evolve in unicellular organisms? To address this question, we developed a mathematical model of the virus-host co-evolution that involves interaction between immunity, PCD and cellular aggregation. Analysis of the parameter space of this model shows that under high virus load and imperfect immunity, joint evolution of cell aggregation and PCD is the optimal evolutionary strategy. Given the abundance of viruses in diverse habitats and the wide spread of PCD in most organisms, these findings imply that multiple instances of the emergence of multicellularity and its essential attribute, PCD, could have been driven, at least in part, by the virus-host arms race.
programmed cell death; host-parasite arms race; viruses; evolution of multicellularity
Germline endogenous viral elements (EVEs) genetically preserve viral nucleotide sequences useful to the study of viral evolution, gene mutation, and the phylogenetic relationships among host organisms. Here, we describe a lineage-specific, adeno-associated virus (AAV)-derived endogenous viral element (mAAV-EVE1) found within the germline of numerous closely related marsupial species. Molecular screening of a marsupial DNA panel indicated that mAAV-EVE1 occurs specifically within the marsupial suborder Macropodiformes (present-day kangaroos, wallabies, and related macropodoids), to the exclusion of other Diprotodontian lineages. Orthologous mAAV-EVE1 locus sequences from sixteen macropodoid species, representing a speciation history spanning an estimated 30 million years, facilitated compilation of an inferred ancestral sequence that recapitulates the genome of an ancient marsupial AAV that circulated among Australian metatherian fauna sometime during the late Eocene to early Oligocene. In silico gene reconstruction and molecular modelling indicate remarkable conservation of viral structure over a geologic timescale. Characterisation of AAV-EVE loci among disparate species affords insight into AAV evolution and, in the case of macropodoid species, may offer an additional genetic basis for assignment of phylogenetic relationships among the Macropodoidea. From an applied perspective, the identified AAV “fossils” provide novel capsid sequences for use in translational research and clinical applications.
Research in quantitative evolutionary genomics and systems biology led to the discovery of several universal regularities connecting genomic and molecular phenomic variables. These universals include the log-normal distribution of the evolutionary rates of orthologous genes; the power law–like distributions of paralogous family size and node degree in various biological networks; the negative correlation between a gene's sequence evolution rate and expression level; and differential scaling of functional classes of genes with genome size. The universals of genome evolution can be accounted for by simple mathematical models similar to those used in statistical physics, such as the birth-death-innovation model. These models do not explicitly incorporate selection; therefore, the observed universal regularities do not appear to be shaped by selection but rather are emergent properties of gene ensembles. Although a complete physical theory of evolutionary biology is inconceivable, the universals of genome evolution might qualify as “laws of evolutionary genomics” in the same sense “law” is understood in modern physics.
Cpf1 is an RNA-guided endonuclease of a type V CRISPR-Cas system that has been recently harnessed for genome editing. Here, we report the crystal structure of Acidaminococcus sp. Cpf1 (AsCpf1) in complex with the guide RNA and its target DNA, at 2.8 Å resolution. AsCpf1 adopts a bilobed architecture, with the RNA–DNA heteroduplex bound inside the central channel. The structural comparison of AsCpf1 with Cas9, a type II CRISPR-Cas nuclease, reveals both striking similarity and major differences, thereby explaining their distinct functionalities. AsCpf1 contains the RuvC domain and a putative novel nuclease domain, which are responsible for the cleavage of the non-target and target strands, respectively, and jointly generate staggered DNA double-strand breaks. AsCpf1 recognizes the 5′-TTTN-3′ protospacer adjacent motif by base and shape readout mechanisms. Our findings provide mechanistic insights into RNA-guided DNA cleavage by Cpf1, and establish a framework for rational engineering of the CRISPR-Cpf1 toolbox.
Many surface structures in archaea including various types of pili and the archaellum (archaeal flagellum) are homologous to bacterial type IV pili systems (T4P). The T4P consist of multiple proteins, often with poorly conserved sequences, complicating their identification in sequenced genomes. Here we report a comprehensive census of T4P encoded in archaeal genomes using sensitive methods for protein sequence comparison. This analysis confidently identifies as T4P components about 5000 archaeal gene products, 56% of which are currently annotated as hypothetical in public databases. Combining results of this analysis with a comprehensive comparison of genomic neighborhoods of the T4P, we present models of organization of 10 most abundant variants of archaeal T4P. In addition to the differentiation between major and minor pilins, these models include extra components, such as S-layer proteins, adhesins and other membrane and intracellular proteins. For most of these systems, dedicated major pilin families are identified including numerous stand alone major pilin genes of the PilA family. Evidence is presented that secretion ATPases of the T4P and cognate TadC proteins can interact with different pilin sets. Modular evolution of T4P results in combinatorial variability of these systems. Potential regulatory or modulating proteins for the T4P are identified including KaiC family ATPases, vWA domain-containing proteins and the associated MoxR/GvpN ATPase, TFIIB homologs and multiple unrelated transcription regulators some of which are associated specific T4P. Phylogenomic analysis suggests that at least one T4P system was present in the last common ancestor of the extant archaea. Multiple cases of horizontal transfer and lineage-specific duplication of T4P loci were detected. Generally, the T4P of the archaeal TACK superphylum are more diverse and evolve notably faster than those of euryarchaea. The abundance and enormous diversity of T4P in hyperthermophilic archaea present a major enigma. Apparently, fundamental aspects of the biology of hyperthermophiles remain to be elucidated.
type IV pili; archaea; evolution; comparative genomics; secretion ATPase
Robustness to destabilizing effects of mutations is thought of as a key factor of protein evolution. The connections between two measures of robustness, the relative core size and the computationally estimated effect of mutations on protein stability (ΔΔG), protein abundance and the selection pressure on protein-coding genes (dN/dS) were analyzed for the organisms with a large number of available protein structures including four eukaryotes, two bacteria and one archaeon. The distribution of the effects of mutations in the core on protein stability is universal and indistinguishable in eukaryotes and bacteria, centered at slightly destabilizing amino acid replacements, and with a heavy tail of more strongly destabilizing replacements. The distribution of mutational effects in the hyperthermophilic archaeon Thermococcus gammatolerans is significantly shifted toward strongly destabilizing replacements which is indicative of stronger constraints that are imposed on proteins in hyperthermophiles. The median effect of mutations is strongly, positively correlated with the relative core size, in evidence of the congruence between the two measures of protein robustness. However, both measures show only limited correlations to the expression level and selection pressure on protein-coding genes. Thus, the degree of robustness reflected in the universal distribution of mutational effects appears to be a fundamental, ancient feature of globular protein folds whereas the observed variations are largely neutral and uncoupled from short term protein evolution. A weak anticorrelation between protein core size and selection pressure is observed only for surface residues in prokaryotes but a stronger anticorrelation is observed for all residues in eukaryotic proteins. This substantial difference between proteins of prokaryotes and eukaryotes is likely to stem from the demonstrable higher compactness of prokaryotic proteins.
Viruses were defined as one of the two principal types of organisms in the biosphere, namely, as capsid-encoding organisms in contrast to ribosome-encoding organisms, i.e., all cellular life forms. Structurally similar, apparently homologous capsids are present in a huge variety of icosahedral viruses that infect bacteria, archaea, and eukaryotes. These findings prompted the concept of the capsid as the virus “self” that defines the identity of deep, ancient viral lineages. However, several other widespread viral “hallmark genes” encode key components of the viral replication apparatus (such as polymerases and helicases) and combine with different capsid proteins, given the inherently modular character of viral evolution. Furthermore, diverse, widespread, capsidless selfish genetic elements, such as plasmids and various types of transposons, share hallmark genes with viruses. Viruses appear to have evolved from capsidless selfish elements, and vice versa, on multiple occasions during evolution. At the earliest, precellular stage of life's evolution, capsidless genetic parasites most likely emerged first and subsequently gave rise to different classes of viruses. In this review, we develop the concept of a greater virus world which forms an evolutionary network that is held together by shared conserved genes and includes both bona fide capsid-encoding viruses and different classes of capsidless replicons. Theoretical studies indicate that selfish replicons (genetic parasites) inevitably emerge in any sufficiently complex evolving ensemble of replicators. Therefore, the key signature of the greater virus world is not the presence of a capsid but rather genetic, informational parasitism itself, i.e., various degrees of reliance on the information processing systems of the host.
Diverse eukaryotes including animals and protists are hosts to a broad variety of viruses with double-stranded (ds) DNA genomes, from the largest known viruses, such as pandoraviruses and mimiviruses, to tiny polyomaviruses. Recent comparative genomic analyses have revealed many evolutionary connections between dsDNA viruses of eukaryotes, bacteriophages, transposable elements, and linear DNA plasmids. These findings provide an evolutionary scenario that derives several major groups of eukaryotic dsDNA viruses, including the proposed order “Megavirales,” adenoviruses, and virophages from a group of large virus-like transposons known as Polintons (Mavericks). The Polintons have been recently shown to encode two capsid proteins, suggesting that these elements lead a dual lifestyle with both a transposon and a viral phase and should perhaps more appropriately be named polintoviruses. Here, we describe the recently identified evolutionary relationships between bacteriophages of the family Tectiviridae, polintoviruses, adenoviruses, virophages, large and giant DNA viruses of eukaryotes of the proposed order “Megavirales,” and linear mitochondrial and cytoplasmic plasmids. We outline an evolutionary scenario under which the polintoviruses were the first group of eukaryotic dsDNA viruses that evolved from bacteriophages and became the ancestors of most large DNA viruses of eukaryotes and a variety of other selfish elements. Distinct lines of origin are detectable only for herpesviruses (from a different bacteriophage root) and polyoma/papillomaviruses (from single-stranded DNA viruses and ultimately from plasmids). Phylogenomic analysis of giant viruses provides compelling evidence of their independent origins from smaller members of the putative order “Megavirales,” refuting the speculations on the evolution of these viruses from an extinct fourth domain of cellular life.
Polintons; Megavirales; virus evolution; capsid proteins; translation
The ancestral set of eukaryotic genes is a chimera composed of genes of archaeal and bacterial origins thanks to the endosymbiosis event that gave rise to the mitochondria and apparently antedated the last common ancestor of the extant eukaryotes. The proto-mitochondrial endosymbiont is confidently identified as an α-proteobacterium. In contrast, the archaeal ancestor of eukaryotes remains elusive, although evidence is accumulating that it could have belonged to a deep lineage within the TACK (Thaumarchaeota, Aigarchaeota, Crenarchaeota, Korarchaeota) superphylum of the Archaea. Recent surveys of archaeal genomes show that the apparent ancestors of several key functional systems of eukaryotes, the components of the archaeal “eukaryome,” such as ubiquitin signaling, RNA interference, and actin-based and tubulin-based cytoskeleton structures, are identifiable in different archaeal groups. We suggest that the archaeal ancestor of eukaryotes was a complex form, rooted deeply within the TACK superphylum, that already possessed some quintessential eukaryotic features, in particular, a cytoskeleton, and perhaps was capable of a primitive form of phagocytosis that would facilitate the engulfment of potential symbionts. This putative group of Archaea could have existed for a relatively short time before going extinct or undergoing genome streamlining, resulting in the dispersion of the eukaryome. This scenario might explain the difficulty with the identification of the archaeal ancestor of eukaryotes despite the straightforward detection of apparent ancestors to many signature eukaryotic functional systems.
The apparent ancestors of key eukaryotic features (e.g., ubiquitin signaling, RNA interference, and cytoskeletal structures) are identifiable in different Archaea. But the specific archaeal ancestor of eukaryotes remains elusive.
Prokaryotes harbor a variety of genetic replicators, including plasmids, viruses, and chromosomes, each having differing effects on the phenotype of the hosting cell. Here, we propose a classification for replicators of bacteria and archaea on the basis of their horizontal-transfer potential and the type of relationships (mutualistic, symbiotic, commensal, or parasitic) that they have with the host cell vehicle. Horizontal movement of replicators can be either active or passive, reflecting whether or not the replicator encodes the means to mediate its own transfer from one cell to another. Some replicators also have an infectious extracellular state, thus separating viruses from other mobile elements. From the perspective of the cell vehicle, the different types of replicators form a continuum from genuinely mutualistic to completely parasitic replicators. This classification provides a general framework for dissecting prokaryotic systems into evolutionarily meaningful components.
bacteria; archaea; prokaryotes; classification; replicators; cell vehicles
In a series of conceptual articles published around the millennium, Carl Woese emphasized that evolution of cells is the central problem of evolutionary biology, that the three-domain ribosomal tree of life is an essential framework for reconstructing cellular evolution, and that the evolutionary dynamics of functionally distinct cellular systems are fundamentally different, with the information processing systems “crystallizing” earlier than operational systems. The advances of evolutionary genomics over the last decade vindicate major aspects of Woese’s vision. Despite the observations of pervasive horizontal gene transfer among bacteria and archaea, the ribosomal tree of life comes across as a central statistical trend in the “forest” of phylogenetic trees of individual genes, and hence, an appropriate scaffold for evolutionary reconstruction. The evolutionary stability of information processing systems, primarily translation, becomes ever more striking with the accumulation of comparative genomic data indicating that nearly allof the few universal genes encode translation system components. Woese’s view on the fundamental distinctions between the three domains of cellular life also withstand the test of comparative genomics, although his non-acceptance of symbiogenetic scenarios for the origin of eukaryotes might not. Above all, Woese’s key prediction that understanding evolution of microbes will be the core of the new evolutionary biology appears to be materializing.
Darwinian threshold; cellular evolution; domains of life; evolutionary transitions; horizontal gene transfer; progenote
Accurate inference of orthologous genes is a pre-requisite for most comparative genomics studies, and is also important for functional annotation of new genomes. Identification of orthologous gene sets typically involves phylogenetic tree analysis, heuristic algorithms based on sequence conservation, synteny analysis, or some combination of these approaches. The most direct tree-based methods typically rely on the comparison of an individual gene tree with a species tree. Once the two trees are accurately constructed, orthologs are straightforwardly identified by the definition of orthology as those homologs that are related by speciation, rather than gene duplication, at their most recent point of origin. Although ideal for the purpose of orthology identification in principle, phylogenetic trees are computationally expensive to construct for large numbers of genes and genomes, and they often contain errors, especially at large evolutionary distances. Moreover, in many organisms, in particular prokaryotes and viruses, evolution does not appear to have followed a simple ‘tree-like’ mode, which makes conventional tree reconciliation inapplicable. Other, heuristic methods identify probable orthologs as the closest homologous pairs or groups of genes in a set of organisms. These approaches are faster and easier to automate than tree-based methods, with efficient implementations provided by graph-theoretical algorithms enabling comparisons of thousands of genomes. Comparisons of these two approaches show that, despite conceptual differences, they produce similar sets of orthologs, especially at short evolutionary distances. Synteny also can aid in identification of orthologs. Often, tree-based, sequence similarity- and synteny-based approaches can be combined into flexible hybrid methods.
homolog; ortholog; paralog; xenolog; orthologous groups; tree reconciliation; comparative genomics
Biological information encoded in genomes is fundamentally different from and effectively orthogonal to Shannon entropy. The biologically relevant concept of information has to do with ‘meaning’, i.e. encoding various biological functions with various degree of evolutionary conservation. Apart from direct experimentation, the meaning, or biological information content, can be extracted and quantified from alignments of homologous nucleotide or amino acid sequences but generally not from a single sequence, using appropriately modified information theoretical formulae. For short, information encoded in genomes is defined vertically but not horizontally. Informally but substantially, biological information density seems to be equivalent to ‘meaning’ of genomic sequences that spans the entire range from sharply defined, universal meaning to effective meaninglessness. Large fractions of genomes, up to 90% in some plants, belong within the domain of fuzzy meaning. The sequences with fuzzy meaning can be recruited for various functions, with the meaning subsequently fixed, and also could perform generic functional roles that do not require sequence conservation. Biological meaning is continuously transferred between the genomes of selfish elements and hosts in the process of their coevolution. Thus, in order to adequately describe genome function and evolution, the concepts of information theory have to be adapted to incorporate the notion of meaning that is central to biology.
information; meaning; evolution; selfish elements
The CRISPR-Cas system of prokaryotic adaptive immunity displays features of a mechanism for directional, Lamarckian evolution. Indeed, this system modifies a specific locus in a bacterial or archaeal genome by inserting a piece of foreign DNA into a CRISPR array which results in acquired, heritable resistance to the cognate selfish element. A key element of the Lamarckian scheme is the specificity and directionality of the mutational process whereby an environmental cue causes only mutations that provide specific adaptations to the original challenge. In the case of adaptive immunity, the specificity of mutations is equivalent to self-nonself discrimination. Recent studies on the CRISPR mechanism have shown that the levels of discrimination can substantially differ such that in some CRISPR-Cas variants incorporation of DNA is random whereas discrimination occurs by selection of cells that carry cognate inserts. In other systems, a higher level of specificity appears to be achieved via specialized mechanisms. These findings emphasize the continuity between random and directed mutations and the critical importance of evolved mechanisms that govern the mutational process.
Reviewers: This article has been reviewed by Yitzhak Pilpel, Martijn Huynen, and Bojan Zagrovic.
CRISPR-Cas; Self-nonself discrimination; Lamarckian evolution; Darwinian evolution; DNA repair
Casposons are a superfamily of putative self-synthesizing transposable elements that are predicted to employ a homolog of Cas1 protein as a recombinase and could have contributed to the origin of the CRISPR-Cas adaptive immunity systems in archaea and bacteria. Casposons remain uncharacterized experimentally, except for the recent demonstration of the integrase activity of the Cas1 homolog, and given their relative rarity in archaea and bacteria, original comparative genomic analysis has not provided direct indications of their mobility. Here, we report evidence of casposon mobility obtained by comparison of the genomes of 62 strains of the archaeon Methanosarcina mazei. In these genomes, casposons are variably inserted in three distinct sites indicative of multiple, recent gains, and losses. Some casposons are inserted into other mobile genetic elements that might provide vehicles for horizontal transfer of the casposons. Additionally, many M. mazei genomes contain previously undetected solo terminal inverted repeats that apparently are derived from casposons and could resemble intermediates in CRISPR evolution. We further demonstrate the sequence specificity of casposon insertion and note clear parallels with the adaptation mechanism of CRISPR-Cas. Finally, besides identifying additional representatives in each of the three originally defined families, we describe a new, fourth, family of casposons.
casposons; self-synthesizing transposons; CRISPR-Cas; mobile genetic elements; transposition
The elaborate eukaryotic DNA replication machinery evolved from the archaeal ancestors that themselves show considerable complexity. Here we discuss the comparative genomic and phylogenetic analysis of the core replication enzymes, the DNA polymerases, in archaea and their relationships with the eukaryotic polymerases. In archaea, there are three groups of family B DNA polymerases, historically known as PolB1, PolB2 and PolB3. All three groups appear to descend from the last common ancestors of the extant archaea but their subsequent evolutionary trajectories seem to have been widely different. Although PolB3 is present in all archaea, with the exception of Thaumarchaeota, and appears to be directly involved in lagging strand replication, the evolution of this gene does not follow the archaeal phylogeny, conceivably due to multiple horizontal transfers and/or dramatic differences in evolutionary rates. In contrast, PolB1 is missing in Euryarchaeota but otherwise seems to have evolved vertically. The third archaeal group of family B polymerases, PolB2, includes primarily proteins in which the catalytic centers of the polymerase and exonuclease domains are disrupted and accordingly the enzymes appear to be inactivated. The members of the PolB2 group are scattered across archaea and might be involved in repair or regulation of replication along with inactivated members of the RadA family ATPases and an additional, uncharacterized protein that are encoded within the same predicted operon. In addition to the family B polymerases, all archaea, with the exception of the Crenarchaeota, encode enzymes of a distinct family D the origin of which is unclear. We examine multiple considerations that appear compatible with the possibility that family D polymerases are highly derived homologs of family B. The eukaryotic DNA polymerases show a highly complex relationship with their archaeal ancestors including contributions of proteins and domains from both the family B and the family D archaeal polymerases.
DNA replication; archaea; mobile genetic elements; DNA polymerases; enzyme inactivation
Argonaute proteins are conserved throughout all domains of life. Recently characterized prokaryotic Argonaute proteins (pAgos) participate in host defense by DNA interference, whereas eukaryotic Argonaute proteins (eAgos) control a wide range of processes by RNA interference. Here we review molecular mechanisms of guide and target binding by Argonaute proteins, and describe how the conformational changes induced by target binding lead to target cleavage. On the basis of structural comparisons and phylogenetic analyses of pAgos and eAgos, we reconstruct the evolutionary journey of the Argonaute proteins through the three domains of life and discuss how different structural features of pAgos and eAgos relate to their distinct physiological roles.
The nucleocytoplasmic large DNA viruses (NCLDVs) comprise a monophyletic group of viruses that infect animals and diverse unicellular eukaryotes. The NCLDV group includes the families Poxviridae, Asfarviridae, Iridoviridae, Ascoviridae, Phycodnaviridae, Mimiviridae and the proposed family “Marseilleviridae”. The family Mimiviridae includes the largest known viruses, with genomes in excess of one megabase, whereas the genome size in the other NCLDV families varies from 100 to 400 kilobase pairs. Most of the NCLDVs replicate in the cytoplasm of infected cells, within so-called virus factories. The NCLDVs share a common ancient origin, as demonstrated by evolutionary reconstructions that trace approximately 50 genes encoding key proteins involved in viral replication and virion formation to the last common ancestor of all these viruses. Taken together, these characteristics lead us to propose assigning an official taxonomic rank to the NCLDVs as the order “Megavirales”, in reference to the large size of the virions and genomes of these viruses.