Bacterial genomes encode numerous homologs of Cas9, the effector protein of the type II CRISPR-Cas systems. The homology region includes the arginine-rich helix and the HNH nuclease domain that is inserted into the RuvC-like nuclease domain. These genes, however, are not linked to cas genes or CRISPR. Here, we show that Cas9 homologs represent a distinct group of nonautonomous transposons, which we denote ISC (insertion sequences Cas9-like). We identify many diverse families of full-length ISC transposons and demonstrate that their terminal sequences (particularly 3′ termini) are similar to those of IS605 superfamily transposons that are mobilized by the Y1 tyrosine transposase encoded by the TnpA gene and often also encode the TnpB protein containing the RuvC-like endonuclease domain. The terminal regions of the ISC and IS605 transposons contain palindromic structures that are likely recognized by the Y1 transposase. The transposons from these two groups are inserted either exactly in the middle or upstream of specific 4-bp target sites, without target site duplication. We also identify autonomous ISC transposons that encode TnpA-like Y1 transposases. Thus, the nonautonomous ISC transposons could be mobilized in trans either by Y1 transposases of other, autonomous ISC transposons or by Y1 transposases of the more abundant IS605 transposons. These findings imply an evolutionary scenario in which the ISC transposons evolved from IS605 family transposons, possibly via insertion of a mobile group II intron encoding the HNH domain, and Cas9 subsequently evolved via immobilization of an ISC transposon.
IMPORTANCE Cas9 endonucleases, the effectors of type II CRISPR-Cas systems, represent the new generation of genome-engineering tools. Here, we describe in detail a novel family of transposable elements that encode the likely ancestors of Cas9 and outline the evolutionary scenario connecting different varieties of these transposons and Cas9.
Unicellular eukaryotes and most prokaryotes possess distinct mechanisms of programmed cell death (PCD). How an “altruistic” trait, such as PCD, could evolve in unicellular organisms? To address this question, we developed a mathematical model of the virus-host co-evolution that involves interaction between immunity, PCD and cellular aggregation. Analysis of the parameter space of this model shows that under high virus load and imperfect immunity, joint evolution of cell aggregation and PCD is the optimal evolutionary strategy. Given the abundance of viruses in diverse habitats and the wide spread of PCD in most organisms, these findings imply that multiple instances of the emergence of multicellularity and its essential attribute, PCD, could have been driven, at least in part, by the virus-host arms race.
programmed cell death; host-parasite arms race; viruses; evolution of multicellularity
Research in quantitative evolutionary genomics and systems biology led to the discovery of several universal regularities connecting genomic and molecular phenomic variables. These universals include the log-normal distribution of the evolutionary rates of orthologous genes; the power law–like distributions of paralogous family size and node degree in various biological networks; the negative correlation between a gene's sequence evolution rate and expression level; and differential scaling of functional classes of genes with genome size. The universals of genome evolution can be accounted for by simple mathematical models similar to those used in statistical physics, such as the birth-death-innovation model. These models do not explicitly incorporate selection; therefore, the observed universal regularities do not appear to be shaped by selection but rather are emergent properties of gene ensembles. Although a complete physical theory of evolutionary biology is inconceivable, the universals of genome evolution might qualify as “laws of evolutionary genomics” in the same sense “law” is understood in modern physics.
Viruses were defined as one of the two principal types of organisms in the biosphere, namely, as capsid-encoding organisms in contrast to ribosome-encoding organisms, i.e., all cellular life forms. Structurally similar, apparently homologous capsids are present in a huge variety of icosahedral viruses that infect bacteria, archaea, and eukaryotes. These findings prompted the concept of the capsid as the virus “self” that defines the identity of deep, ancient viral lineages. However, several other widespread viral “hallmark genes” encode key components of the viral replication apparatus (such as polymerases and helicases) and combine with different capsid proteins, given the inherently modular character of viral evolution. Furthermore, diverse, widespread, capsidless selfish genetic elements, such as plasmids and various types of transposons, share hallmark genes with viruses. Viruses appear to have evolved from capsidless selfish elements, and vice versa, on multiple occasions during evolution. At the earliest, precellular stage of life's evolution, capsidless genetic parasites most likely emerged first and subsequently gave rise to different classes of viruses. In this review, we develop the concept of a greater virus world which forms an evolutionary network that is held together by shared conserved genes and includes both bona fide capsid-encoding viruses and different classes of capsidless replicons. Theoretical studies indicate that selfish replicons (genetic parasites) inevitably emerge in any sufficiently complex evolving ensemble of replicators. Therefore, the key signature of the greater virus world is not the presence of a capsid but rather genetic, informational parasitism itself, i.e., various degrees of reliance on the information processing systems of the host.
The ancestral set of eukaryotic genes is a chimera composed of genes of archaeal and bacterial origins thanks to the endosymbiosis event that gave rise to the mitochondria and apparently antedated the last common ancestor of the extant eukaryotes. The proto-mitochondrial endosymbiont is confidently identified as an α-proteobacterium. In contrast, the archaeal ancestor of eukaryotes remains elusive, although evidence is accumulating that it could have belonged to a deep lineage within the TACK (Thaumarchaeota, Aigarchaeota, Crenarchaeota, Korarchaeota) superphylum of the Archaea. Recent surveys of archaeal genomes show that the apparent ancestors of several key functional systems of eukaryotes, the components of the archaeal “eukaryome,” such as ubiquitin signaling, RNA interference, and actin-based and tubulin-based cytoskeleton structures, are identifiable in different archaeal groups. We suggest that the archaeal ancestor of eukaryotes was a complex form, rooted deeply within the TACK superphylum, that already possessed some quintessential eukaryotic features, in particular, a cytoskeleton, and perhaps was capable of a primitive form of phagocytosis that would facilitate the engulfment of potential symbionts. This putative group of Archaea could have existed for a relatively short time before going extinct or undergoing genome streamlining, resulting in the dispersion of the eukaryome. This scenario might explain the difficulty with the identification of the archaeal ancestor of eukaryotes despite the straightforward detection of apparent ancestors to many signature eukaryotic functional systems.
The apparent ancestors of key eukaryotic features (e.g., ubiquitin signaling, RNA interference, and cytoskeletal structures) are identifiable in different Archaea. But the specific archaeal ancestor of eukaryotes remains elusive.
Prokaryotes harbor a variety of genetic replicators, including plasmids, viruses, and chromosomes, each having differing effects on the phenotype of the hosting cell. Here, we propose a classification for replicators of bacteria and archaea on the basis of their horizontal-transfer potential and the type of relationships (mutualistic, symbiotic, commensal, or parasitic) that they have with the host cell vehicle. Horizontal movement of replicators can be either active or passive, reflecting whether or not the replicator encodes the means to mediate its own transfer from one cell to another. Some replicators also have an infectious extracellular state, thus separating viruses from other mobile elements. From the perspective of the cell vehicle, the different types of replicators form a continuum from genuinely mutualistic to completely parasitic replicators. This classification provides a general framework for dissecting prokaryotic systems into evolutionarily meaningful components.
bacteria; archaea; prokaryotes; classification; replicators; cell vehicles
Diverse eukaryotes including animals and protists are hosts to a broad variety of viruses with double-stranded (ds) DNA genomes, from the largest known viruses, such as pandoraviruses and mimiviruses, to tiny polyomaviruses. Recent comparative genomic analyses have revealed many evolutionary connections between dsDNA viruses of eukaryotes, bacteriophages, transposable elements, and linear DNA plasmids. These findings provide an evolutionary scenario that derives several major groups of eukaryotic dsDNA viruses, including the proposed order “Megavirales,” adenoviruses, and virophages from a group of large virus-like transposons known as Polintons (Mavericks). The Polintons have been recently shown to encode two capsid proteins, suggesting that these elements lead a dual lifestyle with both a transposon and a viral phase and should perhaps more appropriately be named polintoviruses. Here, we describe the recently identified evolutionary relationships between bacteriophages of the family Tectiviridae, polintoviruses, adenoviruses, virophages, large and giant DNA viruses of eukaryotes of the proposed order “Megavirales,” and linear mitochondrial and cytoplasmic plasmids. We outline an evolutionary scenario under which the polintoviruses were the first group of eukaryotic dsDNA viruses that evolved from bacteriophages and became the ancestors of most large DNA viruses of eukaryotes and a variety of other selfish elements. Distinct lines of origin are detectable only for herpesviruses (from a different bacteriophage root) and polyoma/papillomaviruses (from single-stranded DNA viruses and ultimately from plasmids). Phylogenomic analysis of giant viruses provides compelling evidence of their independent origins from smaller members of the putative order “Megavirales,” refuting the speculations on the evolution of these viruses from an extinct fourth domain of cellular life.
Polintons; Megavirales; virus evolution; capsid proteins; translation
Biological information encoded in genomes is fundamentally different from and effectively orthogonal to Shannon entropy. The biologically relevant concept of information has to do with ‘meaning’, i.e. encoding various biological functions with various degree of evolutionary conservation. Apart from direct experimentation, the meaning, or biological information content, can be extracted and quantified from alignments of homologous nucleotide or amino acid sequences but generally not from a single sequence, using appropriately modified information theoretical formulae. For short, information encoded in genomes is defined vertically but not horizontally. Informally but substantially, biological information density seems to be equivalent to ‘meaning’ of genomic sequences that spans the entire range from sharply defined, universal meaning to effective meaninglessness. Large fractions of genomes, up to 90% in some plants, belong within the domain of fuzzy meaning. The sequences with fuzzy meaning can be recruited for various functions, with the meaning subsequently fixed, and also could perform generic functional roles that do not require sequence conservation. Biological meaning is continuously transferred between the genomes of selfish elements and hosts in the process of their coevolution. Thus, in order to adequately describe genome function and evolution, the concepts of information theory have to be adapted to incorporate the notion of meaning that is central to biology.
information; meaning; evolution; selfish elements
In a series of conceptual articles published around the millennium, Carl Woese emphasized that evolution of cells is the central problem of evolutionary biology, that the three-domain ribosomal tree of life is an essential framework for reconstructing cellular evolution, and that the evolutionary dynamics of functionally distinct cellular systems are fundamentally different, with the information processing systems “crystallizing” earlier than operational systems. The advances of evolutionary genomics over the last decade vindicate major aspects of Woese’s vision. Despite the observations of pervasive horizontal gene transfer among bacteria and archaea, the ribosomal tree of life comes across as a central statistical trend in the “forest” of phylogenetic trees of individual genes, and hence, an appropriate scaffold for evolutionary reconstruction. The evolutionary stability of information processing systems, primarily translation, becomes ever more striking with the accumulation of comparative genomic data indicating that nearly allof the few universal genes encode translation system components. Woese’s view on the fundamental distinctions between the three domains of cellular life also withstand the test of comparative genomics, although his non-acceptance of symbiogenetic scenarios for the origin of eukaryotes might not. Above all, Woese’s key prediction that understanding evolution of microbes will be the core of the new evolutionary biology appears to be materializing.
Darwinian threshold; cellular evolution; domains of life; evolutionary transitions; horizontal gene transfer; progenote
The CRISPR-Cas system of prokaryotic adaptive immunity displays features of a mechanism for directional, Lamarckian evolution. Indeed, this system modifies a specific locus in a bacterial or archaeal genome by inserting a piece of foreign DNA into a CRISPR array which results in acquired, heritable resistance to the cognate selfish element. A key element of the Lamarckian scheme is the specificity and directionality of the mutational process whereby an environmental cue causes only mutations that provide specific adaptations to the original challenge. In the case of adaptive immunity, the specificity of mutations is equivalent to self-nonself discrimination. Recent studies on the CRISPR mechanism have shown that the levels of discrimination can substantially differ such that in some CRISPR-Cas variants incorporation of DNA is random whereas discrimination occurs by selection of cells that carry cognate inserts. In other systems, a higher level of specificity appears to be achieved via specialized mechanisms. These findings emphasize the continuity between random and directed mutations and the critical importance of evolved mechanisms that govern the mutational process.
Reviewers: This article has been reviewed by Yitzhak Pilpel, Martijn Huynen, and Bojan Zagrovic.
CRISPR-Cas; Self-nonself discrimination; Lamarckian evolution; Darwinian evolution; DNA repair
Accurate inference of orthologous genes is a pre-requisite for most comparative genomics studies, and is also important for functional annotation of new genomes. Identification of orthologous gene sets typically involves phylogenetic tree analysis, heuristic algorithms based on sequence conservation, synteny analysis, or some combination of these approaches. The most direct tree-based methods typically rely on the comparison of an individual gene tree with a species tree. Once the two trees are accurately constructed, orthologs are straightforwardly identified by the definition of orthology as those homologs that are related by speciation, rather than gene duplication, at their most recent point of origin. Although ideal for the purpose of orthology identification in principle, phylogenetic trees are computationally expensive to construct for large numbers of genes and genomes, and they often contain errors, especially at large evolutionary distances. Moreover, in many organisms, in particular prokaryotes and viruses, evolution does not appear to have followed a simple ‘tree-like’ mode, which makes conventional tree reconciliation inapplicable. Other, heuristic methods identify probable orthologs as the closest homologous pairs or groups of genes in a set of organisms. These approaches are faster and easier to automate than tree-based methods, with efficient implementations provided by graph-theoretical algorithms enabling comparisons of thousands of genomes. Comparisons of these two approaches show that, despite conceptual differences, they produce similar sets of orthologs, especially at short evolutionary distances. Synteny also can aid in identification of orthologs. Often, tree-based, sequence similarity- and synteny-based approaches can be combined into flexible hybrid methods.
homolog; ortholog; paralog; xenolog; orthologous groups; tree reconciliation; comparative genomics
Casposons are a superfamily of putative self-synthesizing transposable elements that are predicted to employ a homolog of Cas1 protein as a recombinase and could have contributed to the origin of the CRISPR-Cas adaptive immunity systems in archaea and bacteria. Casposons remain uncharacterized experimentally, except for the recent demonstration of the integrase activity of the Cas1 homolog, and given their relative rarity in archaea and bacteria, original comparative genomic analysis has not provided direct indications of their mobility. Here, we report evidence of casposon mobility obtained by comparison of the genomes of 62 strains of the archaeon Methanosarcina mazei. In these genomes, casposons are variably inserted in three distinct sites indicative of multiple, recent gains, and losses. Some casposons are inserted into other mobile genetic elements that might provide vehicles for horizontal transfer of the casposons. Additionally, many M. mazei genomes contain previously undetected solo terminal inverted repeats that apparently are derived from casposons and could resemble intermediates in CRISPR evolution. We further demonstrate the sequence specificity of casposon insertion and note clear parallels with the adaptation mechanism of CRISPR-Cas. Finally, besides identifying additional representatives in each of the three originally defined families, we describe a new, fourth, family of casposons.
casposons; self-synthesizing transposons; CRISPR-Cas; mobile genetic elements; transposition
Argonaute proteins are conserved throughout all domains of life. Recently characterized prokaryotic Argonaute proteins (pAgos) participate in host defense by DNA interference, whereas eukaryotic Argonaute proteins (eAgos) control a wide range of processes by RNA interference. Here we review molecular mechanisms of guide and target binding by Argonaute proteins, and describe how the conformational changes induced by target binding lead to target cleavage. On the basis of structural comparisons and phylogenetic analyses of pAgos and eAgos, we reconstruct the evolutionary journey of the Argonaute proteins through the three domains of life and discuss how different structural features of pAgos and eAgos relate to their distinct physiological roles.
Evolution and maintenance of genetic recombination and its relation to the mutational process is a long-standing, fundamental problem in evolutionary biology that is linked to the general problem of evolution of evolvability. We explored a stochastic model of the evolution of recombination using additive fitness and infinite allele assumptions but no assumptions on the sign or magnitude of the epistasis and the distribution of mutation effects. In this model, fluctuating negative epistasis and predominantly deleterious mutations arise naturally as a consequence of the additive fitness and a reservoir from which new alleles arrive with a fixed distribution of fitness effects. Analysis of the model revealed a nonmonotonic effect of recombination intensity on fitness, with an optimal recombination rate value which maximized fitness in steady state. The optimal recombination rate depended on the mutation rate and was evolvable, that is, subject to selection. The predictions of the model were compatible with the observations on the dependence between genome rearrangement rate and gene flux in microbial genomes.
recombination; allele replacement; evolvability; gene flux; genome; rearrangement
The elaborate eukaryotic DNA replication machinery evolved from the archaeal ancestors that themselves show considerable complexity. Here we discuss the comparative genomic and phylogenetic analysis of the core replication enzymes, the DNA polymerases, in archaea and their relationships with the eukaryotic polymerases. In archaea, there are three groups of family B DNA polymerases, historically known as PolB1, PolB2 and PolB3. All three groups appear to descend from the last common ancestors of the extant archaea but their subsequent evolutionary trajectories seem to have been widely different. Although PolB3 is present in all archaea, with the exception of Thaumarchaeota, and appears to be directly involved in lagging strand replication, the evolution of this gene does not follow the archaeal phylogeny, conceivably due to multiple horizontal transfers and/or dramatic differences in evolutionary rates. In contrast, PolB1 is missing in Euryarchaeota but otherwise seems to have evolved vertically. The third archaeal group of family B polymerases, PolB2, includes primarily proteins in which the catalytic centers of the polymerase and exonuclease domains are disrupted and accordingly the enzymes appear to be inactivated. The members of the PolB2 group are scattered across archaea and might be involved in repair or regulation of replication along with inactivated members of the RadA family ATPases and an additional, uncharacterized protein that are encoded within the same predicted operon. In addition to the family B polymerases, all archaea, with the exception of the Crenarchaeota, encode enzymes of a distinct family D the origin of which is unclear. We examine multiple considerations that appear compatible with the possibility that family D polymerases are highly derived homologs of family B. The eukaryotic DNA polymerases show a highly complex relationship with their archaeal ancestors including contributions of proteins and domains from both the family B and the family D archaeal polymerases.
DNA replication; archaea; mobile genetic elements; DNA polymerases; enzyme inactivation
The brain is built from a large number of cell types which have been historically classified using location, morphology and molecular markers. Recent research suggests an important role of epigenetics in shaping and maintaining cell identity in the brain. To elucidate the role of DNA methylation in neuronal differentiation, we developed a new protocol for separation of nuclei from the two major populations of human prefrontal cortex neurons—GABAergic interneurons and glutamatergic (GLU) projection neurons. Major differences between the neuronal subtypes were revealed in CpG, non-CpG and hydroxymethylation (hCpG). A dramatically greater number of undermethylated CpG sites in GLU versus GABA neurons were identified. These differences did not directly translate into differences in gene expression and did not stem from the differences in hCpG methylation, as more hCpG methylation was detected in GLU versus GABA neurons. Notably, a comparable number of undermethylated non-CpG sites were identified in GLU and GABA neurons, and non-CpG methylation was a better predictor of subtype-specific gene expression compared to CpG methylation. Regions that are differentially methylated in GABA and GLU neurons were significantly enriched for schizophrenia risk loci. Collectively, our findings suggest that functional differences between neuronal subtypes are linked to their epigenetic specification.
The nucleocytoplasmic large DNA viruses (NCLDVs) comprise a monophyletic group of viruses that infect animals and diverse unicellular eukaryotes. The NCLDV group includes the families Poxviridae, Asfarviridae, Iridoviridae, Ascoviridae, Phycodnaviridae, Mimiviridae and the proposed family “Marseilleviridae”. The family Mimiviridae includes the largest known viruses, with genomes in excess of one megabase, whereas the genome size in the other NCLDV families varies from 100 to 400 kilobase pairs. Most of the NCLDVs replicate in the cytoplasm of infected cells, within so-called virus factories. The NCLDVs share a common ancient origin, as demonstrated by evolutionary reconstructions that trace approximately 50 genes encoding key proteins involved in viral replication and virion formation to the last common ancestor of all these viruses. Taken together, these characteristics lead us to propose assigning an official taxonomic rank to the NCLDVs as the order “Megavirales”, in reference to the large size of the virions and genomes of these viruses.
The rapidly growing metagenomic databases provide increasing opportunities for computational discovery of new groups of organisms. Identification of new viruses is particularly straightforward given the comparatively small size of viral genomes, although fast evolution of viruses complicates the analysis of novel sequences. Here we report the metagenomic discovery of a distinct group of diverse viruses that are distantly related to the eukaryotic virus-like transposons of the Polinton superfamily.
The sequence of the putative major capsid protein (MCP) of the unusual linear virophage associated with Phaeocystis globosa virus (PgVV) was used as a bait to identify potential related viruses in metagenomic databases. Assembly of the contigs encoding the PgVV MCP homologs followed by comprehensive sequence analysis of the proteins encoded in these contigs resulted in the identification of a large group of Polinton-like viruses (PLV) that resemble Polintons (polintoviruses) and virophages in genome size, and share with them a conserved minimal morphogenetic module that consists of major and minor capsid proteins and the packaging ATPase. With a single exception, the PLV lack the retrovirus-type integrase that is encoded in the genomes of all Polintons and the Mavirus group of virophages. However, some PLV encode a newly identified tyrosine recombinase-integrase that is common in bacteria and bacteriophages and is also found in the Organic Lake virophage group. Although several PLV genomes and individual genes are integrated into algal genomes, it appears likely that most of the PLV are viruses. Given the absence of protease and retrovirus-type integrase, the PLV could resemble the ancestral polintoviruses that evolved from bacterial tectiviruses. Apart from the conserved minimal morphogenetic module, the PLV widely differ in their genome complements but share a gene network with Polintons and virophages, suggestive of multiple gene exchanges within a shared gene pool.
The discovery of PLV substantially expands the emerging class of eukaryotic viruses and transposons that also includes Polintons and virophages. This class of selfish elements is extremely widespread and might have been a hotbed of eukaryotic virus, transposon and plasmid evolution. New families of these elements are expected to be discovered.
Electronic supplementary material
The online version of this article (doi:10.1186/s12915-015-0207-4) contains supplementary material, which is available to authorized users.
Viruses are the most abundant and genetically diverse biological entities on earth, yet the repertoire of viral proteins remains poorly explored. As the number of sequenced virus genomes grows into the thousands, and the number of viral proteins into the hundreds of thousands, we report a systematic computational analysis of the point of first-contact between viruses and their hosts, namely viral transmembrane (TM) proteins.
The complement of α-helical TM proteins in double-stranded DNA viruses infecting bacteria and archaea reveals large-scale trends that differ from those of their hosts. Viruses typically encode a substantially lower fraction of TM proteins than archaea or bacteria, with the notable exception of viruses with virions containing a lipid component such as a lipid envelope, internal lipid core, or inner membrane vesicle. Compared to bacteriophages, archaeal viruses are substantially enriched in membrane proteins. However, this feature is not always stable throughout the evolution of a viral lineage; for example, TM proteins are not part of the common heritage shared between Lipothrixviridae and Rudiviridae. In contrast to bacteria and archaea, viruses almost completely lack proteins with complicated membrane topologies composed of more than 4 TM segments, with the few detected exceptions being obvious cases of relatively recent horizontal transfer from the host.
The dramatic differences between the membrane proteomes of cells and viruses stem from the fact that viruses do not depend on essential membranes for energy transformation, ion homeostasis, nutrient transport and signaling.
Electronic supplementary material
The online version of this article (doi:10.1186/s12859-015-0817-4) contains supplementary material, which is available to authorized users.
Bacteriophages; Viruses; Comparative genomics; Transmembrane proteins; Membrane proteome
The CRISPR-Cas (clustered regularly interspaced short palindromic repeats, CRISPR-associated genes) is an adaptive immunity system in bacteria and archaea that functions via a distinct self-non-self recognition mechanism that is partially analogous to the mechanism of eukaryotic RNA interference (RNAi). The CRISPR-Cas system incorporates fragments of virus or plasmid DNA into the CRISPR repeat cassettes and employs the processed transcripts of these spacers as guide RNAs to cleave the cognate foreign DNA or RNA. The Cas proteins, however, are not homologous to the proteins involved in RNAi and comprise numerous, highly diverged families. The majority of the Cas proteins contain diverse variants of the RNA recognition motif (RRM), a widespread RNA-binding domain. Despite the fast evolution that is typical of the cas genes, the presence of diverse versions of the RRM in most Cas proteins provides for a simple scenario for the evolution of the three distinct types of CRISPR-cas systems. In addition to several proteins that are directly implicated in the immune response, the cas genes encode a variety of proteins that are homologous to prokaryotic toxins that typically possess nuclease activity. The predicted toxins associated with CRISPR-Cas systems include the essential Cas2 protein, proteins of COG1517 that, in addition to a ligand-binding domain and a helix-turn-helix domain, typically contain different nuclease domains and several other predicted nucleases. The tight association of the CRISPR-Cas immunity systems with predicted toxins that, upon activation, would induce dormancy or cell death suggests that adaptive immunity and dormancy/suicide response are functionally coupled. Such coupling could manifest in the persistence state being induced and potentially providing conditions for more effective action of the immune system or in cell death being triggered when immunity fails.
CRISPR-Cas; adaptive immunity; innate immunity; programmed cell death; dormancy; RRM domain
CRISPR-Cas adaptive immunity systems of bacteria and archaea insert fragments of virus or plasmid DNA as spacer sequences into CRISPR repeat loci. Processed transcripts encompassing these spacers guide the cleavage of the cognate foreign DNA or RNA. Most CRISPR-Cas loci, in addition to recognized cas genes, also include genes that are not directly implicated in spacer acquisition, CRISPR transcript processing or interference. Here we comprehensively analyze sequences, structures and genomic neighborhoods of one of the most widespread groups of such genes that encode proteins containing a predicted nucleotide-binding domain with a Rossmann-like fold, which we denote CARF (CRISPR-associated Rossmann fold). Several CARF protein structures have been determined but functional characterization of these proteins is lacking. The CARF domain is most frequently combined with a C-terminal winged helix-turn-helix DNA-binding domain and “effector” domains most of which are predicted to possess DNase or RNase activity. Divergent CARF domains are also found in RtcR proteins, sigma-54 dependent regulators of the rtc RNA repair operon. CARF genes frequently co-occur with those coding for proteins containing the WYL domain with the Sm-like SH3 β-barrel fold, which is also predicted to bind ligands. CRISPR-Cas and possibly other defense systems are predicted to be transcriptionally regulated by multiple ligand-binding proteins containing WYL and CARF domains which sense modified nucleotides and nucleotide derivatives generated during virus infection. We hypothesize that CARF domains also transmit the signal from the bound ligand to the fused effector domains which attack either alien or self nucleic acids, resulting, respectively, in immunity complementing the CRISPR-Cas action or in dormancy/programmed cell death.
CRISPR; Rossmann fold; beta barrel; DNA-binding proteins; phage defense
The archaeal DNA replication system shows an unexpected level of complexity. In fact, there is a close correspondence between components of the archaeal and eukaryotic replication systems.
Recent advances in the characterization of the archaeal DNA replication system together with comparative genomic analysis have led to the identification of several previously uncharacterized archaeal proteins involved in replication and currently reveal a nearly complete correspondence between the components of the archaeal and eukaryotic replication machineries. It can be inferred that the archaeal ancestor of eukaryotes and even the last common ancestor of all extant archaea possessed replication machineries that were comparable in complexity to the eukaryotic replication system. The eukaryotic replication system encompasses multiple paralogs of ancestral components such that heteromeric complexes in eukaryotes replace archaeal homomeric complexes, apparently along with subfunctionalization of the eukaryotic complex subunits. In the archaea, parallel, lineage-specific duplications of many genes encoding replication machinery components are detectable as well; most of these archaeal paralogs remain to be functionally characterized. The archaeal replication system shows remarkable plasticity whereby even some essential components such as DNA polymerase and single-stranded DNA-binding protein are displaced by unrelated proteins with analogous activities in some lineages.
Many proteins of viruses infecting hyperthermophilic Crenarchaeota have no detectable homologs in current databases, hampering our understanding of viral evolution. We used sensitive database search methods and structural modeling to show that a nucleocapsid protein (TP1) of Thermoproteus tenax virus 1 (TTV1) is a derivative of the Cas4 nuclease, a component of the CRISPR-Cas adaptive immunity system that is encoded also by several archaeal viruses. In TTV1, the Cas4 gene was split into two, with the N-terminal portion becoming TP1, and lost some of the catalytic amino acid residues, apparently resulting in the inactivation of the nuclease. To our knowledge, this is the first described case of exaptation of an enzyme for a virus capsid protein function.
This article was reviewed by Vivek Anantharaman, Christine Orengo and Mircea Podar. For complete reviews, see the Reviewers’ reports section.
Electronic supplementary material
The online version of this article (doi:10.1186/s13062-015-0093-2) contains supplementary material, which is available to authorized users.
Virus evolution; Capsid proteins; Nucleocapsid; Virus origin; Archaea viruses
The relationship between the selection affecting codon usage and selection on protein sequences of orthologous genes in diverse groups of bacteria and archaea was examined by using the Alignable Tight Genome Clusters database of prokaryote genomes. The codon usage bias is generally low, with 57.5% of the gene-specific optimal codon frequencies (Fopt) being below 0.55. This apparent weak selection on codon usage contrasts with the strong purifying selection on amino acid sequences, with 65.8% of the gene-specific dN/dS ratios being below 0.1. For most of the genomes compared, a limited but statistically significant negative correlation between Fopt and dN/dS was observed, which is indicative of a link between selection on protein sequence and selection on codon usage. The strength of the coupling between the protein level selection and codon usage bias showed a strong positive correlation with the genomic GC content. Combined with previous observations on the selection for GC-rich codons in bacteria and archaea with GC-rich genomes, these findings suggest that selection for translational fine-tuning could be an important factor in microbial evolution that drives the evolution of genome GC content away from mutational equilibrium. This type of selection is particularly pronounced in slowly evolving, “high-status” genes. A significantly stronger link between the two aspects of selection is observed in free-living bacteria than in parasitic bacteria and in genes encoding metabolic enzymes and transporters than in informational genes. These differences might reflect the special importance of translational fine-tuning for the adaptability of gene expression to environmental changes. The results of this work establish the coupling between protein level selection and selection for translational optimization as a distinct and potentially important factor in microbial evolution.
Selection affects the evolution of microbial genomes at many levels, including both the structure of proteins and the regulation of their production. Here we demonstrate the coupling between the selection on protein sequences and the optimization of codon usage in a broad range of bacteria and archaea. The strength of this coupling varies over a wide range and strongly and positively correlates with the genomic GC content. The cause(s) of the evolution of high GC content is a long-standing open question, given the universal mutational bias toward AT. We propose that optimization of codon usage could be one of the key factors that determine the evolution of GC-rich genomes. This work establishes the coupling between selection at the level of protein sequence and at the level of codon choice optimization as a distinct aspect of genome evolution.
Cyanobacteria of the Synechococcus and Prochlorococcus genera are important contributors to photosynthetic productivity in the open oceans1-3. Here, using pre-existing metagenomic datasets from the global ocean sampling (GOS) expedition4 as well as from viral biomes5, we show the first evidence for the presence of photosystem I (PSI) genes in genomes of marine viruses that infect these marine cyanobacteria. Recently, core photosystem II (PSII) genes were identified in cyanophages; they were proposed to be functional in photosynthesis and in increasing viral fitness by supplementing the host production of these proteins6-9. The 7 cyanobacterial core PSI genes identified in this study, psaA, B, C, D, E, K and a unique J and F fusion, form a distinctive cluster in cyanophage genomes, suggestive of selection for a distinct function in virus life cycle. The existence of this PSI cluster was confirmed with overlapping and long PCR performed on environmental DNA from the Northern Line Islands. Potentially, the 7 proteins encoded by the viral genes are sufficient to form an intact monomeric PSI complex. Projection of viral predicted peptides on the cyanobacterial PSI crystal structure10 suggested that the viral-PSI components may provide a unique way of funneling reducing power from respiratory and other electron transfer chains to PSI.