Because prokaryotic genomes experience a rapid flux of genes, selection may act at a higher level than an individual genome. We explore a quantitative model of the distributed genome whereby groups of genomes evolve by acquiring genes from a fixed reservoir which we denote as supergenome. Previous attempts to understand the nature of the supergenome treated genomes as random, independent collections of genes and assumed that the supergenome consists of a small number of homogeneous sub-reservoirs. Here we explore the consequences of relaxing both assumptions.
We surveyed several methods for estimating the size and composition of the supergenome. The methods assumed that genomes were either random, independent samples of the supergenome or that they evolved from a common ancestor along a known tree via stochastic sampling from the reservoir. The reservoir was assumed to be either a collection of homogeneous sub-reservoirs or alternatively composed of genes with Gamma distributed gain probabilities. Empirical gene frequencies were used to either compute the likelihood of the data directly or first to reconstruct the history of gene gains and then compute the likelihood of the reconstructed numbers of gains.
Supergenome size estimates using the empirical gene frequencies directly are not robust with respect to the choice of the model. By contrast, using the gene frequencies and the phylogenetic tree to reconstruct multiple gene gains produces reliable estimates of the supergenome size and indicates that a homogeneous supergenome is more consistent with the data than a supergenome with Gamma distributed gain probabilities.
supergenome; genome evolution; gene frequency distribution; ancestral reconstruction
Gene fusions can be used as tools for functional prediction and also as evolutionary markers. Fused genes often show a scattered phyletic distribution, which suggests a role for processes other than vertical inheritance in their evolution.
The evolutionary history of gene fusions was studied by phylogenetic analysis of the domains in the fused proteins and the orthologous domains that form stand-alone proteins. Clustering of fusion components from phylogenetically distant species was construed as evidence of dissemination of the fused genes by horizontal transfer. Of the 51 examined gene fusions that are represented in at least two of the three primary kingdoms (Bacteria, Archaea and Eukaryota), 31 were most probably disseminated by cross-kingdom horizontal gene transfer, whereas 14 appeared to have evolved independently in different kingdoms and two were probably inherited from the common ancestor of modern life forms. On many occasions, the evolutionary scenario also involves one or more secondary fissions of the fusion gene. For approximately half of the fusions, stand-alone forms of the fusion components are encoded by juxtaposed genes, which are known or predicted to belong to the same operon in some of the prokaryotic genomes. This indicates that evolution of gene fusions often, if not always, involves an intermediate stage, during which the future fusion components exist as juxtaposed and co-regulated, but still distinct, genes within operons.
These findings suggest a major role for horizontal transfer of gene fusions in the evolution of protein-domain architectures, but also indicate that independent fusions of the same pair of domains in distant species is not uncommon, which suggests positive selection for the multidomain architectures.
Computational predictions are critical for directing the experimental study of protein functions. Therefore it is paradoxical when an apparently erroneous computational prediction seems to be supported by experiment.
We analyzed six cases where application of novel or conventional computational methods for protein sequence and structure analysis led to non-trivial predictions that were subsequently supported by direct experiments. We show that, on all six occasions, the original prediction was unjustified, and in at least three cases, an alternative, well-supported computational prediction, incompatible with the original one, could be derived. The most unusual cases involved the identification of an archaeal cysteinyl-tRNA synthetase, a dihydropteroate synthase and a thymidylate synthase, for which experimental verifications of apparently erroneous computational predictions were reported. Using sequence-profile analysis, multiple alignment and secondary-structure prediction, we have identified the unique archaeal 'cysteinyl-tRNA synthetase' as a homolog of extracellular polygalactosaminidases, and the 'dihydropteroate synthase' as a member of the β-lactamase-like superfamily of metal-dependent hydrolases.
In each of the analyzed cases, the original computational predictions could be refuted and, in some instances, alternative strongly supported predictions were obtained. The nature of the experimental evidence that appears to support these predictions remains an open question. Some of these experiments might signify discovery of extremely unusual forms of the respective enzymes, whereas the results of others could be due to artifacts.
Genomes of bacteria and archaea (collectively, prokaryotes) appear to exist in incessant flux, expanding via horizontal gene transfer and gene duplication, and contracting via gene loss. However, the actual rates of genome dynamics and relative contributions of different types of event across the diversity of prokaryotes are largely unknown, as are the sizes of microbial supergenomes, i.e. pools of genes that are accessible to the given microbial species.
We performed a comprehensive analysis of the genome dynamics in 35 groups (34 bacterial and one archaeal) of closely related microbial genomes using a phylogenetic birth-and-death maximum likelihood model to quantify the rates of gene family gain and loss, as well as expansion and reduction. The results show that loss of gene families dominates the evolution of prokaryotes, occurring at approximately three times the rate of gain. The rates of gene family expansion and reduction are typically seven and twenty times less than the gain and loss rates, respectively. Thus, the prevailing mode of evolution in bacteria and archaea is genome contraction, which is partially compensated by the gain of new gene families via horizontal gene transfer. However, the rates of gene family gain, loss, expansion and reduction vary within wide ranges, with the most stable genomes showing rates about 25 times lower than the most dynamic genomes. For many groups, the supergenome estimated from the fraction of repetitive gene family gains includes about tenfold more gene families than the typical genome in the group although some groups appear to have vast, ‘open’ supergenomes.
Reconstruction of evolution for groups of closely related bacteria and archaea reveals an extremely rapid and highly variable flux of genes in evolving microbial genomes, demonstrates that extensive gene loss and horizontal gene transfer leading to innovation are the two dominant evolutionary processes, and yields robust estimates of the supergenome size.
Electronic supplementary material
The online version of this article (doi:10.1186/s12915-014-0066-4) contains supplementary material, which is available to authorized users.
Ribosomal proteins are encoded in all genomes of cellular life forms and are, generally, well conserved during evolution. In prokaryotes, the genes for most ribosomal proteins are clustered in several highly conserved operons, which ensures efficient co-regulation of their expression. Duplications of ribosomal-protein genes are infrequent, and given their coordinated expression and functioning, it is generally assumed that ribosomal-protein genes are unlikely to undergo horizontal transfer. However, with the accumulation of numerous complete genome sequences of prokaryotes, several paralogous pairs of ribosomal protein genes have been identified. Here we analyze all such cases and attempt to reconstruct the evolutionary history of these ribosomal proteins.
Complete bacterial genomes were searched for duplications of ribosomal proteins. Ribosomal proteins L36, L33, L31, S14 are each duplicated in several bacterial genomes and ribosomal proteins L11, L28, L7/L12, S1, S15, S18 are so far duplicated in only one genome each. Sequence analysis of the four ribosomal proteins, for which paralogs were detected in several genomes, two of the ribosomal proteins duplicated in one genome (L28 and S18), and the ribosomal protein L32 showed that each of them comes in two distinct versions. One form contains a predicted metal-binding Zn-ribbon that consists of four conserved cysteines (in some cases replaced by histidines), whereas, in the second form, these metal-chelating residues are completely or partially replaced. Typically, genomes containing paralogous genes for these ribosomal proteins encode both versions, designated C+ and C-, respectively. Analysis of phylogenetic trees for these seven ribosomal proteins, combined with comparison of genomic contexts for the respective genes, indicates that in most, if not all cases, their evolution involved a duplication of the ancestral C+ form early in bacterial evolution, with subsequent alternative loss of the C+ and C- forms in different lineages. Additionally, evidence was obtained for a role of horizontal gene transfer in the evolution of these ribosomal proteins, with multiple cases of gene displacement 'in situ', that is, without a change of the gene order in the recipient genome.
A more complex picture of evolution of bacterial ribosomal proteins than previously suspected is emerging from these results, with major contributions of lineage-specific gene loss and horizontal gene transfer. The recurrent theme of emergence and disruption of Zn-ribbons in bacterial ribosomal proteins awaits a functional interpretation.
Microbial genomes encompass a sizable fraction of poorly characterized, narrowly spread fast-evolving genes. Using sensitive methods for sequences comparison and protein structure prediction, we performed a detailed comparative analysis of clusters of such genes, which we denote “dark matter islands”, in archaeal genomes. The dark matter islands comprise up to 20 % of archaeal genomes and show remarkable heterogeneity and diversity. Nevertheless, three classes of entities are common in these genomic loci: (a) integrated viral genomes and other mobile elements; (b) defense systems, and (c) secretory and other membrane-associated systems. The dark matter islands in the genome of thermophiles and mesophiles show similar general trends of gene content, but thermophiles are substantially enriched in predicted membrane proteins whereas mesophiles have a greater proportion of recognizable mobile elements. Based on this analysis, we predict the existence of several novel groups of viruses and mobile elements, previously unnoticed variants of CRISPR-Cas immune systems, and new secretory systems that might be involved in stress response, intermicrobial conflicts and biogenesis of novel, uncharacterized membrane structures.
Electronic supplementary material
The online version of this article (doi:10.1007/s00792-014-0672-7) contains supplementary material, which is available to authorized users.
Archaeal genomes; ORFans; Genomic islands; Integration; Viruses; Defense
A dramatic increase in the prevalence of autism and Autistic Spectrum Disorders (ASD) has been observed over the last two decades in USA, Europe and Asia. Given the accumulating data on the possible role of translation in the etiology of ASD, we analyzed potential effects of rare synonymous substitutions associated with ASD on mRNA stability, splicing enhancers and silencers, and codon usage.
Presentation of the hypothesis
We hypothesize that subtle impairment of translation, resulting in dosage imbalance of neuron-specific proteins, contributes to the etiology of ASD synergistically with environmental neurotoxins.
Testing the hypothesis
A statistically significant shift from optimal to suboptimal codons caused by rare synonymous substitutions associated with ASD was detected whereas no effect on other analyzed characteristics of transcripts was identified. This result suggests that the impact of rare codons on the translation of genes involved in neuron development, even if slight in magnitude, could contribute to the pathogenesis of ASD in the presence of an aggressive chemical background. This hypothesis could be tested by further analysis of ASD-associated mutations, direct biochemical characterization of their effects, and assessment of in vivo effects on animal models.
Implications of the hypothesis
It seems likely that the synergistic action of environmental hazards with genetic variations that in themselves have limited or no deleterious effects but are potentiated by the environmental factors is a general principle that underlies the alarming increase in the ASD prevalence.
This article was reviewed by Andrey Rzhetsky, Neil R. Smalheiser, and Shamil R. Sunyaev.
Synonymous mutations; Single nucleotide polymorphism; Codon usage; Splicing enhancer; Splicing silencer; mRNA secondary structure; Transcription factor binding; Neurotoxin
The CRISPR-Cas systems of adaptive antivirus immunity are present in most archaea and many bacteria, and provide resistance to specific viruses or plasmids by inserting fragments of foreign DNA into the host genome and then utilizing transcripts of these spacers to inactivate the cognate foreign genome. The recent development of powerful genome engineering tools on the basis of CRISPR-Cas has sharply increased the interest in the diversity and evolution of these systems. Comparative genomic data indicate that during evolution of prokaryotes CRISPR-Cas loci are lost and acquired via horizontal gene transfer at high rates. Mathematical modeling and initial experimental studies of CRISPR-carrying microbes and viruses reveal complex coevolutionary dynamics.
We performed a bifurcation analysis of models of coevolution of viruses and microbial host that possess CRISPR-Cas hereditary adaptive immunity systems. The analyzed Malthusian and logistic models display complex, and in particular, quasi-chaotic oscillation regimes that have not been previously observed experimentally or in agent-based models of the CRISPR-mediated immunity. The key factors for the appearance of the quasi-chaotic oscillations are the non-linear dependence of the host immunity on the virus load and the partitioning of the hosts into the immune and susceptible populations, so that the system consists of three components.
Bifurcation analysis of CRISPR-host coevolution model predicts complex regimes including quasi-chaotic oscillations. The quasi-chaotic regimes of virus-host coevolution are likely to be biologically relevant given the evolutionary instability of the CRISPR-Cas loci revealed by comparative genomics. The results of this analysis might have implications beyond the CRISPR-Cas systems, i.e. could describe the behavior of any adaptive immunity system with a heritable component, be it genetic or epigenetic. These predictions are experimentally testable.
This manuscript was reviewed by Sandor Pongor, Sergei Maslov and Marek Kimmel. For the complete reports, go to the Reviewers’ Reports section.
The nucleocytoplasmic large DNA viruses (NCLDVs) comprise a monophyletic group of viruses that infect animals and diverse unicellular eukaryotes. The NCLDV group includes the families Poxviridae, Asfarviridae, Iridoviridae, Ascoviridae, Phycodnaviridae, Mimiviridae and the proposed family “Marseilleviridae”. The family Mimiviridae includes the largest known viruses, with genomes in excess of one megabase, whereas the genome size in the other NCLDV families varies from 100 to 400 kilobase pairs. Most of the NCLDVs replicate in the cytoplasm of infected cells, within so-called virus factories. The NCLDVs share a common ancient origin, as demonstrated by evolutionary reconstructions that trace approximately 50 genes encoding key proteins involved in viral replication and virion formation to the last common ancestor of all these viruses. Taken together, these characteristics lead us to propose assigning an official taxonomic rank to the NCLDVs as the order “Megavirales”, in reference to the large size of the virions and genomes of these viruses.
Single-stranded (ss)DNA viruses are extremely widespread, infect diverse hosts from all three domains of life and include important pathogens. Most ssDNA viruses possess small genomes that replicate by the rolling-circle-like mechanism initiated by a distinct virus-encoded endonuclease. However, viruses of the family Bidnaviridae, instead of the endonuclease, encode a protein-primed type B DNA polymerase (PolB) and hence break this pattern. We investigated the provenance of all bidnavirus genes and uncover an unexpected turbulent evolutionary history of these unique viruses. Our analysis strongly suggests that bidnaviruses evolved from a parvovirus ancestor from which they inherit a jelly-roll capsid protein and a superfamily 3 helicase. The radiation of bidnaviruses from parvoviruses was probably triggered by integration of the ancestral parvovirus genome into a large virus-derived DNA transposon of the Polinton (polintovirus) family resulting in the acquisition of the polintovirus PolB gene along with terminal inverted repeats. Bidnavirus genes for a receptor-binding protein and a potential novel antiviral defense modulator are derived from dsRNA viruses (Reoviridae) and dsDNA viruses (Baculoviridae), respectively. The unusual evolutionary history of bidnaviruses emphasizes the key role of horizontal gene transfer, sometimes between viruses with completely different genomes but occupying the same niche, in the emergence of new viral types.
Diverse transposable elements are abundant in genomes of cellular organisms from all three domains of life. Although transposons are often regarded as junk DNA, a growing body of evidence indicates that they are behind some of the major evolutionary innovations. With the growth in the number and diversity of sequenced genomes, previously unnoticed mobile elements continue to be discovered.
We describe a new superfamily of archaeal and bacterial mobile elements which we denote casposons because they encode Cas1 endonuclease, a key enzyme of the CRISPR-Cas adaptive immunity systems of archaea and bacteria. The casposons share several features with self-synthesizing eukaryotic DNA transposons of the Polinton/Maverick class, including terminal inverted repeats and genes for B family DNA polymerases. However, unlike any other known mobile elements, the casposons are predicted to rely on Cas1 for integration and excision, via a mechanism similar to the integration of new spacers into CRISPR loci. We identify three distinct families of casposons that differ in their gene repertoires and evolutionary provenance of the DNA polymerases. Deep branching of the casposon-encoded endonuclease in the Cas1 phylogeny suggests that casposons played a pivotal role in the emergence of CRISPR-Cas immunity.
The casposons are a novel superfamily of mobile elements, the first family of putative self-synthesizing transposons discovered in prokaryotes. The likely contribution of capsosons to the evolution of CRISPR-Cas parallels the involvement of the RAG1 transposase in vertebrate immunoglobulin gene rearrangement, suggesting that recruitment of endonucleases from mobile elements as ready-made tools for genome manipulation is a general route of evolution of adaptive immunity.
Mobile genetic elements; CRISPR-Cas system; Adaptive immunity; Transposons; Archaea; DNA polymerases
Gene evolution is traditionally considered within the framework of the molecular clock (MC) model whereby each gene is characterized by an approximately constant rate of evolution. Recent comparative analysis of numerous phylogenies of prokaryotic genes has shown that a different model of evolution, denoted the Universal PaceMaker (UPM), which postulates conservation of relative, rather than absolute evolutionary rates, yields a better fit to the phylogenetic data. Here, we show that the UPM model is a better fit than the MC for genome wide sets of phylogenetic trees from six species of Drosophila and nine species of yeast, with extremely high statistical significance. Unlike the prokaryotic phylogenies that include distant organisms and multiple horizontal gene transfers, these are simple data sets that cover groups of closely related organisms and consist of gene trees with the same topology as the species tree. The results indicate that both lineage-specific and gene-specific rates are important in genome evolution but the lineage-specific contribution is greater. Similar to the MC, the gene evolution rates under the UPM are strongly overdispersed, approximately 2-fold compared with the expectation from sampling error alone. However, we show that neither Drosophila nor yeast genes form distinct clusters in the tree space. Thus, the gene-specific deviations from the UPM, although substantial, are uncorrelated and most likely depend on selective factors that are largely unique to individual genes. Thus, the UPM appears to be a key feature of genome evolution across the history of cellular life.
molecular clock; genome evolution; phylogenetic trees; relative evolution rates
Alternative splicing (AS), alternative transcription initiation (ATI) and alternative transcription termination (ATT) create the extraordinary complexity of transcriptomes and make key contributions to the structural and functional diversity of mammalian proteomes. Analysis of mammalian genomic and transcriptomic data shows that contrary to the traditional view, the joint contribution of ATI and ATT to the transcriptome and proteome diversity is quantitatively greater than the contribution of AS. Although the mean numbers of protein-coding constitutive and alternative nucleotides in gene loci are nearly identical, their distribution along the transcripts is highly non-uniform. On average, coding exons in the variable 5′ and 3′ transcript ends that are created by ATI and ATT contain approximately four times more alternative nucleotides than core protein-coding regions that diversify exclusively via AS. Short upstream exons that encompass alternative 5′-untranslated regions and N-termini of proteins evolve under strong nucleotide-level selection whereas in 3′-terminal exons that encode protein C-termini, protein-level selection is significantly stronger. The groups of genes that are subject to ATI and ATT show major differences in biological roles, expression and selection patterns.
This article was reviewed by Lakshminarayan M. Iyer and I. King Jordan. For complete reviews, see the Reviewers’ Reports section.
Polintons (also known as Mavericks) and Tlr elements of Tetrahymena thermophila represent two families of large DNA transposons widespread in eukaryotes. Here, we show that both Polintons and Tlr elements encode two key virion proteins, the major capsid protein with the double jelly-roll fold and the minor capsid protein, known as the penton, with the single jelly-roll topology. This observation along with the previously noted conservation of the genes for viral genome packaging ATPase and adenovirus-like protease strongly suggests that Polintons and Tlr elements combine features of bona fide viruses and transposons. We propose the name ‘Polintoviruses’ to denote these putative viruses that could have played a central role in the evolution of several groups of DNA viruses of eukaryotes.
Polintons; Mavericks; Transposable elements; Double jelly-roll fold; Capsid proteins; Virus evolution
We have recently reconstructed the ‘hatcheries’ of the
first cells by combining geochemical analysis with phylogenomic scrutiny of the
inorganic ion requirements of universal components of modern cells (Mulkidjanian
et al.: Origin of first cells at terrestrial, anoxic
geothermal fields. Proc Natl Acad Sci USA 2012,
109:E821–830). These ubiquitous, and by inference primordial, proteins
and functional systems show affinity to and functional requirement for
K+, Zn2+, Mn2+, and phosphate. Thus,
protocells must have evolved in habitats with a high
K+/Na+ ratio and relatively high concentrations of Zn,
Mn and phosphorous compounds. Geochemical reconstruction shows that the ionic
composition conducive to the origin of cells could not have existed in marine
settings but is compatible with emissions of vapor-dominated zones of inland
geothermal systems. Under anoxic, CO2-dominated atmosphere, the ionic
composition of pools of cool, condensed vapor at anoxic geothermal fields would
resemble the internal milieu of modern cells. Such pools would be lined with
porous silicate minerals mixed with metal sulfides and enriched in K+
ions and phosphorous compounds.
Here we address some questions that have appeared in print after the
publication of our anoxic geothermal field scenario. We argue that anoxic
geothermal fields, which were identified as likely cradles of life by using a
top-down approach and phylogenomics analysis as a tool, could provide
geochemical conditions similar to those which were suggested as most conducive
for the emergence of life by the chemists who pursuit the complementary
The CRISPR-Cas systems of archaeal and bacterial adaptive immunity are classified into three types that differ by the repertoires of CRISPR-associated (cas) genes, the organization of cas operons and the structure of repeats in the CRISPR arrays. The simplest among the CRISPR-Cas systems is type II in which the endonuclease activities required for the interference with foreign deoxyribonucleic acid (DNA) are concentrated in a single multidomain protein, Cas9, and are guided by a co-processed dual-tracrRNA:crRNA molecule. This compact enzymatic machinery and readily programmable site-specific DNA targeting make type II systems top candidates for a new generation of powerful tools for genomic engineering. Here we report an updated census of CRISPR-Cas systems in bacterial and archaeal genomes. Type II systems are the rarest, missing in archaea, and represented in ∼5% of bacterial genomes, with an over-representation among pathogens and commensals. Phylogenomic analysis suggests that at least three cas genes, cas1, cas2 and cas4, and the CRISPR repeats of the type II-B system were acquired via recombination with a type I CRISPR-Cas locus. Distant homologs of Cas9 were identified among proteins encoded by diverse transposons, suggesting that type II CRISPR-Cas evolved via recombination of mobile nuclease genes with type I loci.
Any scenario of the transition from chemistry to biology should include an “energy module” because life can exist only when supported by energy flow(s). We addressed the problem of primordial energetics by combining physico-chemical considerations with phylogenomic analysis. We propose that the first replicators could use abiotically formed, exceptionally photostable activated nucleotides both as building blocks and as the main energy source. Nucleoside triphosphates could replace cyclic nucleotides as the principal energy-rich compounds at the stage of the first cells, presumably because the metal chelates of nucleoside triphosphates penetrated membranes much better than the respective metal complexes of nucleoside monophosphates. The ability to exploit natural energy flows for biogenic production of energy-rich molecules could evolve only gradually, after the emergence of sophisticated enzymes and ion-tight membranes. We argue that, in the course of evolution, sodium-dependent membrane energetics preceded the proton-based energetics which evolved independently in bacteria and archaea.
Horizontal gene transfer (HGT) is a major factor in the evolution of prokaryotes. An intriguing question is whether HGT is maintained during evolution of prokaryotes owing to its adaptive value or is a byproduct of selection driven by other factors such as consumption of extracellular DNA (eDNA) as a nutrient. One hypothesis posits that HGT can restore genes inactivated by mutations and thereby prevent stochastic, irreversible deterioration of genomes in finite populations known as Muller’s ratchet. To examine this hypothesis, we developed a population genetic model of prokaryotes undergoing HGT via homologous recombination. Analysis of this model indicates that HGT can prevent the operation of Muller’s ratchet even when the source of transferred genes is eDNA that comes from dead cells and on average carries more deleterious mutations than the DNA of recipient live cells. Moreover, if HGT is sufficiently frequent and eDNA diffusion sufficiently rapid, a subdivided population is shown to be more resistant to Muller’s ratchet than an undivided population of an equal overall size. Thus, to maintain genomic information in the face of Muller’s ratchet, it is more advantageous to partition individuals into multiple subpopulations and let them “cross-reference” each other’s genetic information through HGT than to collect all individuals in one population and thereby maximize the efficacy of natural selection. Taken together, the results suggest that HGT could be an important condition for the long-term maintenance of genomic information in prokaryotes through the prevention of Muller’s ratchet.
environmental DNA; evolution of transformation; competence; structured population; soil bacteria
Genome-wide comparison of phylogenetic trees is becoming an increasingly common approach in evolutionary genomics, and a variety of approaches for such comparison have been developed. In this article we present several methods for comparative analysis of large numbers of phylogenetic trees. To compare phylogenetic trees taking into account the bootstrap support for each internal branch, the Boot-Split Distance (BSD) method is introduced as an extension of the previously developed Split Distance (SD) method for tree comparison. The BSD method implements the straightforward idea that comparison of phylogenetic trees can be made more robust by treating tree splits differentially depending on the bootstrap support. Approaches are also introduced for detecting tree-like and net-like evolutionary trends in the phylogenetic Forest of Life (FOL), i.e., the entirety of the phylogenetic trees for conserved genes of prokaryotes. The principal method employed for this purpose includes mapping quartets of species onto trees to calculate the support of each quartet topology and so to quantify the tree and net contributions to the distances between species. We describe the applications methods used to analyze the FOL and the results obtained with these methods. These results support the concept of the Tree of Life (TOL) as a central evolutionary trend in the FOL as opposed to the traditional view of the TOL as a ‘species tree’.
Forest of life; tree of life; phylogenomic methods; tree comparison; map of quartets
The CRISPR-Cas-derived RNA-guided Cas9 endonuclease is the key element of an emerging promising technology for genome engineering in a broad range of cells and organisms. The DNA-targeting mechanism of the type II CRISPR-Cas system involves maturation of tracrRNA:crRNA duplex (dual-RNA), which directs Cas9 to cleave invading DNA in a sequence-specific manner, dependent on the presence of a Protospacer Adjacent Motif (PAM) on the target. We show that evolution of dual-RNA and Cas9 in bacteria produced remarkable sequence diversity. We selected eight representatives of phylogenetically defined type II CRISPR-Cas groups to analyze possible coevolution of Cas9 and dual-RNA. We demonstrate that these two components are interchangeable only between closely related type II systems when the PAM sequence is adjusted to the investigated Cas9 protein. Comparison of the taxonomy of bacterial species that harbor type II CRISPR-Cas systems with the Cas9 phylogeny corroborates horizontal transfer of the CRISPR-Cas loci. The reported collection of dual-RNA:Cas9 with associated PAMs expands the possibilities for multiplex genome editing and could provide means to improve the specificity of the RNA-programmable Cas9 tool.
The recently discovered Pandoraviruses are by far the largest viruses known, with their 2 megabase genomes exceeding in size the genomes of numerous bacteria and archaea. Pandoraviruses show a distant relationship with other nucleocytoplasmic large DNA viruses (NCLDV) of eukaryotes, lack some of the NCLDV core genes and in particular do not appear to be specifically related to the other, better characterized family of giant viruses, the Mimiviridae. Here we report phylogenetic analysis of 6 core NCLDV genes that confidently places Pandoraviruses within the family Phycodnaviridae, with an apparent specific affinity with Coccolithoviruses. We conclude that, despite their many unusual characteristics, Pandoraviruses are highly derived phycodnaviruses. These findings imply that giant viruses have independently evolved from smaller NCLDV on at least two occasions.
This article was reviewed by Patrick Forterre and Lakshminarayan Iyer. For the full reviews, see the Reviewers’ reports section.
Planctomycetes, Verrucomicrobia and Chlamydia are prokaryotic phyla that are sometimes grouped together as the PVC superphylum of eubacteria. Some PVC species possess interesting attributes, in particular, internal membranes that superficially resemble eukaryotic endomembranes. Some biologists now claim that PVC bacteria are nucleus-bearing prokaryotes and that they are evolutionary intermediates in the transition from prokaryote to eukaryote. PVC prokaryotes do not possess a nucleus and are not intermediates in the prokaryote-to-eukaryote transition. All of the PVC traits that are currently cited as evidence for aspiring eukaryoticity are either analogous (the result of convergent evolution), not homologous, to eukaryotic traits; or else they are the result of lateral gene transfers. Here we summarize the evidence that shows why most of the purported similarities between the PVC bacteria and eukaryotes are analogous and the rest are consequence of lateral gene acquisition.
We applied Illumina Human Methylation450K array to perform a genomic-scale single-site resolution DNA methylation analysis in neuronal and nonneuronal (primarily glial) nuclei separated from the orbitofrontal cortex of postmortem human brain. The findings were validated using enhanced reduced representation bisulfite sequencing. We identified thousands of sites differentially methylated (DM) between neuronal and nonneuronal cells. The DM sites were depleted within CpG-island–containing promoters but enriched in predicted enhancers. Classification of the DM sites into those undermethylated in neurons (neuronal type) and those undermethylated in nonneuronal cells (glial type), combined with findings of others that methylation within control elements typically negatively correlates with gene expression, yielded large sets of predicted neuron-specific and non–neuron-specific genes. These sets of predicted genes were in excellent agreement with the available direct measurements of gene expression in human and mouse. We also found a distinct set of DNA methylation patterns that were unique for neuronal cells. In particular, neuronal-type differential methylation was overrepresented in CpG island shores, enriched within gene bodies but not in intergenic regions, and preferentially harbored binding motifs for a distinct set of transcription factors, including neuron-specific activity-dependent factors. Finally, non-CpG methylation was substantially more prevalent in neurons than in nonneuronal cells.
Non-linear, parabolic (sub-exponential) and hyperbolic (super-exponential) models of prebiological evolution of molecular replicators have been proposed and extensively studied. The parabolic models appear to be the most realistic approximations of real-life replicator systems due primarily to product inhibition. Unlike the more traditional exponential models, the distribution of individual frequencies in an evolving parabolic population is not described by the Maximum Entropy (MaxEnt) Principle in its traditional form, whereby the distribution with the maximum Shannon entropy is chosen among all the distributions that are possible under the given constraints. We sought to identify a more general form of the MaxEnt principle that would be applicable to parabolic growth.
We consider a model of a population that reproduces according to the parabolic growth law and show that the frequencies of individuals in the population minimize the Tsallis relative entropy (non-additive information gain) at each time moment. Next, we consider a model of a parabolically growing population that maintains a constant total size and provide an “implicit” solution for this system. We show that in this case, the frequencies of the individuals in the population also minimize the Tsallis information gain at each moment of the ‘internal time” of the population.
The results of this analysis show that the general MaxEnt principle is the underlying law for the evolution of a broad class of replicator systems including not only exponential but also parabolic and hyperbolic systems. The choice of the appropriate entropy (information) function depends on the growth dynamics of a particular class of systems. The Tsallis entropy is non-additive for independent subsystems, i.e. the information on the subsystems is insufficient to describe the system as a whole. In the context of prebiotic evolution, this “non-reductionist” nature of parabolic replicator systems might reflect the importance of group selection and competition between ensembles of cooperating replicators.
This article was reviewed by Viswanadham Sridhara (nominated by Claus Wilke), Puushottam Dixit (nominated by Sergei Maslov), and Nick Grishin. For the complete reviews, see the Reviewers’ Reports section.
Replicator equation; Parabolic growth; Tsallis entropy; Non-extensive statistical mechanics; MaxEnt principle