The rapidly growing metagenomic databases provide increasing opportunities for computational discovery of new groups of organisms. Identification of new viruses is particularly straightforward given the comparatively small size of viral genomes, although fast evolution of viruses complicates the analysis of novel sequences. Here we report the metagenomic discovery of a distinct group of diverse viruses that are distantly related to the eukaryotic virus-like transposons of the Polinton superfamily.
The sequence of the putative major capsid protein (MCP) of the unusual linear virophage associated with Phaeocystis globosa virus (PgVV) was used as a bait to identify potential related viruses in metagenomic databases. Assembly of the contigs encoding the PgVV MCP homologs followed by comprehensive sequence analysis of the proteins encoded in these contigs resulted in the identification of a large group of Polinton-like viruses (PLV) that resemble Polintons (polintoviruses) and virophages in genome size, and share with them a conserved minimal morphogenetic module that consists of major and minor capsid proteins and the packaging ATPase. With a single exception, the PLV lack the retrovirus-type integrase that is encoded in the genomes of all Polintons and the Mavirus group of virophages. However, some PLV encode a newly identified tyrosine recombinase-integrase that is common in bacteria and bacteriophages and is also found in the Organic Lake virophage group. Although several PLV genomes and individual genes are integrated into algal genomes, it appears likely that most of the PLV are viruses. Given the absence of protease and retrovirus-type integrase, the PLV could resemble the ancestral polintoviruses that evolved from bacterial tectiviruses. Apart from the conserved minimal morphogenetic module, the PLV widely differ in their genome complements but share a gene network with Polintons and virophages, suggestive of multiple gene exchanges within a shared gene pool.
The discovery of PLV substantially expands the emerging class of eukaryotic viruses and transposons that also includes Polintons and virophages. This class of selfish elements is extremely widespread and might have been a hotbed of eukaryotic virus, transposon and plasmid evolution. New families of these elements are expected to be discovered.
Electronic supplementary material
The online version of this article (doi:10.1186/s12915-015-0207-4) contains supplementary material, which is available to authorized users.
Viruses are the most abundant and genetically diverse biological entities on earth, yet the repertoire of viral proteins remains poorly explored. As the number of sequenced virus genomes grows into the thousands, and the number of viral proteins into the hundreds of thousands, we report a systematic computational analysis of the point of first-contact between viruses and their hosts, namely viral transmembrane (TM) proteins.
The complement of α-helical TM proteins in double-stranded DNA viruses infecting bacteria and archaea reveals large-scale trends that differ from those of their hosts. Viruses typically encode a substantially lower fraction of TM proteins than archaea or bacteria, with the notable exception of viruses with virions containing a lipid component such as a lipid envelope, internal lipid core, or inner membrane vesicle. Compared to bacteriophages, archaeal viruses are substantially enriched in membrane proteins. However, this feature is not always stable throughout the evolution of a viral lineage; for example, TM proteins are not part of the common heritage shared between Lipothrixviridae and Rudiviridae. In contrast to bacteria and archaea, viruses almost completely lack proteins with complicated membrane topologies composed of more than 4 TM segments, with the few detected exceptions being obvious cases of relatively recent horizontal transfer from the host.
The dramatic differences between the membrane proteomes of cells and viruses stem from the fact that viruses do not depend on essential membranes for energy transformation, ion homeostasis, nutrient transport and signaling.
Electronic supplementary material
The online version of this article (doi:10.1186/s12859-015-0817-4) contains supplementary material, which is available to authorized users.
Bacteriophages; Viruses; Comparative genomics; Transmembrane proteins; Membrane proteome
The archaeal DNA replication system shows an unexpected level of complexity. In fact, there is a close correspondence between components of the archaeal and eukaryotic replication systems.
Recent advances in the characterization of the archaeal DNA replication system together with comparative genomic analysis have led to the identification of several previously uncharacterized archaeal proteins involved in replication and currently reveal a nearly complete correspondence between the components of the archaeal and eukaryotic replication machineries. It can be inferred that the archaeal ancestor of eukaryotes and even the last common ancestor of all extant archaea possessed replication machineries that were comparable in complexity to the eukaryotic replication system. The eukaryotic replication system encompasses multiple paralogs of ancestral components such that heteromeric complexes in eukaryotes replace archaeal homomeric complexes, apparently along with subfunctionalization of the eukaryotic complex subunits. In the archaea, parallel, lineage-specific duplications of many genes encoding replication machinery components are detectable as well; most of these archaeal paralogs remain to be functionally characterized. The archaeal replication system shows remarkable plasticity whereby even some essential components such as DNA polymerase and single-stranded DNA-binding protein are displaced by unrelated proteins with analogous activities in some lineages.
Many proteins of viruses infecting hyperthermophilic Crenarchaeota have no detectable homologs in current databases, hampering our understanding of viral evolution. We used sensitive database search methods and structural modeling to show that a nucleocapsid protein (TP1) of Thermoproteus tenax virus 1 (TTV1) is a derivative of the Cas4 nuclease, a component of the CRISPR-Cas adaptive immunity system that is encoded also by several archaeal viruses. In TTV1, the Cas4 gene was split into two, with the N-terminal portion becoming TP1, and lost some of the catalytic amino acid residues, apparently resulting in the inactivation of the nuclease. To our knowledge, this is the first described case of exaptation of an enzyme for a virus capsid protein function.
This article was reviewed by Vivek Anantharaman, Christine Orengo and Mircea Podar. For complete reviews, see the Reviewers’ reports section.
Electronic supplementary material
The online version of this article (doi:10.1186/s13062-015-0093-2) contains supplementary material, which is available to authorized users.
Virus evolution; Capsid proteins; Nucleocapsid; Virus origin; Archaea viruses
The recently discovered CRISPR-Cas adaptive immune system is present in almost all archaea and many bacteria. It consists of cassettes of CRISPR repeats that incorporate spacers homologous to fragments of viral or plasmid genomes that are employed as guide RNAs in the immune response, along with numerous CRISPR-associated (cas) genes that encode proteins possessing diverse, only partially characterized activities required for the action of the system. Here, we investigate the evolution of the cas genes and show that they evolve under purifying selection that is typically much weaker than the median strength of purifying selection affecting genes in the respective genomes. The exceptions are the cas1 and cas2 genes that typically evolve at levels of purifying selection close to the genomic median. Thus, although these genes are implicated in the acquisition of spacers from alien genomes, they do not appear to be directly involved in an arms race between bacterial and archaeal hosts and infectious agents. These genes might possess functions distinct from and additional to their role in the CRISPR-Cas-mediated immune response. Taken together with evidence of the frequent horizontal transfer of cas genes reported previously and with the wide-spread microscale recombination within these genes detected in this work, these findings reveal the highly dynamic evolution of cas genes. This conclusion is in line with the involvement of CRISPR-Cas in antiviral immunity that is likely to entail a coevolutionary arms race with rapidly evolving viruses. However, we failed to detect evidence of strong positive selection in any of the cas genes.
When Charles Darwin formulated the central principles of evolutionary biology in the Origin of Species in 1859 and the architects of the Modern Synthesis integrated these principles with population genetics almost a century later, the principal if not the sole objects of evolutionary biology were multicellular eukaryotes, primarily animals and plants. Before the advent of efficient gene sequencing, all attempts to extend evolutionary studies to bacteria have been futile. Sequencing of the rRNA genes in thousands of microbes allowed the construction of the three- domain “ribosomal Tree of Life” that was widely thought to have resolved the evolutionary relationships between the cellular life forms. However, subsequent massive sequencing of numerous, complete microbial genomes revealed novel evolutionary phenomena, the most fundamental of these being: (1) pervasive horizontal gene transfer (HGT), in large part mediated by viruses and plasmids, that shapes the genomes of archaea and bacteria and call for a radical revision (if not abandonment) of the Tree of Life concept, (2) Lamarckian-type inheritance that appears to be critical for antivirus defense and other forms of adaptation in prokaryotes, and (3) evolution of evolvability, i.e., dedicated mechanisms for evolution such as vehicles for HGT and stress-induced mutagenesis systems. In the non-cellular part of the microbial world, phylogenomics and metagenomics of viruses and related selfish genetic elements revealed enormous genetic and molecular diversity and extremely high abundance of viruses that come across as the dominant biological entities on earth. Furthermore, the perennial arms race between viruses and their hosts is one of the defining factors of evolution. Thus, microbial phylogenomics adds new dimensions to the fundamental picture of evolution even as the principle of descent with modification discovered by Darwin and the laws of population genetics remain at the core of evolutionary biology.
Darwin; modern synthesis; comparative genomics; tree of life; horizontal gene transfer
Cyanobacteria of the Synechococcus and Prochlorococcus genera are important contributors to photosynthetic productivity in the open oceans1-3. Here, using pre-existing metagenomic datasets from the global ocean sampling (GOS) expedition4 as well as from viral biomes5, we show the first evidence for the presence of photosystem I (PSI) genes in genomes of marine viruses that infect these marine cyanobacteria. Recently, core photosystem II (PSII) genes were identified in cyanophages; they were proposed to be functional in photosynthesis and in increasing viral fitness by supplementing the host production of these proteins6-9. The 7 cyanobacterial core PSI genes identified in this study, psaA, B, C, D, E, K and a unique J and F fusion, form a distinctive cluster in cyanophage genomes, suggestive of selection for a distinct function in virus life cycle. The existence of this PSI cluster was confirmed with overlapping and long PCR performed on environmental DNA from the Northern Line Islands. Potentially, the 7 proteins encoded by the viral genes are sufficient to form an intact monomeric PSI complex. Projection of viral predicted peptides on the cyanobacterial PSI crystal structure10 suggested that the viral-PSI components may provide a unique way of funneling reducing power from respiratory and other electron transfer chains to PSI.
The origin of eukaryotes is one of the hardest problems in evolutionary biology and sometimes raises the ominous specter of irreducible complexity. Reconstruction of the gene repertoire of the last eukaryotic common ancestor (LECA) has revealed a highly complex organism with a variety of advanced features but no detectable evolutionary intermediates to explain their origin. Recently, however, genome analysis of diverse archaea led to the discovery of apparent ancestral versions of several signature eukaryotic systems, such as the actin cytoskeleton and the ubiquitin network, that are scattered among archaea. These findings inspired the hypothesis that the archaeal ancestor of eukaryotes was an unusually complex form with an elaborate intracellular organization. The latest striking discovery made by deep metagenomic sequencing vindicates this hypothesis by showing that in phylogenetic trees eukaryotes fall within a newly identified archaeal group, the Lokiarchaeota, which combine several eukaryotic signatures previously identified in different archaea. The discovery of complex archaea that are the closest living relatives of eukaryotes is most compatible with the symbiogenetic scenario for eukaryogenesis.
The numerous and diverse eukaryotic viruses with large double-stranded DNA genomes that at least partially reproduce in the cytoplasm of infected cells apparently evolved from a single virus ancestor. This major group of viruses is known as Nucleocytoplasmic Large DNA Viruses (NCLDV) or the proposed order Megavirales. Among the “Megavirales”, there are three groups of giant viruses with genomes exceeding 500 kb, namely Mimiviruses, Pithoviruses, and Pandoraviruses that hold the current record of viral genome size, about 2.5 Mb. Phylogenetic analysis of conserved, ancestral NLCDV genes clearly shows that these three groups of giant viruses have three distinct origins within the “Megavirales”. The Mimiviruses constitute a distinct family that is distantly related to Phycodnaviridae, Pandoraviruses originate from a common ancestor with Coccolithoviruses within the Phycodnaviridae family, and Pithoviruses are related to Iridoviridae and Marseilleviridae. Maximum likelihood reconstruction of gene gain and loss events during the evolution of the “Megavirales” indicates that each group of giant viruses evolved from viruses with substantially smaller and simpler gene repertoires. Initial phylogenetic analysis of universal genes, such as translation system components, encoded by some giant viruses, in particular Mimiviruses, has led to the hypothesis that giant viruses descend from a fourth, probably extinct domain of cellular life. The results of our comprehensive phylogenomic analysis of giant viruses refute the fourth domain hypothesis and instead indicate that the universal genes have been independently acquired by different giant viruses from their eukaryotic hosts.
The origin of eukaryotes is a fundamental, forbidding evolutionary puzzle. Comparative genomic analysis clearly shows that the last eukaryotic common ancestor (LECA) possessed most of the signature complex features of modern eukaryotic cells, in particular the mitochondria, the endomembrane system including the nucleus, an advanced cytoskeleton and the ubiquitin network. Numerous duplications of ancestral genes, e.g. DNA polymerases, RNA polymerases and proteasome subunits, also can be traced back to the LECA. Thus, the LECA was not a primitive organism and its emergence must have resulted from extensive evolution towards cellular complexity. However, the scenario of eukaryogenesis, and in particular the relationship between endosymbiosis and the origin of eukaryotes, is far from being clear. Four recent developments provide new clues to the likely routes of eukaryogenesis. First, evolutionary reconstructions suggest complex ancestors for most of the major groups of archaea, with the subsequent evolution dominated by gene loss. Second, homologues of signature eukaryotic proteins, such as actin and tubulin that form the core of the cytoskeleton or the ubiquitin system, have been detected in diverse archaea. The discovery of this ‘dispersed eukaryome’ implies that the archaeal ancestor of eukaryotes was a complex cell that might have been capable of a primitive form of phagocytosis and thus conducive to endosymbiont capture. Third, phylogenomic analyses converge on the origin of most eukaryotic genes of archaeal descent from within the archaeal evolutionary tree, specifically, the TACK superphylum. Fourth, evidence has been presented that the origin of the major archaeal phyla involved massive acquisition of bacterial genes. Taken together, these findings make the symbiogenetic scenario for the origin of eukaryotes considerably more plausible and the origin of the organizational complexity of eukaryotic cells more readily explainable than they appeared until recently.
endosymbiosis; phagocytosis; cytoskeleton; horizontal gene transfer; archaea
The nucleo-cytoplasmic large DNA viruses (NCLDV) constitute an apparently monophyletic group that consists of 6 families of viruses infecting a broad variety of eukaryotes. A comprehensive genome comparison and maximum-likelihood reconstruction of NCLDV evolution reveal a set of approximately 50 conserved genes that can be tentatively mapped to the genome of the common ancestor of this class of eukaryotic viruses. We address the origins and evolution of NCLDV.
Phylogenetic analysis indicates that some of the major clades of NCLDV infect diverse animals and protists, suggestive of early radiation of the NCLDV, possibly concomitant with eukaryogenesis. The core NCLDV genes seem to have originated from different sources including homologous genes of bacteriophages, bacteria and eukaryotes. These observations are compatible with a scenario of the origin of the NCLDV at an early stage of the evolution of eukaryotes through extensive mixing of genes from widely different genomes.
The common ancestor of the NCLDV probably evolved from a bacteriophage as a result of recruitment of numerous eukaryotic and some bacterial genes, and concomitant loss of the majority of phage genes except for a small core of genes coding for proteins essential for virus genome replication and virion formation.
Bacteriophage; Eukaryogenesis; Nucleo-cytoplasmic large DNA viruses, evolution; Phylogenetic analysis
The Central Dogma of molecular biology posits that transfer of information from proteins back to nucleic acids does not occur in biological systems. I argue that the impossibility of reverse translation is indeed a major, physical exclusion principle that emerges due to the transition from the digital information carriers, nucleic acids, to analog information carriers, proteins, which involves irreversible suppression of the digital information.
This article was reviewed by Itai Yanai, Martin Lercher and Frank Eisenhaber.
Central Dogma; Digital information; Analogous information; Translation; Aminoacyl-tRNA synthetases
Most of the archaea and numerous bacteria possess an elaborate system of adaptive immunity to mobile genetic elements known as the CRISPR (clustered regularly interspaced short palindromic repeats)-associated system (CRISPR-Cas), which consists of arrays of short repeats interspersed with unique DNA spacers and adjacent operons encompassing CRISPR-associated (cas) genes with predicted and, in some cases, experimentally validated nuclease, helicase, and polymerase activities. The system functions by integrating fragments of alien DNA between the repeats and employing their transcripts to degrade the DNA of the respective invading elements via an RNA interference-like mechanism. The CRISPR-Cas system is a case of apparent Lamarckian inheritance.
The CRISPR (clustered, regularly, interspaced, short, palindromic repeats)–Cas (CRISPR-associated genes) systems of archaea and bacteria provide adaptive immunity against viruses and other selfish elements and are believed to curtail horizontal gene transfer (HGT). Limiting acquisition of new genetic material could be one of the sources of the fitness cost of CRISPR–Cas maintenance and one of the causes of the patchy distribution of CRISPR–Cas among bacteria, and across environments. We sought to test the hypothesis that the activity of CRISPR–Cas in microbes is negatively correlated with the extent of recent HGT. Using three independent measures of HGT, we found no significant dependence between the length of CRISPR arrays, which reflects the activity of the immune system, and the estimated number of recent HGT events. In contrast, we observed a significant negative dependence between the estimated extent of HGT and growth temperature of microbes, which could be explained by the lower genetic diversity in hotter environments. We hypothesize that the relevant events in the evolution of resistance to mobile elements and proclivity for HGT, to which CRISPR–Cas systems seem to substantially contribute, occur on the population scale rather than on the timescale of species evolution.
The recent discovery of protein modification by SAMPs, ubiquitin-like (Ubl) proteins from the archaeon Haloferax volcanii, prompted a comprehensive comparative-genomic analysis of archaeal Ubl protein genes and the genes for enzymes thought to be functionally associated with Ubl proteins. This analysis showed that most archaea encode members of two major groups of Ubl proteins with the β-grasp fold, the ThiS and MoaD families, and indicated that the ThiS family genes are rarely linked to genes for thiamine or Mo/W cofactor metabolism enzymes but instead are most often associated with genes for enzymes of tRNA modification. Therefore it is hypothesized that the ancestral function of the archaeal Ubl proteins is sulfur insertion into modified nucleotides in tRNAs, an activity analogous to that of the URM1 protein in eukaryotes. Together with additional, previously described genomic associations, these findings indicate that systems for protein quality control operating at different levels, including tRNA modification that controls translation fidelity, protein ubiquitination that regulates protein degradation, and, possibly, mRNA degradation by the exosome, are functionally and evolutionarily linked.
Evolutionary reconstructions using maximum likelihood methods point to unexpectedly high densities of introns in protein-coding genes of ancestral eukaryotic forms including the last common ancestor of all extant eukaryotes. Combined with the evidence of the origin of spliceosomal introns from invading Group II self-splicing introns, these results suggest that early ancestral eukaryotic genomes consisted of up to 80% sequences derived from Group II introns, a much greater contribution of introns than that seen in any extant genome. An organism with such an unusual genome architecture could survive only under conditions of a severe population bottleneck.
effective population size; endosymbiosis; group II self-splicing introns; origin of eukaryotes; spliceosomal introns
Evolutionary binary characters are features of species or genes, indicating the absence (value zero) or presence (value one) of some property. Examples include eukaryotic gene architecture (the presence or absence of an intron in a particular locus), gene content, and morphological characters. In many studies, the acquisition of such binary characters is assumed to represent a rare evolutionary event, and consequently, their evolution is analyzed using various flavors of parsimony. However, when gain and loss of the character are not rare enough, a probabilistic analysis becomes essential. Here, we present a comprehensive probabilistic model to describe the evolution of binary characters on a bifurcating phylogenetic tree. A fast software tool, EREM, is provided, using maximum likelihood to estimate the parameters of the model and to reconstruct ancestral states (presence and absence in internal nodes) and events (gain and loss events along branches).
We recently showed that spinal cord injury (SCI) leads to a decrease in mRNA editing of serotonin receptor 2C (5-HT2CR) contributing to post-SCI spasticity. Here we study post-SCI mRNA editing and global gene expression using massively parallel sequencing. Evidence is presented that the decrease in 5-HT2CR editing is caused by down-regulation of adenosine deaminase ADAR2 and that editing of at least one other ADAR2 target, potassium channel Kv1.1, is decreased after SCI. Bayesian network analysis of genome-wide transcriptome data indicates that down-regulation of ADAR2 (1) is triggered by persistent inflammatory response to SCI that is associated with activation of microglia and (2) results in changes in neuronal gene expression that are likely to contribute both to post-SCI restoration of neuronal excitability and muscle spasms. These findings have broad implications for other diseases of the Central Nervous System and could open new avenues for developing efficacious antispastic treatments.
Bacteriophages have key roles in microbial communities, to a large extent shaping the taxonomic and functional composition of the microbiome, but data on the connections between phage diversity and the composition of communities are scarce. Using taxon-specific marker genes, we identified and monitored 20 viral taxa in 252 human gut metagenomic samples, mostly at the level of genera. On average, five phage taxa were identified in each sample, with up to three of these being highly abundant. The abundances of most phage taxa vary by up to four orders of magnitude between the samples, and several taxa that are highly abundant in some samples are absent in others. Significant correlations exist between the abundances of some phage taxa and human host metadata: for example, ‘Group 936 lactococcal phages' are more prevalent and abundant in Danish samples than in samples from Spain or the United States of America. Quantification of phages that exist as integrated prophages revealed that the abundance profiles of prophages are highly individual-specific and remain unique to an individual over a 1-year time period, and prediction of prophage lysis across the samples identified hundreds of prophages that are apparently active in the gut and vary across the samples, in terms of presence and lytic state. Finally, a prophage–host network of the human gut was established and includes numerous novel host–phage associations.
human gut; metagenomics; phage
The RAG1 and RAG2 proteins are essential subunits of the V(D)J recombinase that is required for the generation of the enormous variability of antibodies and T-cell receptors in jawed vertebrates. It was demonstrated previously that the 600-aa catalytic core of RAG1 evolved from the transposase of the Transib superfamily transposons. However, although homologs of RAG1 and RAG2 genes are adjacent in the purple sea urchin genome, a transposon encoding both proteins so far has not been reported. Here we describe such transposons in the genomes of green sea urchin, a starfish and an oyster. Comparison of the domain architectures of the RAG1 homologs in these transposons, denoted TransibSU, and other Transib superfamily transposases provides for reconstruction of the structure of the hypothetical TransibVDJ transposon that gave rise to the VDJ recombinases at the onset of vertebrate evolution some 500 million years ago.
This article was reviewed by Mart Krupovic and I. King Jordan.
Electronic supplementary material
The online version of this article (doi:10.1186/s13062-015-0055-8) contains supplementary material, which is available to authorized users.
Molecular evolution; genetics; immune system; V(D)J recombination; RAG1 and RAG2 proteins; Transib DNA transposons; Transib transposase
Search of metagenomics sequence databases for homologs of virophage capsid proteins resulted in the discovery of a new family of virophages in the sheep rumen metagenome. The genomes of the rumen virophages (RVP) encode a typical virophage major capsid protein, ATPase and protease combined with a Polinton-type, protein primed family B DNA polymerase. The RVP genomes appear to be linear molecules, with terminal inverted repeats. Thus, the RVP seem to represent virophage-Polinton hybrids that are likely capable of formation of infectious virions. Virion proteins of mimiviruses were detected in the same metagenomes as the RVP suggesting that the virophages of the new family parasitize on giant viruses that infect protist inhabitants of the rumen.
This article was reviewed by Mart Krupovic and Kenneth Stedman; for complete reviews, see the Reviewers’ Reports section.
Electronic supplementary material
The online version of this article (doi:10.1186/s13062-015-0054-9) contains supplementary material, which is available to authorized users.
The wide spread and high rate of gene exchange and loss in the prokaryotic world translate into “network genomics”. The rates of gene gain and loss are comparable with the rate of point mutations but are substantially greater than the duplication rate. Thus, evolution of prokaryotes is primarily shaped by gene gain and loss. These processes are essential to prevent mutational meltdown of microbial populations by stopping Muller’s ratchet and appear to trigger emergence of major novel clades by opening up new ecological niches. At least some bacteria and archaea seem to have evolved dedicated devices for gene transfer. Despite the dominance of gene gain and loss, evolution of genes is intrinsically tree-like. The significant coherence between the topologies of numerous gene trees, particularly those for (nearly) universal genes, is compatible with the concept of a statistical tree of life, which forms the framework for reconstruction of the evolutionary processes in the prokaryotic world.
Microbial evolution; Phylogenetic trees; Horizontal gene transfer; Muller’s ratchet; Evolvability
Fixation of beneficial genes in bacteria and archaea (collectively, prokaryotes) is often believed to erase pre-existing genomic diversity through the hitchhiking effect, a phenomenon known as genome-wide selective sweep. Recent studies, however, indicate that beneficial genes spread through a prokaryotic population via recombination without causing genome-wide selective sweeps. These gene-specific selective sweeps seem to be at odds with the existing estimates of recombination rates in prokaryotes, which appear far too low to explain such phenomena.
We use mathematical modeling to investigate potential solutions to this apparent paradox. Most microbes in nature evolve in heterogeneous, dynamic communities, in which ecological interactions can substantially impact evolution. Here, we focus on the effect of negative frequency-dependent selection (NFDS) such as caused by viral predation (kill-the-winner dynamics). The NFDS maintains multiple genotypes within a population, so that a gene beneficial to every individual would have to spread via recombination, hence a gene-specific selective sweep. However, gene loci affected by NFDS often are located in variable regions of microbial genomes that contain genes involved in the mobility of selfish genetic elements, such as integrases or transposases. Thus, the NFDS-affected loci are likely to experience elevated rates of recombination compared with the other loci. Consequently, these loci might be effectively unlinked from the rest of the genome, so that NFDS would be unable to prevent genome-wide selective sweeps. To address this problem, we analyzed population genetic models of selective sweeps in prokaryotes under NFDS. The results indicate that NFDS can cause gene-specific selective sweeps despite the effect of locally elevated recombination rates, provided NFDS affects more than one locus and the basal rate of recombination is sufficiently low. Although these conditions might seem to contradict the intuition that gene-specific selective sweeps require high recombination rates, they actually decrease the effective rate of recombination at loci affected by NFDS relative to the per-locus basal level, so that NFDS can cause gene-specific selective sweeps.
Because many free-living prokaryotes are likely to evolve under NFDS caused by ubiquitous viruses, gene-specific selective sweeps driven by NFDS are expected to be a major, general phenomenon in prokaryotic populations.
Electronic supplementary material
The online version of this article (doi:10.1186/s12915-015-0131-7) contains supplementary material, which is available to authorized users.
Viruses with single-stranded (ss) DNA genomes infect hosts in all three domains of life and include many medically, ecologically, and economically important pathogens. Recently, a new group of ssDNA viruses with chimeric genomes has been discovered through viral metagenomics. These chimeric viruses combine capsid protein genes and replicative protein genes that, respectively, appear to have been inherited from viruses with positive-strand RNA genomes, such as tombusviruses, and ssDNA genomes, such as circoviruses, nanoviruses or geminiviruses. Here, we describe the genome sequence of a new representative of this virus group and reveal an additional layer of chimerism among ssDNA viruses. We show that not only do these viruses encompass genes for capsid proteins and replicative proteins that have distinct evolutionary histories, but also the replicative genes themselves are chimeras of functional domains inherited from viruses of different families. Our results underscore the importance of horizontal gene transfer in the evolution of ssDNA viruses and the role of genetic recombination in the emergence of novel virus groups.
ssDNA viruses; virus evolution; origin of viruses; genetic recombination; metagenomics