Viruses infecting prokaryotic cells (phages) are the most abundant entities of the biosphere and contain a largely uncharted wealth of genomic diversity. They play a critical role in the biology of their hosts and in ecosystem functioning at large. The classical approaches studying phages require isolation from a pure culture of the host. Direct sequencing approaches have been hampered by the small amounts of phage DNA present in most natural habitats and the difficulty in applying meta-omic approaches, such as annotation of small reads and assembly. Serendipitously, it has been discovered that cellular metagenomes of highly productive ocean waters (the deep chlorophyll maximum) contain significant amounts of viral DNA derived from cells undergoing the lytic cycle. We have taken advantage of this phenomenon to retrieve metagenomic fosmids containing viral DNA from a Mediterranean deep chlorophyll maximum sample. This method allowed description of complete genomes of 208 new marine phages. The diversity of these genomes was remarkable, contributing 21 genomic groups of tailed bacteriophages of which 10 are completely new. Sequence based methods have allowed host assignment to many of them. These predicted hosts represent a wide variety of important marine prokaryotic microbes like members of SAR11 and SAR116 clades, Cyanobacteria and also the newly described low GC Actinobacteria. A metavirome constructed from the same habitat showed that many of the new phage genomes were abundantly represented. Furthermore, other available metaviromes also indicated that some of the new phages are globally distributed in low to medium latitude ocean waters. The availability of many genomes from the same sample allows a direct approach to viral population genomics confirming the remarkable mosaicism of phage genomes.
Prokaryotic species contain extremely large gene pools (pan-genome) the study of which has been constrained by the difficulties in getting enough cultivated representatives of most of them. The situation of their viruses, also known as phages, that provide part of this genomic diversity and preserve it, is even worse. Here we have found a way to bypass the limitation imposed by pure culture to retrieve phage genomes. We obtained large insert clones (fosmids) from natural communities that are undergoing active viral attack. This has allowed us to triple the number of genomes of marine phages and could be similarly applied to other habitats, shedding light into the biology of the most numerous and least known biological entities on the planet. They exhibit a remarkable degree of variation at one single geographic site but some seem also to be prevalent worldwide. Their frequent mosaicism indicates a high level of promiscuity that goes beyond the already remarkable hybrid nature of prokaryotic genomes.
Secretins form large multimeric complexes in the outer membranes of many Gram-negative bacteria, where they function as dedicated gateways that allow proteins to access the extracellular environment. Despite their overall relatedness, different secretins use different specific and general mechanisms for their targeting, assembly, and membrane insertion. We report that all tested secretins from several type II secretion systems and from the filamentous bacteriophage f1 can spontaneously multimerize and insert into liposomes in an in vitro transcription-translation system. Phylogenetic analyses indicate that these secretins form a group distinct from the secretins of the type IV piliation and type III secretion systems, which do not autoassemble in vitro. A mutation causing a proline-to-leucine substitution allowed PilQ secretins from two different type IV piliation systems to assemble in vitro, albeit with very low efficiency, suggesting that autoassembly is an inherent property of all secretins.
In prokaryotes, genome size is associated with metabolic versatility, regulatory complexity, effective population size, and horizontal transfer rates. We therefore analyzed the covariation of genome size and operon conservation to assess the evolutionary models of operon formation and maintenance. In agreement with previous results, intraoperonic pairs of essential and of highly expressed genes are more conserved. Interestingly, intraoperonic pairs of genes are also more conserved when they encode proteins at similar cell concentrations, suggesting a role of cotranscription in diminishing the cost of waste and shortfall in gene expression. Larger genomes have fewer and smaller operons that are also less conserved. Importantly, lower conservation in larger genomes was observed for all classes of operons in terms of gene expression, essentiality, and balanced protein concentration. We reached very similar conclusions in independent analyses of three major bacterial clades (α- and β-Proteobacteria and Firmicutes). Operon conservation is inversely correlated to the abundance of transcription factors in the genome when controlled for genome size. This suggests a negative association between the complexity of genetic networks and operon conservation. These results show that genome size and/or its proxies are key determinants of the intensity of natural selection for operon organization. Our data fit better the evolutionary models based on the advantage of coregulation than those based on genetic linkage or stochastic gene expression. We suggest that larger genomes with highly complex genetic networks and many transcription factors endure weaker selection for operons than smaller genomes with fewer alternative tools for genetic regulation.
operons; prokaryotes; evolution
Phages, like many parasites, tend to have small genomes and may encode autonomous functions or manipulate those of their hosts'. Recombination functions are essential for phage replication and diversification. They are also nearly ubiquitous in bacteria. The E. coli genome encodes many copies of an octamer (Chi) motif that upon recognition by RecBCD favors repair of double strand breaks by homologous recombination. This might allow self from non-self discrimination because RecBCD degrades DNA lacking Chi. Bacteriophage Lambda, an E. coli parasite, lacks Chi motifs, but escapes degradation by inhibiting RecBCD and encoding its own autonomous recombination machinery. We found that only half of 275 lambdoid genomes encode recombinases, the remaining relying on the host's machinery. Unexpectedly, we found that some lambdoid phages contain extremely high numbers of Chi motifs concentrated between the phage origin of replication and the packaging site. This suggests a tight association between replication, packaging and RecBCD-mediated recombination in these phages. Indeed, phages lacking recombinases strongly over-represent Chi motifs. Conversely, phages encoding recombinases and inhibiting host recombination machinery select for the absence of Chi motifs. Host and phage recombinases use different mechanisms and the latter are more tolerant to sequence divergence. Accordingly, we show that phages encoding their own recombination machinery have more mosaic genomes resulting from recent recombination events and have more diverse gene repertoires, i.e. larger pan genomes. We discuss the costs and benefits of superseding or manipulating host recombination functions and how this decision shapes phage genome structure and evolvability.
Bacterial viruses, called bacteriophages, are extremely abundant in the biosphere. They have key roles in the regulation of bacterial populations and in the diversification of bacterial genomes. Among these viruses, lambdoid phages are very abundant in enterobacteria and exchange genetic material very frequently. This latter process is thought to increase phage diversity and therefore facilitate adaptation to hosts. Recombination is also essential for the replication of many lambdoid phages. Lambdoids have been described to encode their own recombination genes and inhibit their hosts'. In this study, we show that lambdoids are split regarding their capacity to encode autonomous recombination functions and that this affects the abundance of recombination-related sequence motifs. Half of the phages encode an autonomous system and inhibit their hosts'. The trade-off between superseding and manipulating the hosts' recombination functions has important consequences. The phages encoding autonomous recombination functions have more diverse gene repertoires and recombine more frequently. Viruses, as many other parasites, have small genomes and depend on their hosts for several housekeeping functions. Hence, they often face trade-offs between supersession and manipulation of molecular machineries. Our results suggest these trade-offs may shape viral gene repertoires, their sequence composition and even influence their evolvability.
Despite increasing interest in coagulase-negative staphylococci (CoNS), little information is available about their bacteriophages. We isolated and sequenced three novel temperate Siphoviridae phages (StB12, StB27, and StB20) from the CoNS Staphylococcus hominis and S. capitis species. The genome sizes are around 40 kb, and open reading frames (ORFs) are arranged in functional modules encoding lysogeny, DNA metabolism, morphology, and cell lysis. Bioinformatics analysis allowed us to assign a potential function to half of the predicted proteins. Structural elements were further identified by proteomic analysis of phage particles, and DNA-packaging mechanisms were determined. Interestingly, the three phages show identical integration sites within their host genomes. In addition to this experimental characterization, we propose a novel classification based on the analysis of 85 phage and prophage genomes, including 15 originating from CoNS. Our analysis established 9 distinct clusters and revealed close relationships between S. aureus and CoNS phages. Genes involved in DNA metabolism and lysis and potentially in phage-host interaction appear to be widespread, while structural genes tend to be cluster specific. Our findings support the notion of a possible reciprocal exchange of genes between phages originating from S. aureus and CoNS, which may be of crucial importance for pathogenesis in staphylococci.
Quorum sensing (QS) regulates the onset of bacterial social responses in function to cell density having an important impact in virulence. Autoinducer-2 (AI-2) is a signal that has the peculiarity of mediating both intra- and interspecies bacterial QS. We analyzed the diversity of all components of AI-2 QS across 44 complete genomes of Escherichia coli and Shigella strains. We used phylogenetic tools to study its evolution and determined the phenotypes of single-deletion mutants to predict phenotypes of natural strains. Our analysis revealed many likely adaptive polymorphisms both in gene content and in nucleotide sequence. We show that all natural strains possess the signal emitter (the luxS gene), but many lack a functional signal receptor (complete lsr operon) and the ability to regulate extracellular signal concentrations. This result is in striking contrast with the canonical species-specific QS systems where one often finds orphan receptors, without a cognate synthase, but not orphan emitters. Our analysis indicates that selection actively maintains a balanced polymorphism for the presence/absence of a functional lsr operon suggesting diversifying selection on the regulation of signal accumulation and recognition. These results can be explained either by niche-specific adaptation or by selection for a coercive behavior where signal-blind emitters benefit from forcing other individuals in the population to haste in cooperative behaviors.
genome evolution; gene loss; E. coli; balancing selection; social cheater; bacteria signaling
Rapid turnover of mobile elements drives the plasticity of bacterial genomes. Integrated bacteriophages (prophages) encode host-adaptive traits and represent a sizable fraction of bacterial chromosomes. We hypothesized that natural selection shapes prophage integration patterns relative to the host genome organization. We tested this idea by detecting and studying 500 prophages of 69 strains of Escherichia and Salmonella. Phage integrases often target not only conserved genes but also intergenic positions, suggesting purifying selection for integration sites. Furthermore, most integration hotspots are conserved between the two host genera. Integration sites seem also selected at the large chromosomal scale, as they are nonrandomly organized in terms of the origin–terminus axis and the macrodomain structure. The genes of lambdoid prophages are systematically co-oriented with the bacterial replication fork and display the host high frequency of polarized FtsK-orienting polar sequences motifs required for chromosome segregation. matS motifs are strongly avoided by prophages suggesting counter selection of motifs disrupting macrodomains. These results show how natural selection for seamless integration of prophages in the chromosome shapes the evolution of the bacterium and the phage. First, integration sites are highly conserved for many millions of years favoring lysogeny over the lytic cycle for temperate phages. Second, the global distribution of prophages is intimately associated with the chromosome structure and the patterns of gene expression. Third, the phage endures selection for DNA motifs that pertain exclusively to the biology of the prophage in the bacterial chromosome. Understanding prophage genetic adaptation sheds new lights on the coexistence of horizontal transfer and organized bacterial genomes.
Proteins secreted to the extracellular environment or to the periphery of the cell envelope, the secretome, play essential roles in foraging, antagonistic and mutualistic interactions. We hypothesize that arms races, genetic conflicts and varying selective pressures should lead to the rapid change of sequences and gene repertoires of the secretome. The analysis of 42 bacterial pan-genomes shows that secreted, and especially extracellular proteins, are predominantly encoded in the accessory genome, i.e. among genes not ubiquitous within the clade. Genes encoding outer membrane proteins might engage more frequently in intra-chromosomal gene conversion because they are more often in multi-genic families. The gene sequences encoding the secretome evolve faster than the rest of the genome and in particular at non-synonymous positions. Cell wall proteins in Firmicutes evolve particularly fast when compared with outer membrane proteins of Proteobacteria. Virulence factors are over-represented in the secretome, notably in outer membrane proteins, but cell localization explains more of the variance in substitution rates and gene repertoires than sequence homology to known virulence factors. Accordingly, the repertoires and sequences of the genes encoding the secretome change fast in the clades of obligatory and facultative pathogens and also in the clades of mutualists and free-living bacteria. Our study shows that cell localization shapes genome evolution. In agreement with our hypothesis, the repertoires and the sequences of genes encoding secreted proteins evolve fast. The particularly rapid change of extracellular proteins suggests that these public goods are key players in bacterial adaptation.
Type 3 secretion systems (T3SSs) are essential components of two complex bacterial machineries: the flagellum, which drives cell motility, and the non-flagellar T3SS (NF-T3SS), which delivers effectors into eukaryotic cells. Yet the origin, specialization, and diversification of these machineries remained unclear. We developed computational tools to identify homologous components of the two systems and to discriminate between them. Our analysis of >1,000 genomes identified 921 T3SSs, including 222 NF-T3SSs. Phylogenomic and comparative analyses of these systems argue that the NF-T3SS arose from an exaptation of the flagellum, i.e. the recruitment of part of the flagellum structure for the evolution of the new protein delivery function. This reconstructed chronology of the exaptation process proceeded in at least two steps. An intermediate ancestral form of NF-T3SS, whose descendants still exist in Myxococcales, lacked elements that are essential for motility and included a subset of NF-T3SS features. We argue that this ancestral version was involved in protein translocation. A second major step in the evolution of NF-T3SSs occurred via recruitment of secretins to the NF-T3SS, an event that occurred at least three times from different systems. In rhizobiales, a partial homologous gene replacement of the secretin resulted in two genes of complementary function. Acquisition of a secretin was followed by the rapid adaptation of the resulting NF-T3SSs to multiple, distinct eukaryotic cell envelopes where they became key in parasitic and mutualistic associations between prokaryotes and eukaryotes. Our work elucidates major steps of the evolutionary scenario leading to extant NF-T3SSs. It demonstrates how molecular evolution can convert one complex molecular machine into a second, equally complex machine by successive deletions, innovations, and recruitment from other molecular systems.
Most motile bacteria use a flagellum to move. The extracellular components of flagella are secreted by their own Type 3 Secretion System (T3SS). The non-flagellar T3SS (NF-T3SS), also named injectisome, includes many proteins that are homologous to flagellar components. NF-T3SSs are employed by many plant and animal pathogens to deliver effectors to host cells, including toxins. NF-T3SSs are complex protein machineries with >15 components that connect bacterial cell envelopes to eukaryotic cell membranes, including the intervening extracellular space. In this study, we designed computational tools to distinguish flagella and NF-T3SSs from other bacterial protein sequences. We show that NF-T3SSs evolved from the flagellum by a series of genetic deletions, innovations, and recruitments of components from other cellular structures. Our evolutionary analysis suggests that NF-T3SSs then quickly adapted to different eukaryotic cells while maintaining a core structure that remains highly similar to the flagellum. This is an example of evolutionary tinkering where a complex structure arises by exaptation, the recruitment of elements that evolved initially for other functions in other cellular structures.
Genetic exchange by conjugation is responsible for the spread of resistance, virulence,
and social traits among prokaryotes. Recent works unraveled the functioning of the
underlying type IV secretion systems (T4SS) and its distribution and recruitment for other
biological processes (exaptation), notably pathogenesis. We analyzed the phylogeny of key
conjugation proteins to infer the evolutionary history of conjugation and T4SS. We show
that single-stranded DNA (ssDNA) and double-stranded DNA (dsDNA) conjugation, while both
based on a key AAA+ ATPase, diverged before the last common ancestor of
bacteria. The two key ATPases of ssDNA conjugation are monophyletic, having diverged at an
early stage from dsDNA translocases. Our data suggest that ssDNA conjugation arose first
in diderm bacteria, possibly Proteobacteria, and then spread to other bacterial phyla,
including bacterial monoderms and Archaea. Identifiable T4SS fall within the eight
monophyletic groups, determined by both taxonomy and structure of the cell envelope.
Transfer to monoderms might have occurred only once, but followed diverse adaptive paths.
Remarkably, some Firmicutes developed a new conjugation system based on an atypical
relaxase and an ATPase derived from a dsDNA translocase. The observed evolutionary rates
and patterns of presence/absence of specific T4SS proteins show that conjugation systems
are often and independently exapted for other functions. This work brings a natural basis
for the classification of all kinds of conjugative systems, thus tackling a problem that
is growing as fast as genomic databases. Our analysis provides the first global picture of
the evolution of conjugation and shows how a self-transferrable complex multiprotein
system has adapted to different taxa and often been recruited by the host. As conjugation
systems became specific to certain clades and cell envelopes, they may have biased the
rate and direction of gene transfer by conjugation within prokaryotes.
bacterial conjugation; horizontal gene transfer; type IV protein secretion; exaptation; plasmid evolution
Members of the genus Flavobacterium occur in a variety of ecological niches and represent an interesting diversity of lifestyles. Flavobacterium branchiophilum is the main causative agent of bacterial gill disease, a severe condition affecting various cultured freshwater fish species worldwide, in particular salmonids in Canada and Japan. We report here the complete genome sequence of strain FL-15 isolated from a diseased sheatfish (Silurus glanis) in Hungary. The analysis of the F. branchiophilum genome revealed putative mechanisms of pathogenicity strikingly different from those of the other, closely related fish pathogen Flavobacterium psychrophilum, including the first cholera-like toxin in a non-Proteobacteria and a wealth of adhesins. The comparison with available genomes of other Flavobacterium species revealed a small genome size, large differences in chromosome organization, and fewer rRNA and tRNA genes, in line with its more fastidious growth. In addition, horizontal gene transfer shaped the evolution of F. branchiophilum, as evidenced by its virulence factors, genomic islands, and CRISPR (clustered regularly interspaced short palindromic repeats) systems. Further functional analysis should help in the understanding of host-pathogen interactions and in the development of rational diagnostic tools and control strategies in fish farms.
Many studies have been devoted to understand the mechanisms used by pathogenic bacteria to exploit human hosts. These mechanisms are very diverse in the detail, but share commonalities whose quantification should enlighten the evolution of virulence from both a molecular and an ecological perspective. We mined the literature for experimental data on infectious dose of bacterial pathogens in humans (ID50) and also for traits with which ID50 might be associated. These compilations were checked and complemented with genome analyses. We observed that ID50 varies in a continuous way by over 10 orders of magnitude. Low ID50 values are very strongly associated with the capacity of the bacteria to kill professional phagocytes or to survive in the intracellular milieu of these cells. Inversely, high ID50 values are associated with motile and fast-growing bacteria that use quorum-sensing based regulation of virulence factors expression. Infectious dose is not associated with genome size and shows insignificant phylogenetic inertia, in line with frequent virulence shifts associated with the horizontal gene transfer of a small number of virulence factors. Contrary to previous proposals, infectious dose shows little dependence on contact-dependent secretion systems and on the natural route of exposure. When all variables are combined, immune subversion and quorum-sensing are sufficient to explain two thirds of the variance in infectious dose. Our results show the key role of immune subversion in effective human infection by small bacterial populations. They also suggest that cooperative processes might be important for successful infection by bacteria with high ID50. Our results suggest that trade-offs between selection for population growth-related traits and selection for the ability to subvert the immune system shape bacterial infectiousness. Understanding these trade-offs provides guidelines to study the evolution of virulence and in particular the micro-evolutionary paths of emerging pathogens.
Every pathogen is unique and uses distinctive combinations of specific mechanisms to exploit the human host. Yet, several common themes in the ways pathogens use these mechanisms can be found among distantly related bacteria. The understanding of these common themes provides useful concepts and uncovers important principles in pathogenesis. Here, we have made a cross-species analysis of traits thought to be relevant for virulence of bacterial pathogens. We have found that the infectious dose of pathogens is much lower when they are able to kill professional phagocytes of the immune system or to survive in the intracellular milieu of these cells. On the other hand, bacteria requiring higher infectious dose are more likely to be motile, fast-growing and regulate the expression of virulence factors when the population quorum is high enough to be effective in starting an infection. This suggests that infectious dose results from a trade-off between selection for fast coordinated growth and the ability to subvert the immune system. This trade-off may underlie other traits such as the ability of a pathogen to live outside the association from a host. Understanding the patterns shaping infectious dose will facilitate the prediction of evolutionary paths of emerging pathogens.
In order to get further insights into the role of the clustered, regularly interspaced, short palindromic repeats (CRISPRs) in Escherichia coli, we analyzed the CRISPR diversity in a collection of 290 strains, in the phylogenetic framework of the strains represented by multilocus sequence typing (MLST). The set included 263 natural E. coli isolates exposed to various environments and isolated over a 20-year period from humans and animals, as well as 27 fully sequenced strains. Our analyses confirm that there are two largely independent pairs of CRISPR loci (CRISPR1 and -2 and CRISPR3 and -4), each associated with a different type of cas genes (Ecoli and Ypest, respectively), but that each pair of CRISPRs has similar dynamics. Strikingly, the major phylogenetic group B2 is almost devoid of CRISPRs. The majority of genomes analyzed lack Ypest cas genes and contain CRISPR3 with spacers matching Ypest cas genes. The analysis of relatedness between strains in terms of spacer repertoire and the MLST tree shows a pattern where closely related strains (MLST phylogenetic distance of <0.005 corresponding to at least hundreds of thousands of years) often exhibit identical CRISPRs while more distantly related strains (MLST distance of >0.01) exhibit completely different CRISPRs. This suggests rare but radical turnover of spacers in CRISPRs rather than CRISPR gradual change. We found no link between the presence, size, or content of CRISPRs and the lifestyle of the strains. Our data suggest that, within the E. coli species, CRISPRs do not have the expected characteristics of a classical immune system.
Summary: Plasmids are key vectors of horizontal gene transfer and essential genetic engineering tools. They code for genes involved in many aspects of microbial biology, including detoxication, virulence, ecological interactions, and antibiotic resistance. While many studies have decorticated the mechanisms of mobility in model plasmids, the identification and characterization of plasmid mobility from genome data are unexplored. By reviewing the available data and literature, we established a computational protocol to identify and classify conjugation and mobilization genetic modules in 1,730 plasmids. This allowed the accurate classification of proteobacterial conjugative or mobilizable systems in a combination of four mating pair formation and six relaxase families. The available evidence suggests that half of the plasmids are nonmobilizable and that half of the remaining plasmids are conjugative. Some conjugative systems are much more abundant than others and preferably associated with some clades or plasmid sizes. Most very large plasmids are nonmobilizable, with evidence of ongoing domestication into secondary chromosomes. The evolution of conjugation elements shows ancient divergence between mobility systems, with relaxases and type IV coupling proteins (T4CPs) often following separate paths from type IV secretion systems. Phylogenetic patterns of mobility proteins are consistent with the phylogeny of the host prokaryotes, suggesting that plasmid mobility is in general circumscribed within large clades. Our survey suggests the existence of unsuspected new relaxases in archaea and new conjugation systems in cyanobacteria and actinobacteria. Few genes, e.g., T4CPs, relaxases, and VirB4, are at the core of plasmid conjugation, and together with accessory genes, they have evolved into specific systems adapted to specific physiological and ecological contexts.
Horizontal gene transfer shapes the genomes of prokaryotes by allowing rapid acquisition of novel adaptive functions. Conjugation allows the broadest range and the highest gene transfer input per transfer event. While conjugative plasmids have been studied for decades, the number and diversity of integrative conjugative elements (ICE) in prokaryotes remained unknown. We defined a large set of protein profiles of the conjugation machinery to scan over 1,000 genomes of prokaryotes. We found 682 putative conjugative systems among all major phylogenetic clades and showed that ICEs are the most abundant conjugative elements in prokaryotes. Nearly half of the genomes contain a type IV secretion system (T4SS), with larger genomes encoding more conjugative systems. Surprisingly, almost half of the chromosomal T4SS lack co-localized relaxases and, consequently, might be devoted to protein transport instead of conjugation. This class of elements is preponderant among small genomes, is less commonly associated with integrases, and is rarer in plasmids. ICEs and conjugative plasmids in proteobacteria have different preferences for each type of T4SS, but all types exist in both chromosomes and plasmids. Mobilizable elements outnumber self-conjugative elements in both ICEs and plasmids, which suggests an extensive use of T4SS in trans. Our evolutionary analysis indicates that switch of plasmids to and from ICEs were frequent and that extant elements began to differentiate only relatively recently. According to the present results, ICEs are the most abundant conjugative elements in practically all prokaryotic clades and might be far more frequently domesticated into non-conjugative protein transport systems than previously thought. While conjugative plasmids and ICEs have different means of genomic stabilization, their mechanisms of mobility by conjugation show strikingly conserved patterns, arguing for a unitary view of conjugation in shaping the genomes of prokaryotes by horizontal gene transfer.
Some mobile genetic elements spread genetic information horizontally between prokaryotes by conjugation, a mechanism by which DNA is transferred directly from one cell to the other. Among the processes allowing genetic transfer between cells, conjugation is the one allowing the simultaneous transfer of larger amounts of DNA and between the least related cells. As such, conjugative systems are key players in horizontal transfer, including the transfer of antibiotic resistance to and between many human pathogens. Conjugative systems are encoded both in plasmids and in chromosomes. The latter are called Integrative Conjugative Elements (ICE); and their number, identity, and mechanism of conjugation were poorly known. We have developed an approach to identify and characterize these elements and found more ICEs than conjugative plasmids in genomes. While both ICEs and plasmids use similar conjugative systems, there are remarkable preferences for some systems in some elements. Our evolutionary analysis shows that plasmid conjugative systems have often given rise to ICEs and vice versa. Therefore, ICEs and conjugative plasmids should be regarded as one and the same, the differences in their means of existence in cells probably the result of different requirements for stabilization and/or transmissibility of the genetic information they contain.
Gene duplication followed by neo- or sub-functionalization deeply impacts the evolution of protein families and is regarded as the main source of adaptive functional novelty in eukaryotes. While there is ample evidence of adaptive gene duplication in prokaryotes, it is not clear whether duplication outweighs the contribution of horizontal gene transfer in the expansion of protein families. We analyzed closely related prokaryote strains or species with small genomes (Helicobacter, Neisseria, Streptococcus, Sulfolobus), average-sized genomes (Bacillus, Enterobacteriaceae), and large genomes (Pseudomonas, Bradyrhizobiaceae) to untangle the effects of duplication and horizontal transfer. After removing the effects of transposable elements and phages, we show that the vast majority of expansions of protein families are due to transfer, even among large genomes. Transferred genes—xenologs—persist longer in prokaryotic lineages possibly due to a higher/longer adaptive role. On the other hand, duplicated genes—paralogs—are expressed more, and, when persistent, they evolve slower. This suggests that gene transfer and gene duplication have very different roles in shaping the evolution of biological systems: transfer allows the acquisition of new functions and duplication leads to higher gene dosage. Accordingly, we show that paralogs share most protein–protein interactions and genetic regulators, whereas xenologs share very few of them. Prokaryotes invented most of life's biochemical diversity. Therefore, the study of the evolution of biology systems should explicitly account for the predominant role of horizontal gene transfer in the diversification of protein families.
Prokaryotes can be found in the most diverse and severe ecological niches of the planet. Their rapid adaptation is, in part, the result of the ability to acquire genetic information horizontally. This means that prokaryotes utilize two major paths to expand their repertoire of protein families: they can duplicate a pre-existing gene or acquire it by horizontal transfer. In this study, we track family expansions among closely related strains of prokaryotic species. We find that the majority of gene expansions arrive via transfer not via duplication. Additionally, we find that duplicate genes tend be more transient and evolve slower than transferred ones, highlighting different roles with respect to adaptation and evolution. These results suggest that prevailing theories aimed at understanding the evolution of biological systems grounded on gene duplication might be poorly fit to explain the evolution of prokaryotic systems, which include the vast majority of life's biochemical diversity.
Prokaryotes thrive in spite of the vast number and diversity of their viruses. This partly results from the evolution of mechanisms to inactivate or silence the action of exogenous DNA. Among these, Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are unique in providing adaptive immunity against elements with high local resemblance to genomes of previously infecting agents. Here, we analyze the CRISPR loci of 51 complete genomes of Escherichia and Salmonella. CRISPR are in two pairs of loci in Escherichia, one single pair in Salmonella, each pair showing a similar turnover rate, repeat sequence and putative linkage to a common set of cas genes. Yet, phylogeny shows that CRISPR and associated cas genes have different evolutionary histories, the latter being frequently exchanged or lost. In our set, one CRISPR pair seems specialized in plasmids often matching genes coding for the replication, conjugation and antirestriction machinery. Strikingly, this pair also matches the cognate cas genes in which case these genes are absent. The unexpectedly high conservation of this anti-CRISPR suggests selection to counteract the invasion of mobile elements containing functional CRISPR/cas systems. There are few spacers in most CRISPR, which rarely match genomes of known phages. Furthermore, we found that strains divergent less than 250 thousand years ago show virtually identical CRISPR. The lack of congruence between cas, CRISPR and the species phylogeny and the slow pace of CRISPR change make CRISPR poor epidemiological markers in enterobacteria. All these observations are at odds with the expectedly abundant and dynamic repertoire of spacers in an immune system aiming at protecting bacteria from phages. Since we observe purifying selection for the maintenance of CRISPR these results suggest that alternative evolutionary roles for CRISPR remain to be uncovered.
Microbial minimal generation times range from a few minutes to several weeks. They are evolutionarily determined by variables such as environment stability, nutrient availability, and community diversity. Selection for fast growth adaptively imprints genomes, resulting in gene amplification, adapted chromosomal organization, and biased codon usage. We found that these growth-related traits in 214 species of bacteria and archaea are highly correlated, suggesting they all result from growth optimization. While modeling their association with maximal growth rates in view of synthetic biology applications, we observed that codon usage biases are better correlates of growth rates than any other trait, including rRNA copy number. Systematic deviations to our model reveal two distinct evolutionary processes. First, genome organization shows more evolutionary inertia than growth rates. This results in over-representation of growth-related traits in fast degrading genomes. Second, selection for these traits depends on optimal growth temperature: for similar generation times purifying selection is stronger in psychrophiles, intermediate in mesophiles, and lower in thermophiles. Using this information, we created a predictor of maximal growth rate adapted to small genome fragments. We applied it to three metagenomic environmental samples to show that a transiently rich environment, as the human gut, selects for fast-growers, that a toxic environment, as the acid mine biofilm, selects for low growth rates, whereas a diverse environment, like the soil, shows all ranges of growth rates. We also demonstrate that microbial colonizers of babies gut grow faster than stabilized human adults gut communities. In conclusion, we show that one can predict maximal growth rates from sequence data alone, and we propose that such information can be used to facilitate the manipulation of generation times. Our predictor allows inferring growth rates in the vast majority of uncultivable prokaryotes and paves the way to the understanding of community dynamics from metagenomic data.
Microbial minimal generation times vary from a few minutes to several weeks. The reasons for this disparity have been thought to lie on different life-history strategies: fast-growing microbes grow extremely fast in rich media, but are less capable of dealing with stress and/or poor nutrient conditions. Prokaryotes have evolved a set of genomic traits to grow fast, including biased codon usage and transient or permanent gene multiplication for dosage effects. Here, we studied the relative role of these traits and show they can be used to predict minimal generation times from the genomic data of the vast majority of microbes that cannot be cultivated. We show that this inference can also be made with incomplete genomes and thus be applied to metagenomic data to test hypotheses about the biomass productivity of biotopes and the evolution of microbiota in the human gut after birth. Our results also allow a better understanding of the co-evolution between growth rates and genomic traits and how they can be manipulated in synthetic biology. Growth rates have been a key variable in microbial physiology studies in the last century, and we show how intimately they are linked with genome organization and prokaryotic ecology.
Microbes engage in a remarkable array of cooperative behaviors, secreting shared proteins that are essential for foraging, shelter, microbial warfare, and virulence. These proteins are costly, rendering populations of cooperators vulnerable to exploitation by nonproducing cheaters arising by gene loss or migration. In such conditions, how can cooperation persist?
Our model predicts that differential gene mobility drives intragenomic variation in investment in cooperative traits. More mobile loci generate stronger among-individual genetic correlations at these loci (higher relatedness) and thereby allow the maintenance of more cooperative traits via kin selection. By analyzing 21 Escherichia genomes, we confirm that genes coding for secreted proteins—the secretome—are very frequently lost and gained and are associated with mobile elements. We show that homologs of the secretome are overrepresented among human gut metagenomics samples, consistent with increased relatedness at secretome loci across multiple species. The biosynthetic cost of secreted proteins is shown to be under intense selective pressure, even more than for highly expressed proteins, consistent with a cost of cooperation driving social dilemmas. Finally, we demonstrate that mobile elements are in conflict with their chromosomal hosts over the chimeric ensemble's social strategy, with mobile elements enforcing cooperation on their otherwise selfish hosts via the cotransfer of secretome genes with “mafia strategy” addictive systems (toxin-antitoxin and restriction-modification).
Our analysis matches the predictions of our model suggesting that horizontal transfer promotes cooperation, as transmission increases local genetic relatedness at mobile loci and enforces cooperation on the resident genes. As a consequence, horizontal transfer promoted by agents such as plasmids, phages, or integrons drives microbial cooperation.
Oxygen is not only one of life's essential elements but also a source of protein damage, mutagenesis, and ageing. Many proteome adaptations have been proposed to tackle such stresses and we assessed them using comparative genomics in a phylogenetic context. First, we find that aerobiosis is a trait with important phylogenetic inertia but that oxygen content in proteins is not. Instead, oxygen content is close to the expected values given the nucleotide composition. Accordingly, we find no evidence of oxygen being a scarce resource for protein synthesis even among anaerobes. Second, we searched for counterselection of amino acids more prone to oxidation among aerobes. Only cysteine follows the expected trend, whereas tryptophan follows the inverse one. When analyzing composition in the context of protein structures and residue accessibility, we find that all oxidable residues are avoided at the surface of proteins. Yet, there is no difference between aerobes and anaerobes in this respect, and the effect might be explained by the hydrophobicity of these residues. Third, we revisited the hypothesis that atmospheric enrichment in molecular oxygen led to the development of the communication capabilities of eukaryotes. With a larger data set and adequate controls, we confirm the trend of longer oxygen-rich outer domains in transmembrane proteins of eukaryotes. Yet, we find no significant association between oxygen concentration in the environment and this trait within prokaryotes, suggesting that this difference is clade specific and independent of oxygen availability. We find that genes involved in cellular responses to oxygen are much more frequent among aerobes, and we suggest that they erase most expected differences in terms of proteome composition between organisms facing high and low oxygen concentrations.
oxidative stress; cysteine; protein evolution; hydrophobicity; evolution
Summary: Intragenic duplications of genetic material have important biological roles because of their protein sequence and structural consequences. We developed Swelfe to find internal repeats at three levels. Swelfe quickly identifies statistically significant internal repeats in DNA and amino acid sequences and in 3D structures using dynamic programming. The associated web server also shows the relationships between repeats at each level and facilitates visualization of the results.
Supplementary information: Supplementary data are available at Bioinformatics online.
Mycoplasma genitalium is associated with reproductive tract disease in women and may persist in the lower genital tract for months, potentially increasing the risk of upper tract infection and transmission to uninfected partners. Despite its exceptionally small genome (580 kb), approximately 4% is composed of repeated elements known as MgPar sequences (MgPa repeats) based on their homology to the mgpB gene that encodes the immunodominant MgPa adhesin protein. The presence of these MgPar sequences, as well as mgpB variability between M. genitalium strains, suggests that mgpB and MgPar sequences recombine to produce variant MgPa proteins. To examine the extent and generation of diversity within single strains of the organism, we examined mgpB variation within M. genitalium strain G-37 and observed sequence heterogeneity that could be explained by recombination between the mgpB expression site and putative donor MgPar sequences. Similarly, we analyzed mgpB sequences from cervical specimens from a persistently infected woman (21 months) and identified 17 different mgpB variants within a single infecting M. genitalium strain, confirming that mgpB heterogeneity occurs over the course of a natural infection. These observations support the hypothesis that recombination occurs between the mgpB gene and MgPar sequences and that the resulting antigenically distinct MgPa variants may contribute to immune evasion and persistence of infection.
We present MAGIC, an integrative and accurate method for comparative genome mapping. Our method consists of two phases: preprocessing for identifying “maximal similar segments,” and mapping for clustering and classifying these segments. MAGIC's main novelty lies in its biologically intuitive clustering approach, which aims towards both calculating reorder-free segments and identifying orthologous segments. In the process, MAGIC efficiently handles ambiguities resulting from duplications that occurred before the speciation of the considered organisms from their most recent common ancestor. We demonstrate both MAGIC's robustness and scalability: the former is asserted with respect to its initial input and with respect to its parameters' values. The latter is asserted by applying MAGIC to distantly related organisms and to large genomes. We compare MAGIC to other comparative mapping methods and provide detailed analysis of the differences between them. Our improvements allow a comprehensive study of the diversity of genetic repertoires resulting from large-scale mutations, such as indels and duplications, including explicitly transposable and phagic elements. The strength of our method is demonstrated by detailed statistics computed for each type of these large-scale mutations. MAGIC enabled us to conduct a comprehensive analysis of the different forces shaping prokaryotic genomes from different clades, and to quantify the importance of novel gene content introduced by horizontal gene transfer relative to gene duplication in bacterial genome evolution. We use these results to investigate the breakpoint distribution in several prokaryotic genomes.
Comparative genomics is an important discipline with applications in evolutionary, genetic, and genome rearrangement studies. When comparing genomes, one is usually interested in investigating the relation between the genomic segments to establish their evolutionary origin: are the segments orthologous, and hence inherited from their most recent common ancestor? Are they paralogs, and hence duplicated from an ancestral segment? Did the segments undergo reordering? Were the segments deleted or inserted and—if so—how (insertion sequence, prophage, horizontal gene transfer)?
In this paper, Swidan et al. present MAGIC, a new approach for comparative genome mapping. The main novelty of this approach is the biologically intuitive clustering step, which aims towards both calculating reorder-free segments and identifying orthologous segments. The authors demonstrate MAGIC's robustness, relative to both its initial input and to its parameters' values. MAGIC's scalability is demonstrated by running it on distantly related organisms and on large genomes. In addition, Swidan et al. provide a detailed analysis of the differences between MAGIC and other comparative mapping methods.
Applying MAGIC to several prokaryotic pairs enabled the authors to address the aforementioned questions and to quantitatively study the different evolutionary forces shaping the prokaryotic genome as well as to investigate their breakpoint distribution.