Summary: Plasmids are key vectors of horizontal gene transfer and essential genetic engineering tools. They code for genes involved in many aspects of microbial biology, including detoxication, virulence, ecological interactions, and antibiotic resistance. While many studies have decorticated the mechanisms of mobility in model plasmids, the identification and characterization of plasmid mobility from genome data are unexplored. By reviewing the available data and literature, we established a computational protocol to identify and classify conjugation and mobilization genetic modules in 1,730 plasmids. This allowed the accurate classification of proteobacterial conjugative or mobilizable systems in a combination of four mating pair formation and six relaxase families. The available evidence suggests that half of the plasmids are nonmobilizable and that half of the remaining plasmids are conjugative. Some conjugative systems are much more abundant than others and preferably associated with some clades or plasmid sizes. Most very large plasmids are nonmobilizable, with evidence of ongoing domestication into secondary chromosomes. The evolution of conjugation elements shows ancient divergence between mobility systems, with relaxases and type IV coupling proteins (T4CPs) often following separate paths from type IV secretion systems. Phylogenetic patterns of mobility proteins are consistent with the phylogeny of the host prokaryotes, suggesting that plasmid mobility is in general circumscribed within large clades. Our survey suggests the existence of unsuspected new relaxases in archaea and new conjugation systems in cyanobacteria and actinobacteria. Few genes, e.g., T4CPs, relaxases, and VirB4, are at the core of plasmid conjugation, and together with accessory genes, they have evolved into specific systems adapted to specific physiological and ecological contexts.
DNA sequences purified from distinct organisms, e.g. non vertebrate versus vertebrate ones, were shown to differ in their TLR9 signalling properties especially when either mouse bone marrow-derived- or human dendritic cells (DCs) are probed as target cells. Here we found that the DC-targeting immunostimulatory property of Leishmania major DNA is shared by other Trypanosomatidae DNA, suggesting that this is a general trait of these eukaryotic single-celled parasites. We first documented, in vitro, that the low level of immunostimulatory activity by vertebrate DNA is not due to its limited access to DCs' TLR9. In addition, vertebrate DNA inhibits the activation induced by the parasite DNA. This inhibition could result from the presence of competing elements for TLR9 activation and suggests that DNA from different species can be discriminated by mouse and human DCs. Second, using computational analysis of genomic DNA sequences, it was possible to detect the presence of over-represented inhibitory and under-represented stimulatory sequences in the vertebrate genomes, whereas L. major genome displays the opposite trend. Interestingly, this contrasting features between L. major and vertebrate genomes in the frequency of these motifs are shared by other Trypanosomatidae genomes (Trypanosoma cruzi, brucei and vivax). We also addressed the possibility that proteins expressed in DCs could interact with DNA and promote TLR9 activation. We found that TLR9 is specifically activated with L. major HMGB1-bound DNA and that HMGB1 preferentially binds to L. major compared to mouse DNA. Our results highlight that both DNA sequence and vertebrate DNA-binding proteins, such as the mouse HMGB1, allow the TLR9-signaling to be initiated and achieved by Trypanosomatidae DNA.
Distinct laboratory mouse based models have allowed elucidating some of the processes that account for so called resistance or vulnerability to the Leishmania major parasite cutaneous inoculation. The outcome ranges from rapid healing – C57BL/6 mice- to progressive nonhealing ones – BALB/c mice. Distinct cell lineages contribute to sense and process molecules derived from the L. major parasite. Previous studies revealed a role for intracellular Toll-like receptor 9 (TLR9) in host resistance to Leishmania major. L. major DNA is involved in innate immune response, since it induces TLR9 signaling and activation of dendritic cells. We were interested to further explore L. major DNA sequences focusing on their features as (a) either direct TLR9 agonists or antagonists (b) as well as once partnering with endogenous DNA binding proteins. We more specifically used mouse dendritic cells as sensing cells of L. major DNA as well as DNA from other Trypanosomatidae in comparison with vertebrate DNA. Overall, the data underscore a counter-selection of TLR9 agonist motifs in vertebrate DNA which is not found in Trypanosomatidae DNAs and suggest how TLR9 could discriminate between pathogen and self DNAs, to maintain the cellular integrity.
Biologists often wish to use their knowledge on a few experimental models of a given molecular system to identify homologs in genomic data. We developed a generic tool for this purpose.
Macromolecular System Finder (MacSyFinder) provides a flexible framework to model the properties of molecular systems (cellular machinery or pathway) including their components, evolutionary associations with other systems and genetic architecture. Modelled features also include functional analogs, and the multiple uses of a same component by different systems. Models are used to search for molecular systems in complete genomes or in unstructured data like metagenomes. The components of the systems are searched by sequence similarity using Hidden Markov model (HMM) protein profiles. The assignment of hits to a given system is decided based on compliance with the content and organization of the system model. A graphical interface, MacSyView, facilitates the analysis of the results by showing overviews of component content and genomic context. To exemplify the use of MacSyFinder we built models to detect and class CRISPR-Cas systems following a previously established classification. We show that MacSyFinder allows to easily define an accurate “Cas-finder” using publicly available protein profiles.
Availability and Implementation
MacSyFinder is a standalone application implemented in Python. It requires Python 2.7, Hmmer and makeblastdb (version 2.2.28 or higher). It is freely available with its source code under a GPLv3 license at https://github.com/gem-pasteur/macsyfinder. It is compatible with all platforms supporting Python and Hmmer/makeblastdb. The “Cas-finder” (models and HMM profiles) is distributed as a compressed tarball archive as Supporting Information.
Bacterial genomics has greatly expanded our understanding of microdiversification patterns within a species, but analyses at higher taxonomical levels are necessary to understand and predict the independent rise of pathogens in a genus. We have sampled, sequenced, and assessed the diversity of genomes of validly named and tentative species of the Acinetobacter genus, a clade including major nosocomial pathogens and biotechnologically important species. We inferred a robust global phylogeny and delimited several new putative species. The genus is very ancient and extremely diverse: Genomes of highly divergent species share more orthologs than certain strains within a species. We systematically characterized elements and mechanisms driving genome diversification, such as conjugative elements, insertion sequences, and natural transformation. We found many error-prone polymerases that may play a role in resistance to toxins, antibiotics, and in the generation of genetic variation. Surprisingly, temperate phages, poorly studied in Acinetobacter, were found to account for a significant fraction of most genomes. Accordingly, many genomes encode clustered regularly interspaced short palindromic repeats (CRISPR)-Cas systems with some of the largest CRISPR-arrays found so far in bacteria. Integrons are strongly overrepresented in Acinetobacter baumannii, which correlates with its frequent resistance to antibiotics. Our data suggest that A. baumannii arose from an ancient population bottleneck followed by population expansion under strong purifying selection. The outstanding diversification of the species occurred largely by horizontal transfer, including some allelic recombination, at specific hotspots preferentially located close to the replication terminus. Our work sets a quantitative basis to understand the diversification of Acinetobacter into emerging resistant and versatile pathogens.
comparative genomics; bacterial genus; evolution; mobile genetic elements; nosocomial pathogens
Stress-responsive error-prone DNA polymerase genes transferred along with key symbiotic genes ease the evolution of a soil bacterium into a legume endosymbiont by accelerating adaptation of the recipient bacterial genome to its new plant host.
Horizontal gene transfer (HGT) is an important mode of adaptation and diversification of prokaryotes and eukaryotes and a major event underlying the emergence of bacterial pathogens and mutualists. Yet it remains unclear how complex phenotypic traits such as the ability to fix nitrogen with legumes have successfully spread over large phylogenetic distances. Here we show, using experimental evolution coupled with whole genome sequencing, that co-transfer of imuABC error-prone DNA polymerase genes with key symbiotic genes accelerates the evolution of a soil bacterium into a legume symbiont. Following introduction of the symbiotic plasmid of Cupriavidus taiwanensis, the Mimosa symbiont, into pathogenic Ralstonia solanacearum we challenged transconjugants to become Mimosa symbionts through serial plant-bacteria co-cultures. We demonstrate that a mutagenesis imuABC cassette encoded on the C. taiwanensis symbiotic plasmid triggered a transient hypermutability stage in R. solanacearum transconjugants that occurred before the cells entered the plant. The generated burst in genetic diversity accelerated symbiotic adaptation of the recipient genome under plant selection pressure, presumably by improving the exploration of the fitness landscape. Finally, we show that plasmid imuABC cassettes are over-represented in rhizobial lineages harboring symbiotic plasmids. Our findings shed light on a mechanism that may have facilitated the dissemination of symbiotic competency among α- and β-proteobacteria in natura and provide evidence for the positive role of environment-induced mutagenesis in the acquisition of a complex lifestyle trait. We speculate that co-transfer of complex phenotypic traits with mutagenesis determinants might frequently enhance the ecological success of HGT.
Horizontal gene transfer has an extraordinary impact on microbe evolution and diversification, by allowing exploration of new niches such as higher organisms. This is the case for rhizobia, a group of phylogenetically diverse bacteria that form a nitrogen-fixing symbiotic relationship with most leguminous plants. While these arose through horizontal transfer of symbiotic plasmids, this in itself is usually unproductive, and full expression of the acquired traits needs subsequent remodeling of the genome to ensure the ecological success of the transfer. Here we uncover a mechanism that accelerates the evolution of a soil bacterium into a legume symbiont. We show that key symbiotic genes are co-transferred with genes encoding stress-responsive error-prone DNA polymerases that transiently elevate the mutation rate in the recipient genome. This burst in genetic diversity accelerates the symbiotic evolution process under selection pressure from the host plant. A more widespread involvement of plasmid mutagenesis cassettes in rhizobium evolution is supported by their overrepresentation in rhizobia-containing lineages. Our findings provide evidence for the role of environment-induced mutagenesis in the acquisition of a complex lifestyle trait and predict that co-transfer of complex phenotypic traits with mutagenesis determinants might help successful horizontal gene transfer.
The roles of restriction-modification (R-M) systems in providing immunity against horizontal gene transfer (HGT) and in stabilizing mobile genetic elements (MGEs) have been much debated. However, few studies have precisely addressed the distribution of these systems in light of HGT, its mechanisms and its vectors. We analyzed the distribution of R-M systems in 2261 prokaryote genomes and found their frequency to be strongly dependent on the presence of MGEs, CRISPR-Cas systems, integrons and natural transformation. Yet R-M systems are rare in plasmids, in prophages and nearly absent from other phages. Their abundance depends on genome size for small genomes where it relates with HGT but saturates at two occurrences per genome. Chromosomal R-M systems might evolve under cycles of purifying and relaxed selection, where sequence conservation depends on the biochemical activity and complexity of the system and total gene loss is frequent. Surprisingly, analysis of 43 pan-genomes suggests that solitary R-M genes rarely arise from the degradation of R-M systems. Solitary genes are transferred by large MGEs, whereas complete systems are more frequently transferred autonomously or in small MGEs. Our results suggest means of testing the roles for R-M systems and their associations with MGEs.
Secretins form large multimeric complexes in the outer membranes of many Gram-negative bacteria, where they function as dedicated gateways that allow proteins to access the extracellular environment. Despite their overall relatedness, different secretins use different specific and general mechanisms for their targeting, assembly, and membrane insertion. We report that all tested secretins from several type II secretion systems and from the filamentous bacteriophage f1 can spontaneously multimerize and insert into liposomes in an in vitro transcription-translation system. Phylogenetic analyses indicate that these secretins form a group distinct from the secretins of the type IV piliation and type III secretion systems, which do not autoassemble in vitro. A mutation causing a proline-to-leucine substitution allowed PilQ secretins from two different type IV piliation systems to assemble in vitro, albeit with very low efficiency, suggesting that autoassembly is an inherent property of all secretins.
Conjugation of DNA through a type IV secretion system (T4SS) drives horizontal gene transfer. Yet little is known on the diversity of these nanomachines. We previously found that T4SS can be divided in eight classes based on the phylogeny of the only ubiquitous protein of T4SS (VirB4). Here, we use an ab initio approach to identify protein families systematically and specifically associated with VirB4 in each class. We built profiles for these proteins and used them to scan 2262 genomes for the presence of T4SS. Our analysis led to the identification of thousands of occurrences of 116 protein families for a total of 1623 T4SS. Importantly, we could identify almost always in our profiles the essential genes of well-studied T4SS. This allowed us to build a database with the largest number of T4SS described to date. Using profile–profile alignments, we reveal many new cases of homology between components of distant classes of T4SS. We mapped these similarities on the T4SS phylogenetic tree and thus obtained the patterns of acquisition and loss of these protein families in the history of T4SS. The identification of the key VirB4-associated proteins paves the way toward experimental analysis of poorly characterized T4SS classes.
The human bacterial pathogen Listeria monocytogenes is emerging as a model organism to study RNA-mediated regulation in pathogenic bacteria. A class of non-coding RNAs called CRISPRs (clustered regularly interspaced short palindromic repeats) has been described to confer bacterial resistance against invading bacteriophages and conjugative plasmids. CRISPR function relies on the activity of CRISPR associated (cas) genes that encode a large family of proteins with nuclease or helicase activities and DNA and RNA binding domains. Here, we characterized a CRISPR element (RliB) that is expressed and processed in the L. monocytogenes strain EGD-e, which is completely devoid of cas genes. Structural probing revealed that RliB has an unexpected secondary structure comprising basepair interactions between the repeats and the adjacent spacers in place of canonical hairpins formed by the palindromic repeats. Moreover, in contrast to other CRISPR-Cas systems identified in Listeria, RliB-CRISPR is ubiquitously present among Listeria genomes at the same genomic locus and is never associated with the cas genes. We showed that RliB-CRISPR is a substrate for the endogenously encoded polynucleotide phosphorylase (PNPase) enzyme. The spacers of the different Listeria RliB-CRISPRs share many sequences with temperate and virulent phages. Furthermore, we show that a cas-less RliB-CRISPR lowers the acquisition frequency of a plasmid carrying the matching protospacer, provided that trans encoded cas genes of a second CRISPR-Cas system are present in the genome. Importantly, we show that PNPase is required for RliB-CRISPR mediated DNA interference. Altogether, our data reveal a yet undescribed CRISPR system whose both processing and activity depend on PNPase, highlighting a new and unexpected function for PNPase in “CRISPRology”.
CRISPR-Cas systems confer to bacteria and archaea an adaptive immunity that protects them against invading bacteriophages and plasmids. In this study, we characterize a CRISPR (RliB-CRISPR) that is present in all L. monocytogenes strains at the same genomic locus but is never associated with a cas operon. It is an unusual CRISPR that, as we demonstrate, has a secondary structure consisting of basepair interactions between the repeat sequence and the adjacent spacer. We show that the RliB-CRISPR is processed by the endogenously encoded polynucleotide phosphorylase enzyme (PNPase). In addition, we show that the RliB-CRISPR system requires PNPase and presence of trans encoded cas genes of a second CRISPR-Cas system, to mediate DNA interference directed against a plasmid carrying a matching protospacer. Altogether, our data reveal a novel type of CRISPR system in bacteria that requires endogenously encoded PNPase enzyme for its processing and interference activity.
In prokaryotes, genome size is associated with metabolic versatility, regulatory complexity, effective population size, and horizontal transfer rates. We therefore analyzed the covariation of genome size and operon conservation to assess the evolutionary models of operon formation and maintenance. In agreement with previous results, intraoperonic pairs of essential and of highly expressed genes are more conserved. Interestingly, intraoperonic pairs of genes are also more conserved when they encode proteins at similar cell concentrations, suggesting a role of cotranscription in diminishing the cost of waste and shortfall in gene expression. Larger genomes have fewer and smaller operons that are also less conserved. Importantly, lower conservation in larger genomes was observed for all classes of operons in terms of gene expression, essentiality, and balanced protein concentration. We reached very similar conclusions in independent analyses of three major bacterial clades (α- and β-Proteobacteria and Firmicutes). Operon conservation is inversely correlated to the abundance of transcription factors in the genome when controlled for genome size. This suggests a negative association between the complexity of genetic networks and operon conservation. These results show that genome size and/or its proxies are key determinants of the intensity of natural selection for operon organization. Our data fit better the evolutionary models based on the advantage of coregulation than those based on genetic linkage or stochastic gene expression. We suggest that larger genomes with highly complex genetic networks and many transcription factors endure weaker selection for operons than smaller genomes with fewer alternative tools for genetic regulation.
operons; prokaryotes; evolution
Phages, like many parasites, tend to have small genomes and may encode autonomous functions or manipulate those of their hosts'. Recombination functions are essential for phage replication and diversification. They are also nearly ubiquitous in bacteria. The E. coli genome encodes many copies of an octamer (Chi) motif that upon recognition by RecBCD favors repair of double strand breaks by homologous recombination. This might allow self from non-self discrimination because RecBCD degrades DNA lacking Chi. Bacteriophage Lambda, an E. coli parasite, lacks Chi motifs, but escapes degradation by inhibiting RecBCD and encoding its own autonomous recombination machinery. We found that only half of 275 lambdoid genomes encode recombinases, the remaining relying on the host's machinery. Unexpectedly, we found that some lambdoid phages contain extremely high numbers of Chi motifs concentrated between the phage origin of replication and the packaging site. This suggests a tight association between replication, packaging and RecBCD-mediated recombination in these phages. Indeed, phages lacking recombinases strongly over-represent Chi motifs. Conversely, phages encoding recombinases and inhibiting host recombination machinery select for the absence of Chi motifs. Host and phage recombinases use different mechanisms and the latter are more tolerant to sequence divergence. Accordingly, we show that phages encoding their own recombination machinery have more mosaic genomes resulting from recent recombination events and have more diverse gene repertoires, i.e. larger pan genomes. We discuss the costs and benefits of superseding or manipulating host recombination functions and how this decision shapes phage genome structure and evolvability.
Bacterial viruses, called bacteriophages, are extremely abundant in the biosphere. They have key roles in the regulation of bacterial populations and in the diversification of bacterial genomes. Among these viruses, lambdoid phages are very abundant in enterobacteria and exchange genetic material very frequently. This latter process is thought to increase phage diversity and therefore facilitate adaptation to hosts. Recombination is also essential for the replication of many lambdoid phages. Lambdoids have been described to encode their own recombination genes and inhibit their hosts'. In this study, we show that lambdoids are split regarding their capacity to encode autonomous recombination functions and that this affects the abundance of recombination-related sequence motifs. Half of the phages encode an autonomous system and inhibit their hosts'. The trade-off between superseding and manipulating the hosts' recombination functions has important consequences. The phages encoding autonomous recombination functions have more diverse gene repertoires and recombine more frequently. Viruses, as many other parasites, have small genomes and depend on their hosts for several housekeeping functions. Hence, they often face trade-offs between supersession and manipulation of molecular machineries. Our results suggest these trade-offs may shape viral gene repertoires, their sequence composition and even influence their evolvability.
Despite increasing interest in coagulase-negative staphylococci (CoNS), little information is available about their bacteriophages. We isolated and sequenced three novel temperate Siphoviridae phages (StB12, StB27, and StB20) from the CoNS Staphylococcus hominis and S. capitis species. The genome sizes are around 40 kb, and open reading frames (ORFs) are arranged in functional modules encoding lysogeny, DNA metabolism, morphology, and cell lysis. Bioinformatics analysis allowed us to assign a potential function to half of the predicted proteins. Structural elements were further identified by proteomic analysis of phage particles, and DNA-packaging mechanisms were determined. Interestingly, the three phages show identical integration sites within their host genomes. In addition to this experimental characterization, we propose a novel classification based on the analysis of 85 phage and prophage genomes, including 15 originating from CoNS. Our analysis established 9 distinct clusters and revealed close relationships between S. aureus and CoNS phages. Genes involved in DNA metabolism and lysis and potentially in phage-host interaction appear to be widespread, while structural genes tend to be cluster specific. Our findings support the notion of a possible reciprocal exchange of genes between phages originating from S. aureus and CoNS, which may be of crucial importance for pathogenesis in staphylococci.
Quorum sensing (QS) regulates the onset of bacterial social responses in function to cell density having an important impact in virulence. Autoinducer-2 (AI-2) is a signal that has the peculiarity of mediating both intra- and interspecies bacterial QS. We analyzed the diversity of all components of AI-2 QS across 44 complete genomes of Escherichia coli and Shigella strains. We used phylogenetic tools to study its evolution and determined the phenotypes of single-deletion mutants to predict phenotypes of natural strains. Our analysis revealed many likely adaptive polymorphisms both in gene content and in nucleotide sequence. We show that all natural strains possess the signal emitter (the luxS gene), but many lack a functional signal receptor (complete lsr operon) and the ability to regulate extracellular signal concentrations. This result is in striking contrast with the canonical species-specific QS systems where one often finds orphan receptors, without a cognate synthase, but not orphan emitters. Our analysis indicates that selection actively maintains a balanced polymorphism for the presence/absence of a functional lsr operon suggesting diversifying selection on the regulation of signal accumulation and recognition. These results can be explained either by niche-specific adaptation or by selection for a coercive behavior where signal-blind emitters benefit from forcing other individuals in the population to haste in cooperative behaviors.
genome evolution; gene loss; E. coli; balancing selection; social cheater; bacteria signaling
Rapid turnover of mobile elements drives the plasticity of bacterial genomes. Integrated bacteriophages (prophages) encode host-adaptive traits and represent a sizable fraction of bacterial chromosomes. We hypothesized that natural selection shapes prophage integration patterns relative to the host genome organization. We tested this idea by detecting and studying 500 prophages of 69 strains of Escherichia and Salmonella. Phage integrases often target not only conserved genes but also intergenic positions, suggesting purifying selection for integration sites. Furthermore, most integration hotspots are conserved between the two host genera. Integration sites seem also selected at the large chromosomal scale, as they are nonrandomly organized in terms of the origin–terminus axis and the macrodomain structure. The genes of lambdoid prophages are systematically co-oriented with the bacterial replication fork and display the host high frequency of polarized FtsK-orienting polar sequences motifs required for chromosome segregation. matS motifs are strongly avoided by prophages suggesting counter selection of motifs disrupting macrodomains. These results show how natural selection for seamless integration of prophages in the chromosome shapes the evolution of the bacterium and the phage. First, integration sites are highly conserved for many millions of years favoring lysogeny over the lytic cycle for temperate phages. Second, the global distribution of prophages is intimately associated with the chromosome structure and the patterns of gene expression. Third, the phage endures selection for DNA motifs that pertain exclusively to the biology of the prophage in the bacterial chromosome. Understanding prophage genetic adaptation sheds new lights on the coexistence of horizontal transfer and organized bacterial genomes.
Proteins secreted to the extracellular environment or to the periphery of the cell envelope, the secretome, play essential roles in foraging, antagonistic and mutualistic interactions. We hypothesize that arms races, genetic conflicts and varying selective pressures should lead to the rapid change of sequences and gene repertoires of the secretome. The analysis of 42 bacterial pan-genomes shows that secreted, and especially extracellular proteins, are predominantly encoded in the accessory genome, i.e. among genes not ubiquitous within the clade. Genes encoding outer membrane proteins might engage more frequently in intra-chromosomal gene conversion because they are more often in multi-genic families. The gene sequences encoding the secretome evolve faster than the rest of the genome and in particular at non-synonymous positions. Cell wall proteins in Firmicutes evolve particularly fast when compared with outer membrane proteins of Proteobacteria. Virulence factors are over-represented in the secretome, notably in outer membrane proteins, but cell localization explains more of the variance in substitution rates and gene repertoires than sequence homology to known virulence factors. Accordingly, the repertoires and sequences of the genes encoding the secretome change fast in the clades of obligatory and facultative pathogens and also in the clades of mutualists and free-living bacteria. Our study shows that cell localization shapes genome evolution. In agreement with our hypothesis, the repertoires and the sequences of genes encoding secreted proteins evolve fast. The particularly rapid change of extracellular proteins suggests that these public goods are key players in bacterial adaptation.
Type 3 secretion systems (T3SSs) are essential components of two complex bacterial machineries: the flagellum, which drives cell motility, and the non-flagellar T3SS (NF-T3SS), which delivers effectors into eukaryotic cells. Yet the origin, specialization, and diversification of these machineries remained unclear. We developed computational tools to identify homologous components of the two systems and to discriminate between them. Our analysis of >1,000 genomes identified 921 T3SSs, including 222 NF-T3SSs. Phylogenomic and comparative analyses of these systems argue that the NF-T3SS arose from an exaptation of the flagellum, i.e. the recruitment of part of the flagellum structure for the evolution of the new protein delivery function. This reconstructed chronology of the exaptation process proceeded in at least two steps. An intermediate ancestral form of NF-T3SS, whose descendants still exist in Myxococcales, lacked elements that are essential for motility and included a subset of NF-T3SS features. We argue that this ancestral version was involved in protein translocation. A second major step in the evolution of NF-T3SSs occurred via recruitment of secretins to the NF-T3SS, an event that occurred at least three times from different systems. In rhizobiales, a partial homologous gene replacement of the secretin resulted in two genes of complementary function. Acquisition of a secretin was followed by the rapid adaptation of the resulting NF-T3SSs to multiple, distinct eukaryotic cell envelopes where they became key in parasitic and mutualistic associations between prokaryotes and eukaryotes. Our work elucidates major steps of the evolutionary scenario leading to extant NF-T3SSs. It demonstrates how molecular evolution can convert one complex molecular machine into a second, equally complex machine by successive deletions, innovations, and recruitment from other molecular systems.
Most motile bacteria use a flagellum to move. The extracellular components of flagella are secreted by their own Type 3 Secretion System (T3SS). The non-flagellar T3SS (NF-T3SS), also named injectisome, includes many proteins that are homologous to flagellar components. NF-T3SSs are employed by many plant and animal pathogens to deliver effectors to host cells, including toxins. NF-T3SSs are complex protein machineries with >15 components that connect bacterial cell envelopes to eukaryotic cell membranes, including the intervening extracellular space. In this study, we designed computational tools to distinguish flagella and NF-T3SSs from other bacterial protein sequences. We show that NF-T3SSs evolved from the flagellum by a series of genetic deletions, innovations, and recruitments of components from other cellular structures. Our evolutionary analysis suggests that NF-T3SSs then quickly adapted to different eukaryotic cells while maintaining a core structure that remains highly similar to the flagellum. This is an example of evolutionary tinkering where a complex structure arises by exaptation, the recruitment of elements that evolved initially for other functions in other cellular structures.
Genetic exchange by conjugation is responsible for the spread of resistance, virulence,
and social traits among prokaryotes. Recent works unraveled the functioning of the
underlying type IV secretion systems (T4SS) and its distribution and recruitment for other
biological processes (exaptation), notably pathogenesis. We analyzed the phylogeny of key
conjugation proteins to infer the evolutionary history of conjugation and T4SS. We show
that single-stranded DNA (ssDNA) and double-stranded DNA (dsDNA) conjugation, while both
based on a key AAA+ ATPase, diverged before the last common ancestor of
bacteria. The two key ATPases of ssDNA conjugation are monophyletic, having diverged at an
early stage from dsDNA translocases. Our data suggest that ssDNA conjugation arose first
in diderm bacteria, possibly Proteobacteria, and then spread to other bacterial phyla,
including bacterial monoderms and Archaea. Identifiable T4SS fall within the eight
monophyletic groups, determined by both taxonomy and structure of the cell envelope.
Transfer to monoderms might have occurred only once, but followed diverse adaptive paths.
Remarkably, some Firmicutes developed a new conjugation system based on an atypical
relaxase and an ATPase derived from a dsDNA translocase. The observed evolutionary rates
and patterns of presence/absence of specific T4SS proteins show that conjugation systems
are often and independently exapted for other functions. This work brings a natural basis
for the classification of all kinds of conjugative systems, thus tackling a problem that
is growing as fast as genomic databases. Our analysis provides the first global picture of
the evolution of conjugation and shows how a self-transferrable complex multiprotein
system has adapted to different taxa and often been recruited by the host. As conjugation
systems became specific to certain clades and cell envelopes, they may have biased the
rate and direction of gene transfer by conjugation within prokaryotes.
bacterial conjugation; horizontal gene transfer; type IV protein secretion; exaptation; plasmid evolution
Members of the genus Flavobacterium occur in a variety of ecological niches and represent an interesting diversity of lifestyles. Flavobacterium branchiophilum is the main causative agent of bacterial gill disease, a severe condition affecting various cultured freshwater fish species worldwide, in particular salmonids in Canada and Japan. We report here the complete genome sequence of strain FL-15 isolated from a diseased sheatfish (Silurus glanis) in Hungary. The analysis of the F. branchiophilum genome revealed putative mechanisms of pathogenicity strikingly different from those of the other, closely related fish pathogen Flavobacterium psychrophilum, including the first cholera-like toxin in a non-Proteobacteria and a wealth of adhesins. The comparison with available genomes of other Flavobacterium species revealed a small genome size, large differences in chromosome organization, and fewer rRNA and tRNA genes, in line with its more fastidious growth. In addition, horizontal gene transfer shaped the evolution of F. branchiophilum, as evidenced by its virulence factors, genomic islands, and CRISPR (clustered regularly interspaced short palindromic repeats) systems. Further functional analysis should help in the understanding of host-pathogen interactions and in the development of rational diagnostic tools and control strategies in fish farms.
Many studies have been devoted to understand the mechanisms used by pathogenic bacteria to exploit human hosts. These mechanisms are very diverse in the detail, but share commonalities whose quantification should enlighten the evolution of virulence from both a molecular and an ecological perspective. We mined the literature for experimental data on infectious dose of bacterial pathogens in humans (ID50) and also for traits with which ID50 might be associated. These compilations were checked and complemented with genome analyses. We observed that ID50 varies in a continuous way by over 10 orders of magnitude. Low ID50 values are very strongly associated with the capacity of the bacteria to kill professional phagocytes or to survive in the intracellular milieu of these cells. Inversely, high ID50 values are associated with motile and fast-growing bacteria that use quorum-sensing based regulation of virulence factors expression. Infectious dose is not associated with genome size and shows insignificant phylogenetic inertia, in line with frequent virulence shifts associated with the horizontal gene transfer of a small number of virulence factors. Contrary to previous proposals, infectious dose shows little dependence on contact-dependent secretion systems and on the natural route of exposure. When all variables are combined, immune subversion and quorum-sensing are sufficient to explain two thirds of the variance in infectious dose. Our results show the key role of immune subversion in effective human infection by small bacterial populations. They also suggest that cooperative processes might be important for successful infection by bacteria with high ID50. Our results suggest that trade-offs between selection for population growth-related traits and selection for the ability to subvert the immune system shape bacterial infectiousness. Understanding these trade-offs provides guidelines to study the evolution of virulence and in particular the micro-evolutionary paths of emerging pathogens.
Every pathogen is unique and uses distinctive combinations of specific mechanisms to exploit the human host. Yet, several common themes in the ways pathogens use these mechanisms can be found among distantly related bacteria. The understanding of these common themes provides useful concepts and uncovers important principles in pathogenesis. Here, we have made a cross-species analysis of traits thought to be relevant for virulence of bacterial pathogens. We have found that the infectious dose of pathogens is much lower when they are able to kill professional phagocytes of the immune system or to survive in the intracellular milieu of these cells. On the other hand, bacteria requiring higher infectious dose are more likely to be motile, fast-growing and regulate the expression of virulence factors when the population quorum is high enough to be effective in starting an infection. This suggests that infectious dose results from a trade-off between selection for fast coordinated growth and the ability to subvert the immune system. This trade-off may underlie other traits such as the ability of a pathogen to live outside the association from a host. Understanding the patterns shaping infectious dose will facilitate the prediction of evolutionary paths of emerging pathogens.
In order to get further insights into the role of the clustered, regularly interspaced, short palindromic repeats (CRISPRs) in Escherichia coli, we analyzed the CRISPR diversity in a collection of 290 strains, in the phylogenetic framework of the strains represented by multilocus sequence typing (MLST). The set included 263 natural E. coli isolates exposed to various environments and isolated over a 20-year period from humans and animals, as well as 27 fully sequenced strains. Our analyses confirm that there are two largely independent pairs of CRISPR loci (CRISPR1 and -2 and CRISPR3 and -4), each associated with a different type of cas genes (Ecoli and Ypest, respectively), but that each pair of CRISPRs has similar dynamics. Strikingly, the major phylogenetic group B2 is almost devoid of CRISPRs. The majority of genomes analyzed lack Ypest cas genes and contain CRISPR3 with spacers matching Ypest cas genes. The analysis of relatedness between strains in terms of spacer repertoire and the MLST tree shows a pattern where closely related strains (MLST phylogenetic distance of <0.005 corresponding to at least hundreds of thousands of years) often exhibit identical CRISPRs while more distantly related strains (MLST distance of >0.01) exhibit completely different CRISPRs. This suggests rare but radical turnover of spacers in CRISPRs rather than CRISPR gradual change. We found no link between the presence, size, or content of CRISPRs and the lifestyle of the strains. Our data suggest that, within the E. coli species, CRISPRs do not have the expected characteristics of a classical immune system.
Horizontal gene transfer shapes the genomes of prokaryotes by allowing rapid acquisition of novel adaptive functions. Conjugation allows the broadest range and the highest gene transfer input per transfer event. While conjugative plasmids have been studied for decades, the number and diversity of integrative conjugative elements (ICE) in prokaryotes remained unknown. We defined a large set of protein profiles of the conjugation machinery to scan over 1,000 genomes of prokaryotes. We found 682 putative conjugative systems among all major phylogenetic clades and showed that ICEs are the most abundant conjugative elements in prokaryotes. Nearly half of the genomes contain a type IV secretion system (T4SS), with larger genomes encoding more conjugative systems. Surprisingly, almost half of the chromosomal T4SS lack co-localized relaxases and, consequently, might be devoted to protein transport instead of conjugation. This class of elements is preponderant among small genomes, is less commonly associated with integrases, and is rarer in plasmids. ICEs and conjugative plasmids in proteobacteria have different preferences for each type of T4SS, but all types exist in both chromosomes and plasmids. Mobilizable elements outnumber self-conjugative elements in both ICEs and plasmids, which suggests an extensive use of T4SS in trans. Our evolutionary analysis indicates that switch of plasmids to and from ICEs were frequent and that extant elements began to differentiate only relatively recently. According to the present results, ICEs are the most abundant conjugative elements in practically all prokaryotic clades and might be far more frequently domesticated into non-conjugative protein transport systems than previously thought. While conjugative plasmids and ICEs have different means of genomic stabilization, their mechanisms of mobility by conjugation show strikingly conserved patterns, arguing for a unitary view of conjugation in shaping the genomes of prokaryotes by horizontal gene transfer.
Some mobile genetic elements spread genetic information horizontally between prokaryotes by conjugation, a mechanism by which DNA is transferred directly from one cell to the other. Among the processes allowing genetic transfer between cells, conjugation is the one allowing the simultaneous transfer of larger amounts of DNA and between the least related cells. As such, conjugative systems are key players in horizontal transfer, including the transfer of antibiotic resistance to and between many human pathogens. Conjugative systems are encoded both in plasmids and in chromosomes. The latter are called Integrative Conjugative Elements (ICE); and their number, identity, and mechanism of conjugation were poorly known. We have developed an approach to identify and characterize these elements and found more ICEs than conjugative plasmids in genomes. While both ICEs and plasmids use similar conjugative systems, there are remarkable preferences for some systems in some elements. Our evolutionary analysis shows that plasmid conjugative systems have often given rise to ICEs and vice versa. Therefore, ICEs and conjugative plasmids should be regarded as one and the same, the differences in their means of existence in cells probably the result of different requirements for stabilization and/or transmissibility of the genetic information they contain.
Gene duplication followed by neo- or sub-functionalization deeply impacts the evolution of protein families and is regarded as the main source of adaptive functional novelty in eukaryotes. While there is ample evidence of adaptive gene duplication in prokaryotes, it is not clear whether duplication outweighs the contribution of horizontal gene transfer in the expansion of protein families. We analyzed closely related prokaryote strains or species with small genomes (Helicobacter, Neisseria, Streptococcus, Sulfolobus), average-sized genomes (Bacillus, Enterobacteriaceae), and large genomes (Pseudomonas, Bradyrhizobiaceae) to untangle the effects of duplication and horizontal transfer. After removing the effects of transposable elements and phages, we show that the vast majority of expansions of protein families are due to transfer, even among large genomes. Transferred genes—xenologs—persist longer in prokaryotic lineages possibly due to a higher/longer adaptive role. On the other hand, duplicated genes—paralogs—are expressed more, and, when persistent, they evolve slower. This suggests that gene transfer and gene duplication have very different roles in shaping the evolution of biological systems: transfer allows the acquisition of new functions and duplication leads to higher gene dosage. Accordingly, we show that paralogs share most protein–protein interactions and genetic regulators, whereas xenologs share very few of them. Prokaryotes invented most of life's biochemical diversity. Therefore, the study of the evolution of biology systems should explicitly account for the predominant role of horizontal gene transfer in the diversification of protein families.
Prokaryotes can be found in the most diverse and severe ecological niches of the planet. Their rapid adaptation is, in part, the result of the ability to acquire genetic information horizontally. This means that prokaryotes utilize two major paths to expand their repertoire of protein families: they can duplicate a pre-existing gene or acquire it by horizontal transfer. In this study, we track family expansions among closely related strains of prokaryotic species. We find that the majority of gene expansions arrive via transfer not via duplication. Additionally, we find that duplicate genes tend be more transient and evolve slower than transferred ones, highlighting different roles with respect to adaptation and evolution. These results suggest that prevailing theories aimed at understanding the evolution of biological systems grounded on gene duplication might be poorly fit to explain the evolution of prokaryotic systems, which include the vast majority of life's biochemical diversity.
Prokaryotes thrive in spite of the vast number and diversity of their viruses. This partly results from the evolution of mechanisms to inactivate or silence the action of exogenous DNA. Among these, Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are unique in providing adaptive immunity against elements with high local resemblance to genomes of previously infecting agents. Here, we analyze the CRISPR loci of 51 complete genomes of Escherichia and Salmonella. CRISPR are in two pairs of loci in Escherichia, one single pair in Salmonella, each pair showing a similar turnover rate, repeat sequence and putative linkage to a common set of cas genes. Yet, phylogeny shows that CRISPR and associated cas genes have different evolutionary histories, the latter being frequently exchanged or lost. In our set, one CRISPR pair seems specialized in plasmids often matching genes coding for the replication, conjugation and antirestriction machinery. Strikingly, this pair also matches the cognate cas genes in which case these genes are absent. The unexpectedly high conservation of this anti-CRISPR suggests selection to counteract the invasion of mobile elements containing functional CRISPR/cas systems. There are few spacers in most CRISPR, which rarely match genomes of known phages. Furthermore, we found that strains divergent less than 250 thousand years ago show virtually identical CRISPR. The lack of congruence between cas, CRISPR and the species phylogeny and the slow pace of CRISPR change make CRISPR poor epidemiological markers in enterobacteria. All these observations are at odds with the expectedly abundant and dynamic repertoire of spacers in an immune system aiming at protecting bacteria from phages. Since we observe purifying selection for the maintenance of CRISPR these results suggest that alternative evolutionary roles for CRISPR remain to be uncovered.
Microbial minimal generation times range from a few minutes to several weeks. They are evolutionarily determined by variables such as environment stability, nutrient availability, and community diversity. Selection for fast growth adaptively imprints genomes, resulting in gene amplification, adapted chromosomal organization, and biased codon usage. We found that these growth-related traits in 214 species of bacteria and archaea are highly correlated, suggesting they all result from growth optimization. While modeling their association with maximal growth rates in view of synthetic biology applications, we observed that codon usage biases are better correlates of growth rates than any other trait, including rRNA copy number. Systematic deviations to our model reveal two distinct evolutionary processes. First, genome organization shows more evolutionary inertia than growth rates. This results in over-representation of growth-related traits in fast degrading genomes. Second, selection for these traits depends on optimal growth temperature: for similar generation times purifying selection is stronger in psychrophiles, intermediate in mesophiles, and lower in thermophiles. Using this information, we created a predictor of maximal growth rate adapted to small genome fragments. We applied it to three metagenomic environmental samples to show that a transiently rich environment, as the human gut, selects for fast-growers, that a toxic environment, as the acid mine biofilm, selects for low growth rates, whereas a diverse environment, like the soil, shows all ranges of growth rates. We also demonstrate that microbial colonizers of babies gut grow faster than stabilized human adults gut communities. In conclusion, we show that one can predict maximal growth rates from sequence data alone, and we propose that such information can be used to facilitate the manipulation of generation times. Our predictor allows inferring growth rates in the vast majority of uncultivable prokaryotes and paves the way to the understanding of community dynamics from metagenomic data.
Microbial minimal generation times vary from a few minutes to several weeks. The reasons for this disparity have been thought to lie on different life-history strategies: fast-growing microbes grow extremely fast in rich media, but are less capable of dealing with stress and/or poor nutrient conditions. Prokaryotes have evolved a set of genomic traits to grow fast, including biased codon usage and transient or permanent gene multiplication for dosage effects. Here, we studied the relative role of these traits and show they can be used to predict minimal generation times from the genomic data of the vast majority of microbes that cannot be cultivated. We show that this inference can also be made with incomplete genomes and thus be applied to metagenomic data to test hypotheses about the biomass productivity of biotopes and the evolution of microbiota in the human gut after birth. Our results also allow a better understanding of the co-evolution between growth rates and genomic traits and how they can be manipulated in synthetic biology. Growth rates have been a key variable in microbial physiology studies in the last century, and we show how intimately they are linked with genome organization and prokaryotic ecology.