Secretins form large multimeric complexes in the outer membranes of many Gram-negative bacteria, where they function as dedicated gateways that allow proteins to access the extracellular environment. Despite their overall relatedness, different secretins use different specific and general mechanisms for their targeting, assembly, and membrane insertion. We report that all tested secretins from several type II secretion systems and from the filamentous bacteriophage f1 can spontaneously multimerize and insert into liposomes in an in vitro transcription-translation system. Phylogenetic analyses indicate that these secretins form a group distinct from the secretins of the type IV piliation and type III secretion systems, which do not autoassemble in vitro. A mutation causing a proline-to-leucine substitution allowed PilQ secretins from two different type IV piliation systems to assemble in vitro, albeit with very low efficiency, suggesting that autoassembly is an inherent property of all secretins.
Quorum sensing (QS) regulates the onset of bacterial social responses in function to cell density having an important impact in virulence. Autoinducer-2 (AI-2) is a signal that has the peculiarity of mediating both intra- and interspecies bacterial QS. We analyzed the diversity of all components of AI-2 QS across 44 complete genomes of Escherichia coli and Shigella strains. We used phylogenetic tools to study its evolution and determined the phenotypes of single-deletion mutants to predict phenotypes of natural strains. Our analysis revealed many likely adaptive polymorphisms both in gene content and in nucleotide sequence. We show that all natural strains possess the signal emitter (the luxS gene), but many lack a functional signal receptor (complete lsr operon) and the ability to regulate extracellular signal concentrations. This result is in striking contrast with the canonical species-specific QS systems where one often finds orphan receptors, without a cognate synthase, but not orphan emitters. Our analysis indicates that selection actively maintains a balanced polymorphism for the presence/absence of a functional lsr operon suggesting diversifying selection on the regulation of signal accumulation and recognition. These results can be explained either by niche-specific adaptation or by selection for a coercive behavior where signal-blind emitters benefit from forcing other individuals in the population to haste in cooperative behaviors.
genome evolution; gene loss; E. coli; balancing selection; social cheater; bacteria signaling
Rapid turnover of mobile elements drives the plasticity of bacterial genomes. Integrated bacteriophages (prophages) encode host-adaptive traits and represent a sizable fraction of bacterial chromosomes. We hypothesized that natural selection shapes prophage integration patterns relative to the host genome organization. We tested this idea by detecting and studying 500 prophages of 69 strains of Escherichia and Salmonella. Phage integrases often target not only conserved genes but also intergenic positions, suggesting purifying selection for integration sites. Furthermore, most integration hotspots are conserved between the two host genera. Integration sites seem also selected at the large chromosomal scale, as they are nonrandomly organized in terms of the origin–terminus axis and the macrodomain structure. The genes of lambdoid prophages are systematically co-oriented with the bacterial replication fork and display the host high frequency of polarized FtsK-orienting polar sequences motifs required for chromosome segregation. matS motifs are strongly avoided by prophages suggesting counter selection of motifs disrupting macrodomains. These results show how natural selection for seamless integration of prophages in the chromosome shapes the evolution of the bacterium and the phage. First, integration sites are highly conserved for many millions of years favoring lysogeny over the lytic cycle for temperate phages. Second, the global distribution of prophages is intimately associated with the chromosome structure and the patterns of gene expression. Third, the phage endures selection for DNA motifs that pertain exclusively to the biology of the prophage in the bacterial chromosome. Understanding prophage genetic adaptation sheds new lights on the coexistence of horizontal transfer and organized bacterial genomes.
Proteins secreted to the extracellular environment or to the periphery of the cell envelope, the secretome, play essential roles in foraging, antagonistic and mutualistic interactions. We hypothesize that arms races, genetic conflicts and varying selective pressures should lead to the rapid change of sequences and gene repertoires of the secretome. The analysis of 42 bacterial pan-genomes shows that secreted, and especially extracellular proteins, are predominantly encoded in the accessory genome, i.e. among genes not ubiquitous within the clade. Genes encoding outer membrane proteins might engage more frequently in intra-chromosomal gene conversion because they are more often in multi-genic families. The gene sequences encoding the secretome evolve faster than the rest of the genome and in particular at non-synonymous positions. Cell wall proteins in Firmicutes evolve particularly fast when compared with outer membrane proteins of Proteobacteria. Virulence factors are over-represented in the secretome, notably in outer membrane proteins, but cell localization explains more of the variance in substitution rates and gene repertoires than sequence homology to known virulence factors. Accordingly, the repertoires and sequences of the genes encoding the secretome change fast in the clades of obligatory and facultative pathogens and also in the clades of mutualists and free-living bacteria. Our study shows that cell localization shapes genome evolution. In agreement with our hypothesis, the repertoires and the sequences of genes encoding secreted proteins evolve fast. The particularly rapid change of extracellular proteins suggests that these public goods are key players in bacterial adaptation.
Type 3 secretion systems (T3SSs) are essential components of two complex bacterial machineries: the flagellum, which drives cell motility, and the non-flagellar T3SS (NF-T3SS), which delivers effectors into eukaryotic cells. Yet the origin, specialization, and diversification of these machineries remained unclear. We developed computational tools to identify homologous components of the two systems and to discriminate between them. Our analysis of >1,000 genomes identified 921 T3SSs, including 222 NF-T3SSs. Phylogenomic and comparative analyses of these systems argue that the NF-T3SS arose from an exaptation of the flagellum, i.e. the recruitment of part of the flagellum structure for the evolution of the new protein delivery function. This reconstructed chronology of the exaptation process proceeded in at least two steps. An intermediate ancestral form of NF-T3SS, whose descendants still exist in Myxococcales, lacked elements that are essential for motility and included a subset of NF-T3SS features. We argue that this ancestral version was involved in protein translocation. A second major step in the evolution of NF-T3SSs occurred via recruitment of secretins to the NF-T3SS, an event that occurred at least three times from different systems. In rhizobiales, a partial homologous gene replacement of the secretin resulted in two genes of complementary function. Acquisition of a secretin was followed by the rapid adaptation of the resulting NF-T3SSs to multiple, distinct eukaryotic cell envelopes where they became key in parasitic and mutualistic associations between prokaryotes and eukaryotes. Our work elucidates major steps of the evolutionary scenario leading to extant NF-T3SSs. It demonstrates how molecular evolution can convert one complex molecular machine into a second, equally complex machine by successive deletions, innovations, and recruitment from other molecular systems.
Most motile bacteria use a flagellum to move. The extracellular components of flagella are secreted by their own Type 3 Secretion System (T3SS). The non-flagellar T3SS (NF-T3SS), also named injectisome, includes many proteins that are homologous to flagellar components. NF-T3SSs are employed by many plant and animal pathogens to deliver effectors to host cells, including toxins. NF-T3SSs are complex protein machineries with >15 components that connect bacterial cell envelopes to eukaryotic cell membranes, including the intervening extracellular space. In this study, we designed computational tools to distinguish flagella and NF-T3SSs from other bacterial protein sequences. We show that NF-T3SSs evolved from the flagellum by a series of genetic deletions, innovations, and recruitments of components from other cellular structures. Our evolutionary analysis suggests that NF-T3SSs then quickly adapted to different eukaryotic cells while maintaining a core structure that remains highly similar to the flagellum. This is an example of evolutionary tinkering where a complex structure arises by exaptation, the recruitment of elements that evolved initially for other functions in other cellular structures.
Genetic exchange by conjugation is responsible for the spread of resistance, virulence,
and social traits among prokaryotes. Recent works unraveled the functioning of the
underlying type IV secretion systems (T4SS) and its distribution and recruitment for other
biological processes (exaptation), notably pathogenesis. We analyzed the phylogeny of key
conjugation proteins to infer the evolutionary history of conjugation and T4SS. We show
that single-stranded DNA (ssDNA) and double-stranded DNA (dsDNA) conjugation, while both
based on a key AAA+ ATPase, diverged before the last common ancestor of
bacteria. The two key ATPases of ssDNA conjugation are monophyletic, having diverged at an
early stage from dsDNA translocases. Our data suggest that ssDNA conjugation arose first
in diderm bacteria, possibly Proteobacteria, and then spread to other bacterial phyla,
including bacterial monoderms and Archaea. Identifiable T4SS fall within the eight
monophyletic groups, determined by both taxonomy and structure of the cell envelope.
Transfer to monoderms might have occurred only once, but followed diverse adaptive paths.
Remarkably, some Firmicutes developed a new conjugation system based on an atypical
relaxase and an ATPase derived from a dsDNA translocase. The observed evolutionary rates
and patterns of presence/absence of specific T4SS proteins show that conjugation systems
are often and independently exapted for other functions. This work brings a natural basis
for the classification of all kinds of conjugative systems, thus tackling a problem that
is growing as fast as genomic databases. Our analysis provides the first global picture of
the evolution of conjugation and shows how a self-transferrable complex multiprotein
system has adapted to different taxa and often been recruited by the host. As conjugation
systems became specific to certain clades and cell envelopes, they may have biased the
rate and direction of gene transfer by conjugation within prokaryotes.
bacterial conjugation; horizontal gene transfer; type IV protein secretion; exaptation; plasmid evolution
Members of the genus Flavobacterium occur in a variety of ecological niches and represent an interesting diversity of lifestyles. Flavobacterium branchiophilum is the main causative agent of bacterial gill disease, a severe condition affecting various cultured freshwater fish species worldwide, in particular salmonids in Canada and Japan. We report here the complete genome sequence of strain FL-15 isolated from a diseased sheatfish (Silurus glanis) in Hungary. The analysis of the F. branchiophilum genome revealed putative mechanisms of pathogenicity strikingly different from those of the other, closely related fish pathogen Flavobacterium psychrophilum, including the first cholera-like toxin in a non-Proteobacteria and a wealth of adhesins. The comparison with available genomes of other Flavobacterium species revealed a small genome size, large differences in chromosome organization, and fewer rRNA and tRNA genes, in line with its more fastidious growth. In addition, horizontal gene transfer shaped the evolution of F. branchiophilum, as evidenced by its virulence factors, genomic islands, and CRISPR (clustered regularly interspaced short palindromic repeats) systems. Further functional analysis should help in the understanding of host-pathogen interactions and in the development of rational diagnostic tools and control strategies in fish farms.
Many studies have been devoted to understand the mechanisms used by pathogenic bacteria to exploit human hosts. These mechanisms are very diverse in the detail, but share commonalities whose quantification should enlighten the evolution of virulence from both a molecular and an ecological perspective. We mined the literature for experimental data on infectious dose of bacterial pathogens in humans (ID50) and also for traits with which ID50 might be associated. These compilations were checked and complemented with genome analyses. We observed that ID50 varies in a continuous way by over 10 orders of magnitude. Low ID50 values are very strongly associated with the capacity of the bacteria to kill professional phagocytes or to survive in the intracellular milieu of these cells. Inversely, high ID50 values are associated with motile and fast-growing bacteria that use quorum-sensing based regulation of virulence factors expression. Infectious dose is not associated with genome size and shows insignificant phylogenetic inertia, in line with frequent virulence shifts associated with the horizontal gene transfer of a small number of virulence factors. Contrary to previous proposals, infectious dose shows little dependence on contact-dependent secretion systems and on the natural route of exposure. When all variables are combined, immune subversion and quorum-sensing are sufficient to explain two thirds of the variance in infectious dose. Our results show the key role of immune subversion in effective human infection by small bacterial populations. They also suggest that cooperative processes might be important for successful infection by bacteria with high ID50. Our results suggest that trade-offs between selection for population growth-related traits and selection for the ability to subvert the immune system shape bacterial infectiousness. Understanding these trade-offs provides guidelines to study the evolution of virulence and in particular the micro-evolutionary paths of emerging pathogens.
Every pathogen is unique and uses distinctive combinations of specific mechanisms to exploit the human host. Yet, several common themes in the ways pathogens use these mechanisms can be found among distantly related bacteria. The understanding of these common themes provides useful concepts and uncovers important principles in pathogenesis. Here, we have made a cross-species analysis of traits thought to be relevant for virulence of bacterial pathogens. We have found that the infectious dose of pathogens is much lower when they are able to kill professional phagocytes of the immune system or to survive in the intracellular milieu of these cells. On the other hand, bacteria requiring higher infectious dose are more likely to be motile, fast-growing and regulate the expression of virulence factors when the population quorum is high enough to be effective in starting an infection. This suggests that infectious dose results from a trade-off between selection for fast coordinated growth and the ability to subvert the immune system. This trade-off may underlie other traits such as the ability of a pathogen to live outside the association from a host. Understanding the patterns shaping infectious dose will facilitate the prediction of evolutionary paths of emerging pathogens.
In order to get further insights into the role of the clustered, regularly interspaced, short palindromic repeats (CRISPRs) in Escherichia coli, we analyzed the CRISPR diversity in a collection of 290 strains, in the phylogenetic framework of the strains represented by multilocus sequence typing (MLST). The set included 263 natural E. coli isolates exposed to various environments and isolated over a 20-year period from humans and animals, as well as 27 fully sequenced strains. Our analyses confirm that there are two largely independent pairs of CRISPR loci (CRISPR1 and -2 and CRISPR3 and -4), each associated with a different type of cas genes (Ecoli and Ypest, respectively), but that each pair of CRISPRs has similar dynamics. Strikingly, the major phylogenetic group B2 is almost devoid of CRISPRs. The majority of genomes analyzed lack Ypest cas genes and contain CRISPR3 with spacers matching Ypest cas genes. The analysis of relatedness between strains in terms of spacer repertoire and the MLST tree shows a pattern where closely related strains (MLST phylogenetic distance of <0.005 corresponding to at least hundreds of thousands of years) often exhibit identical CRISPRs while more distantly related strains (MLST distance of >0.01) exhibit completely different CRISPRs. This suggests rare but radical turnover of spacers in CRISPRs rather than CRISPR gradual change. We found no link between the presence, size, or content of CRISPRs and the lifestyle of the strains. Our data suggest that, within the E. coli species, CRISPRs do not have the expected characteristics of a classical immune system.
Summary: Plasmids are key vectors of horizontal gene transfer and essential genetic engineering tools. They code for genes involved in many aspects of microbial biology, including detoxication, virulence, ecological interactions, and antibiotic resistance. While many studies have decorticated the mechanisms of mobility in model plasmids, the identification and characterization of plasmid mobility from genome data are unexplored. By reviewing the available data and literature, we established a computational protocol to identify and classify conjugation and mobilization genetic modules in 1,730 plasmids. This allowed the accurate classification of proteobacterial conjugative or mobilizable systems in a combination of four mating pair formation and six relaxase families. The available evidence suggests that half of the plasmids are nonmobilizable and that half of the remaining plasmids are conjugative. Some conjugative systems are much more abundant than others and preferably associated with some clades or plasmid sizes. Most very large plasmids are nonmobilizable, with evidence of ongoing domestication into secondary chromosomes. The evolution of conjugation elements shows ancient divergence between mobility systems, with relaxases and type IV coupling proteins (T4CPs) often following separate paths from type IV secretion systems. Phylogenetic patterns of mobility proteins are consistent with the phylogeny of the host prokaryotes, suggesting that plasmid mobility is in general circumscribed within large clades. Our survey suggests the existence of unsuspected new relaxases in archaea and new conjugation systems in cyanobacteria and actinobacteria. Few genes, e.g., T4CPs, relaxases, and VirB4, are at the core of plasmid conjugation, and together with accessory genes, they have evolved into specific systems adapted to specific physiological and ecological contexts.
Horizontal gene transfer shapes the genomes of prokaryotes by allowing rapid acquisition of novel adaptive functions. Conjugation allows the broadest range and the highest gene transfer input per transfer event. While conjugative plasmids have been studied for decades, the number and diversity of integrative conjugative elements (ICE) in prokaryotes remained unknown. We defined a large set of protein profiles of the conjugation machinery to scan over 1,000 genomes of prokaryotes. We found 682 putative conjugative systems among all major phylogenetic clades and showed that ICEs are the most abundant conjugative elements in prokaryotes. Nearly half of the genomes contain a type IV secretion system (T4SS), with larger genomes encoding more conjugative systems. Surprisingly, almost half of the chromosomal T4SS lack co-localized relaxases and, consequently, might be devoted to protein transport instead of conjugation. This class of elements is preponderant among small genomes, is less commonly associated with integrases, and is rarer in plasmids. ICEs and conjugative plasmids in proteobacteria have different preferences for each type of T4SS, but all types exist in both chromosomes and plasmids. Mobilizable elements outnumber self-conjugative elements in both ICEs and plasmids, which suggests an extensive use of T4SS in trans. Our evolutionary analysis indicates that switch of plasmids to and from ICEs were frequent and that extant elements began to differentiate only relatively recently. According to the present results, ICEs are the most abundant conjugative elements in practically all prokaryotic clades and might be far more frequently domesticated into non-conjugative protein transport systems than previously thought. While conjugative plasmids and ICEs have different means of genomic stabilization, their mechanisms of mobility by conjugation show strikingly conserved patterns, arguing for a unitary view of conjugation in shaping the genomes of prokaryotes by horizontal gene transfer.
Some mobile genetic elements spread genetic information horizontally between prokaryotes by conjugation, a mechanism by which DNA is transferred directly from one cell to the other. Among the processes allowing genetic transfer between cells, conjugation is the one allowing the simultaneous transfer of larger amounts of DNA and between the least related cells. As such, conjugative systems are key players in horizontal transfer, including the transfer of antibiotic resistance to and between many human pathogens. Conjugative systems are encoded both in plasmids and in chromosomes. The latter are called Integrative Conjugative Elements (ICE); and their number, identity, and mechanism of conjugation were poorly known. We have developed an approach to identify and characterize these elements and found more ICEs than conjugative plasmids in genomes. While both ICEs and plasmids use similar conjugative systems, there are remarkable preferences for some systems in some elements. Our evolutionary analysis shows that plasmid conjugative systems have often given rise to ICEs and vice versa. Therefore, ICEs and conjugative plasmids should be regarded as one and the same, the differences in their means of existence in cells probably the result of different requirements for stabilization and/or transmissibility of the genetic information they contain.
Gene duplication followed by neo- or sub-functionalization deeply impacts the evolution of protein families and is regarded as the main source of adaptive functional novelty in eukaryotes. While there is ample evidence of adaptive gene duplication in prokaryotes, it is not clear whether duplication outweighs the contribution of horizontal gene transfer in the expansion of protein families. We analyzed closely related prokaryote strains or species with small genomes (Helicobacter, Neisseria, Streptococcus, Sulfolobus), average-sized genomes (Bacillus, Enterobacteriaceae), and large genomes (Pseudomonas, Bradyrhizobiaceae) to untangle the effects of duplication and horizontal transfer. After removing the effects of transposable elements and phages, we show that the vast majority of expansions of protein families are due to transfer, even among large genomes. Transferred genes—xenologs—persist longer in prokaryotic lineages possibly due to a higher/longer adaptive role. On the other hand, duplicated genes—paralogs—are expressed more, and, when persistent, they evolve slower. This suggests that gene transfer and gene duplication have very different roles in shaping the evolution of biological systems: transfer allows the acquisition of new functions and duplication leads to higher gene dosage. Accordingly, we show that paralogs share most protein–protein interactions and genetic regulators, whereas xenologs share very few of them. Prokaryotes invented most of life's biochemical diversity. Therefore, the study of the evolution of biology systems should explicitly account for the predominant role of horizontal gene transfer in the diversification of protein families.
Prokaryotes can be found in the most diverse and severe ecological niches of the planet. Their rapid adaptation is, in part, the result of the ability to acquire genetic information horizontally. This means that prokaryotes utilize two major paths to expand their repertoire of protein families: they can duplicate a pre-existing gene or acquire it by horizontal transfer. In this study, we track family expansions among closely related strains of prokaryotic species. We find that the majority of gene expansions arrive via transfer not via duplication. Additionally, we find that duplicate genes tend be more transient and evolve slower than transferred ones, highlighting different roles with respect to adaptation and evolution. These results suggest that prevailing theories aimed at understanding the evolution of biological systems grounded on gene duplication might be poorly fit to explain the evolution of prokaryotic systems, which include the vast majority of life's biochemical diversity.
Prokaryotes thrive in spite of the vast number and diversity of their viruses. This partly results from the evolution of mechanisms to inactivate or silence the action of exogenous DNA. Among these, Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR) are unique in providing adaptive immunity against elements with high local resemblance to genomes of previously infecting agents. Here, we analyze the CRISPR loci of 51 complete genomes of Escherichia and Salmonella. CRISPR are in two pairs of loci in Escherichia, one single pair in Salmonella, each pair showing a similar turnover rate, repeat sequence and putative linkage to a common set of cas genes. Yet, phylogeny shows that CRISPR and associated cas genes have different evolutionary histories, the latter being frequently exchanged or lost. In our set, one CRISPR pair seems specialized in plasmids often matching genes coding for the replication, conjugation and antirestriction machinery. Strikingly, this pair also matches the cognate cas genes in which case these genes are absent. The unexpectedly high conservation of this anti-CRISPR suggests selection to counteract the invasion of mobile elements containing functional CRISPR/cas systems. There are few spacers in most CRISPR, which rarely match genomes of known phages. Furthermore, we found that strains divergent less than 250 thousand years ago show virtually identical CRISPR. The lack of congruence between cas, CRISPR and the species phylogeny and the slow pace of CRISPR change make CRISPR poor epidemiological markers in enterobacteria. All these observations are at odds with the expectedly abundant and dynamic repertoire of spacers in an immune system aiming at protecting bacteria from phages. Since we observe purifying selection for the maintenance of CRISPR these results suggest that alternative evolutionary roles for CRISPR remain to be uncovered.
Microbial minimal generation times range from a few minutes to several weeks. They are evolutionarily determined by variables such as environment stability, nutrient availability, and community diversity. Selection for fast growth adaptively imprints genomes, resulting in gene amplification, adapted chromosomal organization, and biased codon usage. We found that these growth-related traits in 214 species of bacteria and archaea are highly correlated, suggesting they all result from growth optimization. While modeling their association with maximal growth rates in view of synthetic biology applications, we observed that codon usage biases are better correlates of growth rates than any other trait, including rRNA copy number. Systematic deviations to our model reveal two distinct evolutionary processes. First, genome organization shows more evolutionary inertia than growth rates. This results in over-representation of growth-related traits in fast degrading genomes. Second, selection for these traits depends on optimal growth temperature: for similar generation times purifying selection is stronger in psychrophiles, intermediate in mesophiles, and lower in thermophiles. Using this information, we created a predictor of maximal growth rate adapted to small genome fragments. We applied it to three metagenomic environmental samples to show that a transiently rich environment, as the human gut, selects for fast-growers, that a toxic environment, as the acid mine biofilm, selects for low growth rates, whereas a diverse environment, like the soil, shows all ranges of growth rates. We also demonstrate that microbial colonizers of babies gut grow faster than stabilized human adults gut communities. In conclusion, we show that one can predict maximal growth rates from sequence data alone, and we propose that such information can be used to facilitate the manipulation of generation times. Our predictor allows inferring growth rates in the vast majority of uncultivable prokaryotes and paves the way to the understanding of community dynamics from metagenomic data.
Microbial minimal generation times vary from a few minutes to several weeks. The reasons for this disparity have been thought to lie on different life-history strategies: fast-growing microbes grow extremely fast in rich media, but are less capable of dealing with stress and/or poor nutrient conditions. Prokaryotes have evolved a set of genomic traits to grow fast, including biased codon usage and transient or permanent gene multiplication for dosage effects. Here, we studied the relative role of these traits and show they can be used to predict minimal generation times from the genomic data of the vast majority of microbes that cannot be cultivated. We show that this inference can also be made with incomplete genomes and thus be applied to metagenomic data to test hypotheses about the biomass productivity of biotopes and the evolution of microbiota in the human gut after birth. Our results also allow a better understanding of the co-evolution between growth rates and genomic traits and how they can be manipulated in synthetic biology. Growth rates have been a key variable in microbial physiology studies in the last century, and we show how intimately they are linked with genome organization and prokaryotic ecology.
Microbes engage in a remarkable array of cooperative behaviors, secreting shared proteins that are essential for foraging, shelter, microbial warfare, and virulence. These proteins are costly, rendering populations of cooperators vulnerable to exploitation by nonproducing cheaters arising by gene loss or migration. In such conditions, how can cooperation persist?
Our model predicts that differential gene mobility drives intragenomic variation in investment in cooperative traits. More mobile loci generate stronger among-individual genetic correlations at these loci (higher relatedness) and thereby allow the maintenance of more cooperative traits via kin selection. By analyzing 21 Escherichia genomes, we confirm that genes coding for secreted proteins—the secretome—are very frequently lost and gained and are associated with mobile elements. We show that homologs of the secretome are overrepresented among human gut metagenomics samples, consistent with increased relatedness at secretome loci across multiple species. The biosynthetic cost of secreted proteins is shown to be under intense selective pressure, even more than for highly expressed proteins, consistent with a cost of cooperation driving social dilemmas. Finally, we demonstrate that mobile elements are in conflict with their chromosomal hosts over the chimeric ensemble's social strategy, with mobile elements enforcing cooperation on their otherwise selfish hosts via the cotransfer of secretome genes with “mafia strategy” addictive systems (toxin-antitoxin and restriction-modification).
Our analysis matches the predictions of our model suggesting that horizontal transfer promotes cooperation, as transmission increases local genetic relatedness at mobile loci and enforces cooperation on the resident genes. As a consequence, horizontal transfer promoted by agents such as plasmids, phages, or integrons drives microbial cooperation.
Oxygen is not only one of life's essential elements but also a source of protein damage, mutagenesis, and ageing. Many proteome adaptations have been proposed to tackle such stresses and we assessed them using comparative genomics in a phylogenetic context. First, we find that aerobiosis is a trait with important phylogenetic inertia but that oxygen content in proteins is not. Instead, oxygen content is close to the expected values given the nucleotide composition. Accordingly, we find no evidence of oxygen being a scarce resource for protein synthesis even among anaerobes. Second, we searched for counterselection of amino acids more prone to oxidation among aerobes. Only cysteine follows the expected trend, whereas tryptophan follows the inverse one. When analyzing composition in the context of protein structures and residue accessibility, we find that all oxidable residues are avoided at the surface of proteins. Yet, there is no difference between aerobes and anaerobes in this respect, and the effect might be explained by the hydrophobicity of these residues. Third, we revisited the hypothesis that atmospheric enrichment in molecular oxygen led to the development of the communication capabilities of eukaryotes. With a larger data set and adequate controls, we confirm the trend of longer oxygen-rich outer domains in transmembrane proteins of eukaryotes. Yet, we find no significant association between oxygen concentration in the environment and this trait within prokaryotes, suggesting that this difference is clade specific and independent of oxygen availability. We find that genes involved in cellular responses to oxygen are much more frequent among aerobes, and we suggest that they erase most expected differences in terms of proteome composition between organisms facing high and low oxygen concentrations.
oxidative stress; cysteine; protein evolution; hydrophobicity; evolution
Summary: Intragenic duplications of genetic material have important biological roles because of their protein sequence and structural consequences. We developed Swelfe to find internal repeats at three levels. Swelfe quickly identifies statistically significant internal repeats in DNA and amino acid sequences and in 3D structures using dynamic programming. The associated web server also shows the relationships between repeats at each level and facilitates visualization of the results.
Supplementary information: Supplementary data are available at Bioinformatics online.
Mycoplasma genitalium is associated with reproductive tract disease in women and may persist in the lower genital tract for months, potentially increasing the risk of upper tract infection and transmission to uninfected partners. Despite its exceptionally small genome (580 kb), approximately 4% is composed of repeated elements known as MgPar sequences (MgPa repeats) based on their homology to the mgpB gene that encodes the immunodominant MgPa adhesin protein. The presence of these MgPar sequences, as well as mgpB variability between M. genitalium strains, suggests that mgpB and MgPar sequences recombine to produce variant MgPa proteins. To examine the extent and generation of diversity within single strains of the organism, we examined mgpB variation within M. genitalium strain G-37 and observed sequence heterogeneity that could be explained by recombination between the mgpB expression site and putative donor MgPar sequences. Similarly, we analyzed mgpB sequences from cervical specimens from a persistently infected woman (21 months) and identified 17 different mgpB variants within a single infecting M. genitalium strain, confirming that mgpB heterogeneity occurs over the course of a natural infection. These observations support the hypothesis that recombination occurs between the mgpB gene and MgPar sequences and that the resulting antigenically distinct MgPa variants may contribute to immune evasion and persistence of infection.
We present MAGIC, an integrative and accurate method for comparative genome mapping. Our method consists of two phases: preprocessing for identifying “maximal similar segments,” and mapping for clustering and classifying these segments. MAGIC's main novelty lies in its biologically intuitive clustering approach, which aims towards both calculating reorder-free segments and identifying orthologous segments. In the process, MAGIC efficiently handles ambiguities resulting from duplications that occurred before the speciation of the considered organisms from their most recent common ancestor. We demonstrate both MAGIC's robustness and scalability: the former is asserted with respect to its initial input and with respect to its parameters' values. The latter is asserted by applying MAGIC to distantly related organisms and to large genomes. We compare MAGIC to other comparative mapping methods and provide detailed analysis of the differences between them. Our improvements allow a comprehensive study of the diversity of genetic repertoires resulting from large-scale mutations, such as indels and duplications, including explicitly transposable and phagic elements. The strength of our method is demonstrated by detailed statistics computed for each type of these large-scale mutations. MAGIC enabled us to conduct a comprehensive analysis of the different forces shaping prokaryotic genomes from different clades, and to quantify the importance of novel gene content introduced by horizontal gene transfer relative to gene duplication in bacterial genome evolution. We use these results to investigate the breakpoint distribution in several prokaryotic genomes.
Comparative genomics is an important discipline with applications in evolutionary, genetic, and genome rearrangement studies. When comparing genomes, one is usually interested in investigating the relation between the genomic segments to establish their evolutionary origin: are the segments orthologous, and hence inherited from their most recent common ancestor? Are they paralogs, and hence duplicated from an ancestral segment? Did the segments undergo reordering? Were the segments deleted or inserted and—if so—how (insertion sequence, prophage, horizontal gene transfer)?
In this paper, Swidan et al. present MAGIC, a new approach for comparative genome mapping. The main novelty of this approach is the biologically intuitive clustering step, which aims towards both calculating reorder-free segments and identifying orthologous segments. The authors demonstrate MAGIC's robustness, relative to both its initial input and to its parameters' values. MAGIC's scalability is demonstrated by running it on distantly related organisms and on large genomes. In addition, Swidan et al. provide a detailed analysis of the differences between MAGIC and other comparative mapping methods.
Applying MAGIC to several prokaryotic pairs enabled the authors to address the aforementioned questions and to quantitatively study the different evolutionary forces shaping the prokaryotic genome as well as to investigate their breakpoint distribution.
Hemiascomycete yeasts cover an evolutionary span comparable to that of the entire phylum of chordates. Since this group currently contains the largest number of complete genome sequences it presents unique opportunities to understand the evolution of genome organization in eukaryotes. We inferred rates of genome instability on all branches of a phylogenetic tree for 11 species and calculated species-specific rates of genome rearrangements. We characterized all inversion events that occurred within synteny blocks between six representatives of the different lineages. We show that the rates of macro- and microrearrangements of gene order are correlated within individual lineages but are highly variable across different lineages. The most unstable genomes correspond to the pathogenic yeasts Candida albicans and Candida glabrata. Chromosomal maps have been intensively shuffled by numerous interchromosomal rearrangements, even between species that have retained a very high physical fraction of their genomes within small synteny blocks. Despite this intensive reshuffling of gene positions, essential genes, which cluster in low recombination regions in the genome of Saccharomyces cerevisiae, tend to remain syntenic during evolution. This work reveals that the high plasticity of eukaryotic genomes results from rearrangement rates that vary between lineages but also at different evolutionary times of a given lineage.
The yeast Saccharomyces cerevisiae has proved to be a very powerful model organism for deciphering the molecular functioning of our cells. It also is the first eukaryote (the domain of life that includes human) whose genome has been completely sequenced in 1996. There are hundreds of species of yeast covering a tremendous genetic diversity. Almost ten years after the release of the first complete eukaryotic genome sequence, yeasts are still at the forefront of the field of genomics as they represent the monophyletic group of eukaryotes for which the largest number of complete genome sequences has been unveiled. The comparative analysis of their organization now provides an exquisite tool to dissect the mechanistic underpinnings of the process of genome evolution. This study reveals the extraordinary plasticity of the eukaryotic genomes. It also shows that genomes get rearranged at different rates both between the different lineages but also at the different evolutionary times of a given lineage. Finally, in spite of their distant phylogenetic relationship, pathogenic yeasts such as the two main causatives of human candidiasis, Candida albicans and Candida glabrata species, harbor the most unstable genomes of all lineages.
Homologous recombination is a housekeeping process involved in the maintenance of chromosome integrity and generation of genetic variability. Although detailed biochemical studies have described the mechanism of action of its components in model organisms, there is no recent extensive assessment of this knowledge, using comparative genomics and taking advantage of available experimental data on recombination. Using comparative genomics, we assessed the diversity of recombination processes among bacteria, and simulations suggest that we missed very few homologs. The work included the identification of orthologs and the analysis of their evolutionary history and genomic context. Some genes, for proteins such as RecA, the resolvases, and RecR, were found to be nearly ubiquitous, suggesting that the large majority of bacterial genomes are capable of homologous recombination. Yet many genomes show incomplete sets of presynaptic systems, with RecFOR being more frequent than RecBCD/AddAB. There is a significant pattern of co-occurrence between these systems and antirecombinant proteins such as the ones of mismatch repair and SbcB, but no significant association with nonhomologous end joining, which seems rare in bacteria. Surprisingly, a large number of genomes in which homologous recombination has been reported lack many of the enzymes involved in the presynaptic systems. The lack of obvious correlation between the presence of characterized presynaptic genes and experimental data on the frequency of recombination suggests the existence of still-unknown presynaptic mechanisms in bacteria. It also indicates that, at the moment, the assessment of the intrinsic stability or recombination isolation of bacteria in most cases cannot be inferred from the identification of known recombination proteins in the genomes.
Genomes evolve mostly by modifications involving large pieces of genetic material (DNA). Exchanges of chromosome pieces between different organisms as well as intragenomic movements of DNA regions are the result of a process named homologous recombination. The central actor of this process, the RecA protein, is amazingly conserved from bacteria to human. In addition to its role in the generation of genetic variability, homologous recombination is also the guardian of genome integrity, as it acts to repair DNA damage. RecA-catalyzed DNA exchange (synapse) is facilitated by the action of presynaptic enzymes and completed by postsynaptic enzymes (resolvases). In addition, some enzymes counteract RecA. Here, the researchers assess the diversity of recombination proteins among 117 different bacterial species. They find that resolvases are nearly as ubiquitous and as well conserved at the sequence level as RecA. This suggests that the large majority of bacterial genomes are capable of homologous recombination. Presynaptic systems are less ubiquitous, and there is no obvious correlation between their presence and experimental data on the frequency of recombination. However, there is a significant pattern of co-occurrence between these systems and antirecombinant proteins.
Telomerase replicates chromosome ends, a function necessary for maintaining genome integrity. We have identified the gene that encodes the catalytic reverse transcriptase (RT) component of this enzyme in the malaria parasite Plasmodium falciparum (PfTERT) as well as the orthologous genes from two rodent and one simian malaria species. PfTERT is predicted to encode a basic protein that contains the major sequence motifs previously identified in known telomerase RTs (TERTs). At ∼2500 amino acids, PfTERT is three times larger than other characterized TERTs. We observed remarkable sequence diversity between TERT proteins of different Plasmodial species, with conserved domains alternating with hypervariable regions. Immunofluorescence analysis revealed that PfTERT is expressed in asexual blood stage parasites that have begun DNA synthesis. Surprisingly, rather than at telomere clusters, PfTERT typically localizes into a discrete nuclear compartment. We further demonstrate that this compartment is associated with the nucleolus, hereby defined for the first time in P.falciparum.
In Escherichia coli and Bacillus subtilis, essentiality, not expressivity, drives the distribution of genes between the two replicating strands. Although essential genes tend to be coded in the leading replicating strand, the underlying selective constraints and the evolutionary extent of these findings have still not been subject to comparative studies. Here, we extend our previous analysis to the genomes of low G + C firmicutes and γ-proteobacteria, and in a second step to all sequenced bacterial genomes. The inference of essentiality by homology allows us to show that essential genes are much more frequent in the leading strand than other genes, even when compared with non- essential highly expressed genes. Smaller biases were found in the genomes of obligatory intracellular bacteria, for which the assignment of essentiality by homology from fast growing free-living bacteria is most problematic. Cross-comparisons used to assess potential errors in the assignment of essentiality by homology revealed that, in most cases, variations in the assignment criteria have little influence on the overall results. Essential genes tend to be more conserved in the leading strand than average genes, which is consistent with selection for this positioning and may impose a strong constraint on chromosomal rearrangements. These results indicate that essentiality plays a fundamental role in the distribution of genes in most bacterial genomes.