|Home | About | Journals | Submit | Contact Us | Français|
Summary: Plasmids are key vectors of horizontal gene transfer and essential genetic engineering tools. They code for genes involved in many aspects of microbial biology, including detoxication, virulence, ecological interactions, and antibiotic resistance. While many studies have decorticated the mechanisms of mobility in model plasmids, the identification and characterization of plasmid mobility from genome data are unexplored. By reviewing the available data and literature, we established a computational protocol to identify and classify conjugation and mobilization genetic modules in 1,730 plasmids. This allowed the accurate classification of proteobacterial conjugative or mobilizable systems in a combination of four mating pair formation and six relaxase families. The available evidence suggests that half of the plasmids are nonmobilizable and that half of the remaining plasmids are conjugative. Some conjugative systems are much more abundant than others and preferably associated with some clades or plasmid sizes. Most very large plasmids are nonmobilizable, with evidence of ongoing domestication into secondary chromosomes. The evolution of conjugation elements shows ancient divergence between mobility systems, with relaxases and type IV coupling proteins (T4CPs) often following separate paths from type IV secretion systems. Phylogenetic patterns of mobility proteins are consistent with the phylogeny of the host prokaryotes, suggesting that plasmid mobility is in general circumscribed within large clades. Our survey suggests the existence of unsuspected new relaxases in archaea and new conjugation systems in cyanobacteria and actinobacteria. Few genes, e.g., T4CPs, relaxases, and VirB4, are at the core of plasmid conjugation, and together with accessory genes, they have evolved into specific systems adapted to specific physiological and ecological contexts.
Plasmids are, along with phages and integrative conjugative elements, the key vectors of horizontal gene transfer and essential tools in genetic engineering. They often code for genes involved in detoxication, virulence, ecological interactions, and antibiotic resistance. Hence, an understanding of plasmid mobility is essential to an understanding of the evolution of these important bacterial traits, often involved in human health or well-being. However, while the last decades have seen a remarkable enlightenment of plasmid mobility mechanisms in model systems (reviewed in references 52 and 82) (5, 36, 50, 54, 55, 123), the identification and characterization of plasmid mobility on a global scale remain unexplored. There is a need to devise such a tool for several reasons. First, if we want to give a systems biology answer to the problems of plasmid dissemination, e.g., the dissemination of multiple-drug-resistant bacteria in hospital or farm environments or the evolution of catabolic plasmids in contaminated environments, we must understand plasmid mobility. A global understanding of plasmid mobility can help us control the dissemination of these fastidious genes and thus help curve the recent increase in human mortality produced by infectious diseases in advanced countries. By the same token, it can help the efficiency of bioremediation efforts or the biological fight against other types of damaging bacteria. Second, microbial ecology is undergoing a revolution caused by the availability of metagenomic approaches that allow the sampling of the genetic diversity of microbial communities. Knowing how information flows in such communities is essential to interpreting how communities respond to changes. Third, the ability of plasmids to move between different hosts is of technical and industrial importance. The identification of the diversity of mobility mechanisms might allow the development of new, more-efficient, and better-adapted vectors for genetic engineering and the release of genetically modified microorganisms for bioremediation and pest control, etc. Most prokaryotes remain genetically intractable, and an understanding of the natural mechanisms of gene mobility is bound to allow the creation of new tools. Finally, conjugation is a secretion system that must adapt to cell physiology. An understanding of its diversity might enlighten how existing variants of this secretion mechanism are adapted to peculiar cellular envelopes or environments.
In this review we make use of the abundant available genomic data to extract a few general concepts that, we hope, will help our understanding of plasmid mobility. To carry out such an analysis, we first established a computational protocol to identify conjugation and mobilization genetic modules in 1,730 plasmids and used these data to establish a plasmid classification system. This allowed an accurate classification of proteobacterial conjugative or mobilizable systems in a combination of four mating pair formation (MPF) and six mobilization families (the term family is used here as it refers to protein families, that is, a set of proteins that are related in sequence and share a biological function). Few genes, e.g., those coding for conjugative coupling proteins, relaxases, and VirB4 proteins, are at the core of plasmid conjugation. Together with several auxiliary genes, they have evolved into systems with specific adaptations to the cell physiology and to ecological strategies. Second, we used this inventory of plasmid mobility and family-specific genes to characterize systems in other prokaryotic clades. We found that, globally, one-fourth of the plasmids are conjugative, and as many are mobilizable. Half of all plasmids are classed as being nonmobilizable. Third, evolutionary analysis allowed us to trace the evolution of conjugation elements and showed an ancient divergence between mobility systems, with relaxases and type IV coupling proteins (T4CPs) often following separate paths from type IV secretion systems (T4SSs). Phylogenetic patterns of mobility proteins are consistent with the phylogeny of the host prokaryotes, suggesting that plasmid mobility is in general circumscribed within large clades. Surprisingly, most very large plasmids are nonmobilizable. We have made no attempt to change the naming of GenBank replicons. Therefore, we used all replicons named “plasmid” and none named “chromosome.” There is no consensual distinction between plasmids and chromosomes. We therefore review the claims that many very large plasmids are becoming secondary chromosomes. Indeed, we find that these plasmids tend to be nonmobilizable and contain appreciable amounts of essential genes. The last section of this article discusses outstanding issues in comparative genomics of plasmids in relation to the understating of their mobility.
Genetic information flows at high rates among prokaryotes, leading bacteria and archaea to have a small core genome that is conserved within a species and a large pangenome that is highly variable (137). Widespread horizontal gene transfer has profound evolutionary implications (35, 42, 90, 109). First, it allows homologous recombination between closely related strains or species in a process resembling eukaryotic sex (55, 142). Second, it leads to the integration of new genetic information, creating large functional leaps that allow fast adaptation to new environments or to stressful conditions (62). Third, gene mobility has been proposed to drive microbial cooperative processes (105). From the three classical mechanisms of horizontal transfer, transformation, transduction, and conjugation, the latter is thought to be quantitatively more important (66). This is because phages have restricted host ranges and small cargo regions, whereas some plasmids can conjugate between remotely related organisms, including conjugation from bacteria to eukaryotes (10, 69). Plasmids, along with integrative conjugative elements (ICEs), are the major players in conjugation processes. Many of the genes allowing bacteria to metabolize toxic organic compounds such as antibiotics are carried by plasmids (9, 106, 138). Plasmids also often code for information essential for the interaction of bacteria with multicellular eukaryotes, including nitrogen fixation by rhizobia (95), plant cell manipulation by Agrobacterium species (59), and virulence by Shigella species (21), among many other human pathogens. Plasmids are also molecular biology workhorses whose mobility opened up the possibility of genetic manipulation (28).
Mobility is an essential part of plasmid fitness. It is also a key element to an understanding of the epidemiology of plasmid-carried traits such as virulence and antibiotic resistance. As such, two functions are deemed essential for plasmid survival: DNA replication and horizontal spread. The latter may occur by conjugation if a plasmid carries two sets of genes. The set of mobility (MOB) genes is essential and allows conjugative DNA processing (the MOB genes were also called Dtr genes, for DNA-transfer replication). Besides, a membrane-associated mating pair formation (MPF) complex, which is a form of a type 4 secretion system (T4SS), provides the mating channel. A plasmid that codes for its own set of MPF genes is called self-transmissible or conjugative. If it uses an MPF of another genetic element present in the cell, it is called mobilizable. Some plasmids are called nonmobilizable because they are neither conjugative nor mobilizable. They spread by natural transformation or by transduction. Hence, plasmids can be classified into three categories according to mobility: conjugative, mobilizable, and nonmobilizable.
The only protein component of the conjugative machinery that is common to all transmissible, i.e., conjugative or mobilizable, plasmids is the relaxase (Fig. (Fig.1).1). The relaxase is a key protein in conjugation, since it recognizes the origin of transfer (oriT), a short DNA sequence which is the only sequence required in cis for a plasmid to be conjugally transmissible. The relaxase catalyzes the initial and final stages in conjugation, that is, the initial cleavage of oriT in the donor, to ultimately produce the DNA strand that will be transferred, and the final ligation of the transported DNA in the recipient cell that reconstitutes the conjugated plasmid (see reference 36 for a recent review). Conjugative relaxases are structurally related to rolling-circle replication initiator proteins, and they catalyze similar biochemical reactions. However, they are easily distinguished because relaxase amino acid sequences are linear permutations of the replication initiator sequences, as discussed previously (57, 63). Mobilizable plasmids carry only the relaxosomal components oriT, a relaxase gene, and one or more nicking auxiliary proteins. On the other hand, conjugative plasmids carry all the machinery needed for self-transfer. This includes, besides the above-mentioned relaxosome components, the type IV coupling protein (T4CP) and the components of the mating channel that assemble a T4SS. The T4CP is involved in the connection between the relaxosome and the transport channel (11, 41, 94, 101). It is also thought to energize the process of DNA transport (135, 136). The conjugative mating channel is basically a protein secretion channel, which transports the relaxase protein bound to the DNA to be transferred (43, 58). According to the nomenclature of protein secretion mechanisms, it is a T4SS (54). The phylogenetic relationship among relaxases has been traced, leading to a classification of conjugative systems into six MOB families: MOBF, MOBH, MOBQ, MOBC, MOBP, and MOBV (52, 57). This classification extends to the entire mobility region, which includes the nicking auxiliary proteins (145) and the T4CPs (57). There have been reports on the classification and phylogenetic relationships among NTPases (53, 112), yet little is known about the classification of conjugative T4SSs in prokaryotes, except for some specific clades such as Rickettsia (147) or for VirB4 of T4SSs involved in pathogenicity (53). While the manuscript was being finished, a review giving an extensive description of known T4SSs in prokaryotes, including many conjugative systems, was published (5). That analysis showed some common themes among T4SSs as well as an important diversity, and here we aim to quantify them.
No estimation exists on the proportion of plasmids that are conjugative, mobilizable, or nontransmissible. Even if there are over 1,700 complete plasmid genomes in GenBank, there is experimental evidence of transmissibility for just a small fraction of them (57). Since several proteins involved in conjugation have been extensively studied, we figured that it should be possible to draw an educated hypothesis of plasmid mobility just by analyzing plasmid genomes. Thus, to carry out this review, we identified and classified key proteins in the two major modules, i.e., sets of functionally related genetic elements involved in plasmid mobility: relaxases and T4SSs. By identifying relaxases, we obtained a list of putatively transmissible and nontransmissible plasmids. Along the following discussion, we will assume that plasmids containing relaxases are either mobilizable or conjugative. We then defined the major characteristics of T4SSs and identified them on plasmids, thereby separating conjugative from mobilizable plasmids. We did much of this survey in two steps. Initially, an intensive expert study allowed the identification and classification of plasmids. Using these results as guidelines, we created a set of methodologies to automate the analysis. Our results (see Table S1 in the supplemental material for the complete list of plasmids analyzed) suggest that the majority of plasmids are not transmissible and that many conjugative systems remain yet to be found in most bacterial clades.
By early 2009 there were 1,730 complete plasmid sequences in GenBank. According to their annotation files, these plasmids code for a total of 106,288 proteins. Some plasmids are very small, the smallest being Thermotoga petrophila RKU1 plasmid pRKU1, with 846 bp and carrying only the rep gene. Nine small plasmids carry no annotated gene. On the other hand, some other plasmids are very large, for example, 22 plasmids carrying more than 500 genes. The largest replicon marked as a plasmid, Ralstonia solanacearum GMI1000 plasmid pGMI1000MP, carrying 1,674 genes, is larger than 20% of the published bacterial chromosomes. It follows that the coding potential of plasmids is both large and diverse. The set of plasmids that we analyzed comes from two different sources, plasmid-sequencing projects (~62%), which are motivated by scientific interest in the plasmid itself, and microbial genome projects (~38%), which typically show a marginal interest in plasmids. The latter sample is thus less biased by the interests of the plasmid biology community. Both samples overrepresent culturable prokaryotes and often include strains that were extensively cultured before sequencing. It is unclear if this fact changes the structure of plasmids significantly. An analysis of Escherichia coli K-12 strains MG1655 and W3110, which diverged in the laboratory over 50 years ago, showed very few differences between these strains, suggesting that genomes are relatively stable in the laboratory in the absence of extensive transposition (68). One likely effect of extensive subculturing of bacteria is plasmid loss. This leads to an undersampling of the natural pool of plasmids.
Most plasmids have a description of the original source, even if this is sometimes not retrievable in a simple manner. According to this information, a few bacterial phyla contain most of the sequenced bacterial genomes. As shown in Fig. Fig.2,2, almost half of the plasmids come from proteobacteria (46%) and especially from gammaproteobacteria (29%). Firmicutes, spirochetes, and actinobacteria make most of the remaining data set. Overall, these four clades correspond to nearly 80% of all sequenced plasmids. Unsurprisingly, qualitatively similar, albeit fewer, biases are found among the completely sequenced prokaryotic chromosomes (Fig. (Fig.2).2). Firmicutes and gammaproteobacteria are extensively sampled because they include the plasmids that are best studied and also the ones most associated with the dissemination of antibiotic resistance among human-pathogenic bacteria. Noticeably, clades containing many large plasmids, such as alphaproteobacteria and cyanobacteria, have been sampled mostly thanks to microbial genome projects.
Larger plasmids are hosted by prokaryotes with larger chromosomes (Spearman's ρ correlation between chromosome and plasmid lengths of 0.34; P < 0.001). Within genome sequences, chromosomes have a higher coding density than plasmids (86% versus 75%; P < 0.001 by a Wilcoxon test). Surprisingly, larger plasmids have higher coding densities than do smaller ones (Spearman's ρ between gene density and plasmid size of 0.45; P < 0.001) (see Fig. S1 in the supplemental material). No such trend is found for prokaryotic chromosomes (ρ = −0.03; P = 0.37). To test if these results were due to poor annotations, we made automatic reannotations using Glimmer v3.02 (37). In one assay we used the plasmid sequence for the training set, whereas in the other we used the host chromosome. In both cases the extents of the correlation between plasmid size and gene density were similar or increased (data not shown). Hence, while larger plasmids have coding densities approaching those of chromosomes, small plasmids are less coding dense. This surprising result might be due to the inability of current annotation protocols to identify small protein-encoding genes and stable RNA genes. If so, small plasmids might have a higher proportion of these small genes. Alternatively, one might imagine that, below a certain size, the gene coding density decreases because some incompressible part of the plasmid is occupied by sequences governing the DNA structural scaffold or signals regulating replication or transmission. Since the automatic reannotation efforts did not significantly increase the number of coding sequences, we used the gene positions of the GenBank files in this assessment of plasmid mobility. The key proteins discussed in this work, relaxase, T4CP, and VirB4, are large, and therefore, their genes are unlikely to have passed unnoticed by annotation software. We have not attempted at this stage to extensively identify pseudogenes associated with plasmid mobility. However, we did make extensive analyses of pseudogenization events using tBLASTn for intergenic regions of plasmids carrying all genes expected from a conjugative plasmid except a key one, e.g., virD4 or virB4.
An expert classification of MOB families was carried out previously for a large set of plasmids (52, 57). In this review, we followed this classification methodology as a first step and extended it manually to an analysis of 1,000 plasmids, which were classified into six families and 31 subfamilies. We then used these results to automatically cluster relaxases into six families (MOBP, MOBF, MOBV, MOBQ, MOBH, and MOBC), as shown in Fig. Fig.33 and explained in the legend of Fig. Fig.3.3. All classes could be distinguished by using the automatic clustering procedure, with eight clusters accounting for 98.5% of all relaxases (see Fig. S2 in the supplemental material). However, some relaxases were in small clusters (e.g., two to three items). Changing the clustering parameters (i.e., the inflation parameter in the Markov cluster algorithm [MCL]) did show a lower-level structure, suggesting some heterogeneity in the classes and justifying their further division into subclasses (as attempted in reference 57). In this review we have not extended our automatic classification procedure to subclasses because many classes still contain few elements, precluding the establishment of reliable automatic classification. Among 1,730 plasmids, we found a total of 741 relaxases in 673 plasmids (Fig. (Fig.4).4). All families were found in the database, with MOBH (28 plasmids) and MOBC (24 plasmids) being the least represented, while MOBP was the most represented (273 plasmids). As shown previously (57), within each family there are subfamilies corresponding to phylogenetic clades that contain more members, e.g., MOBF12, which harbors relaxases of IncF plasmids; MOBP11, which groups relaxases of IncP plasmids; MOBP5, which contains mostly MOBHEN (ColE1-like) relaxases; MOBV1, which groups relaxases of plasmids belonging to several Inc groups from firmicutes; and MOBC2, producing aggregation substances.
MOBP and MOBQ are uniformly distributed in plasmids of all size ranges, but the remaining families are associated with plasmids of different sizes (P < 0.001 by a Wilcoxon test on similar sizes of plasmid families) (Fig. (Fig.4).4). MOBF and MOBH are typical, or exclusive, of large plasmids. MOBC is present in mid-sized plasmids. MOBV is almost absent from large plasmids and present in over 50% of all plasmids less than 5 kb long. This finding suggests that particular MOB modules (and, thus, specific DNA-processing mechanisms) have adapted to plasmids of different sizes, possibly as a result of the interaction between conjugation and replication systems. For example, MOBV plasmids of firmicutes, which replicate by a rolling circle, are known to become unstable with increasing size (19, 20, 71, 86).
We were surprised by the large fraction of large plasmids lacking relaxases. In an attempt to find out if there might be relaxases that were not detected by our automated analysis, all plasmids within the 30- to 60-kb size range that did not give a hit against relaxases (70 out of a total of 140 plasmids in January 2008) were manually inspected for the presence of relaxase-like proteins. This can be done because most relaxase families belong to a single protein superfamily, the so-called 3H family, because of a signature of a cluster of three histidines in the catalytic site (57). After a thorough examination, only one plasmid that contained a possible new relaxase was found. Bacteroides thetaiotaomicron VPI-5482 plasmid p5482 (33,038 bp) contains some genes coding for T4SS components, a putative T4CP, MobC (25% identity to that of plasmid pBFY46), and also a putative relaxase, MobB, which show 27% and 28% identities to similar proteins in plasmids pBIF10 and pBUN24, respectively. These three putative relaxases were not retrieved by PSI-BLAST starting from any of the six known relaxase families. Since the three motifs typical of the 3H class of relaxases could be identified, they might represent the first members of a seventh relaxase family. There is no experimental information concerning the transfer abilities of these plasmids, and currently, the number of known elements in the family is small. Hence, we left it out of the graphical representations. If the same approach that we followed here with plasmids in the size range of 30 to 60 kb was applied to larger plasmids not showing a relaxase gene (similar to those used as queries up to now), additional relaxase families might be uncovered. It should be emphasized, nevertheless, that for each of the six relaxase classes analyzed in this work, there is at least one prototype plasmid for which detailed molecular studies warrant the inclusion of the relevant genes within the realm of conjugative relaxases. Since relaxases show ancestral homology to other DNA-processing proteins (RC replication proteins, IS91-like transposases, and so on ), extreme care has to be exerted before new gene families are proposed as relaxases. As will become evident at the end of this work, we are convinced that new relaxase families remain to be discovered, particularly in phyla other than the proteobacteria. However, before claiming such a discovery, it is imperative that some experimental data within the proposed family rigorously identify the protein as a relaxase.
After automating the classification of relaxases, we set up to identify putative T4SSs associated with the conjugative pilus. For this, we classified known T4SSs according to protein homology among a dozen plasmids from different MOB and Inc types. This allowed the clustering of all known proteobacterial T4SSs into four groups (Fig. (Fig.3),3), as suggested previously (72). For each group we used the nomenclature of one model T4SS representative of the group, notably the vir system for MPFT (25, 123), F for MPFF (82), R64 for MPFI (77), and ICEHIN1056 for Haemophilus influenzae genome island-like MPFG (73). For each group we selected a dozen genes for which the function was known and/or for which inactivation was shown experimentally to strongly reduce conjugation rates (see the legend of Fig. Fig.3).3). We then classified genes into three categories: nearly ubiquitous, group specific, and others. Finally, we built phylogenetic trees to find potential false assignations.
In accordance with data reported in the literature (5), we found that the only gene present in practically all T4SSs codes for the VirB4 family (TraC in the F plasmid and TraU in R64). However, even though TraU is generally believed to be a homologue of VirB4, it could not be retrieved in our analysis due to the lack of significant sequence similarity. Instead, the search for distant homologues of VirB4 with PSI-BLAST resulted in the retrieval of other homologous but functionally different proteins such as T4CPs. The low level of similarity between TraU and VirB4, and the highest level of similarity of the latter with VirD4, is puzzling. One might suppose that TraU evolved extremely fast, but we found no very long branches in the TraU tree, and one would still expect TraU to diverge among proteobacterial VirB4 proteins, since its distribution seems to be confined to this clade. It is therefore more likely that either TraU arose before the split of VirB4 and the T4CP or it arose independently from another ATPase. Current data do not allow one to unambiguously choose between these hypotheses.
We disentangled VirB4 and T4CP proteins with a clustering procedure and fine-tuned the clusters using phylogenetic analyses (see the legend of Fig. Fig.3).3). From the 255 T4CPs having been found by expert annotation in 1,000 plasmids, we found 100% of them in just two clusters. We also found all proteins annotated as VirB4 in a few separate clusters. We retrieved a total of 327 VirB4-encoding genes (including homologues of virB4 and traU). Importantly, these genes were always inside the region coding for the T4SS, whereas the gene coding for the T4CP often matched other regions of the plasmid.
Other nearly ubiquitous protein families, most notably VirB11, were not reliable markers of the presence of a T4SS. They were also poorly specific of T4SS groups, and thus, we called them “nonspecific” proteins. For example, VirB11 is homologous to ATPases of type II secretion systems (24), while it is absent from most T4SSs of firmicutes and archaea and from model plasmids such as F (5). These “nonspecific” proteins were used in the expert analysis to confirm the presence or absence of T4SSs but were ignored in the automatic analysis, without consequences for its accuracy. The remaining genes were separated into families according to their discriminative powers. Some families matched functionally unrelated proteins, were too rare, or did not converge in PSI-BLAST searches (Fig. (Fig.3).3). These genes were discarded. The remaining genes were used as T4SS group-specific markers, as they were present in the vast majority of members of each T4SS group, while they were absent from other groups of T4SSs. They are likely to have important roles in conjugation. Some of the genes that are poor markers may also have important roles in some specific systems. However, when we analyzed genes whose inactivation had little effect on the frequency of conjugation, they turned out to be bad markers of specific T4SSs (e.g., p1056.30 for chromosomally encoded MPFG  or TraO for MPFI T4SSs ). These sporadic genes may provide accessory functions, evolve quickly, or be prone to analogous gene replacement. For example, many known T4SSs contain a lipoprotein, but this protein shows wide variations in size and sequence, and we found it to be irretrievable by sequence similarity searches, often even among closely related plasmids.
In summary, six families of MOB modules and four families of MPF modules represent most of the diversity of known conjugative transfer systems. Extensive work in microbial genetics and ecology needs to be carried out to explain why some of these modules associate preferentially among themselves (and with specific replication modules) and how they have adapted to particular ecosystems. Naturally, the ensuing question is how complete this set of known systems is and how it is represented among prokaryotes.
We found 253 regions with several hits to T4SS type-specific genes in prokaryotic plasmids. There were 66 plasmids where VirB4 did not colocalize with a prototypical proteobacterial T4SS, of which 40 were from firmicutes and 13 were from archaea (see Table S1 in the supplemental material). Four proteobacterial plasmids had VirB4 but no clearly identifiable T4SS: the Magnetospirillum gryphiswaldense MSR-1 plasmid, Gluconacetobacter diazotrophicus PAl 5 plasmid pGDIPal5I, the Pseudomonas syringae pv. phaseolicola 1448A large plasmid, and Achromobacter denitrificans plasmid pEST4011. Inspection of these plasmids showed evidence of a degraded T4SS. One system (Vibrio species plasmid p23023) coded for homologues of several T4SS proteins, e.g., VirD4, VirB10, VirB1, and VirB11, but not of genes among those that we used to specify the MPF type (Fig. (Fig.3),3), i.e., no clearly identifiable type-specific genes. For all cases where we found a putative T4SS but no VirB4, a detailed inspection revealed either a virB4 pseudogene or an incomplete T4SS. In short, among the 798 plasmids of proteobacteria, we found 236 plasmids coding for VirB4 or TraU and 236 plasmids coding for type-specific T4SS proteins. Two hundred thirty-two of these, i.e., an outstanding 98%, coincided exactly. Since the search for T4SS classes and VirB4 was carried out independently and with different methods, the large overlap of the results strongly suggests that we can classify practically all VirB4-containing conjugative T4SSs in proteobacteria. Concomitantly, these four classes of T4SSs represent practically all the diversity of conjugative plasmids of sequenced proteobacteria.
Among classified conjugative plasmids of proteobacteria (Fig. (Fig.5),5), there was a clear overrepresentation of MPFT plasmids (142 plasmids [58%]) over MPFF plasmids (77 plasmids [31%]). MPFI plasmids were even rarer (25 plasmids [10%]). The same analysis restricted to plasmids sequenced with the microbial chromosomes showed similar frequencies (see Fig. S3 in the supplemental material). Thus, the biased distribution of these plasmids was not caused by the specific biases of plasmid models and might be representative of their true frequency in nature. We found two occurrences of MPFG in plasmids: Marinobacter aquaeolei VT8 plasmid pMAQU02 and Haemophilus influenzae plasmid ICEhin1056. Thus, contrary to previous suggestions (73), this type of T4SS is not associated exclusively with ICEs, although it is indeed rare among sequenced plasmids. The distribution of the three most abundant T4SSs showed a significant association with genome size (P < 0.001 by a Wilcoxon test) (Fig. (Fig.5).5). For instance, MPFT was found more frequently in very large and in small plasmids, whereas MPFI was absent from small plasmids and MPFF was more frequent in intermediate-sized plasmids. The overrepresentation of MPFT in small plasmids may result from its lower complexity. Thus, model plasmids with MPFT systems, such as RP4 or R388, are composed of about 12 genes and allow mating only on solid surfaces. On the other hand, model plasmids with MPFF and MPFI systems, such as F and R64, respectively, are composed of around 30 genes and also allow mating at high frequencies in liquid culture (3, 15). It is not surprising that an additional set of genes is required for a functional liquid mating system. Results regarding the largest plasmids are harder to interpret given the small sample size (eight plasmids).
We could then analyze the joint distribution of MOB and MPF modules in the most populated set in our database, that of proteobacteria. Only 28% of all proteobacterial plasmids contained both a relaxase and a T4SS and were therefore classed as conjugative (MOB plus MPF). These plasmids correspond to about half (54%) of the transmissible plasmids. The high frequency of cooccurrence of VirB4 with both T4CP- and T4SS-specific genes in proteobacteria supports the idea that VirB4 can be used as a marker of T4SSs.
Most plasmids contain just one MPF module and one associated MOB module, with surprisingly few exceptions. We found only one case of a cooccurrence of T4SS-specific genes of different MPF types in the same plasmid. In the transferrable plasmid pP99-018 of Photobacterium damselae subsp. piscicida (74), we found two MPFT genes (virB8 and virB9) and seven MPFF genes (traLEKVWUN). A more detailed inspection of the DNA sequence of this plasmid showed a complete MPFF module and an incomplete MPFT, which is part of a recently acquired transposon-like element. Besides, these MPFT genes are absent in a closely related plasmid, pP99-018, from the same host. Therefore, our survey excludes the existence of hybrid T4SSs. While we found other plasmids carrying homologues of different systems, these invariably showed evidence of pseudogenization. Only 50 plasmids harbored two relaxases, and nine plasmids contained three relaxases. We found 10 plasmids that contained two T4SSs, in all cases two MPFT, usually associated with one MOBQ and one MOBP relaxase and two T4CPs (see Table S2 in the supplemental material). It would be tempting to suggest that one T4SS is responsible for conjugation and that the other is responsible for the secretion of unrelated proteins. However, the finding of multiple relaxases in these plasmids argues against this view. Interestingly, 9 of the 10 plasmids with two MPFs were in pathogens or mutualists of plants. Several of these include the Agrobacterium tumefaciens Ti plasmid, which indeed codes for two T4SSs, one dedicated to Ti plasmid conjugation and the other dedicated to the transfer of T-DNA to plants (59). We are therefore inclined to think that all these plasmids carry two conjugation systems. These may either arise from plasmid cointegration or be used for different purposes/target cells. The existence of multiple relaxases and T4SSs may result from the cointegration of plasmids. For example, plasmids pDOJH10L and pK214 contain more than one relaxase and also contain more than one replication initiation protein (85, 111). Since there is a clear overrepresentation of the same MOB family and T4SS type in the same plasmids (63% and 100% of the total), it is possible that cointegration is less damaging for long-term plasmid stability if the relaxases and T4SSs are from the same type. On the other hand, this might simply reflect the preference of a given T4SS for that host or environment or a higher frequency of cointegration among plasmids of the same type. Indeed, very similar plasmids are expected to recombine at higher rates since they carry very similar genes, such as those coding for MOB and MPF, due to recent common ancestry.
The plasmid database shows a bimodal distribution of plasmid sizes, with a clear local minimum at ~20 kb (Fig. (Fig.6).6). It has often been suggested that mobilizable plasmids tend to be less than 30 kb, while conjugative plasmids are usually larger. Indeed, plasmids coding for relaxases tend to be larger than the others (median sizes of 35 kb and 11 kb; P < 0.001 by a Wilcoxon test), and conjugative plasmids are even larger (median, 181 kb; P < 0.001). The classification of plasmids in terms of mobility shows that conjugative plasmids distribute around an average of 100 kb, while mobilizable plasmids have a mean peak at 5 kb and a broad, flat, secondary peak at around 150 kb (Fig. (Fig.6).6). Thus, speaking very broadly, this dichotomy has true value. Besides, plasmids smaller than 30 kb that code for a relaxase rarely code for a T4CP (6%), whereas larger plasmids generally do (86%).
Nevertheless, the distribution of nonmobilizable plasmids is also multimodal (Fig. (Fig.6)6) and suggests that things may not be that simple. Nonmobilizable plasmids show three distinctive peaks for small (around 5 kb), average (around 35 kb), and large (more than 300 kb) plasmids. The percentages of nonmobilizable plasmids larger than 20 kb are 42% in proteobacteria and 41% in the other clades, showing that the identification of large nonmobilizable plasmids is not an artifact caused by poorly known plasmid mobility in some clades. Interestingly, the two largest plasmids in our data set are classified as being nonmobilizable: Burkholderia phymatum STM815 plasmid pBPHY01 and Ralstonia solanacearum GMI1000 plasmid pGMI1000MP. Both are more than 1.9 Mb in size. Thus, a lack of relaxases, while more frequent in the very small plasmids, is also found among average- and large-sized plasmids. This result was surprising and perhaps breaks some misconceptions in plasmid biology. It will be discussed more extensively below (see How Mobile Are Nonmobilizable Plasmids?).
From 673 plasmids harboring relaxases, conjugative transfer was experimentally verified for only 123 of them (57), comprising all size ranges considered here except those larger than 1 Mb. In proteobacteria, the smallest plasmid that we identified as being putatively conjugative is 21.8 kb long (pCRY from Yersinia pestis) and is highly homologous to plasmids shown to be conjugative (128). In other clades, the smallest VirB4-containing plasmid was pSci1, with only 13 kb, found in Spiroplasma citri GII3. This strain harbors several plasmids for which transmission has been demonstrated, but it is unclear if the plasmid codes for a self-sufficient T4SS (17). Mollicutes contain small conjugative elements measuring much less than 30 kb (23, 98, 107). Since they lack most T4SS genes, one could imagine that they are “degraded” nonconjugative derivatives of ancestral plasmids. However, most mollicutes are not undergoing rapid genome degradation, have few pseudogenes, and have highly coding-dense genomes (125). Since there is evidence that horizontal gene transfer is frequent in the clade (126), it is more parsimonious to think that they code for a simpler conjugative machinery. The largest plasmid encoding a T4SS is pSymA from Sinorhizobium meliloti 1021, which has 1.35 Mb. The largest mobilizable plasmid is pSymB, also from Sinorhizobium meliloti 1021, with 1.68 Mb. Thus, most of the very large plasmids are not conjugative (Fig. (Fig.66 and see Table S1 in the supplemental material).
MOB families were differentially represented among mobilizable and conjugative plasmids. MOBV was found almost exclusively among mobilizable plasmids, while MOBF and MOBH were present almost exclusively in conjugative plasmids. Although we know of no molecular reason that can explain this difference, we speculate that large plasmids might need additional regulation. Accordingly, MOBF and MOBH relaxosomes are structurally more complex and contain comparably large oriTs (>300 bp) (2, 51, 81, 92, 118). On the other hand, MOBQ1 and certain MOBV relaxosomes are simple, contain short oriTs (<100 bp), and are promiscuous in their use of different helper T4SSs as mating channels (49, 52, 100). MOBC1 plasmids are a case apart, since they are mobilizable but code for their own T4CP (22). If we assume that conjugation requires two recognition steps, relaxosome-T4CP and T4CP-T4SS, the genetic composition of MOBC1 plasmids explains why they can use the amplest set of helper T4SSs for mobilization (22), although this fact has to be confirmed in natural settings. The association between classes of relaxases and T4SSs in proteobacteria is far from random (Fig. (Fig.7).7). While MOBP is the largest relaxase family, it is absent from MPFF plasmids, which show a clear overrepresentation of MOBF relaxases and, to a lesser extent, MOBH. MOBC relaxases are particularly abundant in MPF-unclassified plasmids, i.e., in plasmids from clades other than proteobacteria. Once again, this suggests large differences between the archetypical plasmids of proteobacteria and the ones of other clades.
As expected by the accumulated knowledge of plasmid conjugation, in plasmids there are more relaxases than T4CPs, and there are more of these than T4SSs (Fig. (Fig.8).8). Plasmid transfer has been demonstrated only for four plasmids that do not contain (known) relaxases: the circular plasmid SCP2 of Streptomyces coelicolor and the linear plasmid SLP2 of Streptomyces lividans, which transfer between mycelia through an FtsK-like protein (18, 84); plasmid pC194 of Staphylococcus aureus, which is transferred by the conjugative transposon Tn916 (104); and pKPN2 of Klebsiella pneumoniae, which contains two oriTs related to plasmids F and R64 and was mobilized by plasmid F (150). Thus, in all four cases, their conjugative behavior can be explained by existing knowledge.
Surprisingly, we found 34 plasmids lacking a T4CP while having VirB4. Most of these were from firmicutes (13 cases), archaea (13 cases), or proteobacteria (7 cases). This might indicate the presence of T4SSs specialized in the transport of proteins rather than in conjugation. Among proteobacteria, all seven plasmids had a recognizable T4SS with conserved VirB4 or TraU homologues, and three of them contained relaxases. The association of T4CP-lacking T4SSs with relaxases suggests that their T4CPs have been recently lost and that these T4SSs were involved in conjugation. Surprisingly, 22% of the archaeal plasmids have VirB4 homologues, but only one has T4CP and relaxase (Haloarcula marismortui plasmid pNG500). There are a few plasmids from archaea (e.g., pING and pNOB) that are known to be conjugative (60), and indeed, we found virB4 homologues in their genomes. However, they do not contain genes coding for a recognizable relaxase or T4CP. Considering that T4CP is the most conserved protein of those analyzed in this work and that it can be found in all other clades containing relaxases, this result is puzzling. Given the very large family of NTPases to which T4CPs belong (112), it may be that a more distant homologue than the ones considered in this work performs the T4CP role in archaea (60, 114, 124, 130). These results suggest that T4CPs and relaxases are (radically) different in archaea but still interact with VirB4-containing T4SSs.
Some 90 plasmids encode a T4CP but not VirB4. Of these, some belong to MOBC1, the only group of mobilizable plasmids shown to code for their own T4CP (52, 57). Most of the remaining ones correspond to plasmids with uncharacterized T4SSs, such as the cyanobacterial plasmids (see below). In fact, contrary to archaea, in cyanobacteria and actinobacteria, we found many relaxases and T4CPs but few T4SSs. The above-mentioned plasmids of Streptomyces contain FtsK-like proteins involved in double-stranded DNA (dsDNA) mobility in mycelia (61). The existence of this alternative mobilization mechanism in Streptomyces could be invoked to explain the low number of conjugative plasmids found in actinobacteria. However, our data set contains 147 plasmids of actinobacteria, of which only 25 are from Streptomyces. Since many of the other actinobacteria do not produce mycelia, it is unclear why so many plasmids code for relaxases (39%) or T4CPs (15%) but no known T4SS or VirB4. The results concerning cyanobacteria are even more surprising: we found 31 mobilizable plasmids, many of which are larger than 100 kb, but no T4SS. Naturally, if no plasmid is conjugative, the mobilizable plasmids are effectively not mobile. This strongly suggests that cyanobacteria use an as-yet-uncharacterized system to conjugate. This system lacks a clear homologue of virB4 or any T4SS-specific genes of proteobacteria. To test if more-sensitive sequence searches could detect distant homologues of virB4, we built profiles for this protein using hmmer (43a). This allowed the retrieval of very distant homologues of VirB4 in both cyanobacteria and actinobacteria. Experimental work to test the validity of these results is ongoing. If they were correct, then the conjugation systems of cyanobacteria and actinobacteria are very divergent from but phylogenetically related to the VirB4-associated T4SSs of proteobacteria, firmicutes, and archaea.
We investigated how variants of the key elements of transmissibility, relaxases and T4CPs, are distributed among clades (Fig. (Fig.9).9). Firmicutes lack MOBF and MOBH but contain an overrepresentation of MOBV, which is rare in proteobacteria. Actinobacteria seem to be dominated by the MOBF family and in particular by MOBF2 relaxases; they lack MOBH and MOBV and have few MOBC relaxases. Within proteobacteria, the gamma subdivision contains representatives of the six MOB groups, whereas the alphaproteobacteria lack MOBH and MOBC and show many MOBQ relaxases. This is due mainly to MOBQ2 relaxases of tumorigenic and symbiotic plasmids of Agrobacterium and rhizobiales. Surprisingly, the association of a T4CP with the relaxases of different MOB families is far from random. While 62% of proteobacterial relaxases have an associated T4CP, this drops to 36% in actinobacteria and 19% in firmicutes. Relaxases of some subfamilies of MOBF2, MOBQ1, MOBP5 (previously MOBHEN), and MOBV are never associated with a cognate T4CP.
All four classes of T4SSs were found exclusively in plasmids of proteobacteria, with the exception of Prosthecochloris aestuarii plasmid pPAES01 (a member of the Chlorobi) and a few plasmids from noncultivated bacteria for which the phylogenetic classification of the host is unknown. While MPFT clearly predominates in most clades, MPFF is equally abundant in gammaproteobacteria (Fig. (Fig.9).9). MPFI is less frequent, being more abundant in gammaproteobacteria and absent from delta- and epsilonproteobacteria, possibly due to a small sample size. The finding that prototypical conjugative T4SSs are exclusive of proteobacteria is puzzling because some of these plasmids can conjugate into cells of very distant clades (99, 139), such as cyanobacteria (75), where we found no such types of plasmids. Plasmids of firmicutes have also been shown to transfer to proteobacteria and actinobacteria (79, 143). Hence, our results suggest that although some proteobacterial plasmids have a broad host range, they tend to reside in one bacterial clade.
Overall, these analyses suggest that our understanding of plasmid mobility is excellent for proteobacteria and poor for most other clades, including clades as environmentally important as archaea, cyanobacteria, and actinobacteria. They also show a complex evolutionary history, with different clades sharing or lacking different components of the conjugation machinery. We therefore set up an evolutionary analysis of the coevolution between mobilization and conjugation.
The T4CP is the most informative protein of the conjugation machinery because it is a large, highly conserved protein with homologues in almost all conjugative plasmids. Furthermore, the phylogeny of T4CPs matched that of relaxases closely (57). We selected 184 T4CPs sharing less than 95% similarity to reduce sampling redundancy. The phylogenetic tree of the T4CPs confirmed on a larger scale that relaxases and T4CPs evolve in parallel, since large clusters in the T4CP tree corresponded almost invariably to a similar relaxase family or to an absence of known relaxases (Fig. (Fig.10).10). The VirB4 tree showed a more scattered distribution of relaxase families, especially among MPFT plasmids. From a functional point of view, although T4CPs interact with components of both the relaxosome and the T4SS, the T4CP-relaxosome interactions seem to be more system specific (94, 96, 97, 101). Besides, T4CP-encoding genes are usually adjacent to the genes coding for the relaxosome components (50, 57), with gene order conservation being another sign of coevolution and functional relatedness. In the light of these results, it is becoming evident that T4CPs are part of the conjugative DNA-processing module (what we call in this review the MOB module). There are, however, some exceptions to this rule, mostly concerning the MOBP family, showing a less tight linkage between MOB and T4CP. The ample diversity within the MOBP family might be the cause of this weaker association, which is consistent with the clustering of these relaxases into separate groups based on global sequence similarity (see Fig. S2 in the supplemental material). Interestingly, plasmids without known relaxases are often clustered together in both phylogenetic trees. It is tempting to view this as a further indication of the existence of unknown novel relaxases in some clades.
The basal positions in the T4CP tree show a few plasmids grouped into three clusters corresponding to MPFI-containing plasmids, a variety of MPFT plasmids from different phylogenetic origins containing all MOBC relaxases, and a group of MOBH-containing MPFT and MPFF plasmids (Fig. (Fig.10).10). The inner groups are constituted essentially by two clusters of MOBF-containing plasmids, separating MOBF plasmids from cyanobacterial plasmids lacking identifiable T4SSs, and a somewhat intermingled group of MOBP5- and MOBQ (or both)-associated plasmids. Figure Figure1010 (middle) shows the phylogenetic origin of the plasmid hosts containing T4CP or VirB4. It is apparent from Fig. Fig.1010 that the large clusters of T4CP sequences tend to correspond to coherent taxonomic clusters. While this might seem surprising given the mobile nature of plasmids and, in particular, of broad-host-range plasmids, it is perfectly compatible with our analysis of gene repertoires of T4SSs, showing a strict association of archetypical systems with proteobacteria. In fact, the only groups containing large phylogenetic diversity, e.g., MOBC-associated T4CPs, correspond to branches rooting very deeply in the tree. In our analysis, closely related genes usually correspond to plasmids in closely related clades. For example, archaeal VirB4s, for which we could find no matching T4CP, cluster together, and so do cyanobacterial and actinobacterial T4CPs, for which we could not find T4SSs. Hence, these results further suggest an important evolutionary inertia in the mechanisms of plasmid mobility, where mobility between distant clades is sporadic and short-lived.
The VirB4 tree is highly concordant with our classification of the archetypical T4SSs (Fig. (Fig.10).10). As mentioned above, TraU and VirB4 trees are independent because the similarities between the two families are too weak. The early divergence of MPFI is consistent with the basal position of the T4CP associated with MPFI in the T4CP tree. MPFI plasmids are known to produce a functionally idiosyncratic type of pilus. They can mate in liquid medium, so in this respect, they bear resemblance to MPFF. However, they contact other cells by a specific fimbrial system related to type 2 secretion (76). This originality might be the hallmark of an early divergence or independent origin from the remaining conjugative systems.
Among the remaining plasmids, there is a neat division of the VirB4 proteins into two groups. One group includes MPFT and MPFF, suggesting that the divergence between these two classes of T4SSs was comparatively recent, in agreement with the higher level of similarity of their T4SS loci (25, 123). Interestingly, there are some MPFT/MOBP11 plasmids on the side of the MPFF group at the basal position (Fig. (Fig.10).10). This finding suggests that MPFF and MPFG derived from the same lineage of MPFT. More plasmid sequences are required to ascertain this branching that, if correct, suggests that the ancestral proteobacterial VirB4-based conjugation system was MPFT. It is tempting to speculate that the ancestral MPFT-containing plasmids, exclusively surface maters, led to the more-complex MPFF systems by the acquisition of additional functions that allow plasmids to mate in liquid. Indeed, the other related T4SSs found in firmicutes also seem to carry fewer genes than MPFF systems. On the other hand, MPFT systems include plasmids with a broad host range (140) and that are capable of retrotransfer (134). Thus, along with the capacity to mate in liquid medium, MPFF evolved toward a narrower host range. This may represent an inevitable tradeoff between the complexity of the conjugation machinery and plasmid host range, since the known complex, independently derived, liquid-mating MPFI systems also have a narrow host range (33).
The presence of a given T4SS cannot be trivially correlated with the life-style of the host. For instance, the genus Vibrio is composed of planktonic organisms that thrive in brackish waters. Out of the six Vibrio plasmids listed in Table S1 in the supplemental material that contain mobility genes, V. harveyi plasmid pVIBHAR and V. fischeri plasmid pES100 contain MPFT, while V. vulnificus plasmids pC4602-1 and pYJ016 and V. tapetis plasmid pVT1 contain MPFF. Finally, V fischeri plasmid pMJ100 lacks relaxases but contains a T4CP and MPFI. Thus, there are examples of all three MPF systems in Vibrio plasmids. As a case in point, the best-studied bacterium, E. coli, has plasmids with all three MPF types. There is a lack of studies on the possible relative advantages of each mating type in a given environment.
Finally, there is a large congruent group of VirB4 proteins in plasmids from archaea and firmicutes (plus other Gram-positive bacteria), with the two groups being well-separated. This strongly suggests that these systems are more closely related among themselves than with the T4SSs of proteobacteria. Since firmicutes are more distantly related to archaea than to other bacteria, these results suggest an origin of VirB4-based MPFs dating from after the divergence of archaea and bacteria. These MPFs would have then spread among prokaryotes by horizontal gene transfer, possibly by conjugation. The close relationship between VirB4 homologues of firmicutes and archaea might then result from the adaptation of a lineage of ancestral T4SSs that was particularly plastic and capable of adapting to different cellular envelope structures.
Most evolutionary models assume that plasmids must be mobile to persist because plasmid carriage incurs a fitness cost (39, 78, 89, 91). Plasmids are regarded as selfish genetic elements because they can spread and survive without necessarily increasing their host fitness (7, 113). In this light, how can one interpret the existence of so many nontransmissible plasmids? Some of the very small plasmids may be remnants of larger replicons that lost genetic information and thus remain transiently carried in spite of being nonmobile. These plasmids would be expected to be on their way to extinction. However, since our survey uncovered that more than half of the plasmids are in fact nontransmissible, this explanation seems unsatisfactory.
Plasmids might move between cells by processes other than direct conjugation, e.g., by transduction, natural transformation, or cointegration in mobile plasmids. However, these mechanisms are logically expected to affect the transfer of plasmids at much lower rates than conjugation. They are also expected to favor small plasmids. Transformation requires the presence of full plasmids in the environment in spite of environmental nucleases, their entry into the cell, and eventually their reconstitution there, when transformation includes a preliminary step of DNA restriction. Transduction requires plasmids to be packaged and fit into a phage capsid. Therefore, transduction is restricted to plasmids with genomes the size of, or smaller than, the phage. Transfer by cointegration in other replicons requires the presence of other mobilizable elements and integration mechanisms. Since half of the plasmids are nonmobilizable, including the largest ones, these results seriously question the purely parasitic nature of all plasmids, which require high rates of transfer (13, 91).
It follows that the persistence of nonmobile plasmids has to be explained by the potentially useful genes (for the host) that they carry. Plasmids of Borrelia take up a significant part of their cellular genomes (26). Plasmids of these intracellular obligate parasites might have rare opportunities for plasmid transfer. The 39 different virulence plasmids of Borrelia garinii recombine frequently (32), and selection for gene conversion by antigenic variation might offset the cost of the loss of the capacity for frequent horizontal transfer by conjugation. There is also evidence of some exchange of genetic material between Borrelia species, since both plasmid and chromosomal sequences show marks of homologous recombination between different lineages (115). Some plasmids of Borrelia, the cp32 family, transfer by transduction (27, 151). These plasmids might then have a dual nature, like the bacteriophage P1 plasmid of E. coli; that is, they are maintained in cells like plasmids and transfer like phages (121). Plasmids in Buchnera species have most likely been kept without horizontal transfer but with exchanges with the chromosome for millions of years, possibly to increase the gene dosage of genes involved in mutualism with aphids (80). Traits encoded in mobile elements are both functionally and ecologically peculiar, and they might be durably and adaptively associated with plasmids (116). There is thus some evidence suggesting that plasmids with little or no mobility may survive for long periods of time by natural selection.
Surprisingly, most plasmids larger than 300 kb are nonmobilizable (Fig. (Fig.6).6). Such large, coding-dense plasmids are expected to have an important impact on cell fitness, and given their size, they are unlikely to be transferred by transformation or transduction. Their loss can be precluded only if their presence is selected for in the long term. One is then tempted to suggest that many of such very large plasmids are in the way of being domesticated into secondary chromosomes (recently called chromids ). In multichromosomal bacteria, occurring among clades such as Rhizobium, Burkholderia, and Vibrio (108), secondary chromosomes are smaller, contain few essential genes, and code for niche-specific functions (45, 67, 127). Some of these chromosomes contain a plasmid-like origin of replication, e.g., chromosome 2 in Vibrio cholerae resembles the oriVs of P1 and F plasmids (46) and replicates at a different moment in the cell cycle (117).
While there is no consensual definition of the difference between a chromosome and a plasmid, the former is often seen as any replicon containing essential housekeeping genes (108). Many large plasmids carry essential RNA genes, such as tRNA and rRNA (Fig. (Fig.11).11). In Deinococcus geothermalis and Ralstonia solanacearum, some chromosomal copies of the 16S rRNA genes are more similar to plasmid copies than to other chromosomal ones. In general, intact rRNA gene copies present in plasmids are as identical to the copies in chromosomes as they are among themselves, showing in all cases more than 99% identity. Since the threshold identity commonly used to define species in prokaryotes from 16S copies is 97% (129), this strongly suggests that these plasmids carry rRNA genes acquired from the species that hosted them at the moment of natural isolation. Our failure to observe rRNA genes significantly different from the chromosomal copies is a further indication that these plasmids have little, if any, mobility.
Genes coding for essential proteins are nearly always absent from small plasmids (Fig. (Fig.11)11) but are present in ~60% of large plasmids (100 to 400 kb) and 90% of very large plasmids (>400 kb). Highly persistent genes, i.e., genes present in most strains, many of which are not strictly essential (48), are also frequently found in a small number in large replicons (67). Contrary to tRNA and rRNA, genes encoding essential or persistent proteins rarely have multiple copies in a genome, and therefore, there is no possibility of a chromosomal compensation for the plasmid loss. Naturally, the existence of essential genes in a plasmid does not preclude plasmid mobility to other cells; it just prevents its segregation from the host. The data presented in this review suggest that most very large plasmids cannot be segregated without large negative fitness effects. In fact, even very large plasmids that are conjugative tend to show little mobility. For example, the large symbiotic plasmid of Rhizobium leguminosarum (>1 Mb) is conjugative but shows little mobility in natural populations (148). As a case in point, the largest plasmid of three strains of Rhizobium leguminosarum bv. trifolii, which is nonmobilizable in our analysis, could not be cured (8).
Plasmids tend to be A+T rich relative to the host (119) and have different oligonucleotide frequencies (144). However, large plasmids and secondary chromosomes tend to have compositions closer to that of the primary chromosome (67, 131). This is interpreted as the result of the long-standing presence of the plasmid in the host lineage, leading to the plasmid genome amelioration to the chromosomal composition (83), since the replication machineries are shared between replicons. It was recently proposed that secondary chromosomes and most very large plasmids should be called “chromids,” defined as replicons with plasmid-like maintenance and replication systems, nucleotide composition close to that of the primary chromosome, and carriage of some essential or persistent genes (67). It has been proposed that chromids allow bacteria to have a larger genome without incurring a chromosome replication time penalty leading to slower growth (149). However, bacterial replication is uncoupled from cell duplication, and the existence of multiple simultaneous rounds of chromosomal replication renders the genome partition into multiple replicons unnecessary. Furthermore, there is no association between the maximal growth rate and genome size (146). Accordingly, few of the fastest-growing and a number of the slow-growing bacteria have chromids (29).
It might be thought that large domesticated plasmids would be doomed to disappear because genes coding for adaptive traits could migrate to the chromosome, rendering the costly and nonmobile plasmid useless to the cell. Nevertheless, it has been proposed that the above-mentioned highly dynamic Borrelia plasmids do not integrate into the chromosome to stabilize chromosome organization (120). In itself, this might lead to a longstanding presence of plasmids, but plasmids also coevolve to impose low or no cost at all to hosts (14, 40). A plasmid conferring an adaptive advantage to the host does not require the maintenance of costly functions such as surface exclusion, poison-antidote systems, or even mobility. For example, both the fusion of Sinorhizobium meliloti replicons (65) and division of the Bacillus subtilis genome (70) showed little effect on growth in rich media. Overall, these results are consistent with the idea that very large nonmobilizable plasmids are in the way of becoming secondary chromosomes.
One of the major contributions that we expect from this review is the demonstration that many conjugative transfer systems can be found and classified just by searching for similarities to known relaxases, T4CPs and T4SS main ATPase sequences (VirB4). The use of cluster techniques allows matching the expert work usually involved in the classification of relaxases and T4SSs with great accuracy. The simplicity of the methodology will allow anyone to reuse it to characterize newly sequenced plasmids. In this respect, we are now creating ready-to-use hidden Markov model-based protein domains specific to each of the proteins analyzed in this study to make them available on a dedicated website. However, for plasmid comparative genomics to take off as an important part of plasmid biology, there are still several hindrances. These hindrances relate to resources for data storage and analysis and how these might allow using the extensive information on individual plasmids to understand the biology of other plasmids. Some effort has been done by comprehensive databases such as ACLAME (87). However, such databases are often short-lived or updated at a much slower pace than GenBank, as is the case of the Plasmid Genome Database (102), the Database of Plasmid Replicons, or the Genome Database of Naturally Occurring Plasmids. The ideal plasmid database should also establish links between the plasmid and its host chromosomes (or range thereof) and provide a rationale for functional classification. The ACLAME (87) database provides these links and a growing gene ontology that facilitates the expert analysis of plasmid sequence data from a systems biology perspective.
Gene nomenclature can also defeat the neophyte enthusiasm for plasmid biology. It is confusing that sets of orthologous genes can have different names in different plasmids. For example, VirB4 is called over 50 different names in the database, among which are VirB4, TrbE, TraE, TraC, MpfC, and TraB, etc., not to mention the distant homologue TraU family. Inversely, different plasmids have similarly named genes that have no relation of homology. This is largely a historical problem. While some genes were named by their order in the mobility locus, other genes were named by homology, and yet others, because homology was not obvious, were named yet differently. It is also not intuitive to name one given T4SS—MPFI—as a type 2 conjugative pilus or to name T2SSs as “type IV pili.” Rationalization of the nomenclature in the field would facilitate comparative approaches but naturally requires a community-level effort in establishing new names and assessing synonyms. While care must be taken in gene renaming, as sequence similarity may not be an indication of an exact function analogy, the current naming of genes carried by plasmids makes genome comparison using annotation files impossible. Our plasmid mobility classification scheme can be automatically generated by sequence similarity searches followed by clustering. This allows analyses such as the ones that we presented here but also allows analyses of environmental plasmid sets, such as those found in metagenomic data sets (12, 110, 122, 132, 133).
We now have an automated method for the systematic classification of relaxases, T4CPs, and T4SSs that allows the monitoring of the evolutionary patterns of plasmids in phylogenetic trees. The evolutionary histories of relaxases, T4CPs, and T4SSs are not fully congruent but suggest coevolution. The relaxases can thus be considered markers for plasmid classification in a way resembling the 16S RNA sequence in genomes, with the exception that around half of the plasmids lack relaxases and that different relaxase families exist. Still, MOB is a broader phylogenetic marker than origins of replication used to define Inc types (30), which evolve too quickly to obtain deep phylogenetic trees. The T4SSs are also good phylogenetic markers but are more limited in that T4SSs are absent from most plasmids. The T4CP is an excellent marker because it so far consists of one single protein family, allowing the establishment of the deepest evolutionary relationships. An understanding of the evolution of the conjugation machinery in clades other than proteobacteria and firmicutes will certainly be gained by the study of T4CP evolution and its interaction with the relaxosome and the T4SS.
Within proteobacteria, the 98% overlap between VirB4-based and specific T4SS-based searches suggests that we have attained a good knowledge of the conjugative pilus that can be used to automatically classify plasmid mobility. Our analysis suggests that the classical division of T4SSs into two groups, T4SSa and T4SSb (31), should be revised into a classification into four groups, one of which (MPFG) is rare and another of which (MPFT) is by far the most frequent. However, one must emphasize that such a classification scheme is applicable only to plasmids of proteobacteria. Outside this clade, our analyses suggest that new relaxases and T4SSs remain to be found. Some clades, notably in archaea, have known mobilizable plasmids containing a putative T4SS but no relaxase or T4CP. Some plasmids contain a small but coherent set of mobility genes of low homology with known relaxases, such as the above-mentioned new family represented by Bacteroides thetaiotaomicron VPI-5482 plasmid p5482. Other clades, notably cyanobacteria, contain many plasmids with relaxases and T4CPs but no close VirB4 homologues. Given that cyanobacterial cells are among the most abundant on earth and that cyanobacteria genomes have many plasmids, uncovering how plasmids spread in this clade should become a priority. Indeed, the 408-kb Anabaena MOBV plasmid pCC7120α was reported to be transmissible (103), although the exact mechanism of transmission was not investigated. As mentioned above, the VirB4 analogue of MPFI is less similar to VirB4 than to VirD4. It is thus likely that other unknown types of proteins energize unknown types of T4SSs. Mining for relaxases and T4SSs in large genomes is therefore bound to produce novel families. This should be done in connection with the identification of ICEs in genomes, which remain largely unexplored by comparative genomics.
It is tempting to relate the current caveats in relaxases and T4SSs with our results showing that known broad-host-range proteobacterial plasmids are in fact found only in proteobacteria. This reinforces previous observations that the conjugation range of plasmids is larger than the range of hosts that they typically occupy (34) and that plasmids tend to have nucleotide compositions close to that of the host where they are often found (131). While some plasmids can conjugate between very different clades, they are not naturally found there (at least not significantly often), and our phylogenetic analysis suggests that the diversification of mobility-associated protein families takes place in narrower clades. Plasmids do not shuffle modules freely; they tend to cluster within given clades, and this preference will somehow be related to specific features of a given plasmid design and with the host physiology (50). As a case in point, the T4SSs of firmicutes seem smaller than those of proteobacteria, possibly because of the lack of an outer membrane in these cells. Elements of the T4SS with homologues between proteobacteria and firmicutes are found to interact in equivalent ways (1), but the remaining machinery may work very diversely; e.g., T4SSs of firmicutes are not known to form a mating pilus (1). The T4SSs of mollicutes, lacking an outer membrane and a cell wall, seem even simpler. While conjugation systems have certainly adapted to the peculiarities of bacterial membranes, it makes little sense in opposing Gram-positive and Gram-negative plasmids. As a case in point, T4SSs of plasmids of proteobacteria seem to have more in common with the ones of firmicutes than with the ones of cyanobacteria, which also have two membranes. Thus, the emerging picture that seems to arise is that conjugation systems are adapted to taxonomic clades, and more research on differences between plasmids of proteobacteria, firmicutes, actinobacteria, cyanobacteria, and archaea, etc., should be carried out. As a result of these adaptive processes, few elements of the conjugation machinery are common between firmicutes and proteobacteria, and even fewer are common between cyanobacteria and archaea.
Plasmid design or host adaptation is also likely to account for phylogenetic specificities in MOB and MPF modules. This is analogous to the evolution of prokaryotic chromosomes. Although rampant recombination could have taken place and eroded phylogenetic lineages, this has not happened, because some core genes are rarely successfully exchanged (88, 141). What makes prokaryotic classification useful and meaningful appears to do the same job in plasmid classification, respecting mobility systems most likely because of the adaptive coevolution of the different elements of the mobility machinery with the host.
According to the interpretation of the sequence analysis carried out in this work, we found 15% conjugative, 24% mobilizable, and 61% nontransmissible plasmids in prokaryotes. In proteobacteria, for which our predictions are more accurate, the percentages are not so different, 28%, 23%, and 49%, respectively. Thus, about half of the plasmids are nontransmissible, and the remaining ones are divided more or less evenly between conjugative and mobilizable plasmids. This finding suggests that many evolutionary models of plasmid evolution requiring high rates of horizontal spread for plasmid survival might have to be revised to account for the fact that at least half of the plasmids probably have low transfer rates. Our phylogenetic analysis shows rare transfer between distant phyla and describes the evolutionary history of T4SSs. MPFT is by far the most abundant T4SS, and MPFG and MPFF might derive from an ancestral MPFT. However, MPFI and T4SSs from other clades have not derived from MPFT, which seems to be a proteobacterial invention. The high frequency of MPFT occurrence might reflect a particularly successful design, even though such plasmids are notoriously poorly functioning for mating in liquid media (16, 38). It might also result from some bias toward sequencing host-associated proteobacteria. The incoming metagenomic data should be able to provide more-accurate and less-biased estimates of the diversity and frequency of the different types of MPF and MOB. In any case, a full account of the evolutionary history of conjugation will require a parallel study of ICEs and the uncovering of conjugation mechanisms in prokaryotes lacking identifiable T4SSs and/or relaxases, that is, the vast majority of prokaryotes.
Work in the laboratory of F.D.L.C. was supported by grants BFU2008-00995/BMC (Spanish Ministry of Education), REIPI RD06/0008/1012 (RETICS Research Network, Instituto de Salud Carlos III, Spanish Ministry of Health), and LSHM-CT-2005_019023 (European VI Framework Program). Research in the laboratory of E.P.C.R. is supported by the CNRS and the Institut Pasteur. C.S. was supported by a BRAVO fellowship from the University of Arizona (HHMI grant 52005889). M.V.F.'s research was supported by grants FIS PI07/0664 (Spanish Fondo de Investigación Sanitaria, Instituto de Salud Carlos III) and CES08/008 (Spanish Fondo de Investigación Sanitaria, Instituto de Salud Carlos III, and Fundación Marqués de Valdecilla-IFIMAV), REIPI RD06/0008/0031 (Ministerio de Sanidad y Consumo, Instituto de Salud Carlos III), and LSHE-CT-2007-037410 (European VI Framework Program).
We thank Pascal Sirand-Pugnet for comments on the manuscript.
Chris Smillie received his B.S. in Molecular and Cellular Biology and Mathematics from the University of Arizona, where he worked with Dr. Howard Ochman on bacterial genome dynamics and next-generation sequencing. The following year, he was funded by a BRAVO research fellowship to study plasmid evolution and ecology in Dr. Eduardo Rocha's laboratory at the University of Paris. He is currently enrolled as a graduate student in the Computational and Systems Biology Ph.D. program at the Massachusetts Institute of Technology, where he is a student in Dr. Eric Alm's laboratory. His research interests include the development of computational methods for understanding microbial evolution and ecology, particularly from metagenomics sequencing projects.
M. Pilar Garcillán-Barcia, Ph.D. in Molecular Biology and M.Sc. in Biochemistry, is currently a postdoctoral researcher at the Institute of Biomedicine and Biotechnology of Cantabria. Her work has been focused on the molecular mechanisms and diversity of transposable elements and plasmid conjugation. Her main interests are the study of plasmid evolution and its applicability to synthetic biology.
Dr. M. Victoria Francia is a molecular biologist interested in the regulation of gene transfer in bacteria. Her research focuses on mobile genetic elements, mainly plasmids and integrons, in the context of the emergence and dissemination of multiple-antibiotic resistance. Her research is also concerned with studying the emergence and spread of important nosocomial pathogens such as MRSA (methicillin-resistant Staphylococcus aureus) and VRE (vancomycin-resistant enterococci).
Eduardo P. C. Rocha received M.Sc.'s in Biochemical Engineering and in Applied Mathematics, a Ph.D. in Bioinformatics, and an Habilitation à Diriger des Recherches in Biology. He is a CNRS senior researcher at the Institut Pasteur in Paris, where he leads the Microbial Evolutionary Genomics group. His main interests are the study of genome organization and dynamics and especially the ensuing tradeoffs from an evolutionary perspective.
Fernando de la Cruz is Professor of Genetics at the University of Cantabria, Santander, Spain. He has worked for more than 30 years on various aspects of plasmid biology, with a focus on the mechanism of bacterial conjugation. His main interests at present revolve around the diversity and evolution of plasmids and their application in Systems and Synthetic Biology.
†Supplemental material for this article may be found at http://mmbr.asm.org/.