All plasmids previously isolated from Thermococales were small RC plasmids (23–25
). Here we report the isolation, sequencing and characterization of three new plasmids from Thermococcales that are larger and probably replicate via the theta mode. The plasmids pTN2 and pP12-1 encode a DNA polymerase fused to a domain of unknown function. These proteins are homologous to the RepA protein of the plasmids pTIK4 from S. neozelandicus
and pXZ1 from S. solfataricus
) and to proteins encoded by integrated elements present in genomes of Methanococcales. They form therefore a new family of archaeal RepA proteins common to mobile elements present in both Euryarchaea and Crenarchaea. The DNA polymerase domain of these RepA proteins have no detectable sequence similarities with those of previously known DNA polymerases, including the DNA polymerase discovered by Georg Lipps in pRN1 (14
), indicating that these proteins could be the prototypes of a new DNA polymerase family. As in the case of the pRN1 polymerase, the DNA polymerase of pTN2 exhibits a primase activity (not shown, manuscript in preparation) and could be used for the initiation as well as for the elongation step of plasmid DNA replication. The DNA polymerase domain of the Rep proteins of pTN2 and pP12-1 are linked in C-terminal to a large domain of unknown function. In pRN-type plasmids, the DNA polymerase domain of the Rep proteins is fused in C-terminal to a helicase domain. This is probably not the case for pTN2 and pP12-1 since their C-terminal domain do not contain detectable ATP binding site. Indeed, the genes encoding the Rep-DNA polymerase in pTN2 and pP12-1 are contiguous to the genes encoding a putative SF1 helicase that is probably involved, together with the Rep proteins, in plasmid replication. It is tempting to speculate that the C-terminal domains of the pTN2 and pP12-1 Rep proteins bear an origin binding activity since the combination of such activity with their DNA polymerase/primase activities and the unwinding activity of the associated helicase SFI would reconstitute a complete DNA replication system for the plasmids.
The plasmid pT26-2 does not seem to encode a DNA polymerase and probably recruits a cellular DNA polymerase. Interestingly, this plasmid encodes a putative replicative helicase that would also correspond to a new family of replication protein. It is striking that the polymerase/helicase cassette of pTN2/pP12-1 and the putative replicative helicase of pT26-2 are both located downstream of the largest intergenic region present on each of the three plasmids. These regions are predicted to be replication origins by cumulative GC skew analysis and include many direct and inverted repeats, reminiscent of the ‘iterons’ typical of bacterial plasmid replication origins (55
The three new plasmids of Thermococcales described here could become interesting models to study plasmid replication both in vitro
and in vivo
in hyperthermophilic archaea. Furthermore, the identification of their replication cassettes will help to use these plasmids in future vector construction for T. kodakaraensis
, which is becoming the model species for genetic and molecular studies in hyperthermophilic archaea (28
). It should be interesting for instance to express different genes from different compatible plasmids in the same strain for complementation studies. We already know that pTN1 and pTN2 are compatible, since they both are present in the same strain of T. nautilus
). It should be also possible to use the SSV-type integrase of pT26-2 to insert groups of foreign genes in Thermococcales. Conversely, the genetic tools already available for T. kodakaraensis
should help to analyse the biological functions of the proteins encoded by the three plasmids described here if, as in the case of pTN1, they can be propagated in T. kodakaraensis
The plasmid pT26-2 encodes many homologues of proteins encoded by ‘virus-like elements’ present in Thermococcales and Methanococcales, including a SSV-type integrase, thus defining a large family of plasmid and integrated elements that probably predated the separation of these two archaeal orders. The pT26-2 family is characterized by the presence of seven ‘core genes’ present in all members of the family. Whereas pTN2 and pP12-1 can be considered as cryptic plasmids, the evolutionary relationships between pT26-2 and virus-like elements present in Thermococcales (TKV2/3, TGV1, PHV1) and Methanococcales (MMPV1, MMC7V1/2, MVV1, MMC6V1) suggests that pT26-2 plasmid could be a conjugative plasmid or a viral genome (or derived from such elements). Indeed, pT26-2 encode hydrophobic proteins that are clustered in one half of the plasmid, suggesting that they could be involved in the formation of protein complexes involved in DNA transfer, either for conjugation or viral infection. The putative glycoprotease activity of the large and conserved membrane protein t26-5p also suggests that pT26-2 and related elements indeed have the ability to destroy and/or modify the cell envelope to allow DNA transfer either during conjugation or viral infection. Krupovic and Bamford (57
) have shown that one of the genes present in TKV4 encodes a capsid protein of the PRD1-adenovirus type (with a double-jelly roll fold), suggesting that TKV4 is a bona fide
virus. In contrast, we could not detect gene encoding putative capsid proteins in pT26-2 or related elements. We also did not detect virus particles in cultures of T. nautilus
, or in cultures of the Thermococcales containing related elements (54
). These strains produce virus-like vesicles and some intracellular DNA is strongly associated to vesicles of T. gammatolerans
) but we failed to specifically detect DNA from pT26-2 or related elements in these vesicles, nor proteins encoded by these elements (data not shown).
The origin and mode of evolution of plasmids and viruses, as well as their position in the tree of life, remain controversial topics (58–63
). It is sometimes assumed that these biological entities essentially evolve by recruiting cellular genes (62
). However, in contradiction with this view, homologues of cellular genes are usually rare in viral genomes and plasmids, with the exception of cellular genes that are present in integrated viruses and/or plasmids (58–60
). Indeed, none of the proteins encoded by the three plasmids studied here has cellular homologues, except for genes mostly present within plasmid and/or viruses integrated in cellular genomes. The putative SFI helicases of pTN2 and pP12-1 are homologous to bacterial UvrD-type helicases, however, they are more closely related to putative helicases present in archaeal genomes which are most likely of viral origin (Supplementary Figure S4
). This indicates that none of the proteins encoded by the three plasmids studied here correspond to a bona fide
cellular gene that has been recently captured from a host cell.
The presence of a high proportion of ORFans among plasmid and viral genes is another specific feature that cannot be explained easily in the traditional view of virus/plasmid evolution. The question of the origin and the nature of cellular and viral ORFans has been raised repeatedly (64
). Most of them encode short proteins, and it has been argued sometimes that these ORFs do not encode bona fide
proteins. The three plasmids studied here also encode a relatively large proportion of small ORFs. We can assume that most of them are real genes because they are conserved either between pTN2 and pP12-1, or between pT26-2 and the virus-like elements present in the genomes of Thermococcales and Methanococcales. The characterization of a large family of pT26-2 related elements present in the genomes of Thermococcales and Methanococcales illustrates our recent observation, made at larger scale, that viral/plasmid genes represent a large reservoir of genes that constantly invade bacterial and archaeal genomes (41
Plasmids and viruses often encode large proteins that are involved in genome replication or in the formation of structural apparatus, such as virions or DNA transferring machinery. These genes, that have either no cellular homologues or only distantly related ones can be considered as viral specific proteins (59
), or virus hallmark protein (60
). The three plasmids studied here encode several examples of such proteins. This is the case for instance of the two large proteins conserved in all members of the pT26-2 family (t26-5p and t26-6p). Structural analysis indeed confirmed the uniqueness of one of them, the protein t26-6p, which contains three novel folds (53
). Many of the large proteins encoded by plasmids and DNA viruses are involved in DNA replication, repair recombination and/or integration, and this is indeed the case in the present study. These proteins are usually conserved in all members of the same plasmid/viral family (as the integrases of the pT26-2 family or the Rep-DNA polymerases and the putative SFI helicases of the pTN2 family) but they can be also shared by different plasmid families (as in the case of the pTN2-type helicase SFI, the DNA polymerase/primase or else the SSV1-like integrases).
To explain the existence of many proteins involved in DNA replication, repair or recombination among viral specific proteins, it has been suggested that DNA and DNA-manipulating enzymes originated first in an ancient virosphere, and that only a subset of these proteins were later on transferred into modern cellular lineages (61
). This hypothesis predicts that many new types and families of viral-specific DNA-manipulating enzymes remain to be discovered in the plasmid/viral world. The present work fulfils this prediction since, by studying only three plasmids, we discovered a new family of DNA polymerase and probably a new family of helicases. It gives us the exiting expectation that more new families of enzymes involved in DNA metabolism await to be discovered in systematic analyses of yet uncharacterized proteins encoded by plasmids and viruses.
Interestingly, plasmids and related viruses of Thermococcales studied here have no close homologues encoded by bacterial or eukaryotic viruses/plasmids but only in archaeal plasmids and viruses, thus confirming that the three domains have largely independent viral reservoirs (68
). The pTN2 and pP12-1 plasmids share three genes with the virus PAV1 from P. abyssi
, which reminds the presence of SSV1-like genes in the genome of the Sulfolob
us pRN-type plasmids pSSVx (18
). A homologue of a PAV1 gene has also been detected recently in TKV4 from T. kodakaraensis
), showing that gene exchange between PAV1 and plasmids has not been limited to the pTN2 family. Finally, the pTN2 and pP12-1 putative SFI helicase is also present in the virus-like element TGV2 from T. gammatolerans
. All these observations suggest that viruses and plasmids infecting and/or present in Thermococcales share a common pool of genes that can be shifted from one genome to the other by recombination. The same phenomenon has been observed in Bacteria. For instance, Krupovic and Bamford (69
) have shown that some viruses of the Corticoviridae family, whose prototype is the bacteriovirus PM2, use a replication machinery of plasmid origin instead of the viral DNA replication machinery of PM2.
It is often assumed that the evolution of plasmid and viruses cannot be reconstructed because they are constantly involved in horizontal gene transfer between various cellular lineages (62
). However, a recent phylogenomic analysis of 16 genomes of T4-related viruses identified a conserved core of 24 genes that all (but one) exhibit a similar phylogenetic pattern, indicating that these viruses mainly evolved vertically (70
). Similarly, phylogenetic analysis of the core proteins of the pT26-2 family is congruent with those of Thermococcales (using Methanococcales as outgroup, see Supplementary Figure S12
). It has been recently observed that viruses and their hosts exhibit similar biogeographic patterns in genomic studies focusing on the archaeal species S. islandicus
). This shows that cells and their associated viruses and plasmids co-evolved at the level of species. The analysis of the proteins encoded by the three plasmids from Thermococcales studied here suggests that co-evolution occurred at the order level as well. Indeed, the most closely related homologues of the pTN2, pP12-1 and pT26-2 proteins are mostly found in viruses, plasmids or integrated elements of Thermococcales. If Thermococcales proteins are removed from the analysis, the most closely related homologues are frequently found in Methanococcales, which are closely related to Thermococcales in phylogenetic trees of Archaea based on ribosomal proteins or RNA polymerase subunits (38
). Our large-scale analysis of the relationship between the pT26-2 family and other plasmids or integrated elements (CAGs) present in Archaea confirms this view, since the network of evolutionary interactions between these elements overlaps with the phylogenetic pattern of the archaeal domain ( and Supplementary Figure S11B
). These results show that gene transfers between viral, plasmids and cellular lineages have not obliterated the phylogenetic signal that testifies for their co-evolution with their hosts. This has important implications for the current debate about the inclusion of viruses in the tree of life. Indeed, although viruses and plasmids cannot be placed in a tree based on universally conserved proteins, they can be probably included as companions in a universal tree of life based on the evolution of cellular species. We think that further systematic analyses of free or integrated plasmids and viruses in all archaeal, bacterial and eukaryotic groups are now mandatory in order to draw a comprehensive tree of life unifying all types of living entities present in the biosphere.