The discrepancy between the phylogenetic trees of the late genes and the regulatory region of the 31 phages together with the many instances of regulatory region genes of other types than P2 or 186 types, e.g., the presence of lambda type integrases in phages classified as Peduovirinae, being associated with P2-like late genes clearly demonstrates that the evolution of P2-like phages are no exception from the evolutionary processes found among other groups of phages. They indeed have mosaic genomes that consist of functional modules. In addition, GenBank contains hundreds of P2-like sequences coding for structural proteins found in bacterial genomes. Even though many of these are cryptic phages lacking the regulatory region, some ought to be functional P2-like prophages with other sets of regulatory regions than the P2 or 186 types.
The majority of phages with complete P2-like genomes (sensu stricto: alike phage P2) are of two distinct types, one with a P2 type of regulatory region and another with a phage 186 type, and it appears as only two recombination events have occurred between their regulatory regions and their late genes.
Recombination within the same type of regulatory region has been detected before,9
but it seems to be confined to closely related phages of the same regulatory region type. Although the phylogenetic inference is weak, no chimeric regulatory region was found among the 31 phages. The int
–transcriptional switch regions of both types are strongly coadapted due to their multifunctionality, and the evolution of the regions is probably constrained by this complexity. The regions are simply too dissimilar to allow homologous recombination, and illegitimate recombination would most likely result in dysfunctional regulatory genes and ruin the precise key and lock feature of the proteins. New regions that do arise of each type are more likely the result of many and smaller mutational changes, as well as recombination between similar regions. The two int
–transcriptional switch regions have diverged into several subgroups accordingly, wherein the individual genes show a different evolutionary rate.
The crystal structure of P2 C has recently been determined.22
The N-terminal of the C protein is predicted to contain four α-helices where helix 3 has been hypothesized to be the DNA recognition helix in a helix-turn-helix (HTH) motif formed in association with helix 2. This is in accordance with the finding that this part of the 13 C proteins of the P2 type is very variable which corresponds to the variation of their target DNA sequences. There are only two conserved amino acids in helix 3, both at the C-terminal end. The structure of P2 C also revealed a fifth α-helix and a β-sheet not detected by the JPRED structure prediction. The sequences of these are quite conserved in all the 13 C proteins from phages with a P2 type transcriptional switch. The C-terminals of these C proteins also contain another conserved region, GQIAPALA, located after the structurally determined β sheet in P2 C but before the last 14 residues that form the C-terminal’s flexible tail (Fig. S1
The structure of the CI protein of phage 186 has also been determined, and the N-terminal domain contains five α-helices where helix 2 and 3 forms the HTH motif. The C-terminal domain consists of a highly twisted ten-stranded β-sheet that is involved in the assembly of a heptamer of dimers.23
The alignment of the CI proteins showed that PsP3 and prophage ΦECO3 have a longer C-terminal domain compared with phage 186, and also that SopEΦ, Fels-2 and ΦSEN2 have a longer coupler between the two domains (Fig. S2
It is doubtful if these proteins, the C type and the CI type, have a common evolutionary ancestry. Both groups are however structurally conserved at the N-terminal end which interacts with DNA via a HTH motif, and it could be hypothesized that the C type of repressor protein is the result of an old truncation of CI. However, the CI proteins are all twice the size of the C proteins and the sequences of the two groups are impossible to align. A more plausible hypothesis is that the C gene is a horizontally transferred addition to the P2 type of regulatory region.
The Cox/Apl proteins have actually differentiated into more than four groups since there are analogous proteins that do not resemble any of the Cox/Apl proteins in the alignment. The proteins of HP1/HP2 and PsP3 are different from the rest and the Cox/Apl proteins in ΦCTX and ΦRSA1 have not been identified. The Cox/Apl proteins of HP1/HP2 have a secondary structure predicted to be similar to that of group 4 but the protein of PsP3 is a singleton without detectable conserved domains or relation to a protein with known function.
The identified Cox/Apl proteins in the analyses are also highly differentiated, especially between groups. Group 1 and 2, all belonging to the Cox type, share six residues but the other groups are only structurally similar over the HTH domains (Fig. S3
). Thus the phylogenetic relationship of the groups cannot be concluded, which suggests that the evolution of the Cox/Apl proteins needs to be further studied.
The integrases are more conserved than the transcriptional switch genes and obviously shares a common ancestry. They can clearly be divided into two groups, the P2 type and the 186 type of integrases, which share secondary structure and some residues. The same phylogenetical groups are present in the analyses of both transcriptional switch genes which, in combination with the analyses of the secondary structure of these proteins, points at an evolutionary monophyletic background of the entire regulatory region. Midpoint rooting of the Int and Cox/Apl phylogenetic trees places the Int–CI–Apl type closer to the root which suggests that it is the older type and that the less differentiated Int–C–Cox is a derived state that has evolved to become less complicated. The 186 type of transcriptional switch is similar to the intricate lambda switch as it is also dependent on the additional gene cII, whereas the P2-like switch not only lacks an equivalent to this gene, but also has a smaller C protein, half the size of CI. It cannot be ruled out that the 186 type of switch is distantly related to the transcriptional switch type of phage λ.
The distribution of these two types of transcriptional switches seems not to be strongly associated to phages utilizing certain hosts. Previous studies have concluded that the P2 type of switch is confined to phages with an E. coli
Our analyses show that the P2 type can also be found in prophages in Yersinia
, and that the 186 type is widely distributed among phages in many bacterial genera, e.g., Salmonella
, and Aeromonas
, as well as in E. coli
. There are also prophages from E. coli
with a genome containing a P2 type of regulatory region and structural genes more similar to phages like HP1, HP2 or K139. These results on the distribution of the two types of regulatory regions and their host preferences are not in accordance with earlier studies which showed a high occurrence of the P2 type of transcriptional switch in E. coli
but no sign of phages with a 186 type. If the 186 type of transcriptional switch is actually common in all γ-proteobacteria it is surprising that not a single one was found among 38 sequenced E. coli
The results of the hybridizations may explain this discrepancy since they show that 186 like prophages are scarce within E. coli
. In addition, phage 186 may have been isolated from an E. coli
host but it grows poorly on many E. coli
strains. Under laboratory conditions, phage 186 is propagated on E. coli
It could thus be questioned whether either E. coli
is the preferred host for phage 186 or if it is another bacteria related to these or to Klebsiella
. In either case, the observations are in accordance with the view that the 186 type is older and spread over more bacterial genera than the P2 type.
From a taxonomic point of view, it could be motivated to let the subfamily Peduovirinae contain at least two genera; P2 types and 186 types. Since phage HP1 was the first phage with the 186 type of regulatory genes to be fully sequenced, we suggest naming the two groups “P2-like phages” and “HP1-like phages” within the Peduovirinae subfamily.
Previous studies have shown not only a preference but also a differentiation of P2-like phage genomes consistent with the phylogeny of the hosts, indicating a host preference.8
Though we cannot see signs of host preference in this study it is undisputable that there can be several explanations for a lack of correlation between different sets of genes and host association. P2-like phages were initially isolated from commensal gut bacteria and sewage. Only phages that could be propagated on standard laboratory E. coli
strains were isolated, and the sampling of phages was thus biased. Many of the prophages identified in bacterial genome sequencing projects might be old inactive prophages with mutationally deteriorated genomes showing a poor relationship to recent functional phages. Several of the bacteria that harbor the prophages identified in this study are not commensal but pathogens, due to a sequencing bias of such strains. Pathogenic E. coli
may have as much as 20% larger genomes, and the extra genes are often organized into horizontally transferable pathogenicity islands which may contain prophages. There is also a possibility for conjugative transfer of genes between bacteria or by means of other vectors. Consequently, phage genes may hitch-hike along other genetic elements that are horizontally transferred and eventually be found in atypical genomes.