|Home | About | Journals | Submit | Contact Us | Français|
The generalized transducing double-stranded DNA bacteriophage ES18 has an icosahedral head and a long noncontractile tail, and it infects both rough and smooth Salmonella enterica strains. We report here the complete 46,900-bp genome nucleotide sequence and provide an analysis of the sequence. Its 79 genes and their organization clearly show that ES18 is a member of the lambda-like (lambdoid) phage group; however, it contains a novel set of genes that program assembly of the virion head. Most of its integration-excision, immunity, Nin region, and lysis genes are nearly identical to those of the short-tailed Salmonella phage P22, while other early genes are nearly identical to Escherichia coli phages λ and HK97, S. enterica phage ST64T, or a Shigella flexneri prophage. Some of the ES18 late genes are novel, while others are most closely related to phages HK97, lambda, or N15. Thus, the ES18 genome is mosaically related to other lambdoid phages, as is typical for all group members. Analysis of virion DNA showed that it is circularly permuted and about 10% terminally redundant and that initiation of DNA packaging series occurs across an approximately 1-kbp region rather than at a precise location on the genome. This supports a model in which ES18 terminase can move substantial distances along the DNA between recognition and cleavage of DNA destined to be packaged. Bioinformatic analysis of large terminase subunits shows that the different functional classes of phage-encoded terminases can usually be predicted from their amino acid sequence.
Generalized transducing bacteriophages are valuable members of the arsenal of tools for the genetic study of bacteria. ES18 is a temperate, generalized transducing double-stranded DNA (dsDNA) phage that naturally infects Salmonella enterica serovar Typhimurium (48), as well as some serovar Enteriditis, Dublin, Pullorum, Gallinarum, and Paratyphi B strains (52, 76). In addition, it can infect Escherichia coli if it displays the Salmonella surface receptor (47). ES18, which has also been called typing phage A18, was originally isolated in about 1953 after its release from Salmonella sp. strain BA19, in which it apparently resided as a prophage (8, 75). It has an estimated genome size of about 46,000 bp (74). It is of technical interest because, unlike the well-characterized Salmonella transducing phage P22, it can infect rough strains that do not produce full-length O-antigen, for which no other transducing phages have been studied (48). Yamamoto (98) showed that ES18 is able to recombine to form viable hybrid phages with both the short-tailed phage P22 and the long-tailed phage Fels-1, both of which are now considered to be lambdoid phages (34). Thus, ES18 was also thought to be a lambdoid phage. This was tentatively confirmed in studies by Schmieger and colleagues (74), which found that the ES18 prophage repressor and lysis regions are very similar to those of the short-tailed phage P22 in both sequence and genome location. It has been reported (without publication of the data) that ES18 has a long flexible tail (48, 98), and the ES18 head and tail gene DNAs have been shown not to hybridize to the DNAs of phages P22 or λ (74). Because it seemed possible that it harbored a novel set of head assembly genes, and in order to foster further research on this interesting and potentially useful phage, we have sequenced and annotated the complete ES18 genome and analyzed its DNA packaging strategy.
ES18 was propagated on S. enterica strain Q1 (98) or on strain DB7000 that was cured of the Fels-2 prophage (7) (this prophage restricts ES18 growth ). All bacterial cultures were grown in Luria-Bertani medium at 37°C. ES18 virions were partially purified as follows: cells were shaken at 37°C for 90 min after infection, at which point the culture was shaken with chloroform to complete lysis. Cell debris was removed by centrifugation for 20 min at 8,000 rpm in a Beckman JA-10 rotor at 4°C, and phage particles were pelleted by spinning overnight at 8,000 rpm in the same rotor at 4°C. The pellet was gently resuspended in TM (10 mM Tris-Cl [pH 7.4], 1 mM MgCl2), and insoluble material was removed by low-speed centrifugation. The resulting preparation was pelleted onto a CsCl cushion (density, 1.4 g/ml), dialyzed against TM, and finally sedimented through a 10-to-40% sucrose gradient at 25,000 rpm for 2 h at 4°C in an SW40.1 Beckman rotor; ES18 virions sediment as a visibly opalescent band about three-quarters of the way down the gradient, which was harvested by syringe puncture of the tube. DNA was isolated from ES18 virions by sodium dodecyl sulfate (SDS) release and ethanol precipitation (17), followed by phenol extraction. Virions on carbon-coated Parlodion films supported by 400-mesh copper grids were stained for electron microscopy with 0.5% uranyl formate and examined with a Philips Morgani 268 microscope operating at 80 kV.
Nucleotide sequence determination was performed by dideoxy chain termination methods as previously described (66). Bulk sequencing of a random plasmid library of sheared DNA was performed until an average of >7-fold coverage was obtained (768 sequencing runs). Twenty-two primers were designed to sequence across weak areas by using virion DNA as template, and PhredPhrap (32) assembled the sequence data into one circularsequence. The DNA sequence analysis software used was DNA Strider (24), GeneMark (5), Staden programs (78), BLAST (2), BPROM (http://www.softberry.com/berry.phtml?topic=gfindb), tRNAscan (57; http://www.genetics.wustl.edu/eddy/tRNAscan-SE), and DNA Master (http://cobamide2.bio.pitt.edu/). Neighbor-joining trees were constructed with CLUSTAL X for the Macintosh (43, 84), and maximum likelihood analysis was performed with PHYLIP (28; http://evolution.genetics.washington. edu/phylip.html). Polyacrylamide gel electrophoresis of proteins in SDS, staining with Coomassie brilliant blue R-250 (Bio-Rad, Richmond, Calif.), and determination of the N-terminal amino acid sequence by the University of Utah Protein Facility were performed as previously described (27). Agarose gel electrophoresis and contour-clamped homogeneous electric field (CHEF) electrophoresis of DNA molecules were performed as previously described, as were Southern (77) analyses of electrophoresis gels (14).
ES18 virions are unusual in that they do not band in CsCl equilibrium density gradients at a density near 1.5 g/ml as do nearly all other tailed-phage virions; this is presumed to be due to the inability of Cs+ ions to penetrate the head shell, although other explanations are possible. Heating the virions to 45°C for 10 min or addition of Triton X-100 in the presence of CsCl did not alter the banding density of ES18 virions (data not shown). This property has precluded simple, one-step equilibrium density purification of the virions to near homogeneity. ES18 virions were thus partially purified by differential centrifugation as described in Materials and Methods.
ES18 virions have been reported informally to have long, flexible tails by Kuo and Stocker (48) and Yamamoto (98) (although its morphology has been incorrectly discussed elsewhere as being “P2-like” ). Figure Figure11 shows that ES18 does indeed have a long, flexible, noncontractile tail, as well as a head that is hexagonal in outline and so almost certainly icosahedral. Its head is 56 nm in diameter, and its tail is 210 nm long and 12 nm wide. We observed no obvious side tail fibers or other tail tip structures. Figure Figure2A2A shows an SDS-12.5% polyacrylamide electrophoresis gel of the partially purified virion preparation. The genes that encode several of the virion proteins have been identified (discussed below), and two host proteins, flagellin and OmpA, that are present in large contaminating structures (flagella and outer membrane vesicles, respectively) have been identified by their N-terminal amino acid sequences (indicated in Fig. Fig.2A).2A). ES18 virions contain DNA molecules that are 51.5 ± 1 kbp in length as measured by CHEF gel electrophoresis (Fig. (Fig.2B).2B). This indicates that the virion chromosome is about 10% terminally redundant, since the sequence of the genome is 46,900 bp in length (below). In addition, like P22 DNA, the width of the ES18 virion DNA electrophoresis band is wider than λ DNA (Fig. (Fig.2B).2B). This is a typical feature of headful packaging phages, since size measurement of their packaged chromosome is imprecise (12, 83). Such terminal redundancy and imprecise length measurement is not unexpected, since ES18 is a generalized transducing phage (48) and all characterized naturally occurring generalized transducing phages utilize the headful packaging strategy.
The complete nucleotide sequence of the ES18 genome (GenBank accession no. AY736146) was determined by sequencing random DNA clones of virion DNA fragments as described in Materials and Methods. The sequence assembled unambiguously into a single circular sequence 46,900 bp in length; we assigned bp 1 to the first base pair of the predicted small terminase subunit gene, so its map can be easily compared to the standard phage P22 and λ maps.
We describe briefly here the predicted ES18 genes from left to right across the genome. Figure Figure33 shows a map of the 79 predicted ES18 genes, and Table Table11 lists salient features of these protein-encoding genes (no tRNA genes were found in the ES18 genome). Most of the morphogenetic portion of the putative ES18 late operon (the bulk of the left half of the virion chromosome) is not closely related to morphogenetic regions of previously characterized phages. Genes 1, 2, 5 and 6 are putative head genes, because their protein products have weak similarities to phage small and large terminase subunits, portal proteins, and phage SPP1 gene 7 head assembly factor (HAF) (81), respectively (Table (Table1).1). These functions of the ES18 head genes, as tentatively deduced from their weak sequence matches and the relative abundance of the encoded proteins in virions, fit the stereotypical lambdoid phage morphogenetic gene order, which is 5′-small terminase-large terminase-portal- HAF-decoration-coat-3′ on the mRNA (HAF refers to homologues of the phage SPP1 gene 7 head assembly factor ).
Amino-terminal sequencing of virion proteins isolated from SDS-polyacrylamide electrophoresis gels (Fig. (Fig.2A)2A) showed that the products of genes 8 and 9 are major virion components; we determined their N-terminal sequences to be H2N-SRYRRV and H2N-AVGGFTR, respectively. Neither protein has any recognizable homologue in the current sequence database. The gene 8 protein's N-terminal methionine has been removed (common when a penultimate serine is present ), and the N-terminal 51 amino acids have been cleaved from the 358-amino-acid primary translation product of the gene 9 protein. The predicted size of the cleaved protein (33,248 Da) agrees well with the measured size of 32,000 Da. Quantitation of the Coomassie brilliant blue staining intensity of the bands in SDS-polyacrylamide electrophoresis gels indicated that the molar ratio of gp8/gp9 was 0.8 ± 0.1 in virions; since staining efficiencies for different proteins may vary, it seems likely that there is one gp8 for each gp9 molecule in the virion. The major components of known phage heads are coat proteins, which build the head shell, and decoration proteins, which stabilize the coat shell by binding to its exterior surface. Proteolytic removal of an N-terminal peptide from phage coat proteins during assembly is common (13), and head decoration proteins are usually rather small and are not known to be cleaved. In addition, decoration protein genes are in some cases immediately 5′ of the coat protein gene. From these observations, we feel it is very likely that gene 8 encodes a decoration protein and gene 9 encodes the coat protein.
The predicted gene 5 protein has weak sequence similarity to other phage portal proteins, and it was identified in virions, since its amino-terminal sequence was determined to be H2N-XALNDA, where ALNDA is identical to amino acids 12 through 16 of the predicted ES18 gene 5 protein (Fig. (Fig.2A).2A). The unidentified amino acid X in position 1 of the determined sequence is amino acid number 11 (a histidine) in the predicted gene 5 protein sequence, indicating that 10 N-terminal amino acids have been removed by proteolysis from the primary translation product. Removal of a small number of N-terminal portal protein amino acids by proteolysis during head maturation has precedent in phage λ gene B protein's cleavage (35, 90). The staining intensity relative to that of coat protein of the gene 5 protein (Fig. (Fig.2A)2A) is consistent with the expected 12 molecules of portal protein per virion (13). The cleavage of the putative coat protein and the stereotyped order of lambdoid head gene clusters suggest that ES18 gene 7 may be the capsid maturation protease, but further experimentation will be required to test this hypothesis.
It is noteworthy that the ES18 region of genes 1 through approximately 11 defines a new type of head gene module that has not been previously observed; this module is shown in comparison to the five previously known lambdoid head types in Fig. Fig.4.4. These six modules encode the formation of heads that are indistinguishable in the electron microscope, and almost certainly all have an icosahedral T=7 structure. This diagram is not meant to imply that additional head types or mosaics of these very different types will not be found. For example, KO2 (11), PY54 (41), ΦP27 (72), and SFV (1) comprise a rather distantly related subtype within the HK97 group, and Sf6 (17) and HK620 (21) have a large terminase subunit that is more closely related to the ES18 terminase than that of other members of the P22 group (see below). In addition, we point out that although these six groups are very highly diverged, a number of the genes within each functional group are nonetheless homologous (i.e., diverged from a common ancestor). For example, most of these six large terminase and portal protein types can be shown to be members of the extended terminase and portal families after multiple rounds of Ψ-BLAST (2) analysis, and the HK97 and P22 coat proteins are thought to have the same polypeptide fold in spite of no recognizable amino acid sequence similarity (33, 44, 93). On the other hand, the putative proteases of lambda and HK97 are members of the ClpP and procapsid protease families with different folds (20) and so may have arrived at parallel functions in head assembly by convergent evolution.
Genes approximately 12 through 29 include the putative tail genes of ES18. Gene 16 and 21 proteins have weak similarities to the major tail shaft subunits and tail tape measure proteins, respectively, of other lambdoid phages that have long flexible tails, most notably those of phage HK97 (45). Our failure to obtain N-terminal amino acid sequence from the putative tail shaft protein suggests that its N terminus is blocked. Genes 23, 26, 27, 28, and 29 are homologues of phage λ tail tip genes M, L, K, I, and J, respectively (Table (Table1).1). The gene J protein is the host range-determining protein in λ virions which binds to the target cell's surface receptor (46, 91), and J protein and its homologues are not known to utilize surface polysaccharides as receptors. This agrees with the fact that ES18 is known to infect both O-antigen-producing (smooth) and O-antigen-defective (rough) strains of Salmonella enterica (48). It adsorbs to the host outer membrane FhuA protein, a TonB-dependent ferrichrome transporter that is also the receptor in E. coli for phages 80, T1, and T5 (6, 47). However, the ES18 J-like protein is only rather distantly related to receptor binding proteins in these three phages (80 [G. Plunkett, personal communication], T1 , and T5 [accession no. AY543070]), and it has been shown that 80 and ES18 bind to different surfaces of the FhuA protein (47), suggesting that they may have evolved independently to utilize the same receptor.
Genes 30, 31, 32, and 33 are homologues of phage N15 (71) and HK022 (45) genes that are located in parallel regions of those genomes. Of the latter four genes, only ES18 genes 32 and 33 have homologues of known function. Gene 32 is similar to the N15 24 and 80 cor genes, which are lysogenic conversion genes that function to exclude superinfecting phages such as 80 and T1 (58, 89); however, unlike the N15 and 80 (and HK022) cor genes, ES18 gene 32 lies in the opposite orientation from the late operon. The N-terminal sequence analysis (H2N-SAGT) of the isolated virion proteins (above) suggests that gp33, which has weak homology to some phage tail fibers, is present in virions; the gp33 band in the gel is substantially smaller than the predicted translation product, suggesting that it might be cleaved (Fig. (Fig.2A).2A). The functions of the other two genes in this region, 30 and 31, are not known.
Although the ES18 tail genes show overall similarity and organization to the tail genes of better-studied lambdoid phages with long noncontractile tails, ES18 has four predicted genes, 17 through 20, between the tail shaft subunit (gene 16) and the tape measure (gene 21). Other lambdoid tail gene clusters of this type which have been examined carry only two open reading frames in this interval. It therefore seems likely that one or more of these four genes are “morons,” or non-tail assembly genes that have been recently inserted into this location (38, 45). This idea is supported by the fact that gene 19 homologues are found in E. coli phage ΦP27 (72) and a Pasteurella multicoda prophage (70), where in both cases they are adjacent to phage-carried toxin genes; in the former it is clearly not in the tail cluster and in the latter it lies at a different location but near putative tail genes. The two open reading frames in this interval in phage λ encode a tail assembly chaperone (97), and the downstream one is only expressed through translational frameshifting from the first gene (53); such pairs of overlapping genes related by a translational frameshift are in fact nearly universally present among phages with long tails (97). We are unable to identify the site of the expected frameshift for ES18 with confidence, but we suggest that a plausible possibility, based on patterns seen in other phages, would be a +1 frameshift occurring at the end of gene 17, into the open reading frame that extends just to the beginning of gene 18. Finally, tail region genes 22, 24, and 25 (functions unknown) do not have homologues in the studied lambdoid phages. Genes 3, 18 to 20, 22 to 25, and 32 (cor) correspond to the four portions of the late operon with the lowest G+C contents, suggesting that they have a different evolutionary history from the rest of the late operon, and so are all moron candidates.
ES18's divergent early operons are in a typically lambdoid arrangement and have similar control elements. Between these operons, the immunity region contains genes 55 and 56 (Repressor and Cro) which, as previously pointed out by Schicklmaier and Schmieger (74), encode proteins that are identical to their homologues in P22. This agrees with the observation that ES18 has the same immunity (repressor) specificity as P22 (48, 82). In addition, ES18 gene 57 protein is identical to the P22 c1 gene protein. Transcriptional antitermination should be exerted on the early and late operons by gene 54 protein (85% identical over 75 amino acids to the P22 gene 24 protein) and gene 73 protein (a distantly related λ Q protein homologue), respectively. The former appears to have a target specificity identical to that of the P22 protein, since sequences identical to P22's nut sites are present at the expected locations, but the latter has no known close relatives and so likely has a novel target specificity.
The early left operon of ES18 is typical of lambdoid phages; it contains 21 genes (34 through 54) that are mosaically related to other known lambdoid phages. From 5′ to 3′ (right to left on the map), the putative early left operon mRNA carries genes that are similar to λ's genes N, cIII, and kil, HK620 gene hkaM, P22 erf, Sf6 gene 27 (a bacterium-type single-stranded DNA binding protein), P22 abc2, HK97 gene 38, P22 eaE and eaD, PY54 gene 46, 933W gene L007, ST64T gene eaa1, 15 gene 46, and P22 eaA (see Table Table11 for details). Only two genes in this region, 40 and 53, have no known homologues. Finally, the ES18 integration function integrase and excisionase (encoded by genes 34 and 35, respectively) are 96% identical to those of P22 and its putative attP integration site is identical to that of P22, and so it is virtually certain that ES18 utilizes the P22 attB attachment site for integration into the host chromosome as was suggested by Kuo and Stocker (48).
The ES18 early right operon is also typical of lambdoid phages. It contains 18 genes, 56 through 73, of which only 60, 62, and 68 do not have homologues in parallel locations in other lambdoid phage genomes. The putative ES18 DNA replication initiation gene 59 protein is 93% identical to the λ O gene product (and its putative 85-bp replication origin sequence has only three differences from that of lambda), but its putative partner gp61 is only a very distant relative (if it is a homologue at all) of the λ P protein; gp61 contains primase and helicase homologies, and its closest relatives are in a Shigella defective prophage Flex3 (9) and the phage T7 gene 4 primase protein (85). The ES18 Nin region's genes 63 through 72 are most closely related to those of P22.
At the right end of the genome, near the promoter-proximal end of the ES18 late operon, as Schicklmaier and Schmieger (74) previously reported, the holin and lysozyme lysis genes, 74/75 and 76, respectively, are nearly identical to those of P22, and λ Rz and Rz1 homologues, gene 77 and 78, are present but less closely related. Immediately downstream, gene 79 is homologous to the phage P22 rha (orf201) gene, whose product is detrimental for lytic growth in the absence of host integration host factor function (10, 40).
Thus, ES18 genome is a perfectly “typical” lambdoid phage in that it has a standard lambda-like transcriptional program, its predicted gene functions have the canonical lambdoid arrangement on the genome, and its genome is clearly mosaically related to other characterized lambdoid phage genomes. Perhaps the most notable feature of this mosaicism is that the ES18 head assembly genes are very different from any of the currently known lambdoid phages.
Bacteriophage headful nucleases have no strong sequence specificity. This lack of specificity allows the packaging of some host DNA, and all naturally occurring generalized transducing phages utilize a headful packaging strategy. The generalized transducing phages that have been studied in detail, most notably phages P22 and P1, use a DNA recognition-cleavage mechanism that utilizes a pac site for recognition of DNA that is to be packaged (79, 80, 96). Replicated phage concatemeric DNA is recognized at a pac site by the phage terminase, a cut is made in the DNA at or near that point, and a processive series of packaging events proceeds in one direction from the DNA break thus produced. The result is virion DNA that is terminally redundant and partially circularly permuted (4, 42, 86, 87). When such virion DNA is cleaved by a restriction enzyme, a unique fragment, one of whose ends is the packaging series initiation cut, is generated only from the first member of each packaging series. This pac fragment is thus present in submolar amounts relative to the true restriction fragments (usually one-half to one-fourth of that expected if they were present in every DNA molecule). The presence of such a submolar fragment is considered to be diagnostic of this type of headful DNA packaging, as opposed to mechanisms like those of phages λ or T7, which generate ends at precisely the same location in all virion DNA molecules. However, the virion DNAs of some apparently headful packaging phages do not exhibit an obvious pac fragment, e.g., 933W (68), A118 (55), HK620 (21), APSE-1 (88), HSIC (J. Paul, personal communication), Sf6 (17), 11 (56), 42 (64), Aa23 (94), and probably T4 (54). We have recently presented evidence that Sf6 packages DNA in a manner that is very similar to that of the phages that do exhibit a pac fragment; however, the DNA cut that initiates the packaging series occurs not at a precise location but at scattered locations within an 1,800-bp region such that the pac fragment electrophoresis band, although present, is so diffuse as to be essentially invisible to staining (17).
Schicklmaier and Schmieger (74) reported that ES18 virion DNA does not have cohesive ends and does not exhibit a pac fragment in ethidium bromide-stained electrophoresis gels. Our findings above show that its virion chromosome is about 10% terminally redundant and has an imprecise virion DNA length, strongly suggesting that ES18 utilizes a headful packaging mechanism similar to that of Sf6. To determine whether this is true, we searched for the diagnostic diffuse pac fragment. Figure Figure5A5A shows that restricted ES18 virion DNA does indeed exhibit a submolar, diffuse DNA band in electrophoresis gels when cleaved by the restriction endonucleases BamHI, BssHII, ScaI, or NgoMIV. This diffuse band is visible upon Southern analysis when using an appropriate probe, but it is not visible after ethidium bromide staining. Analysis of double digests showed that the variable end of each of these fragments is its left end (e.g., double digests of BamHI with BssHII or NgoMIV yields only the BamHI-sized diffuse band [data not shown]), and Fig. Fig.5B5B shows that the diffuse ends of all of these fragments coincide and are centered on the putative ES18 small terminase subunit encoding gene 1. Each of these diffuse putative pac fragment bands is about 1,000 bp in width.
These findings support a model for ES18 DNA packaging in which initiation of DNA packaging series occurs at many locations between bp −300 (46,600) and 700 in the genome sequence, and DNA insertion into the procapsid proceeds rightward on the sequence from the site of initiation. This large initiation region is reminiscent of packaging by phage Sf6 (17) and is related to the well-characterized sequential headful packaging by phage P22 (96); however, ES18 initiation occurs within a region that is larger than P22's 120 bp and smaller than Sf6's 1,800 bp. Since the diffuse pac fragment band is submolar compared to true restriction fragments from outside the initiation region and because intact restriction fragments that span this region are present (Fig. (Fig.5A),5A), it is almost certain that ES18, like P22, packages additional headfuls of DNA in a sequential manner along the concatemeric product of rolling-circle DNA replication. We note that in P22, Sf6, SPP1, T4, and P1 the region where the packaging series initiates is within or overlaps the gene that encodes the small terminase subunit (19, 54, 80, 96), the protein that carries the specificity for recognition of the DNA to be packaged (15, 16, 19). In P22 this site has been genetically identified, and it lies near the center of the 120-bp region within which P22 initiates DNA packaging (96); similarly, the Sf6 pac site also lies near the center of the 1,800-bp region within which it initiates packaging (E. Gilcrease and S. Casjens, unpublished results). We predict that ES18 will similarly carry a pac DNA packaging recognition site within its 1,000-bp packaging series initiation region, and we note that 8 of 10 bp and 10 of 11 bp at positions 240 to 250 and 462 to 472, respectively, are identical to sections of the pac site region of phage Sf6 (Gilcrease and Casjens, unpublished). It seems possible that one or both are part of the ES18 pac site. If true, its terminase must be able to either move or reach up to 500 bp along the DNA from the recognition site before cleaving to start a packaging series.
Phage terminases recognize DNA for packaging and have a nuclease activity that is responsible for creating the ends of the virion chromosome (18). The putative gene 2-encoded ES18 large terminase subunit's closest relatives among the characterized phages are those from phages Sf6, HK620, APSE-1, and HSIC, which are all headful packaging phages that exhibit no obvious, sharp pac fragment band upon electrophoretic separation of their restricted virion DNAs (see above). This observation prompted us to ask if the terminases that create different types of virion DNA ends were members of recognizably separable amino acid sequence groups. The dsDNA, tailed phage virion chromosomes are known to have several different types of end structures, including the (i) 5′- and (ii) 3′-protruding single-stranded cohesive ends with no terminal redundancy or circular permutation (e.g., 5′ phages λ and P2, 3′ HK97), (iii) direct terminal repeats with no circular permutation (e.g., phages T3 and T7), (iv) host DNA attached to both ends (e.g., phage Mu), and (v) the headful packaging phages that have virion DNAs with both terminal redundancy and circular permutation (e.g., phages T4, SPP1, P1, and P22). Although we did not include them in this analysis, we note that the eukaryotic herpesviruses appear to also have terminase proteins which are involved in viral genome maturation and which have sequences that are related to the phage terminases (69). In addition, phage 29 and its relatives are unique among the tailed dsDNA phages in that they do not cleave the DNA during packaging and have a protein covalently bound to each DNA end. Although arguments can be made that 29 gene 16 protein and its close relatives have similar (perhaps even homologous?) functions to other large terminase subunits, they have no convincing overall sequence similarity to the other terminases, so they were not included in the following analysis.
We compared a set of 114 large terminase subunit sequences that includes nearly all those for which both the terminase sequence and virion DNA end structure are known, as well as a number whose end structure is not known. This set includes all of the five types mentioned above. Figure Figure6shows6shows a neighbor-joining tree of these large terminase subunits. They fall into at least eight robust groups that have bootstrap values of ≥992 out of a possible 1,000, each of which includes only terminases known to create a single type of DNA end (when they have been studied). Several of the major DNA packaging types form single, convincing groups, while others divide into several types. The 5′ cohesive end-forming enzymes split cleanly into two types, exemplified by those of phages λ and P2. The 3′ cohesive end-forming terminases cluster, with one exception (phage VP16T), into a robust single large group, as do the T7-like enzymes. The Mu-like phages—so classified on the grounds that they all appear to utilize a transposition replication mechanism—also form one robust terminase group. The headful packaging terminases fall into several separable groups at this resolution (see below). Maximum likelihood analysis (data not shown) gave nearly identical terminase clustering, further supporting the robustness of these groupings. Previous smaller studies (51, 61) have noticed relationships among some of these terminases but did not include enough terminases to see the complete picture presented here. Clearly, terminases with similar enzymatic end-generating functions usually cluster together.
Thus, we believe that the structure of virion DNA ends can be accurately predicted for phages where this is not known experimentally, if their putative terminase amino acid sequence falls convincingly within one of these robust groups. For example, we predict that Gifsy-2 (29, 59), SfV (1), and VWB (3) will be found to have λ type 5′ cohesive ends, 3′ cohesive ends, and headful packaging ends, respectively, in spite of the fact that several of these (e.g., Gifsy-2 and VWB) have terminases that are quite different from any studied terminase. This improved understanding of terminase relationships could also be helpful in searches for generalized transducing phages of particular bacteria, since all known generalized transducing phages utilize headful packaging and many such terminases can be recognized from their sequence.
Although the vast majority of large terminase subunits do fall into the above groups, we hasten to point out that these groups do not encompass all known terminases. In particular, 3′ cohesive end phages TM4, MS1, and r1t and headful packaging phages P1 and 11, as well as T7-like VpV262, have terminases that are quite different from the majority of such enzymes and were not included in our analysis. TM4, MS1, and r1t, which form a related subgroup, usually fell within the 3′ cohesive end group. The inclusion of the above seven phage terminases sometimes (depending upon which combination was included) tended to lower the overall bootstrap value for their groups by virtue of their relatively weak affinities for those groups.
Several of the major Fig. Fig.66 terminase groups have phage members that infect different classes of bacteria (e.g., proteobacteria and firmicutes in the 3′ cohesive end, P22-like headful, and T7-like groups), and terminases from different groups can be found in phages with very similar molecular lifestyles and that infect closely related host bacteria (e.g., the 3′-extended cohesive end terminases, 5′-extended cohesive end terminases, and headful terminases are found in the clearly otherwise lambdoid phages HK97, λ, and P22, respectively). This is no doubt only a small fraction of such relationships that exist, since the genomes of tailed phages that infect only a few of the many bacterial classes have been studied or sequenced, but it does suggest that either all types of terminases were present before the classes of bacteria diverged or, perhaps more likely, that in the past phages have switched hosts or phages have exchanged terminase genes across vast evolutionary distances (39). Possible examples of relatively recent terminase exchange among the lambdoid phages are the similar terminases of phages ES18 and Sf6, which are otherwise quite different members of this group; similarly, the terminases of Bacillus subtilis (firmicute) phage 105 and E. coli (proteobacteria) ΦP27 lie within the same rather robust subgroup.
On the other hand, in spite of the accumulating evidence that phages have engaged in much horizontal exchange of genetic material over the eons, terminases do not appear to have been shuffled randomly among the dsDNA phages. For example, among those phages that have been studied in sufficient detail to know, all the terminases within the P2-like, Mu-like, T4-like, and T7-like groups (Fig. (Fig.6)6) are found only in phages that are sufficiently similar in other ways (e.g., transcriptional organization and replication strategies) to form the same groups without reference to their terminases. Such relationships may reflect as-yet-poorly understood functional connections between DNA packaging and other aspects of the lifestyle of these phages.
The headful packaging terminases appear to be more diverse than the other types. Three of the four well-characterized headful terminases fall into two separable groups, typified by T4 and P22/SPP1 (Fig. (Fig.6),6), while the fourth, that of phage P1 (not shown in Fig. Fig.6),6), does not fall convincingly into any of the observed clusters. In addition, the Mu-like terminases and gene transfer agent (GTA) terminase, although they have not been studied in detail, must have headful properties, since neither packages DNA with sequence-specific ends. The Rhodobacter GTA (49, 50) putative terminase clusters with several others from prophage-like entities in the published Brucella (23), Agrobacterium (31, 95), and Caulobacter (65) genome sequences, and it will be interesting see if the latter three also have gene transfer function. Similarly, phage BBC5 (a Sinorhizobium meliloti phage; GenBank accession no. AF448724) and Bcep22 (a Burkholderia cepacia phage; GenBank accession no. AY349011) terminases cluster fairly robustly near the T4-like group and, although their packaging mechanisms have not been reported, this analysis suggests that they are likely to be headful packagers. Finally, the headful TP901-1 and Aa23 terminases form a small, moderately supported group, and those of 933W and 11 (the latter is not shown in Fig. Fig.6)6) do not cluster strongly with each other or any other terminase in our analysis set. This larger diversity of headful terminases compared to the other types suggests that they may be the ancestral terminase type.
The ES18 large terminase subunit falls into a headful subgroup with seven members (Fig. (Fig.6),6), five of which, ES18, Sf6, HK620, HSIC, and APSE-1, have been reported to have virion DNA that has no cohesive end and no obvious pac fragment (the other two members of this subgroup are prophages in bacterial genome sequences whose parental phages have not been studied; both of these, Flu in Haemophilus influenzae Rd and Plu10 in Photorhabdus luminescens TT01, are probably defective prophages [25, 39]). Headful terminases that do not generate a pac fragment are not, however, limited to this subgroup. For example, phages A118, 933W, 11, and Aa23 are headful packagers that do not exhibit an obvious pac fragment, and their terminases fall outside of this subgroup (and, except for A118, are not robustly within any of the Fig. Fig.66 groups). It may be that the presence of a pac fragment is a somewhat artificial distinction, since the three headful packaging phages that do show a pac fragment and have been studied in detail do not actually initiate packaging at a truly precise site, but the initiation cuts are distributed over a number of base pairs. In phages SPP1, P1, and P22, packaging series initiate with cuts that occur within 9-, 12-, and 120-bp regions, respectively (22, 80, 96). It may well be that no headful packagers initiate at a precise site and that the size of the regions over which different phages initiate packaging series is a continuum rather than falling neatly into “pac fragment” and “no pac fragment” categories. If true, it would not be surprising that the two types do not all break into neatly distinct groups.
This work was supported by grant MCB-990526 to S.C. from the National Science Foundation and by grant GM51975 from the National Institutes of Health to R.H. and G.H.
We thank John Paul and Guy Plunkett for information on the HSIC terminase and 80 J gene sequences, respectively, prior to their publication. We thank Jon Seger for help with PHYLIP analysis.