|Home | About | Journals | Submit | Contact Us | Français|
Bacteriophage K1F specifically infects Escherichia coli strains that produce the K1 polysaccharide capsule. Like several other K1 capsule-specific phages, K1F encodes an endo-neuraminidase (endosialidase) that is part of the tail structure which allows the phage to recognize and degrade the polysaccharide capsule. The complete nucleotide sequence of the K1F genome reveals that it is closely related to bacteriophage T7 in both genome organization and sequence similarity. The most striking difference between the two phages is that K1F encodes the endosialidase in the analogous position to the T7 tail fiber gene. This is in contrast with bacteriophage K1-5, another K1-specific phage, which encodes a very similar endosialidase which is part of a tail gene “module” at the end of the phage genome. It appears that diverse phages have acquired endosialidase genes by horizontal gene transfer and that these genes or gene products have adapted to different genome and virion architectures.
Many strains of Escherichia coli produce the K1 antigen, which is a thick polysaccharide capsule composed mainly of α-2,8-linked polysialic acid (46). This capsule contributes to pathogenicity by allowing the bacteria to evade the immune system and cross barriers (i.e., the blood-brain barrier). The K1 capsule may also act as a defense against certain phages, such as T7, which recognizes structures beneath the capsule, by physically blocking adsorption (43). Some phages, on the other hand, require the K1 capsule for infection (19). Such phages typically possess virion-bound endo-neuraminidase (endosialidase) that degrades the capsule by cleaving the α-2,8 bond, allowing the phage access to the bacterial surface. Phage-encoded endosialidase genes that have been cloned and sequenced include K1E (28), K1-5 (41), 63D (30), and K1F (37), all of which share significant sequence similarity to one another.
Analysis of several phage genome sequences has led to the classification of the T7 supergroup, the members of which have common morphological, biological, and genomic characteristics (20; for a review on the T7 group, see reference 32). T7 supergroup phages in which the genomes have been sequenced include coliphages T7 (13), T3 (34), and K1-5 (42); yersiniophages A1122 (16) and Ye03-12 (35); salmonellaphage SP6 (12, 42); cyanophage P60 (4); vibriophage VpV262 (20); Pseudomonas phages gh-1 (25) and KMV (26); and roseophage SI01 (40). Also, the genome sequence of Pseudomonas putida KT2440 revealed a prophage with considerable sequence similarity to members of the T7 supergroup, although it is not known if this phage is still functional (52). It has become apparent that within the T7 supergroup there are subgroups of phages that are more similar to each other than to other T7 supergroup members. One of these subgroups contains those that are very similar to T7 itself and includes T3, A1122, Ye03-12, and the slightly more distant gh-1. Another subgroup includes salmonellaphage SP6 and K1-5, which can infect either K1 or K5 E. coli strains (42). The SP6 subgroup has diverged significantly in sequence similarity from the T7 subgroup and has somewhat different genome organization, but nevertheless the members of this group probably share similarities in replication to T7. Unlike the T7 subgroup, SP6 and K1-5 have a tail gene module at the end of the linear genome that encodes capsule-degrading tail proteins. In K1-5, one of these genes is the endosialidase that is responsible for degrading the K1 capsule. Members of the T7 subgroup, on the other hand, encode tail fibers that simply recognize host lipopolysaccharide and do not have any known capsule-degrading activity.
K1F was first isolated from sewage in 1984 (51) and was used to identify polysialic acid in neuronal membranes as well as for detecting the K1 antigen in E. coli (53). K1F encodes a phage-bound endosialidase that was shown to be part of the tail structure of the virion (37). The endosialidase is autocatalytically truncated, in which 152 amino acids are cleaved from the C terminus to generate active enzyme (33). The enzyme has also been shown to form trimers, which seem to be common among phage tail proteins, including the T7 tail fiber (48). Other than its host range, little else is known about the biology of K1F. In this work, we determined the complete nucleotide sequence of the K1F genome and found that it is another member of the T7 supergroup. Unlike K1-5, K1F is more closely related to T7, T3, A1122, YeO3-12, and gh-l, and we therefore assign it as a member of the T7 subgroup and not the SP6 subgroup. Also unlike K1-5, the K1F endosialidase is not encoded in a module at the end of the genome but is located in the analogous position to the T7 tail fiber. The K1F endosialidase has an N-terminal head-binding domain with similarity to the T7 tail fiber. It appears that K1F evolved from a close T7 ancestor in which the C-terminal portion of the tail fiber was replaced by an endosialidase domain, allowing it to replicate on K1 strains of E. coli.
The K1F genome sequence was determined by a combination of shotgun sequencing and primer walking (Fig. (Fig.1).1). Open reading frames (ORFs) were determined using a combination of visual inspection and translated BLAST searches (1) and by analysis of the predicted translated sequence using GCG-lite (11). The genome consists of a single double-stranded DNA molecule of 39,704 bp with a GC content of 50%. We have annotated 42 open reading frames (see Table S1 in the supplemental material), which are all transcribed on one strand of the DNA molecule. The K1F genome is flanked by terminal repeats of 179 bp. It is a tightly packed genome with 88% of the sequence predicted to encode proteins. We use the same gene-numbering system, starting from left to right on the genetic map, as that of T7 since the two genomes are similar in both organization and sequence. ORFs that are not present, or have no sequence similarity, to a previously characterized T7-like phage are simply named by the amino acid length (i.e., orf156). Like T7, the genome can be divided into three regions: early, middle, and late.
The first gene predicted to be transcribed on the K1F genome is 0.3, which is likely to be involved in antirestriction (49). Other than the endosialidase (see below), this the only open reading frame carried by K1F that is more similar to phages K1-5 and SP6 than to T7 and could reflect differences in the host strains that they infect. Characteristic of the T7-like phages, K1F encodes an RNA polymerase (RNAP; 1.0) which is responsible for transcription of most of the phage genes but also is involved in other functions, such as translocating phage DNA into the cell (17), as well as DNA replication, maturation, and packaging (55). A DNA ligase gene analogous to that of T7, 1.3, is present and represents the end of the early genes. K1F carries fewer genes in the early region and is missing equivalents to T7 genes 0.4, 0.5, 0.7, and 1.2.
The middle region encodes mainly proteins involved in DNA metabolism and includes a host RNA polymerase inhibitor (2.0), single-stranded DNA binding protein (2.5), endonuclease (3.0), lysozyme (3.5), primase/helicase (4.0), DNA polymerase (DNAP; 5.0), and exonuclease (6.0). All are closely related to one of the close-knit T7 family members, including T7, T3, and YeO3-12. This region carries several smaller putative T7-like genes, including 1.6, 1.7, 1.8, 5.5, 5.7, and 6.7; the function of these ORFs is unknown. ORFs 50, 156, 71, 50, 57, and 122 have no significant sequence similarity to anything in the databases.
Several bacterium and bacteriophage genomes have been found to contain group I introns, including phages phiI and W31, which are possibly members of the T7 supergroup (2). Both have a group I intron inserted 156 bp from the end of the DNAP gene. K1F also contains a putative group I intron within its DNAP gene, also 156 bp from the end (positions 14937 to 15534 on the genome). Like phi and W31, the intron encodes a 131-amino-acid homing endonuclease very similar (77% identity) to I-TsII (for a review, see reference 18). T3, T7, and YeO3-12 also contain ORFs with similarity to homing endonucleases (34); however, these do not appear to be part of introns and are located in intergenic regions.
The late region of the T7 family encodes the virion structural proteins as well as many of the proteins involved in maturation and cell lysis. The organization of the K1F structural genes is nearly identical to that of T7. The first ORF is the head-tail connector (8.0), followed by those coding for the scaffolding protein (9.0), capsid (10.0), tail tube (11.0 and 12.0), internal virion proteins (13.0 to 16.0), and the tail protein (17.0; see below). Following the genes coding for the major structural proteins are those involved in lysis and maturation, which include genes 17.5, 18, 18.5, and 19.0.
Perhaps the most striking difference between K1F and T7 is the tail protein, gp17. The gene coding for this product in T7, T3, A1122, and YeO3-12 encodes the tail fiber protein which is involved in recognition and adsorption to the host LPS. The K1F counterpart, however, has only a small region of amino acid similarity to the T7 tail fiber at the N-terminal head-binding portion (37). The central catalytic region is highly similar to those of other phage endosialidases (57% amino acid identity to 63D, 64% to CUS-3, and 80% to K1-5/K1E) which are involved in both recognition and depolymerization of the K1 polysaccharide capsule.
The initial report of the cloning of the endosialidase gene showed a length of 2,763 bp (37). However a sequencing error was discovered, and it was found that the actual gene length is 3,195 bp and that the C terminus of the protein product is postranslationally cleaved (33). Our results confirm a gene length of 3,195 bp.
Like most of the members of the closely related T7 family, K1F is flanked by terminal repeats suggesting similar replication strategies. Immediately following the left terminal repeat is an equivalent to the transcriptional termination site known as the CJ terminator (bases 179 to 186), where RNAP, complexed with lysozyme, pauses at the concatemer junction during maturation and packaging (29, 54, 55).
Most gene transcription among T7 supergroup phages is driven by phage-encoded RNAPs which typically recognize very specific promoter sequences. K1F encodes an RNAP similar to that of T7 (64% amino acid identity), so we predicted that K1F might have a similar promoter consensus sequence. By mainly visual analysis, we were able to identify 10 putative promoters, most of which have T7 homologues (see Table S2 in the supplemental material). The consensus sequence from −8 to +2 is identical to that of T7. However bases −9 through −12, which are important for promoter specificity, differ. In T7 any change from the consensus C at −9 results in an inactive promoter (21). Since K1F has a T at this position, it is unlikely that the T7 RNA polymerase will initiate transcription from a K1F promoter and it is likely that the two RNAPs have different promoter specificities.
K1F appears to have fewer phage promoters throughout the middle region of its genome and is missing equivalents to T7 promoters 1.1a, 1.1b, 3.8, 4c, 4.3, and 4.7. (see Fig. S2 in the supplemental material). The late region of K1F appears to have all of the T7 analogues. We identified a OR promoter in K1F but not a OL promoter. OR is involved in packaging (6, 7) and possibly replication (39). A phage-specific promoter was found immediately upstream of the RNA polymerase gene in K1F, suggesting that there is some autoregulation of this gene; phage gh-1 is the only other T7-like phage to have a phage-specific promoter in this region.
The T7 genome contains three strong σ70 promoters responsible for early gene transcription, including phage RNA polymerase. We have annotated one such promoter in K1F (positions 924 to 951).
The end of transcription of the early genes in T7 is marked by a rho-independent terminator (TE) immediately following the ligase gene (23). K1F has a sequence in the equivalent position (7128 to 7161) characteristic of such terminators that could serve this function. Efficient termination of host RNAP in T7 requires gene 0.7, which encodes a kinase that phosphorylates E. coli RNAP (38). Since K1F lacks this gene, we have to assume that this terminator alone is enough to end transcription. K1F also has a potential analogue to the T7 T terminator (positions 22314 to 22352) which terminates transcription of phage RNAP.
The primary origin of replication in T7 is downstream of the 1.1a and 1.1b promoters, which are used by T7 RNAP to initiate replication of the leading strand (14, 15). K1F has a relatively large noncoding region in the analogous position on the genome but does not have analogues to promoters 1.1a and 1.1b. However K1F promoter 1.3 is actually in a position such that it could serve both to initiate replication as well as to initiate transcription of downstream gene(s). In addition, it is known that T7 does not require the primary origin for growth, and replication can also be initiated from secondary origins associated with promoters 6.5, 13, and OR (39). These are all present in K1F, and any one of these could potentially serve as the primary origin.
Double-stranded DNA phages are known to have mosaic genome structures (3, 22, 36) that are composed of functional modules which can be genes, clusters of genes, or domains within a gene that are acquired through horizontal transfer. Such mosaicism is evident in all of the members of the T7 supergroup. This makes classification of these phages particularly difficult based on classical criteria (27). However, based on overall genome organization and sequence similarity, it is clear that K1F is more closely related to T7, T3, A1122, Ye03-12, and gh-1 within the larger T7 supergroup. K1-5, on the other hand, has a somewhat different genome organization and has more sequence similarity to SP6, and they together make up their own subgroup within the T7 supergroup. There are just two genes in K1F that show more sequence similarity to K1-5 than to any member of the T7 subgroup: one is gene 0.3, and the other, discussed below, is the gene coding for endosialidase.
Like all K1-specific phages studied at the molecular level, K1F encodes a tail-associated endosialidase, a key determinant in host specificity that allows the phages to penetrate the polysaccharide capsule. The K1F endosialidase shares sequence similarity to the N-terminal portion of the T7 tail fiber protein, the region that is responsible for attachment to the virion, and it appears that K1F arose from a T7-like ancestor in which an endosialidase gene became fused to a portion of the tail fiber. Another phage-encoded endosialidase, that of phage 63D, which possesses a longer tail structure and is probably not a member of the T7 supergroup (30), has an N-terminal domain with sequence similarity to the minor structural protein of salmonellaphage MB78 (9), which perhaps could play the role connecting the endosialidase to the head. Genome sequencing of E. coli K1 strain RS218 has uncovered yet another probable K1-specific prophage, CUS-3, which is also not a member of the T7 supergroup but is more similar to the lysogenic phages HK620 and P22 (10). It is not known if this phage is still able to go through a lytic cycle, but CUS-3 does encode an endosialidase which shares sequence similarity to the central portion of the endosialidases described above. The N-terminal 120-amino-acid sequence of the CUS-3 endosialidase is nearly identical (90% amino acid identity) to the head-binding domain of the HK620 tailspike (8) and the well-studied p22 tailspike (47) as well as the head-binding domains of Sf6 (5), ST64T (31), and APSE-1 (50). Again, the fusion of an endosialidase gene to a portion of a tail gene appears to have occurred in an ancestor of CUS-3. The K1-5 endosialidase differs from the above three in that it has no head-binding domain at all and is possibly linked to the head by separate polypeptide, a feature that may allow the phage to fit two different tail proteins to its head (42). It appears that phage-encoded endosialidases have been adapted to a variety of phage structural architectures, particularly with regard to the mechanisn in which they are linked to the virion.
The genomes of several members of the T7 supergroup members have now been sequenced, and a clearer picture of how these viruses evolved is coming into view. However the pool of phages examined so far is somewhat skewed toward those that infect laboratory strains, which often have no capsule or a greatly reduced amount. There are many potential T7-like phages that infect a wide range of hosts that produce chemically different capsules (44). Further work needs to be done to determine how these phages have evolved to infect diverse bacteria.
The GenBank accession number for the K1F genome is DQ111067.
This research was supported by the Intramural Research Program of the National Institute of Mental Health, National Institutes of Health.
We thank Eric Vimr for supplying strains and Ian Molineux and Sankar Adhya for helpful discussions.
†Supplemental material for this article may be found at http://jb.asm.org/.