The K1F genome sequence was determined by a combination of shotgun sequencing and primer walking (Fig. ). Open reading frames (ORFs) were determined using a combination of visual inspection and translated BLAST searches (1
) and by analysis of the predicted translated sequence using GCG-lite (11
). The genome consists of a single double-stranded DNA molecule of 39,704 bp with a GC content of 50%. We have annotated 42 open reading frames (see Table S1 in the supplemental material), which are all transcribed on one strand of the DNA molecule. The K1F genome is flanked by terminal repeats of 179 bp. It is a tightly packed genome with 88% of the sequence predicted to encode proteins. We use the same gene-numbering system, starting from left to right on the genetic map, as that of T7 since the two genomes are similar in both organization and sequence. ORFs that are not present, or have no sequence similarity, to a previously characterized T7-like phage are simply named by the amino acid length (i.e., orf156
). Like T7, the genome can be divided into three regions: early, middle, and late.
FIG. 1. ORF-by-ORF comparison of the genomes of T7, K1F, and K1-5. The blue shading indicates the percent amino acid identity of the open reading frames. Open reading frames throughout the K1F genome show considerably more sequence similarity to T7 than to K1-5. (more ...)
The first gene predicted to be transcribed on the K1F genome is 0.3
, which is likely to be involved in antirestriction (49
). Other than the endosialidase (see below), this the only open reading frame carried by K1F that is more similar to phages K1-5 and SP6 than to T7 and could reflect differences in the host strains that they infect. Characteristic of the T7-like phages, K1F encodes an RNA polymerase (RNAP; 1.0
) which is responsible for transcription of most of the phage genes but also is involved in other functions, such as translocating phage DNA into the cell (17
), as well as DNA replication, maturation, and packaging (55
). A DNA ligase gene analogous to that of T7, 1.3
, is present and represents the end of the early genes. K1F carries fewer genes in the early region and is missing equivalents to T7 genes 0.4
, and 1.2
The middle region encodes mainly proteins involved in DNA metabolism and includes a host RNA polymerase inhibitor (2.0), single-stranded DNA binding protein (2.5), endonuclease (3.0), lysozyme (3.5), primase/helicase (4.0), DNA polymerase (DNAP; 5.0), and exonuclease (6.0). All are closely related to one of the close-knit T7 family members, including T7, T3, and YeO3-12. This region carries several smaller putative T7-like genes, including 1.6, 1.7, 1.8, 5.5, 5.7, and 6.7; the function of these ORFs is unknown. ORFs 50, 156, 71, 50, 57, and 122 have no significant sequence similarity to anything in the databases.
Several bacterium and bacteriophage genomes have been found to contain group I introns, including phages phiI and W31, which are possibly members of the T7 supergroup (2
). Both have a group I intron inserted 156 bp from the end of the DNAP gene. K1F also contains a putative group I intron within its DNAP gene, also 156 bp from the end (positions 14937 to 15534 on the genome). Like phi and W31, the intron encodes a 131-amino-acid homing endonuclease very similar (77% identity) to I-TsII (for a review, see reference 18
). T3, T7, and
YeO3-12 also contain ORFs with similarity to homing endonucleases (34
); however, these do not appear to be part of introns and are located in intergenic regions.
The late region of the T7 family encodes the virion structural proteins as well as many of the proteins involved in maturation and cell lysis. The organization of the K1F structural genes is nearly identical to that of T7. The first ORF is the head-tail connector (8.0), followed by those coding for the scaffolding protein (9.0), capsid (10.0), tail tube (11.0 and 12.0), internal virion proteins (13.0 to 16.0), and the tail protein (17.0; see below). Following the genes coding for the major structural proteins are those involved in lysis and maturation, which include genes 17.5, 18, 18.5, and 19.0.
Perhaps the most striking difference between K1F and T7 is the tail protein, gp17. The gene coding for this product in T7, T3,
YeO3-12 encodes the tail fiber protein which is involved in recognition and adsorption to the host LPS. The K1F counterpart, however, has only a small region of amino acid similarity to the T7 tail fiber at the N-terminal head-binding portion (37
). The central catalytic region is highly similar to those of other phage endosialidases (57% amino acid identity to 63D, 64% to CUS-3, and 80% to K1-5/K1E) which are involved in both recognition and depolymerization of the K1 polysaccharide capsule.
The initial report of the cloning of the endosialidase gene showed a length of 2,763 bp (37
). However a sequencing error was discovered, and it was found that the actual gene length is 3,195 bp and that the C terminus of the protein product is postranslationally cleaved (33
). Our results confirm a gene length of 3,195 bp.
Like most of the members of the closely related T7 family, K1F is flanked by terminal repeats suggesting similar replication strategies. Immediately following the left terminal repeat is an equivalent to the transcriptional termination site known as the CJ terminator (bases 179 to 186), where RNAP, complexed with lysozyme, pauses at the concatemer junction during maturation and packaging (29
Most gene transcription among T7 supergroup phages is driven by phage-encoded RNAPs which typically recognize very specific promoter sequences. K1F encodes an RNAP similar to that of T7 (64% amino acid identity), so we predicted that K1F might have a similar promoter consensus sequence. By mainly visual analysis, we were able to identify 10 putative promoters, most of which have T7 homologues (see Table S2 in the supplemental material). The consensus sequence from −8 to +2 is identical to that of T7. However bases −9 through −12, which are important for promoter specificity, differ. In T7 any change from the consensus C at −9 results in an inactive promoter (21
). Since K1F has a T at this position, it is unlikely that the T7 RNA polymerase will initiate transcription from a K1F promoter and it is likely that the two RNAPs have different promoter specificities.
This exlusivity of promoter specificity seems to be the trend among the T7-like phages (24
), and it seems that there must be some selective pressure that resulted in this feature.
K1F appears to have fewer phage promoters throughout the middle region of its genome and is missing equivalents to T7 promoters 1.1a
, and 4.7
. (see Fig. S2 in the supplemental material). The late region of K1F appears to have all of the T7 analogues. We identified a OR
promoter in K1F but not a OL
is involved in packaging (6
) and possibly replication (39
). A phage-specific promoter was found immediately upstream of the RNA polymerase gene in K1F, suggesting that there is some autoregulation of this gene; phage gh-1 is the only other T7-like phage to have a phage-specific promoter in this region.
The T7 genome contains three strong σ70 promoters responsible for early gene transcription, including phage RNA polymerase. We have annotated one such promoter in K1F (positions 924 to 951).
The end of transcription of the early genes in T7 is marked by a rho-independent terminator (TE) immediately following the ligase gene (23
). K1F has a sequence in the equivalent position (7128 to 7161) characteristic of such terminators that could serve this function. Efficient termination of host RNAP in T7 requires gene 0.7
, which encodes a kinase that phosphorylates E. coli
). Since K1F lacks this gene, we have to assume that this terminator alone is enough to end transcription. K1F also has a potential analogue to the T7 T
terminator (positions 22314 to 22352) which terminates transcription of phage RNAP.
The primary origin of replication in T7 is downstream of the 1.1a
promoters, which are used by T7 RNAP to initiate replication of the leading strand (14
). K1F has a relatively large noncoding region in the analogous position on the genome but does not have analogues to promoters 1.1a
. However K1F promoter 1.3
is actually in a position such that it could serve both to initiate replication as well as to initiate transcription of downstream gene(s). In addition, it is known that T7 does not require the primary origin for growth, and replication can also be initiated from secondary origins associated with promoters 6.5
, and OR
). These are all present in K1F, and any one of these could potentially serve as the primary origin.