After tectivirus, myovirus is the second most ubiquitous family of Thermus phages isolated so far. However, all of the collected Thermus myoviruses, except for ϕYS40 (which has a genome size of >100 kb), resemble coliphage Mu or P2 with a genome size of 28–34 kb.9
This report describes the isolation of another myovirus, ϕTMA, with large genome size and morphology similar to those of the ϕYS40, but with different host specificity.
The closed circular map of both ϕTMA and ϕYS40 genomes consisted of approximately 152 kb long DNA. This value is, however, inconsistent with the previous report in which the molecular weight of the ϕYS40 virion DNA, as determined by sucrose density gradient centrifugation, was estimated to be 1.36 × 108
(approximately 206 kb, assuming that the average molecular weight of a single DNA base pair is 660) using the T4 DNA as an internal marker (1.3 × 108
, approximately 197 kb),21
suggesting that the ϕYS40 genome is longer than the T4 genome. In this study, the length of the genomic DNA purified from the phage particles was determined more precisely using PFGE, and was found to be more than 200 kb. In the T4-related phages, DNA packaging starts from any free end of a long concatemer duplicated in the host cell and terminates when the head is full, giving rise to a linear DNA with various terminally redundant ends within their capsids.24
The redundancy accounts for the circular permutation of the genome and is usually several percent of the circular map, which, for example, is approximately 3% in T4. In contrast, the length of the DNA within the phage particles of ϕTMA and ϕYS40 is more than 50 kb longer than the circular map, and thus, the redundancy is more than 30%. The low mobility of the ϕYS40 genomic DNA in the PFGE assay could not be explained on the basis of its covalent binding to a protein, such as the YSP_065 protein that was predicted as the protein primer in DNA replication, because the DNA sample used in the PFGE analysis was pre-treated with proteinase K. The large redundancy may facilitate homologous recombination, contributing to the repair of frequently occurring mutations at a high temperature. Consistent with this idea, several genes supposedly involved in the DNA repair, such as the putative recA
genes, were found in the ϕTMA and ϕYS40 genomes. It is, however, unclear whether, among all the Thermus myoviruses, the large redundancy is specific to ϕYS40 and ϕTMA, because we currently do not have appropriate information on the precise genome sizes and complete genome sequences of other Thermus myoviruses. Another possible explanation for such a long-terminal repeat is that when the thermophilic phages are propagated in environments other than their native environment (i.e., in the laboratory), many genes are no longer needed and are lost while the terminal repeat is increased in length to physically compensate for the lost genes in their capsids and to maintain the efficiency of DNA packaging and injection. Results of the PFGE assay also suggest a larger difference in length between the DNAs of ϕTMA and ϕYS40 genomes than their nucleotide sequence suggest. The difference might reflect the difference in capsid lengths, one of the determinants of DNA length in T4.25
It has been shown that the head length of the T4 capsids could be controlled by mutations in certain amino acids of the major structural protein, and the head length is determined by a vernier-type mechanism that involves interaction between the core and shell proteins.26
The putative major head proteins of ϕTMA and ΔYS40, encoded by TMA_072 and YSP_073 genes, respectively, could be responsible for the differences in space within the capsid. However, other capsid proteins could also be responsible for determining the length and shape of the capsid.27
T4 has approximately 290 probable protein-coding genes packed into its 169 kb long genome whereas ϕTMA and ϕYS40 have 168 and 170 protein-coding genes, respectively, in their 152 kb long genomes. The average length of ORFs in the ϕTMA and ϕYS40 genomes are 837 bp and 836 bp, respectively, which are longer than the average length of ORFs (588 bp) in the T4 genome, suggesting that the protein-coding genes in the thermophilic phages are longer than those in the T4 phage. The presence of smaller number of protein-coding genes in ϕTMA and ϕYS40 compared with those in the counterpart mesophilic large virulent phages suggests that the thermophilic phages might have a simpler life cycle than the mesophilic ones. Alternatively, it can be argued that the thermophilic phages might require longer proteins to attain stability at higher temperature; however, comparative analysis of complete genomes of mesophilic, thermophilic and hyperthermophilic organisms indicated a trend toward shortened thermophilic proteins relative to their mesophilic homologs,28
and this trend seems to hold true for proteins of thermophilic phages.29
One of the significant differences between the genomes of the two Thermus phages is the presence of the transposase and resolvase genes in the ϕTMA genome. Some transposons have a pair of transposase and resolvase, where the transposase helps to form a cointegrate between the donor and recipient replicons by carrying a directly repeated copy of the transposable unit, and then the intermediate is separated by the resolvase into donor and recipient replicons, each containing one copy of the transposon.30
The amino acid sequence of the ϕTMA transposase is most similar to that of the hyperthermophile S. azorense
Az-Fu1 and very similar to that found in the T. scotoductus
SA-01 genome (accession no. YP_004201456). In addition, the resolvase gene in the ϕTMA genome is considerably similar to that of T. scotoductus
SA-01 (accession no. YP_004201455). The transposase and resolvase genes of T. scotoductus
SA-01 reside next to each other and their order is same as that in ϕTMA. Thus, genetic exchange could occur between the (hyper)thermophiles and the phages. This speculation is in accordance with an earlier suggestion that diverse Thermus phages have access to a common gene pool,11
although as of now there is no report describing the presence of a pair of transposase and resolvase in other phage genomes. In sharp contrast to the host cell's GC content (69%), the GC contents of the transposase (32%) and resolvase (31%) genes of ϕTMA are very similar to the average GC content of the ϕTMA genomic DNA (32.6%), suggesting that the elements may have transposed from other genomes with low GC content and that the transposition may have occurred over a time long enough for each gene to ameliorate to a lower GC content. It however remains unclear whether the DNA region that contains the genes for the transposase and resolvase is transposable. Our sequence analysis did not reveal any inverted repeats flanking these genes to function as an insertion sequence.
In T-even related phages, host range specificity can be changed by amino acid substitution or duplication/mutational alteration of the His-boxes found in the C-terminal portion of gene 37 tail fibers that bind to receptors on the host bacterial surface.31
ϕTMA showed broader host specificity than ϕYS40. The amino acid sequence of the long tail fiber proteins of ϕTMA and ϕYS40 strikingly differed in their C-terminal portion, including a deletion of 30 amino acid residues in the ϕTMA protein. Thus, the C-terminal region of the long tail fiber protein of the thermophilic phages might also be critical for the host discrimination, as was found with the mesophilic phages. Because the tail fiber proteins of ϕTMA and ϕYS40 did not have any Hisboxes, their structures might be considerably different from that of the mesophilic phages. There are other possibilities for the host range specificity, including the T. thermophilus
DNA modification/restriction system. However, we have observed that ϕTMA, which can infect both HB8 and HB27, could form plaques on both strains with virtually same efficiency irrespective of the host used for the previous passage (results not shown).
The phage resistant strains selected in the presence of excess ϕTMA and ϕYS40 lacked the pilus fiber on the host cell surface. Deletion of the pilA
gene led to the phage resistance in HB27 and HB8 (this study). These results demonstrate that the myoviruses infect T. thermophilus
via the pilus of the host strain. It has been shown that the phage PO4 binds to the type IV pili expressing on the surface of Pseudomonas aeruginosa32
and also to the pili heterologously expressed on the cell surface of Neisseria gonorrhoeae
suggesting that the P. aeruginosa
pili are the primary receptors for the phage PO4. Because infection of the T. thermophilus
strains by ϕTMA and ϕYS40 was dependent on the existence of the host pili, they could also be the primary receptors for the thermophilic phages. The amino acid sequence of the pilA
gene products of HB8 and HB27 differed from each other, especially in the C-terminal region that included the putative disulfidebonded loop region, which has been shown to be critical for the pilus assembly and twitching motility in P. aeruginosa
It has been hypothesized that the sequence diversity of pilin found in pathogenic bacteria reflects an evolutionary compromise between the retention of the function as a retractile tether for twitching motility and antigenic variation against host immune system.34
On the other hand, the significant difference found in the pilus structural protein of the T. thermophilus
likely reflects competition against the phages. The astonishing divergences in the primary sequences of pilins of the thermophiles and putative tail fiber proteins of the phages must have resulted partly from the competition between the thermophiles and the phages in the hot springs. In addition to the differences in the type IV pili, comparative genomics of HB8 and HB27 have shown striking differences in the cell surface determinants, including the S-layer proteins and cell envelope-modifying enzymes, such as the glycosyltransferases.35
These strain-specific surface structures other than the pili might also act as countermeasures against phages in the natural environment.
SDS-PAGE analysis of the ϕTMA structural proteins suggested that the proteins in the ϕTMA phage particles are very stable. We found that proper denaturation of proteins by boiling prior to the sample loading on the gel was very important for reproducibility. Short time boiling (<3 min) caused poor reproducibility, as a result of which few protein bands went missing in the SDS-PAGE analysis. In the previous report in reference 10
, the protein composition of the ϕYS40 virions was analyzed only by mass spectrometry. We identified six ϕTMA virion structural subunits that corresponded to the products of TMA_019, TMA_066, TMA_067, TMA_068, TMA_072 and TMA_166 genes on the basis of their N-terminal sequence and mass spectrometric analyses. These gene products were also found in ϕYS40, and the relative abundances of these proteins in these two phages were similar (i.e., TMA_072 protein was most abundant, followed by the TMA_019 protein). We found that both the TMA_072 and TMA_166 proteins were processed at the C-terminal side of a lysine residue. This result suggests that a trypsin-type protease is involved in the phage assembly process. The major capsid proteins of T4 phage36
one of ϕKZ-related phages, also exhibited posttranslational cleavage, although in those cases proteolysis occurred after a glutamate residue. Based on the abundance and posttranslational cleavage data we conclude that the TMA_072 gene encodes for the precursor of the major capsid subunit, even though the encoded TMA_072 protein did not show any sequence homology with the subunits of the registered virus particles available in the public databases, except for the ortholog found in ϕYS40.
We could also speculate on the function of some of the other capsid proteins. First, the primary sequence of the protein encoded by the TMA_068 gene revealed low but significant homology with the tail sheath proteins of other Myoviridae family of phages having a contractile tail and a linear double-stranded DNA. Although the Myoviridae virion generally contains high copies of the sheath protein, the TMA_68 gene product was not abundant in the SDS-PAGE analysis. The transmission electron microscopic analysis of ϕTMA showed frequent contracted phage particles, which could have caused by osmotic shock during the dialysis following the cesium chloride gradient ultracentrifugation step. Another reason for contraction might be the sensitivity of the phage particles to high concentration of CsCl used for the density gradient centrifugation. It has been shown previously that ϕYS40 is easily inactivated at high salt concentration.21
A similar sensitivity to high salt concentration could be responsible for the contraction of ϕTMA tail, leading to aggregation of the sheath protein. We speculate that the aggregate is resistant to denaturation by the standard SDS/PAGE sample buffer, and is too large to enter the separating gel (see ), which could be a major reason for the observed low abundance of the TMA_068 gene product. Second, the product of the TMA_067 gene could be a tube protein because of its abundance and molecular weight (24 kDa), which is close to the tube proteins of other phages (e.g., 18 kDa and 15.9 kDa for the T4 phage and K phage, respectively). Finally, products of the TMA_073 gene and that of its counterpart YSP_074 gene of ϕYS40 showed partial similarity to the HK97 prohead protease. A strictly conserved pair of serine and histidine residues found in the prohead protease superfamily38
was also conserved in the proteins encoded by the TMA_073 and YSP_074 genes. This gene is located next to the one encoding the phage major head structural protein described above. We, therefore, hypothesize that the TMA_073 protein of ϕTMA (and also the YSP_074 protein of ϕYS40) is the protease involved in the head maturation. In this way, the order of the genes encoding the head protease, major capsid protein tail-related sheath and tube proteins are highly conserved among other Myoviridae (). It is hard to speculate on the function of the TMA_066 protein in ϕTMA virion morphogenesis, because its primary sequence did not share any homology with the sequences of proteins involved in virion morphogenesis that are available in the public database. However, it showed a weak homology (22% amino acid sequence identity) with the putative tube protein encoded by the TMA_067 gene, a probable tube protein-encoding gene (see above). The gp54 protein of T4, whose amino acid sequence shows partial similarity to the tail tube structural protein of T4, is believed to function as a tail tube initiator.39
It is possible that the TMA_066 protein could help in the formation of the tail tube of the thermophilic phage.
Gene orders for the capsid proteins of Thermus, Staphylococcus and Escherichia phages
In conclusion, we isolated a thermophilic myovirus, ϕTMA, evolutionarily related to ϕYS40. Their divergence seems to be partly driven by coevolution with the host thermophile that proceeds via interaction between the tail fiber of the phages and the pili of the host cells and involves a mobile element encoding a transposase. The presence of small number of the predicted ORFs suggests a unique simple life cycle for the thermophilic large myoviruses. The unexpectedly large terminal redundancy in their genomes suggests a role in the maintenance of genetic information through circular permutation and also implies novel significance in processes such as in DNA repair. The gene order and processing of the head proteins of these phages resemble those of the mesophilic phages.