We identified three heterogeneous loci while comparing the sequences of the Gamma and Cherry phages (Fig. , yellow highlighted areas). We first became aware of heterogeneity near the integrase when performing confirmatory restriction mapping of the WγC genome from a plaque-purified phage preparation grown on B. cereus ATCC 4342 (Table , locus I). The map revealed additional DNA that was not included in the Cherry phage assembly (data not shown). Primers 10BB (AATTGTATCATCGAGTATTAATAGC) and 10AX (TGTAAGTATCGATACCTAATCG) were designed to subclone this conflicting region using a TOPO TA cloning kit (Invitrogen, Carlsbad, CA), for the production of a microlibrary for sequencing and for primer walking of the PCR product. For diagnostic purposes, primers 10BE (TGTGGTGAGCCAATTACAGC) and 10AK (TTTCGCTATCTGCATATTTGAG) were designed to amplify this locus (Fig. ). PCR using primers 10BE and 10AK generated a 1,155-bp product (form C) for WγC and WγL but a 3,797-bp product (form A) for this variant, which we refer to as WγC′ (Cherry prime; DQ222852) (Table ). Assembly of the previous Cherry sequences with the sequence of the 3,797-bp PCR product reconciled the restriction map data.
When a different stock of the Gamma phage (WγU) was sequenced, we found that this region turned out to have yet another form, with a size of 1,794 bp (Fig. , form B). To determine the scope of variability in this region, we conducted PCR experiments with primers 10BB and 10AK on 24 well-isolated plaques from each stock lysate grown on B. cereus ATCC 4342 (data not shown). From these results, we concluded that there were three distinct forms (A, B, and C) from this region of the Gamma/Cherry phage genomes and that each stock tested is not genetically pure. For the WγU stock, there were 13 out of 23 total plaques (57%) that were positive for form B (1,794 bp) and 10 out of 23 (43%) that had form C (1,155 bp), but there was no PCR product for form A. WγL contained form A (3,797 bp) in 16 out of 20 (80%) and form C in 5 out of 20 (25%) of those plaques that gave a product but no form B. WγC was similar to WγL in that no form B was observed, but 14 out of 21 plaques (67%) amplified the form A product and 7 out of 21 (33%) gave form C. Only WγU produced form B.
The second locus of heterogeneity was initially discovered only in the WγC
preparation, affecting the coding sequence of a putative replisome organizer (CHERRY0030; Table , locus II, and Fig. , blue diamond). At coordinates 27025 to 27049, the consensus sequence of WγC
from the whole shotgun assembly was (STTcttyTTKgTTKTTCTTTTTYTTK; lowercase letters indicate the presence of gaps in some of the aligned sequences). Further inspection of the underlying sequence reads showed that this ambiguous sequence was the result of a composite of two distinct sequences, each having about equal numbers of supporting clones. There were two library clones that matched part of the form A sequence and bridged the ambiguity region, which provided assembly data to support two distinct forms near the integrase. To determine whether form I or II sequences belonged with the WγC
phages, we designed nested sets of the primers P44705 (TGATTTTCTATGATGCTGTGTTG) and P44482 (AATAGTTGAAGAATATACACTTCC) to first amplify a 2,165-bp product and then primers P41871 (CCCATACAACTCAATTGGGAG) and P41870 (GTGCAAATAACGTGCTCGGTC) to obtain high-quality sequence data close to the ambiguity region (Fig. ). The sequences of these PCR products confirmed that the form II ambiguity sequence is linked to form A (WγC′
) and the form I ambiguity sequence is linked to form C (WγC
). Since this study was completed, the sequences of two additional Gamma phage isolates (Wγd
] and WγP
[unpublished]), Wβ (28
), and Fah (22
) have become available for comparison. With the addition of Fah, this locus was expanded to 13 amino acids, with a total of four different variations observed (Table ).
A third locus of heterogeneity between Gamma and Cherry phages was identified during comparative analysis of the three phage genomes (Table and Fig. , locus III). WγU
main assemblies have identical sequences in this region, while WγL
and a 7,578-bp variant assembly from WγU
(Fig. ) share a different sequence. This region in WγU
encodes three proteins (GAMMAUSAM0038/CHERRY0036, GAMMAUSAM0039/CHERRY0037, and GAMMAUSAM0040/CHERRY0038). Both GAMMAUSAM0038/CHERRY0036 and GAMMAUSAM0039/CHERRY0037 have matches to proteins with no known function from other phages. GAMMAUSAM0040/CHERRY0038 is predicted to encode a fosfomycin resistance protein (Table ). It is unclear whether GAMMAUSAM0040/CHERRY0038 is able to produce a functional protein, because the insertion of a cytosine nucleotide at position 67 caused a frameshift in both WγU
; however, a nonframeshifted homolog, gp41
, was recently shown to confer fosfomycin resistance (28
The equivalent region in WγL
and a 7,578-bp assembly from WγU
(Fig. ) is larger than the region in the WγC
main assembly, encoding two proteins (GAMMALSU0036/GAMMAUSAMA0007 and GAMMALSU0037/GAMMAUSAMA0008). GAMMALSU0036/GAMMAUSAMA0007 is predicted to encode a 479-amino-acid protein with 95 copies of a G-X-X repeat that is found in members of the collagen superfamily and proteins that are structural components of the exosporium of B. anthracis
) and B. cereus
) spores and form a triple helix. The distribution of repeats has the structure [GXX]5
. This open reading frame (ORF) is predicted to belong to the collagen repeat superfamily based on HMM (PF01391) and BLASTP matches. GAMMALSU0037/GAMMAUSAMA0008 is predicted to encode a 193-amino-acid protein that matches HMM PF07883, a cupin domain protein. In bacteria, proteins with one or two cupin domains, which form a beta barrel structure, can have either isomerase or epimerase activities that modify cell wall carbohydrates. The best NCBI-BLASTP match is a hypothetical protein, CTC01899 from Clostridium tetani
We propose that the Gamma phage encodes the collagen repeat protein either to function in host recognition or possibly to make the bacillus spore more stable, ensuring its survival under stress. It is also entirely possible that either the collagen repeat protein or the cupin domain protein or both account for the ability of bacteriophage Gamma to infect encapsulated B. anthracis
strains when Wβ cannot. The Gamma phage has not been shown to form lysogens in B. anthracis
, but the allelic variant Wα has been shown to survive within B. anthracis
). This phage-trapping phenomenon has been observed during infection of B. subtilis
3610 by the virulent phage Φe (32
) and by phage PBS1 in B. subtilis
There is also the question of the origin of fosfomycin resistance and the collagen repeat/cupin domain regions. It is possible that through propagation of these phages on various hosts, in various labs, they acquired these loci via recombination with prophages that existed in the host genome. We have evidence that contradicts this hypothesis, because PCRs on B. cereus strain W and on a mitomycin-induced prophage from strain W (presumably Wβ) gave products for both regions (data not shown). This indicates that these two forms existed in the parental host strain W.