|Home | About | Journals | Submit | Contact Us | Français|
The genetic relatedness of the Bacillus anthracis typing phages Gamma and Cherry was determined by nucleotide sequencing and comparative analysis. The genomes of these two phages were identical except at three variable loci, which showed heterogeneity within individual lysates and among Cherry, Wβ, Fah, and four Gamma bacteriophage sequences.
Bacteriophages are useful tools for bacterial species and strain differentiation (7, 20). Gamma phage (Wγ) susceptibility is an initial test for differentiating Bacillus anthracis from closely related Bacillus cereus group species (1, 3). Even though lysis is highly specific for B. anthracis, there are a few B. cereus strains that can be infected by Gamma phage (8, 29). The history of Gamma phage is quite complex. McCloy isolated Wβ phage that was induced from B. cereus strain W (ATCC 11950) and found it to be somewhat B. anthracis specific, infecting only a few strains of B. cereus (21). Wβ formed turbid plaques on B. anthracis and failed to infect the original source, B. cereus strain W; however, a rare clear plaque mutant, called Wα, could infect both B. anthracis and B. cereus strain W. Both Wβ and Wα could infect only B. anthracis strains that lacked a capsule (21), limiting their usefulness as typing phages. Gamma phage was originally isolated by Brown and Cherry in 1955 as a W phage variant formed by reinfecting B. cereus strain W with a lysate of W phage (3). It has the unique properties of being able to infect both smooth (encapsulated) and rough (nonencapsulated) B. anthracis strains and being unable to infect Bacillus strains that are lysogenic for Wβ phage. Since many B. anthracis strains are encapsulated, Wγ became a valuable tool for typing B. anthracis strains. Another B. anthracis phage called Cherry phage has also been used for typing, albeit less frequently (13); however, its relationship to Gamma phage was not known. Gamma and Cherry phages appear identical under the electron microscope, and both belong to the Siphoviridae morphotype (13, 36).
We completed and analyzed the nucleotide sequences of the Gamma and Cherry phages to determine their genetic relatedness. During the course of sequencing, it was determined through restriction enzyme mapping and PCR experiments that the stock phage preparations were heterogeneous, which led to the acquisition and sequencing of a second Gamma phage preparation from USAMRIID. Comparison of the complete genome sequences has revealed the location of three distinct variable genetic loci. These variable loci were also compared with the sequences of Wβ, Wγd, WγP, and Fah (Table (Table1).1). Overall, this work provides a striking example of how diagnostic bacteriophages can evolve over several years in different laboratories.
Gamma-LSU phage (WγL) and Cherry phage (WγC) DNA were provided by Pamala R. Coker while at Louisiana State University. The phages were propagated on B. anthracis strain Vollum by plating on Trypticase soy agar with 5% sheep blood (Remel, Kansas) followed by amplification in nutrient broth. Bacterial cells were removed from the lysate by filtration through a 0.22-μm syringe filter prior to isolation of bacteriophage genomic DNA. A stock Gamma-USAMRIID phage (WγU) lysate was obtained from John Ezzell, USAMRIID, Fort Detrick, MD, and propagated on B. cereus ATCC 4342. A single isolated plaque was picked after overnight growth from a lawn of B. cereus ATCC 4342 using the agar layer method (2). Bacteriophages from this plaque were propagated on B. cereus ATCC 4342 on agar plates (2). The resulting cell lysate was passed over DE52 cellulose resin to remove unpackaged, contaminating nucleic acids in the lysate (18). The flowthrough was then filtered through a 0.22-μm syringe filter to remove bacterial cells. WγL and WγC genomic DNA was purified using a QIAGEN Lambda DNA extraction kit (QIAGEN, Germany). The DNA extraction procedure was modified from the QIAGEN Lambda DNA extraction kit (QIAGEN, Germany) by resuspending the polyethylene glycol phage pellet in 215 μl of buffer L4 and 4.3 μl of proteinase K (20 mg/ml) followed by incubation at 56°C for 2 h prior to the addition of the remaining buffer L4 and incubation following the manufacturer's instructions.
The complete nucleotide sequences were determined for the B. anthracis typing phages WγU (37,253 bp in length and 35.22% G+C at 34-fold coverage), WγL (38,067 bp in length and 35.63% G+C at 46-fold coverage), and WγC (36,615 bp in length and 35.26% G+C at 15-fold coverage) using methods previously described (11). To identify potential coding regions, the Glimmer gene finder (9) was modified by training with a set of B. cereus (15, 24, 25) coding regions. Functional assignments for predicted coding regions were based on characterized matches to a nonredundant database and a collection of hidden Markov models (HMMs). Since the sequenced DNA was obtained from functional phages and not bacterial chromosomes, novel nomenclature was used to reflect this. For example, a “hypothetical phage protein” is a coding region that is not similar to anything in the current databases but may be a phage protein. In contrast, a “conserved phage protein” is a coding region that has no defined function but is present in at least one other phage or prophage region. The total numbers of predicted coding regions are 55 (WγU), 50 (WγL), and 51 (WγC). The Gamma and Cherry phages are identical at the nucleotide level except in three loci (Fig. (Fig.11 and Table Table1).1). Because these three genomes are so similar, we will refer to them collectively as the Gamma/Cherry phage unless specifying those unique loci. The mean of the BLASTP score ratio (23a) was used to compute best matches. WγC was the best match to WγL and WγU, having a BLASTP score ratio close to 1 (perfect match). The next best match is the λBa02 prophage from B. anthracis Ames followed by an unpublished induced functional prophage from Bacillus thuringiensis 4I1.
The gold standard 16S rRNA gene (5, 12, 17) and core housekeeping genes (4, 27) serve as reliable markers for phylogenetic analysis of bacterial and archaeal chromosomal evolution. In contrast, there are no bacteriophage genes that are suitable universal markers for phylogenetic analysis of phage genome evolution (26). Among the tailed bacteriophages (the order Caudovirales), there is a relatively conserved operon of genes that code for the proteins responsible for phage packaging and head morphogenesis. Two of these proteins, large terminase (6, 14, 30) and portal (19), have been used in phylogenetic analysis. Recently, the large terminase phylogeny was demonstrated by Casjens and colleagues to have predictive value: the clustering of the large terminase correlates with the mechanism of DNA packaging and with the structure of the virion ends (6).
A phylogenetic tree of large terminase protein sequences was used to determine the group to which the Gamma/Cherry phage belongs (data not shown). The large terminase protein sequences from 32 bacteriophage genomes were aligned using T-Coffee (23). One thousand bootstrapped replicates were generated as described previously (11), except the default settings of the PROTDIST and NEIGHBOR programs were used (10). The large terminase protein of the Gamma/Cherry phage grouped with the phages that generate 3′-extended cos ends (data not shown).
Since the large terminase protein of Gamma and Cherry phages grouped with phages of gram-positive bacteria having known 3′ overhang single-stranded cohesive (cos) ends and we observed no terminal redundancy in the genome sequence, which would have suggested a pac site mechanism of DNA packaging, we hypothesized that the Gamma/Cherry phage packages DNA using a 3′ overhang cos site mechanism. We tested this hypothesis by sequencing the PCR product that was formed after religation of the cos ends (Fig. (Fig.1).1). Two unique primers, P44087 (TCAATCTGACTAATTCAGCAGC) and P44086 (GGATAAGAATAGATACTACGACC), were designed to face outward and read the DNA sequence of each end of the linear phage genomic DNA (Fig. (Fig.1).1). By comparing the sequence of the PCR product to the sequence of the ends, the sequence of the cos site, CGCCGCCCC (Fig. (Fig.1A),1A), was determined to be 9 nucleotides in length, which is similar to the cos site of related Clostridium perfringens phage 3626 (CGCAGTGTC) and identical to the cos site of B. anthracis bacteriophage Fah (22).
We identified three heterogeneous loci while comparing the sequences of the Gamma and Cherry phages (Fig. (Fig.1A,1A, yellow highlighted areas). We first became aware of heterogeneity near the integrase when performing confirmatory restriction mapping of the WγC genome from a plaque-purified phage preparation grown on B. cereus ATCC 4342 (Table (Table1,1, locus I). The map revealed additional DNA that was not included in the Cherry phage assembly (data not shown). Primers 10BB (AATTGTATCATCGAGTATTAATAGC) and 10AX (TGTAAGTATCGATACCTAATCG) were designed to subclone this conflicting region using a TOPO TA cloning kit (Invitrogen, Carlsbad, CA), for the production of a microlibrary for sequencing and for primer walking of the PCR product. For diagnostic purposes, primers 10BE (TGTGGTGAGCCAATTACAGC) and 10AK (TTTCGCTATCTGCATATTTGAG) were designed to amplify this locus (Fig. (Fig.1B).1B). PCR using primers 10BE and 10AK generated a 1,155-bp product (form C) for WγC and WγL but a 3,797-bp product (form A) for this variant, which we refer to as WγC′ (Cherry prime; DQ222852) (Table (Table1).1). Assembly of the previous Cherry sequences with the sequence of the 3,797-bp PCR product reconciled the restriction map data.
When a different stock of the Gamma phage (WγU) was sequenced, we found that this region turned out to have yet another form, with a size of 1,794 bp (Fig. (Fig.1B,1B, form B). To determine the scope of variability in this region, we conducted PCR experiments with primers 10BB and 10AK on 24 well-isolated plaques from each stock lysate grown on B. cereus ATCC 4342 (data not shown). From these results, we concluded that there were three distinct forms (A, B, and C) from this region of the Gamma/Cherry phage genomes and that each stock tested is not genetically pure. For the WγU stock, there were 13 out of 23 total plaques (57%) that were positive for form B (1,794 bp) and 10 out of 23 (43%) that had form C (1,155 bp), but there was no PCR product for form A. WγL contained form A (3,797 bp) in 16 out of 20 (80%) and form C in 5 out of 20 (25%) of those plaques that gave a product but no form B. WγC was similar to WγL in that no form B was observed, but 14 out of 21 plaques (67%) amplified the form A product and 7 out of 21 (33%) gave form C. Only WγU produced form B.
The second locus of heterogeneity was initially discovered only in the WγC preparation, affecting the coding sequence of a putative replisome organizer (CHERRY0030; Table Table1,1, locus II, and Fig. Fig.1A,1A, blue diamond). At coordinates 27025 to 27049, the consensus sequence of WγC from the whole shotgun assembly was (STTcttyTTKgTTKTTCTTTTTYTTK; lowercase letters indicate the presence of gaps in some of the aligned sequences). Further inspection of the underlying sequence reads showed that this ambiguous sequence was the result of a composite of two distinct sequences, each having about equal numbers of supporting clones. There were two library clones that matched part of the form A sequence and bridged the ambiguity region, which provided assembly data to support two distinct forms near the integrase. To determine whether form I or II sequences belonged with the WγC or WγC′ phages, we designed nested sets of the primers P44705 (TGATTTTCTATGATGCTGTGTTG) and P44482 (AATAGTTGAAGAATATACACTTCC) to first amplify a 2,165-bp product and then primers P41871 (CCCATACAACTCAATTGGGAG) and P41870 (GTGCAAATAACGTGCTCGGTC) to obtain high-quality sequence data close to the ambiguity region (Fig. (Fig.1).1). The sequences of these PCR products confirmed that the form II ambiguity sequence is linked to form A (WγC′) and the form I ambiguity sequence is linked to form C (WγC). Since this study was completed, the sequences of two additional Gamma phage isolates (Wγd  and WγP [unpublished]), Wβ (28), and Fah (22) have become available for comparison. With the addition of Fah, this locus was expanded to 13 amino acids, with a total of four different variations observed (Table (Table11).
A third locus of heterogeneity between Gamma and Cherry phages was identified during comparative analysis of the three phage genomes (Table (Table11 and Fig. Fig.1A,1A, locus III). WγU and WγC main assemblies have identical sequences in this region, while WγL and a 7,578-bp variant assembly from WγU (Fig. (Fig.1A)1A) share a different sequence. This region in WγU/WγC encodes three proteins (GAMMAUSAM0038/CHERRY0036, GAMMAUSAM0039/CHERRY0037, and GAMMAUSAM0040/CHERRY0038). Both GAMMAUSAM0038/CHERRY0036 and GAMMAUSAM0039/CHERRY0037 have matches to proteins with no known function from other phages. GAMMAUSAM0040/CHERRY0038 is predicted to encode a fosfomycin resistance protein (Table (Table1).1). It is unclear whether GAMMAUSAM0040/CHERRY0038 is able to produce a functional protein, because the insertion of a cytosine nucleotide at position 67 caused a frameshift in both WγU and WγC; however, a nonframeshifted homolog, gp41 in Wγd, was recently shown to confer fosfomycin resistance (28).
The equivalent region in WγL and a 7,578-bp assembly from WγU (Fig. (Fig.1)1) is larger than the region in the WγC and WγU main assembly, encoding two proteins (GAMMALSU0036/GAMMAUSAMA0007 and GAMMALSU0037/GAMMAUSAMA0008). GAMMALSU0036/GAMMAUSAMA0007 is predicted to encode a 479-amino-acid protein with 95 copies of a G-X-X repeat that is found in members of the collagen superfamily and proteins that are structural components of the exosporium of B. anthracis (33) and B. cereus (35) spores and form a triple helix. The distribution of repeats has the structure [GXX]5-T-[GXX]43-P-[GXX]5-P-[GXX]4-T-[GXX]38. This open reading frame (ORF) is predicted to belong to the collagen repeat superfamily based on HMM (PF01391) and BLASTP matches. GAMMALSU0037/GAMMAUSAMA0008 is predicted to encode a 193-amino-acid protein that matches HMM PF07883, a cupin domain protein. In bacteria, proteins with one or two cupin domains, which form a beta barrel structure, can have either isomerase or epimerase activities that modify cell wall carbohydrates. The best NCBI-BLASTP match is a hypothetical protein, CTC01899 from Clostridium tetani E88.
We propose that the Gamma phage encodes the collagen repeat protein either to function in host recognition or possibly to make the bacillus spore more stable, ensuring its survival under stress. It is also entirely possible that either the collagen repeat protein or the cupin domain protein or both account for the ability of bacteriophage Gamma to infect encapsulated B. anthracis strains when Wβ cannot. The Gamma phage has not been shown to form lysogens in B. anthracis, but the allelic variant Wα has been shown to survive within B. anthracis spores (16). This phage-trapping phenomenon has been observed during infection of B. subtilis 3610 by the virulent phage Φe (32) and by phage PBS1 in B. subtilis SB19 (34).
There is also the question of the origin of fosfomycin resistance and the collagen repeat/cupin domain regions. It is possible that through propagation of these phages on various hosts, in various labs, they acquired these loci via recombination with prophages that existed in the host genome. We have evidence that contradicts this hypothesis, because PCRs on B. cereus strain W and on a mitomycin-induced prophage from strain W (presumably Wβ) gave products for both regions (data not shown). This indicates that these two forms existed in the parental host strain W.
The type of recombinase encoded by a bacteriophage determines target site specificity. For example, tyrosine recombinases that have a tropism for tRNA genes typically have what appears to be a target site duplication flanking the ends of the integrated prophage genome, which corresponds to the core sequence of the att site. In contrast, serine recombinases have very small core att sites that are flanked by inverted repeats (31) and may or may not have any recognizable target site duplication. An in silico method to identify the type of recombinase is to use HMMs. GAMMAUSAM0027 of WγU, GAMMALSU0027 of WγL, and CHERRY0027 of WγC match PF0235, an HMM model for serine recombinases, above the trusted cutoff. Multiple sequence alignments of the Gamma/Cherry recombinases with members of the serine recombinase family (data not shown) enabled a prediction of the catalytic serine residue at amino acid residue 13.
Sequence analysis of the three forms (A, B, and C) near the integrase attP region of the Gamma/Cherry phage revealed a conserved breaking point 31 nucleotides downstream of the integrase stop codon (Fig. (Fig.2A,2A, yellow ORF). Further inspection of this region revealed inverted repeats with the breakpoint in the center of predicted stem-loop structures (Fig. (Fig.2).2). It is common for serine recombinases to use inverted repeats as the substrate for integration (31). The MFOLD program (37) was used to calculate the structure and free energy of the putative attP region for two of the three forms (Fig. (Fig.11 and and2).2). The largest attP region (Fig. (Fig.1B1B and and2A,2A, form A) was predicted to form the best inverted repeat and the most stable structure (ΔG = −10.6 kcal). The medium-sized attP region (Fig. (Fig.1B1B and and2B,2B, form B) formed a stem-loop with predicted free energy of −5.4 kcal (Fig. (Fig.2).2). We were unable to find an inverted repeat or a predicted secondary structure for the smallest attP region (Fig. (Fig.1B1B and and2C,2C, form C), suggesting that this form may have been derived through illegitimate recombination. Given these data, we hypothesize that form C is unable to integrate into attB, while forms A and B may be functional attP substrates capable of site-specific integration into attB. Form A is the ancestral form, since Wβ has this sequence (Table (Table1).1). Form A was shown to serve as a substrate for site-specific recombination in B. anthracis by targeting BA1618 (Fig. (Fig.2)2) (R. Calendar, personal communication). Bacteriophages Fah, WγC′, and WγP also have form A (Table (Table1),1), suggesting that these phages are also capable of integration/excision reactions. Further studies are necessary to determine whether form B is a functional attP sequence.
We present the complete nucleotide sequences and a comparison of two B. anthracis-specific bacteriophages that are used for typing. To our surprise, we discovered heterogeneity within each of three phage lysate stocks used to make purified phage DNA for whole shotgun sequencing. There was also heterogeneity between Gamma phage stock lysates from two different sources and among Wβ, Wγd, WγP, and Fah sequences. We conclude that the Gamma phage, Cherry phage, and Fah are essentially the same phage, containing variations at three distinct locations within the genome and demonstrating significant heterogeneity within their populations.
The nucleotide sequences of B. anthracis WγL, WγU, and WγC genomes and minor variant assemblies have been deposited at GenBank (http://www.ncbi.nlm.nih.gov/GenBank/) under accession numbers DQ222851 to DQ222855 and DQ294634.
We thank Pamala R. Coker for supplying the Gamma-LSU and Cherry phage lysates and purified genomic DNA preparations and John W. Ezzell for supplying the Gamma-USAMRIID phage lysate. We also thank Robert T. DeBoy and Eric Eisenstadt for insightful discussions about the manuscript, Shanmuga Sozhamannan and Karen E. Nelson for critically reviewing the manuscript, and Richard Calendar for communicating unpublished results and for stimulating discussions on the history of the W phages.
This work was supported by NSF grant 0242162.