|Home | About | Journals | Submit | Contact Us | Français|
Some bacteriophages target potentially pathogenic bacteria by exploiting surface-associated virulence factors as receptors. For example, phage have been identified that exhibit specificity for Vi capsule producing Salmonella enterica serovar Typhi. Here we have characterized the Vi-associated E1-typing bacteriophage using a number of molecular approaches. The absolute requirement for Vi capsule expression for infectivity was demonstrated using different Vi-negative S. enterica derivatives. The phage particles were shown to have an icosahedral head and a long noncontractile tail structure. The genome is 45,362 bp in length with defined capsid and tail regions that exhibit significant homology to the S. enterica transducing phage ES18. Mass spectrometry was used to confirm the presence of a number of hypothetical proteins in the Vi phage E1 particle and demonstrate that a number of phage proteins are modified posttranslationally. The genome of the Vi phage E1 is significantly related to other bacteriophages belonging to the same serovar Typhi phage-typing set, and we demonstrate a role for phage DNA modification in determining host specificity.
Since the early days of their discovery, bacteriophages have been exploited as tools for typing bacteria, particularly those with pathogenic potential (14). Phage can exhibit specific patterns of host infectivity, even within isolates of a particular bacterial species. Phage infectivity can be limited by many factors including DNA restriction systems, the availability of receptors and phage immunity. Sophisticated phage-typing schemes have been developed for certain pathogens such as Staphylococcus aureus and Salmonella enterica. Indeed, within S. enterica, specialized typing schemes are applied to different serovars within the species. For example, different sets of phage are used for typing S. enterica serovar Typhimurium and S. enterica serovar Typhi. Most of these phage-typing schemes were developed in the middle of the last century, often involving the sequential “adaptation” of phage to grow on different sets of clinical isolates of a particular pathogen. The adapted typing phage then became established as typing tools for use in different reference laboratories. Phage typing is labor-intensive and is mainly limited to a few centralized reference laboratories.
It is now recognized that phage have played a pivotal role in driving the evolution of pathogens. The genomes of many bacterial pathogens harbor multiple resident prophages or their remnants. Indeed, diversity in these prophage-related elements is a critical feature that distinguishes different isolates (31). These prophages often encode “cargo” or accessory genes that can contribute to the pathogenicity of bacteria (20). The interplay between phage and their host bacteria is now an area of intensive research. One interesting feature of typing phage is that they are often highly adapted to virulent forms of a particular pathogen. For example, in the case of serovar Typhi, the cause of human typhoid, phage can target the Vi- or virulence-associated polysaccharide capsular antigen normally present on the surface of clinical isolates. Vi expression is known to be a relatively unstable phenotype, and yet the genetic locus for Vi is maintained in the serovar Typhi population and Vi is even exploited as an antigenic target for an efficacious typhoid vaccine (19). The targeting of the Vi exopolysaccharide capsule as a receptor in many ways mirrors that of the range of phage that target the Escherichia coli K1 polysialic acid capsule (21).
Although Vi phage were first described over 60 years ago, their molecular structures have remained virtually unknown. Consequently, we decided to use molecular approaches to characterize members of the serovar Typhi phage typing set and describe here some of their interesting features.
Serovar Typhi Vi phage type II was used in the present study. The Vi phage E1 stock (designated G2362) was obtained from the Bruce Stocker laboratory (Stanford University). This phage type was routinely propagated by infection of a serovar Typhi Ty2 derivative, BRD948, which is highly attenuated and thus allowed its use in a CL2 environment (16). The Vi phage D1 and the specific host strain, serovar Typhi M223/224, were obtained from the Colindale reference laboratory. The serovar Typhi strain, BA256, containing a kanamycin cassette in the tviB gene was constructed using the red recombinase system of Datsenko and Wanner (9). All bacterial strains were routinely grown on Luria broth or agar plates and, in the case of serovar Typhi BRD948, the medium required supplementation with aromatic amino acids as described previously (12). Growth was performed at 37°C for both broth and plate cultures. Serovar Typhimurium derivative C5507, which harbors the viaB locus and expresses surface-associated Vi capsule, was kindly provided by Michel Popoff (Pasteur Institute, Paris, France).
Standard phage techniques were used to obtain high-titer stocks of the D1 and E1 phage lysotypes (27). Briefly, 1 liter of phage lysate was obtained by using a multiplicity of infection of 1.0, yielding a titer of 2 × 1010 PFU/ml after the infection took place over 7 h at 37°C. After removal of the cell debris by centrifugation, the lysate was treated with RNase and DNase I to remove bacterial contaminants. The phage lysate was then concentrated by treatment with NaCl to a final concentration of 1 M and with PEG 8000 (10% [wt/vol]) and left overnight at 4°C. The precipitated phage were collected by centrifugation at 11,000 × g. After resuspension of the phage particles in 16 ml of lambda diluent (10 mM Tris, 1 mM MgCl2 [pH 7.6]), an equal volume of chloroform was added, and the mixture was briefly mixed and spun at 11,000 × g for 15 min at 4°C to pellet the phage particles and remove the polyethylene glycol and cell debris. The phage were resuspended in lambda diluent and finally purified by two-step CsCl density-gradient centrifugation. Phage DNA isolation from the purified phage particles was by a combination of formamide treatment and ethanol precipitation (27). The phage DNA pellet was resuspended in 10 mM Tris (pH 8.5). Comparative restriction enzyme analysis of the D1 and E1 phage was performed as recommended by the enzyme manufacturers (Roche and New England Biolabs, respectively).
Purified phage DNA of the E1 lysotype was randomly fragmented by sonication and several small insert libraries were generated in pUC19 by using size fractions ranging from 1.4 to 4 kb. The prophage genome was sequenced to a depth of 10× coverage using Applied Biosystems BigDye terminator chemistry on ABI 3730 automated sequencers. The sequence was assembled, finished, and annotated as described previously (22), using the program Artemis (4) to collate data and facilitate annotation. The genome sequence of phage Vi was compared to other viral genomes pairwise by using the Artemis Comparison Tool (6).
The CsCl-purified phage particles were dialyzed against 2 liters of lambda diluent (with two changes of buffer) and used directly for electron microscopic studies. Copper palladium grids (200 mesh) with carbon-Formvar support film were glow-discharged minutes before adhering the phage particles. The particle suspension was prepared in fresh 0.1 M ammonium acetate buffer diluted until just visibly turbid. Then, 5 μl of suspension was applied to the support film surface, and particles were allowed to settle for 30 s. Next, 5 μl of 5% aqueous ammonium molybdate with 1% trehalose was added for a few seconds and then removed with cut filter paper. The grid was then air dried for 30 min before imaging was performed on a 120kV Philips Tecnai Spirit Biotwin with a Tietz F224HD digital TemCam at magnifications ranging from ×10,000 to ×60,000. A size measurement of the E1 phage head and tail regions was carried out by using 50 transmission electron microscopic images.
CsCl-purified E1 lysotype phage particles were dialyzed against 2 liters of lambda diluent, with two changes of buffer, to remove all traces of CsCl from solution. The phage preparations were then reduced with 1 mM dithiothreitol, alkylated with 2 mM iodoacetamide, and separated in a 4 to 12% Bis-Tris NuPAGE gel (Invitrogen). Proteins were stained with colloidal Coomassie blue. Protein bands were excised, destained completely, and digested with trypsin (sequencing grade; Roche). Peptides were extracted with 5% formic acid-50% acetonitrile, dried, and resuspended in 0.2% formic acid prior to mass spectrometry analysis. Analysis was performed on an Ultimate 3000 Nano and Capillary LC system (Dionex) connected to a 7T Finnigan LTQ-FT mass spectrometer (Thermo Electron, Bremen, Germany) equipped with a nanoelectrospray ion source. Samples were desalted on a trap and then separated on a PepMap column (75 μm [inner diameter] by 15 cm; Dionex) with a gradient of 4 to 40% acetonitrile-0.1% formic acid. The LTQ-FT was operated in standard data-dependent mode with an FT-ICR resolution of 100,000 at m/z 400. The three most abundant multiply charged ions were subject to tandem mass spectrometry in the LTQ ion trap at an isolation width of 3 Da and with a dynamic exclusion width of ±5 ppm and duration of 30 s. Peak lists were generated by using Bioworks 3.2 (Thermo Electron). The data were subjected to a database search with Mascot Server 2.1 (www.matrixscience.com) against an in-house-built serovar Typhi and E1 lysotype phage translated genomic database using the following parameters: enzyme trypsin; peptide mass tolerance, ±20 ppm; fragment mass tolerance, ±0.4 Da; maximum number of missed cleavages, 1; and variable modifications, carbamidomethyl cysteine, carbamidomethyl lysine, carbamidomethyl N terminus, and methionine oxidation.
We have deposited the DNA sequence and annotation of the E1 serovar Typhi Vi-specific bacteriophage at the EMBL under accession number of AM491472.
Vi phage E1, isolate G2362 (referred from now on as Vi phage E1), is reported to exploit Vi polysaccharide as a primary receptor (14). To confirm this using a modern genetic approach, we utilized Vi phage E1 to infect serovar Typhi Ty2 derivatives harboring a defined mutation in the tviB gene, which is an essential component of the ViaB locus that encodes for the production and export of the Vi polysaccharide to the cell surface. A mutation in the tviB gene completely eliminates Vi expression, and no capsule is therefore produced. Although serovar Typhi Ty2 was exquisitely susceptible to Vi phage infection, forming individual clear plaques at low dilutions, no plaques were formed on the isogenic Vi-negative serovar Typhi tviB derivative, even when extremely high titers of >109 were applied. We then performed a simple screen involving the treatment of serovar Typhi cultures with Vi phage E1 with selection for phage-resistant mutant derivatives. In all cases such isolates were Vi negative. Finally, we infected a serovar Typhimurium derivative C5507 that harbors the viaB locus and expresses surface-associated Vi exopolysaccharide. This strain was found to be susceptible to Vi phage E1, although relatively low titers were generated on the first round of infection. When Vi phage E1 was subsequently harvested from these initial infections and plate stocks made, they where highly infective for serovar Typhimurium C5505. Again, a tviB, Vi-negative derivative of C5505 was found to be not susceptible to Vi phage E1 infection.
The complete nucleotide sequence of the Vi phage E1 genome was determined and found to be circular and 45,362 bp in length, with an average G+C content of 47.03%. The Vi phage E1 genome sequence was split at a point that would aid comparison with the related ES18 phage. The genome was predicted to encode 56 coding sequences (CDS) with a coding density of 85.4%. Of the predicted CDS, a number could be assigned functions based on sequence similarity and conserved protein motifs (summarized in Table Table11).
It was evident from sequence analysis that Vi phage E1 shared significant conservation in both sequence and gene order with two previously characterized bacteriophages, ES18 (7) and phage belonging to the T1 coliphage family (Table (Table11 and Fig. Fig.1).1). Thus, the functional description of the Vi phage E1 protein coding genes and/or regions are discussed in relation to these two other enteric bacteriophages, where appropriate. Phage ES18 is a generalized transducing phage that naturally infects serovar Typhimurium, Enteritidis, Gallinarum, and Paratyphi B strains, as well as Escherichia coli (18), and has been used as a Salmonella typing phage (29). ES18 is able to lysogenize its host and is highly recombinogenic, being able to form chimeras with other lambdoid phage such as P22 and Fels-1 (32, 33).
A comparison of the head region of ES18 showed that Vi phage E1 possesses orthologues (amino acid sequence identities ranging from 28 to 36% over the entire length of the protein) of the proposed ES18 large terminase (gene product 2 [gp2], VIP0002), portal protein (gp5, VIP0004), procapsid assembly protein (gp6, VIP0005), capsid decoration protein (gp8, VIP0016), and putative coat protein (gp9, VIP0017). In addition, Vi phage E1 also encodes an orthologue of ES18 gene 7, VIP0026, which may act as the head maturation protease (18). As is common for functionally related phage genes, the sequence and gene order of the head region found in ES18 and Vi phage E1 is conserved in the range found with other lambdoid phage genomes (7, 26). This highly conserved gene arrangement for the head region predicts that VIP0001 should encode the small subunit terminase, although there is no obvious sequence similarity with any other phage terminase genes.
Within this apparent conservation of gene order for the capsid, there is a synteny break between the ES18 and Vi phage E1 head regions, with an apparent insertion of nine CDS in the Vi E1 phage genome (VIP0006 to VIP0014). The CDS VIP0006 to VIP0009 all share sequence similarities with phage lysozymes and holins (Table (Table1).1). VIP0008 is the probable holin of E1 phage based upon its size and hydrophobicity plot. The analogous lysis region in ES18 is located elsewhere on the phage (genes 74 to 78; Fig. Fig.1)1) (7), and the predicted products of these genes share little or no sequence identity with the Vi phage E1 lytic genes (data not shown). There is also a second break in synteny between the proposed head and tail regions of the Vi phage E1 (Fig. (Fig.1)1) in which six CDS are predicted (VIP0018 to VIP0023); none of these CDS could be assigned functions based on protein motifs or database comparisons. In addition to the head region, some of the ES18 tail genes are also conserved with the Vi phage E1. These include the ES18 tail shaft (gp16, VIP0029; 28% identity) and the tail tape measure proteins (gp21, VIP0033; 29% identity). After VIP0033 the large-scale conservation between ES18 and the Vi phage E1 genomes is lost (see Fig. Fig.1).1). Phage genomes are highly modular, and so it is not uncommon for discrete parts of related phage to display conservation while the remaining regions in the phage are unrelated or highly divergent.
Analysis of the remaining Vi phage E1-specific tail genes (VIP0034 to VIP0040) showed sequence similarity with other phage. For example, VIP0037 is similar to the Fels-1 phage tail assembly protein. In addition, the gene product of VIP0040 shares sequence similarity with the endo-N-acylneuraminidase produced by the E. coli K1-specific phage K1F (23), particularly at the N terminus, which includes the glycine-rich hinge region identified by Petter and Vimr (23). Vi phage E1 lacks the partial T7 tail fiber protein homology found in the phage K1F endo-N-acylneuraminidase gene, and the encoded protein is present in phage K1F attached to the head. The morphology of the E1 phage differs fundamentally from the T7-like K1-F phage in this respect due to the presence of the long tail structure, which is also shared by the ES18 phage. We suggest that the rest of VIP0040, based upon the similarities to the endosialidase genes, contains the region responsible for Vi capsule recognition and degradation, allowing the phage to infect the serovar Typhi host. The capsule of neuroinvasive E. coli K1 is composed of a large polymer of up to 200 residues of α-2,8-linked sialic acid (polysialic acid) compared to the α-1,4-linked N-acetylgalactosaminuronate of the Vi capsule of serovar Typhi. The K1 homopolymer of E. coli is an important virulence factor protecting the bacterium from the immune system (2) and also serves as an attachment site for several lytic phage. To infect the bacterial host the K1 phage, like the Vi phage, must penetrate this capsular polysaccharide to gain access to the cell surface (21). This penetration is dependent on endosialidases that form part of the tail spikes and are required for both specific binding and degradation of the capsule. Thus, these depolymerases are important for determining specificity and host range.
The divergence in the tail regions of ES18 and Vi phage E1 genomes is consistent with their known target receptors. ES18, like T1, is known to bind to both rough and smooth Salmonella strains alike because it does not target surface exopolysaccharides but instead binds to FhuA, which encodes an outer membrane transport protein for ferrichrome (5, 7).
The Vi phage E1 also shares some sequence conservation with coliphage T1 and T1-like phages. T1 is one of the seven archetypal T phages; it is a lytic phage that can infect E. coli and some strains of Shigella (10, 11). The biology of this phage has been extensively reported, along with a genome sequence (25). Vi phage E1 carries several orthologues of the T1 replication and recombination genes (see Fig. Fig.1).1). These include VIP0047, which carries a zinc-binding motif and is significantly similar to the gene product of T1 gene 24, the phage DNA primase. In addition, the products of VIP0045 and VIP0041 are related in sequence to the T1 ATP-dependent helicase HelA (gp22) and the single-stranded DNA-binding protein Ssb (gp27), respectively. The presence of ssb is common in phage; for example, this gene is found in T1, T3, T7, and ES18 and, in many instances, such as T3 and T7, the host Ssb cannot substitute for the phage-encoded gene product.
Like T1, Vi phage E1 is predicted to harbor a general recombination system, with VIP0042 and VIP0043 encoding orthologues of the T1 recombination and exodeoxyribonuclease VIII family proteins, Erf (gp28) and RecE (gp29), respectively. In T1 these genes are essential for phage genome concatemers to form and recombine ready for packaging. Mutants lacking these two functions fail to package their DNA, although the loss of recE in T1 can be partially compensated for by host-encoded recE (24).
Genome comparisons between phages T1 and Vi phage E1 also showed that both phage carry multiple zinc-dependent HNH homing endonucleases in their genomes. T1 carries three such endonucleases, endA to -C, whereas Vi phage E1 carries two complete CDS (VIP0046 and VIP0052) and a partial CDS (VIP0049) of this class. Homing endonucleases are more commonly associated with group I introns and inteins, wherein they promote the mobility of themselves and these associated genetic elements in a process called homing. Homing is a process whereby the intron encounters a cognate copy of the gene into which it has been inserted but which lacks the intron (13). However, not all phage-encoded homing endonucleases are associated with group I introns or inteins; some exist as free-standing genes (this appears to be the case for Vi E1) (30). These homing endonucleases promote the exchange of flanking genes between related phage and are frequently found as pseudogenes or remnants that are rapidly cleared from phage genomes.
Vi phage E1 also possesses other CDS of interest. For example, VIP0028 shares 26 to 27% sequence identity with the Roi protein from phage HK022 and 80 (15, 17). Roi protein, although dispensable, mediates the dependence of phage on integration host factor for plaque formation (8).
Highly purified Vi phage E1 particles yielded at least nine major candidate polypeptides after sodium dodecyl sulfate-polyacrylamide gel electrophoresis (SDS-PAGE). Visible polypeptide bands were cut out of the gels and subjected to mass spectrometric analysis. This analysis indicated that contamination from host bacterial polypeptides was low, and peptides corresponding to predicted phage encoded proteins were readily identified. The proteins identified in the mass spectrometric analysis are summarized in Table Table2.2. Identified proteins corresponded to the phage-encoded head, tail, and tail-fiber proteins (summarized in Fig. Fig.2).2). Several of the polypeptides detected on SDS-PAGE migrated with an estimated molecular mass lower than that predicted from their DNA sequence. This is likely to be a consequence of posttranslational processing of the proteins, a possibility supported by the failure to detect peptides corresponding to the carboxyl termini of these proteins during the mass spectrometric analysis. This processing is a common feature of phage proteins and was observed for the VIP0040 maturation/adhesion protein, which shares sequence similarity with the endo-N-acylneuraminidase produced by the E. coli K1-specific phage K1E (21). It is possible that, just as for phages K1E and K1F, the C-terminal fragment of VIP0040 may serve as an intramolecular chaperone and that its presence in the primary translation product is required for activity of the enzymatic moiety that then degrades the capsule (21). The putative phage tail protein (VIP0033) and the phage head assembly protein (VIP0005) also undergo C-terminal posttranslational modification. The combined SDS-PAGE and mass spectrometric approach helped to confirm the annotation of a number of hypothetical proteins, including VIP0004.
CsCl-purified Vi phage E1 were examined by electron microscopy (Fig. (Fig.2).2). In most morphological aspects this phage resembled the serovar Typhimurium ES18 virion (7), with which it also shares sequence similarity and synteny in the head and tail regions. Like ES18, the Vi phage E1 has a long flexible noncontractile tail and a head that is also hexagonal in outline, and the phage is likely to belong to the family Myoviridae A1. We measured 50 individual Vi E1 phage particles to obtain accurate data for this analysis. The Vi E1 phage head is 55 nm in diameter (56 nm for ES18), and the average tail length is 205 nm (210 nm for ES18), with a width of 11 nm (12 nm for ES18). The tail tape measure protein of E1 phage is slightly smaller than the corresponding protein of ES18, and this is supported by the marginal differences in size of the respective tail lengths. Although no fine side tail fibers could be detected (also absent in ES18), image analysis revealed a complex of four or five spiked structures that may carry the VIP0040 protein predicted to encode the putative capsular depolymerizing activity at their tip.
The Vi phage E1 characterized in the present study is a representative of a historical phage typing set now in use in diagnostic laboratories around the world (14). The typing set was originally created by adapting a type II Vi phage onto different serovar Typhi host strains. It is not clear whether the typing phage in the set have retained similar genome structures during passage over the past decades. Consequently, DNA was prepared from two additional representative Vi typing phage, A1 and D1, after growth on either a permissive host, serovar Typhi M223/224, or after adaptation to growth on serovar Typhi Ty2, which is permissive for the Vi phage E1.
Phage D1 DNA prepared after growth on serovar Typhi M223/224 generated a cleavage pattern with SspI and DraI, which recognize A-T bases only, which was indistinguishable from that of phage E1 DNA cleaved with the same enzymes (Fig. (Fig.3).3). This would be expected if both phage were derived from a single isolate of a Vi type II progenitor phage (14). However, cleavage of similar DNA using the restriction enzymes AflII, PstI, and SalI showed some interesting differences in susceptibility. D1 phage DNA prepared by propagation on the typing strain serovar Typhi M223/224, unlike E1 DNA, was not detectably cleaved with these enzymes. An inability to cleave D1 phage with these enzymes could be due to a lack of restriction enzyme target sites or modification and protection of the DNA by the bacterial host.
The former possibility was deemed unlikely since DNA sequence analysis revealed that the Vi phage E1 genome possesses such target sites. Consequently, this phenomenon is most likely to be a consequence of the phage being grown in their type-specific hosts that encode differing host DNA restriction and modification characteristics. To support this hypothesis, the type D1 phage was grown and adapted to the type E1-specific serovar Typhi Ty2, and the subsequent D1 phage DNA then yielded a restriction enzyme cleavage pattern with PstI and SalI that exactly matched that for the E1 type phage DNA prepared from phage grown on serovar Typhi Ty2 (data not shown). Interestingly, the A1 phage DNA also yielded an indistinguishable DNA cleavage pattern from phage D1, suggesting that this is a general property of phage from this set.
This evidence clearly shows that the type II Vi phage have been derived from a common progenitor in a manner somewhat analogous to that of the Salmonella typing scheme of Anderson, in which most belong to the P22 branch of the lambdoid phages (28). We can therefore conclude that differences in the infectivity of particular type II phage for different serovar Typhi strains is largely dependent on the restriction and/or modification systems encoded by these strains. This is the key to the discriminatory ability of the phage-typing scheme devised for serovar Typhi (14).
The study presented here allows us to draw some important conclusions about the Vi phage E1. This phage has exquisite specificity for the Vi capsular antigen for infectivity. This Vi dependence may be associated with the presence of a gene, VIP0040, encoding a product that shares sequence similarity with endo-N-acylneuraminidases that could function to both bind and degrade Vi antigen during infection. The genome of the Vi phage E1 exhibits extensive regions of homology with the temperate phage ES18, another well-characterized S. enterica phage that has proven to be extremely valuable as a typing tool. Blocks of DNA homology were detected between the head and tail genes of the two phages, pointing to a likely recombination of a lytic and temperate phage for the origin of the Vi E1 phage. Interestingly, the morphology of the E1 phage bears no similarity to the bacteriophages that target the E. coli K1 capsules, lacking the tail structure shared by the E1 and ES18 phages. On the other hand, the type III to VII typing phages that target the serovar Typhi Vi capsule have a T7-like morphology similar to these K1 phage (1; our unpublished data).
Baron et al. have demonstrated that the type II Vi phage have the ability to act as general transducers, for example, in transferring xylose-fermenting ability and streptomycin resistance to serovar Typhi strains lacking these factors (3). In this transducing ability, these phage also resemble the ES18 phage. SDS-PAGE in combination with the mass spectrometric analysis identified a number of proteins in the phage particle. Significantly, some of these polypeptides migrated more rapidly in the gel than would have been predicted from the annotation. Careful analysis of the peptides identified for these proteins showed that in all cases there was a complete absence of peptides associated with the carboxyl-terminal regions of the CDS corresponding to these proteins. This finding supports the contention that these particular proteins have undergone posttranslational modification during their incorporation into the virion particle, such as the removal of the C terminus of VIP0040 in order for activation of the Vi recognition and degradation enzyme in a manner similar to that of the endosialidase gene of the E. coli K1 phage (21).
During revision of the manuscript and specifically updating of the annotation in Table Table1,1, extensive synteny was observed between the S. enterica serovar Typhi Vi type II phage E1 and a putative lysogenic phage in the genome of Enterobacter sakazakii (accession number NC_009778). This synteny extended to regions of the E1 adhesion/maturation protein, VIP0040, and included the glycine-rich hinge region identified by Petter and Vimr (23) in the endo-N-acylneuraminidase of phage K1F. E. sakazakii is known to produce a range of exopolysaccharides also.
We thank the core sequencing and informatics teams at the Sanger Institute for their assistance and The Wellcome Trust for its support. We also thank Hans Ackermann for invaluable advice.
Published ahead of print on 11 January 2008.