Bacteriophages (phages) are natural viral predators of bacteria. They are in a constant evolutionary arms race with host bacteria; the survival of phages over millions of years is a testament to their ability to overcome bacterial resistance mechanisms by constantly evolving in parallel with their hosts. Isolation of new phages is rapid, facile and inexpensive, and there is an abundant supply of phages in nature, making them ideal weapons to combat bacterial infections. Despite the fact that they are non-toxic to animals and plants,
1 phages are not as widely used for biocontrol and therapeutics as one would imagine. Since the introduction of antibiotics in the 1940s to treat bacterial infections in humans and livestock, the widespread use, and in many instances misuse, has resulted in the current crisis with multi-drug resistant bacteria. This activity, combined with a decline in the discovery of new classes of antibiotics that are effective against these resistant bacteria in the past several decades, has brought about a renewed interest in alternatives to antibiotics, such as phages or phage-encoded lytic enzymes.
2-5Since their discovery around 1915–1917, phages have served as excellent research tools,
6 although the promise of their antibacterial potential has not been fully realized.
7 Despite the apparent attractiveness of phages as antimicrobials, history is replete with false starts that have suppressed the field for decades at a time.
1-3,7Besides human therapy approaches, whole-phage preparations have also been widely evaluated as biocontrol agents for food production. Numerous studies attest to the efficacy of selected phages or phage cocktails against foodborne pathogens, such as
Listeria,
Salmonella or
E. coli.
8-13 First phage preparations, such as Listex
TM (Micreos) or ListShield
TM (Intralytix), have received approval from regulatory agencies and are being used in food production. Phage lytic enzyme application in food production has received intensive research interest, as they present highly effective and practical means of decontamination (reviewed in refs.
14, 15). Phage particles or their components have also been used successfully as detection agents for pathogens. Phage-based detection methods confer a faster and more sensitive detection. Newer developments include phage-amplification assays coupled with MALDI-MS,
16,17 detection by lysis products (reviewed in ref.
18), reporter bacteriophages (reviewed in ref.
19) or detection by receptor binding, to list just a few.
Today, technologies exist that allow cost-effective sequencing of hundreds of viral or bacterial genomes per year, and we can anticipate in the not-too-distant future further advances that might allow routine whole-genome screening of every pathogen encountered in a clinic
20 or on contaminated foodstuff. The majority of the sequenced bacterial genomes reveal the presence of one or more partial or complete prophage genomes. Even closely related genomes appear to possess different sets of prophages.
21 Thus, the phage gene pool is larger and more diverse than the rest of the chromosome. In our experience, some prophage regions are recalcitrant to cloning, most likely due to toxicity of the gene products to the bacterium. Genes revealed by whole genome sequencing and screening of phage collections will potentially yield new generations of antimicrobials. Whole-genome sequencing has become mandatory for regulatory approval of any healthcare or food-industry application of phage or phage products,
22-24 but today’s researchers are faced with an array of sequencing platforms and assembly options and the massive amounts of data they produce.
25The current wave of high throughput sequencing efforts began in 2005 with the introduction of the Roche/454 sequencer followed by other platforms such as SOLiD, Solexa (Illumina), Helicos, Ion Torrent and PacBio and another wave of platforms yet to be released such as a nanopore-based platform (MinIon and GridIon of Oxford Nanopore).
26 Furthermore, the relatively small footprint, both in terms of laboratory space and personnel, required by these technologies brought about the democratization of genome sequencing in the sense that whole-genome sequencing can be done in any laboratory with limited resources and is therefore no longer just a prerogative of large Genome Centers. Each of the 2nd and 3rd generation sequencing platforms has its own unique features and distinct advantages over other platforms. However, all these platforms produce very high sequence outputs compared with the throughput of conventional Sanger sequencing platforms. Conservative estimates by the Genomic Standards Consortium in 2009 placed the prokaryotic and eukaryotic genomes completed by 2012 at over 10,000 and ~2000 respectively.
27 According to the GOLD genomes database, currently there are a total of approximately 15,000 prokaryotic and 3,000 eukaryotic genomes listed, of which only about 20% are finished genomes.
Although phage genomes are orders of magnitude smaller in size, whole-genome sequencing of phage has not kept pace with the current trend in high throughput sequencing of bacteria and other organisms. There has only been a slow increase in the number of complete bacteriophage genomes published.
28 The NCBI genome database contains around 600 Caudovirales genomes to date as well as some unclassified phage genomes.
The lack of parity in phage genome sequencing can be attributed to several problems unique to sequencing and assembly of phage genomes. (1) Phages are not self-replicating and rely on their host macromolecular machinery for their replication and growth and hence isolation of phage genomic material completely devoid of host genetic material involves extensive purification steps. Although one can, in many cases, separate the reads pertaining to the host bioinformatically post-sequencing, the presence of prophage sequences in the host chromosomes may pose a problem for such filtration. (2) Sometimes phage preparations are associated with host debris and cellular membrane fractions that contaminate the genomic material and interfere with subsequent steps of DNA sequencing. (3) Phages, especially the exclusively lytic phages, have notoriously highly methylated genomes because bacteria possess restriction-modification systems to safeguard the integrity of their genomes from invading DNAs. In order to overcome such restriction systems, phages have evolved mechanisms such as genome methylation so that they are able to infect and grow in their host bacteria. From a practical standpoint, such highly methylated sequences are recalcitrant to many of the routine genetic manipulations including shearing, cloning and DNA sequencing. In conventional cloning-based shotgun Sanger sequencing, many of the phage fragments are underrepresented and/or unclonable due to toxicity of the genes for the cloning host, usually an
E. coli strain. This problem is avoided in the next generation sequencing platforms, by virtue of cloning free PCR amplification of fragments in oil and water microreactors, or emulsion PCR. However, in many instances, highly methylated DNA is a poor template for PCR and sometimes even for fragmentation of the DNA by usual procedures, such as nebulization by compressed Nitrogen gas. (4) Some phage genomes are notoriously rich in extreme GC content that is different from that of their host. Such extremes may pose a problem for PCR and sequencing. (5) Phage genomes are also known to contain complex genomic structures such as extremely long direct or inverted repeats and terminal redundancies that are problematic for assembly of the whole-genome sequence from the reads. Many assembly algorithms break the contigs at these repeats, requiring further evaluation by the human eye and confirmatory sequencing by other methods, such as PCR, restriction analysis or Sanger sequencing.
29,30 (6) Regions of uneven sequence depth along the length of the genome, when amplifying or generating libraries using random-priming methods, may cause problems for many of the common assembly algorithms because the programs assume that this uneven coverage is due to repeats or contamination, resulting in artificially poor assemblies. (7) Almost 80% of the genome sequences in the genome online database (GOLD) are unfinished draft sequences. For bacteria and other organisms, complete genome finishing may not be a requisite for many applications, but for small genomes such as bacteriophages, finishing the genome sequencing is essential to obtain a more complete understanding of their biology, i.e., obtain confirmation of their lifestyle by identification/exclusion of genes encoding lysogeny control functions; or to identify their potential for generalized transduction of host DNA by assessing the physical genome structure.
29 Hence, phage researchers are faced with an increased demand in resources in order to finish and polish phage genomes before publication is possible. (8) Whereas in bacterial and human genomics, mapping of reads to a finished reference genome can be a powerful analytical tool not just for genome assembly, but for discovery of genetic variations such as insertions/deletions (indels) and single nucleotide polymorphisms (SNPs), in phage genomics this is very seldom feasible due to the absence of a reference genome for any given phage. Phage genomes are extremely mosaic in nature
31 and even closely related phages are highly divergent, rendering reference mapping a futile effort. (9) In general, a lack of resources for phage genome sequencing within the reach of individual phage researchers coupled with a general lack of interest and support for phage genomics by the journals and the funding agencies have resulted in too few complete phage reference genomes.
Despite all the challenges outlined above, 2nd and 3rd generation sequencing platforms offer the best opportunity for whole-genome sequencing of phages. In this report, we describe our efforts to sequence a large number of bacteriophages using both conventional and 2nd/3rd generation sequencing approaches. We also present general guidelines for obtaining a complete genome sequence of phages using a blended approach.