Phages (viruses that infect bacteria) are ubiquitous on Earth, where they are the most abundant and diverse biological entities [1
]. Phages have been central to many tools and discoveries in molecular biology, and serve important ecological functions, including structuring microbial communities [4
], driving evolution through gene transfer [6
], and playing major roles in biogeochemical cycling [8
]. Since phages are often host-specific predators [10
], it is important to understand not only the abundance of phages, but also which types of phages are present in the environment.
Phages are extremely diverse, encompassing a wide range of virion properties, genome sizes and types, host ranges, and lifestyles. Phages are typically classified by the International Committee on Taxonomy of Viruses (ICTV) based on morphology and nucleic acid type [12
] or by sequence-based taxonomic systems [13
]. Traditional culture-based methods for exploring the diversity of phages in the environment are limited because they require having the bacterial host in culture, and it is known that the majority of environmental bacteria cannot be cultured using standard laboratory techniques [17
]. Recently, molecular techniques have overcome these limitations, revealing a vast diversity of phages in natural environments without the requirement of culturing [19
Development of the 16S ribosomal RNA gene as a molecular marker for studying microbial communities revolutionized the field of microbial ecology by allowing researchers to access the vast diversity of uncultured microbes in natural systems [24
]. However, exploration and comparative genomics of environmental phage communities have been hampered by the lack of a universally conserved genetic marker that can be used to examine the diversity of phages and trace their evolutionary histories. Despite the fact that there is no single gene found in all known phages, groups of related phage genomes often share conserved genes ("signature genes") which have been used to examine phage diversity. For example, conserved regions of phage structural proteins, such as the portal vertex protein (g20
) and the major capsid protein (g23
), are routinely used to characterize genetic diversity in T4-like myophage communities [22
]. Other studies have used the DNA polymerase gene for examining the diversity and evolution of T7-like podophages [20
]. Numerous auxiliary metabolic genes (i.e., phage-encoded metabolic genes that were previously thought to be restricted to cellular genomes [2
]) involved in photosynthesis, carbon metabolism, and nucleotide metabolism have also been used as signature genes for marine phages [30
]. Although these signature genes are restricted to specific subsets of phage genomes and are not universally present in all phage types, they are good targets to design PCR primers for exploring related uncultured phages in environmental samples. Further examination of environmental phage diversity would be greatly enhanced through the development of PCR assays for additional signature genes.
With advances in sequencing technologies and the success of student-driven research/outreach programs [40
], an increasing number of phage genomes are sequenced each year and are available for bioinformatic analyses [3
]. As of February 2011, the genomes of 636 phages and 33 archaeal viruses were available in the PhAnToMe database (http://www.phantome.org/
]. Many phage ecologists are interested in mining these genomes to identify and design PCR primers for signature genes. Numerous tools and databases exist to identify and analyze homologous gene sequences (e.g., COGs [42
], OrthoMCL [43
], HMMER [44
]). One major limitation of these existing tools is that they are confined to cellular organisms, and very few available tools incorporate viral genomes (e.g., CoreGenes [45
], CoreExtractor [15
]). Likewise, numerous tools for primer design and analysis exist (e.g., CODEHOP [46
], IDT Oligo Analyzer [47
], Primer3 [48
]), yet they have many restrictions regarding input file requirements (based upon nucleotide sequence, protein sequence or multiple nucleotide alignment), primer type (non-degenerate or degenerate), genomes of interest, physicochemical properties, input and output format, and usability. In practice, the identification of conserved genes and design of PCR primers to amplify these genes currently requires several stand-alone steps that are not integrated into a single work flow. When performed manually, it can be a time-consuming, tedious, and error-prone process.
In light of these problems, PhiSiGns provides a convenient web interface that allows biologists to perform a dynamic search against selected phage genomes of interest, identify signature genes, generate sequence alignments, and design primers for PCR amplification, all in one environment that increases efficiency and productivity. Signature genes identified using this tool can be used to build phylogenetic trees and study phage evolution. Furthermore, primers designed using PhiSiGns can be used to amplify related sequences from environmental samples to increase knowledge of uncultured phage diversity.