The bacteriophage population is large, dynamic, ancient, and genetically diverse. Limited genomic information shows that phage genomes are mosaic, and the genetic architecture of phage populations remains ill-defined. To understand the population structure of phages infecting a single host strain, we isolated, sequenced, and compared 627 phages of Mycobacterium smegmatis. Their genetic diversity is considerable, and there are 28 distinct genomic types (clusters) with related nucleotide sequences. However, amino acid sequence comparisons show pervasive genomic mosaicism, and quantification of inter-cluster and intra-cluster relatedness reveals a continuum of genetic diversity, albeit with uneven representation of different phages. Furthermore, rarefaction analysis shows that the mycobacteriophage population is not closed, and there is a constant influx of genes from other sources. Phage isolation and analysis was performed by a large consortium of academic institutions, illustrating the substantial benefits of a disseminated, structured program involving large numbers of freshman undergraduates in scientific discovery.
Viruses are unable to replicate independently. To generate copies of itself, a virus must instead invade a target cell and commandeer that cell's replication machinery. Different viruses are able to invade different types of cell, and a group of viruses known as bacteriophages (or phages for short) replicate within bacteria. The enormous number and diversity of phages in the world means that they play an important role in virtually every ecosystem.
Despite their importance, relatively little is known about how different phage populations are related to each other and how they evolved. Many phages contain their genetic information in the form of strands of DNA. Using genetic sequencing to find out where and how different genes are encoded in the DNA can reveal information about how different viruses are related to each other. These relationships are particularly complicated in phages, as they can exchange genes with other viruses and microbes.
Previous studies comparing the genomes—the complete DNA sequence—of reasonably small numbers of phages that infect the Mycobacterium group of bacteria have found that the phages can be sorted into ‘clusters’ based on similarities in their genes and where these are encoded in their DNA. However, the number of phages investigated so far has been too small to conclude how different clusters are related. Are the clusters separate, or do they form a ‘continuum’ with different genes and DNA sequences shared between different clusters?
Here, Pope, Bowman, Russell et al. compare the individual genomes of 627 bacteriophages that infect the bacterial species Mycobacterium smegmatis. This is by far the largest number of phage genomes analyzed from a single host species. The large number of genomes analyzed allowed a much clearer understanding of the complexity and diversity of these phages to be obtained. The isolation, sequencing and analysis of the hundreds of M. smegmatis bacteriophage genomes was performed by an integrated research and education program, called the Science Education Alliance Phage Hunters Advancing Genomics and Evolutionary Science (SEA-PHAGES) program. This enabled thousands of undergraduate students from different institutions to contribute to the phage discovery and sequencing project, and co-author the report. SEA-PHAGES therefore shows that it is possible to successfully incorporate genuine scientific research into an undergraduate course, and that doing so can benefit both the students and researchers involved.
The results show that while the genomes could be categorized into 28 clusters, the genomes are not completely unrelated. Instead, a spread of diversity is seen, as genes and groups of genes are shared between different clusters. Pope, Bowman, Russell et al. further reveal that the phage population is in a constant state of change, and continuously acquires genes from other microorganisms and viruses.