GC content in ssDNA and dsDNA phages was highly correlated with host GC content (r2
= 0.82 for ssDNA phages, 0.84 for dsDNA phages, equally correlated p = 0.72) across a very wide range of host GC content (~0.25 to ~0.72) (). A previous study found significant differences between ssDNA and dsDNA phage nucleotide correlation with their hosts,33
but the additional 333 dsDNA and 13 ssDNA reference sequences added to GenBank since that analysis suggest there is no difference (Table S1 and Fig. S1
). ssDNA phages exhibited a pronounced genomic thymine bias (average 0.30 T), but nonetheless infected hosts with a range of GC contents (0.25 to 0.70), as wide as that of dsDNA phages (0.26 to 0.72).
Figure 1. Correlation between host and phage genomic GC content. Grey squares indicate dsDNA phages, open squares ssDNA. Best-fit linear regression lines are solid for dsDNA (r2 = 0.84) and dashed for ssDNA (r2 = 0.82). There was no significant (more ...)
Correlated GC content was a poor predictor of strong CAI match between E. coli and the coat genes of its phages. The mean CAI of ssDNA coliphages was 0.706, while the dsDNA phages were significantly better matched to E. coli (0.744, p < 0.001, ). This number includes eight dsDNA coliphage genomes for which tail protein encoding genes were used, rather that coat protein encoding genes, due to the absence of properly annotated coat genes. The inclusion of tail genes did not change the results of this analysis (p < 0.001 with and without the eight tail genes). The evidence of selection for translational efficiency is stronger for dsDNA phages.
Figure 2. Mean coat gene CAI with 95% confidence intervals of ssDNA (n = 11), dsDNA (n = 34) coliphages.
Comparison of the GC content of the first two positions of each codon (GC1,2) and the third position (GC3) of these genes revealed an interesting pattern: for both ssDNA and dsDNA coliphages, the GC1,2 was restricted to a tight range between about 0.45 and 0.55. dsDNA GC3 varied along a wide range, from 0.26 to 0.69, but ssDNA GC3 occupied a narrower range, from 0.30 to 0.54 (). Furthermore, when plotted with a line representing a perfect correlation between GC1,2 and GC3, all but one of the ssDNA phages fell to the left of that line (), indicating a paucity of GC in the third codon position of their coat genes. Conversely, the dsDNA coat genes were GC3-rich or GC3-poor in approximately equal numbers. Past studies have indicated that strong mutational biases often occur with low levels of CUB,34-36
possibly because a strong, non-specific mutational pressure would prevent any persistent, directional changes in the genome. The consistently lower GC3 content of the ssDNA genes suggests that a specific mutational pressure might be reducing GC3 content in a directional manner, which is disrupting the effects of selection for translational efficiency.
Figure 3. GC1,2/GC3 correlation for ssDNA (open squares) and dsDNA (gray squares) coliphage coat genes. Solid line indicates perfect correlation. Points above the line indicate genes deficient in GC3, points below denote genes enriched in GC3. (more ...)
We further investigated the GC3-poor nature of ssDNA coliphage coat proteins with RSCU analysis. It revealed statistically significant variation in use for 15 of 59 codons between ssDNA and dsDNA phage (p < 0.03 for TTG, p < 0.002 for CTT and TCC, p < 0.001 for all other codons, ). Notably, for four of the five codons more frequently used by ssDNA rather than dsDNA coliphages, thymine was in the third position. No codons enriched in dsDNA phage relative to ssDNA phage contained thymine in the third positions.
Figure 4. Mean RSCU values and 95% confidence intervals for individual codons with statistically significant differences in usage between ssDNA (open squares) and dsDNA (gray squares) coliphage coat or tail genes.
Calculation of RSCUs of coat genes in 28 ssDNA phages with a diverse host range confirmed this pattern: codons with thymine in the third position were extremely overrepresented (p < 0.001) for six amino acids (A, D, G, I, T, V), and were significantly favored (p < 0.012) in three more (H, P, S) (). Only one of the remaining nine degenerate amino acids had a statistically preferred codon in ssDNA phages (GAA for E, p < 0.01).
Figure 5. RSCU values and 95% confidence intervals for ssDNA phage coat gene codons that exhibited an NNT codon preference. Preferred NNT codons indicated by bold triangles, NNV codons indicated by squares.
We subdivided our data set to separately examine the two morphologically distinct families of ssDNA phages, the Inoviridae
and the Microviridae
. Because inoviruses are frequently vertically transmitted and can productively infect their hosts without causing lysis, they might be under increased selective pressure to match the genomes of their more permanently associated hosts. RSCU comparisons revealed no consistent patterns associated with phage lifestyle. No difference in RSCU was evident for 11 of the 16 NNT codons in these groups (Fig. S2
Cytosines are comparatively unstable and readily undergo spontaneous deamination to uracil, resulting in C to T transitions after unrepaired replication.37
This spontaneous deamination occurs 100 times more frequently in ssDNA than dsDNA, resulting in a higher mutation rate at cytosines38
than at other bases in ssDNA phage.39
ssDNA phage genomes appear to spend more time truly single-stranded, as they do not experience consistent intra-strand base pairing or regular secondary structure formation while encapsidated.40-45
This causes ssDNA phages to more frequently have unpaired bases than ssRNA genomes, which are constrained by extensive stem-loop formation both in the cytosol and when encapsidated.46
Any thymine-increasing bias does not appear to have a discernible effect on genomic nucleotide content relative to the phages’ primary hosts. Rather, it is likely that cytosine transitions in the first or second positions are subject to strong purifying selection relative to the wobble position,47-49
and the signature of this mutational bias is only observed in the overabundance of thymine in the third position of synonymous ssDNA phage codons. The significant overrepresentation of NNT codons is strongly indicative of a biased mutational pressure acting in concert with strong selection against non-synonymous substitutions.
Genomic architecture (nucleic acid, segmentation, strandedness), while acknowledged as an important characteristic of virus taxonomy, is not typically included in broad-scale analyses of viral evolution. Instead, most comparisons focus within a single kind of virus,50
and while many of these studies have provided insight into the codon usage biases of individual viruses, this is the first observation of a specific bias with a possible mechanistic explanation. Examining across two architectures, we saw strandedness play a critical role in the composition of phage genomes, and in determining the limits of ssDNA viral adaptation to their hosts.