Protein clusters containing viral sequences originating from the VF and LF were also categorized in the context of KEGG pathways, with the addition of a virus structure category, in order to assess the functional potential of Indian Ocean virioplankton. The majority of protein sequences from VF and LF were not mapped to a pathway and remained uncharacterized (VF

=

~80%; LF

=

~37%). This level of functional novelty was not unexpected due to high abundance of hypothetical proteins in each category (VF

=

~68%; LF

=

~50%). Smaller proportions of data were considered poorly characterized (VF

=

~3%; LF

=

~5%) or were not specific to a particular pathway (VF

=

~3%; LF

=

~8%). The remaining sequences were mapped primarily to the Virus Structure, Metabolism and Genetic Information Processing categories (,
Table S5). A heatmap of functional categories (
Figure S1) further demonstrated the partitioning of VF and LF sequences as observed through PCA with differential clustering of the VF and LF. The vast majority of VF (~79%) and LF (~80%) sequences within the genetic information processing pathway were categorized as putative DNA replication, recombination and repair proteins (
Table S5).
The largest proportion of viral sequences within the Metabolism pathway was attributed to energy metabolism, with a slight enrichment in the VF (,
Table S5). A relatively small proportion of these sequences from the VF and LF were classified into lower categories including nitrogen metabolism and oxidative phosphorylation. Although nitrogen metabolism genes have been previously documented in viral metagenomes created from a variety of environmental settings
[11], the exact nature of the genes contributing to this pathway was unclear. Several studies have demonstrated that viruses infecting eukaryotic phytoplankton and zooplankton (likely to be retained in the LF) carry genes of either host or bacterial (i.e prey) origin
[38]–
[45]. However, the majority of these genes are involved with lipid, carbohydrate and protein metabolism and polyamine biosynthesis rather than nitrogen metabolism or oxidative phosphorylation
[46]–
[48]. Only VF sequences possessed enzyme commission (EC) numbers that could be linked directly to the KEGG nitrogen reduction and fixation pathway and these were examined in more detail. The majority of VF sequences within the nitrogen metabolism category were annotated as glutamate synthase (n

=

98), which together with glutamine synthetase, comprise the GS-GOGAT pathway. This pathway facilitates the process of ammonium assimilation in phytoplankton
[49] and is dependent on the availability of nitrogen compounds in the environment. The Indian Ocean is considered an oligotrophic water mass with very low concentrations of available nitrogen
[50], and nitrogen concentrations measured in our samples were indeed indicative of a nitrogen-limited environment (
Table S1). The presence of glutamate synthase genes suggest that viruses may play a role in nitrogen modulation and assimilation during the infection of host cells. Proteins involved in oxidative phosphorylation (OP) pathway were much more abundant than photosynthesis-related proteins, comprising ~30% of VF and ~11% of LF sequences within the energy metabolism category (,
Table S5); with 466 VF and 25 LF sequences possessing EC numbers. NADH dehydrogenase I subunit and inorganic diphosphatase were represented in both the VF (n

=

255 and 53 respectively) and LF (n

=

7 and 18 respectively) while the cbb3-type cytochrome C oxidase subunit I was only detected in the VF (n

=

158). To the best of our knowledge, this is the first report of viral cytochrome C oxidase and inorganic diphosphatase genes in the marine environment. NADH dehydrogenase and cytochrome C oxidase are both components of the electron transport chain in bacteria, which is ultimately used to produce ATP. Viral type I NADH dehydrogenase genes were first reported by Alperovitch-Lavy and colleagues
[51] and were detected through a combined analysis of GOS microbial scaffolds and long PCR amplification of viral fractions collected from the Pacific Line Islands
[52]. Interestingly, the viral NADH dehydrogenase genes were co-localized on viral scaffolds (and amplicons) containing photosystem I and II genes suggesting that cyanophage encode this complex. A subsequent search of GOS scaffolds by Sharon and coworkers (2011) for viral auxiliary metabolic genes also revealed the presence of viral Type I NADH dehydrogenase subunits putatively involved in cyclic electron flow around PSI and respiration during viral infection. Again, these genes were attributed to cyanophage since the majority of scaffolds containing viral auxiliary genes that were examined in this study appeared to be related to know cyanophages
[15]. It's possible that the viral NADH dehydrogenase genes observed in this study are of cyanophage origin due to the abundance of cyanophage- like sequences in the Indian Ocean data. However, the abundance of virus-SAR86 host predictions (discussed later) coupled with the presence of viral cbb3-type cytochrome C oxidases, which are only found in proteobacteria, suggests that viruses that infect heterotrophic bacteria may also be the source of these genes. The enzyme inorganic diphosphatase catalyzes the conversion of diphosphate (Ppi) to phosphate (Pi), which is needed for the production of ATP. Out of the three OP enzymes, inorganic diphosphatase was more evenly distributed between the VF and LF suggesting that a diverse group of viruses may carry this gene. If the viral version of inorganic diphosphatase is expressed and functional during infection, viruses could potentially contribute to host ATP production. This process could temporarily prolong the lifespan of the host and increase replication efficiency, analogous to viral NADH dehydrogenase and PS genes. An alternative hypothesis is that viral inorganic diphosphatase is used to produce Pi for incorporation into viral nucleic acids. Phosphate concentration in the marine environment is thought to influence virus production due to their inherently high nucleic acid to protein ratio
[52]. The ability to influence the availability of phosphate during infection could maximize nucleic acid biosynthesis. Furthermore, a variety of phosphorous metabolism genes have been detected in the genomes of cultivated viruses that infect heterotrophic bacteria, cyanophage genomes
[53],
[54], as well as numerous viral metagenomes
[11],
[12],
[55], suggesting that viruses have developed multiple strategies to address phosphate-limiting conditions. It is now well known that cyanophages carry photosynthesis (PS) related genes, including those associated with photosystems I and II
[14],
[56]–
[59], and the presence of viral PS genes has been documented in numerous marine metagenomic studies
[7],
[12],
[13],
[15],
[18]. However, only a small proportion of VF sequences (0.42%) could be mapped to proteins involved in photosynthesis based on KEGG classification of protein clusters. A direct BLAST analysis of VF and LF sequences using PSI and PSII genes collected from cyanophage genomes (PSII) as well as
Prochlorococcus and
Synechococcus (PSI) (
Table S6) did reveal the presence of additional viral PS genes. The PSII genes psbA and psbD (total

=

6,877) far outnumbered the PSI genes that were previously noted in the marine environment including psaA, psaB, psaC, psaD psaE and psaK (total

=

371) (
Table S7). Viral PSI genes were also noted in previous analyses of GOS microbial metagenomic data including 17 samples collected from the Indian Ocean
[14],
[15]. It is hypothesized that viral PSI components may facilitate electron donation from alternative sources other than plastocyanin to the PSI of their hosts, thereby increasing ATP generation for replication
[14]. The discrepancy in the abundance of viral PSII versus PSI genes in the Indian Ocean data suggests that cyanophage may benefit more from carrying PSII genes, which have been shown to supplement photosynthesis in culture
[60],
[61].