|Home | About | Journals | Submit | Contact Us | Français|
Viruses NY-2A and AR158, members of the family Phycodnaviridae, genus Chlorovirus, infect the fresh water, unicellular, eukaryotic, chlorella-like green alga, Chlorella NC64A. The 368,683-bp genome of NY-2A and the 344,690-bp genome of AR158 are the two largest chlorella virus genomes sequenced to date; NY-2A contains 404 putative protein-encoding and 7 tRNA-encoding genes and AR158 contains 360 putative protein-encoding and 6 tRNA-encoding genes. The protein-encoding genes are almost evenly distributed on both strands, and intergenic space is minimal. Two of the NY-2A genes encode inteins, the large subunit of ribonucleotide reductase and a superfamily II helicase. These are the first inteins to be detected in the chlorella viruses. Approximately 40% of the viral gene products resemble entries in the public databases, including some that are unexpected for a virus. These include GDP-D-mannose dehydratase, fucose synthase, aspartate transcarbamylase, Ca++ transporting ATPase and ubiquitin. Comparison of NY-2A and AR158 protein-encoding genes with the prototype chlorella virus PBCV-1 indicate that 85% of the genes are present in all three viruses.
Members and prospective members of the family Phycodnaviridae constitute a genetically diverse but morphologically similar group of viruses that have eukaryotic algal hosts from both fresh and marine waters. The phycodnaviruses have dsDNA genomes that range in size from 170 kb to 560 kb and the viruses have hundreds of protein-encoding genes (Dunigan et al., 2006; Wilson et al., 2005). The phycodnaviruses, together with the poxviruses, iridoviruses, asfarviruses, and the recently discovered 1.2-Mb Mimivirus, share a common evolutionary ancestor that may have arisen at the point of eukaryogenesis, 2 to 3 billion years ago (Iyer et al., 2006; Raoult et al., 2004; Villarreal and DeFilippis, 2000; Villarreal, 2005). These viruses share nine gene products and at least two of these viral families encode an additional 41 homologous gene products (Iyer et al., 2006). Collectively, these viruses are referred to as nucleocytoplasmic large DNA viruses (NCLDV) (Iyer et al., 2001).
The most studied phycodnaviruses are the chlorella viruses that belong to the genus Chlorovirus (Van Etten, 2003; Yamada et al., 2006). The chloroviruses infect certain fresh water, unicellular, eukaryotic, chlorella-like green algae, which normally exist as endosymbionts in various protists, such as Paramecium bursaria (Kawakami and Kawakami, 1978; Van Etten et al., 1982), Hydra viridis (Meints et al., 1981), and Acanthocystis turfacea (Bubeck and Pfitzner, 2005). The prototype chlorella virus, Paramecium bursaria chlorella virus (PBCV-1), has a 331-kb genome that contains 366 putative protein-encoding genes and a polycistronic gene that encodes 11 tRNAs. PBCV-1 and the two viruses described in this report, NY-2A and AR158, infect Chlorella NC64A (NC64A viruses), an endosymbiont of P. bursaria that was originally isolated in North America. Other viruses infect Chlorella Pbi (Pbi viruses), an endosymbiont of P. bursaria that was isolated in Europe.
To investigate the diversity of the chlorella viruses, we are sequencing the genomes of several additional family members. The current manuscript describes the sequencing and annotation of the 369-kb genome from virus NY-2A and the 345-kb genome from virus AR158. NY-2A was chosen for sequencing for two reasons. First, it has the largest genome of the 36 partially-characterized Chlorella NC64A viruses. Second, its genome is heavily methylated relative to that of the prototype virus PBCV-1 [45% of the cytosines are 5-methylcytosine (5mC) and 37% of the adenines are N6-methyladenine (6mA) versus 1.9% 5mC, 1.5% 6mA]. Virus AR158 was chosen because it was the only NC64A virus that appeared to lack a gene encoding a 94 amino acid potassium ion channel protein (Kcv) that is believed to be involved in viral infection (e.g. Frohns et al., 2006). As reported here, AR158 encodes a truncated 33 amino acid K+ channel protein. The following manuscript (Fitzgerald et al., 2006) describes the sequence and annotation of two viruses that infect Chlorella Pbi.
As part of the chlorella virus genome sequencing effort, a project website has been created at http://greengene.uml.edu. This site contains the genomic DNA sequence assemblies as well as the predicted amino acid sequences of all virus-encoded ORFs and is viewable in text format or through a graphical genome browser. This database also contains the complete annotation for each chlorella virus-encoded ORF. The supplemental data files referenced below are also available at this site.
The NY-2A and AR158 genomes were assembled into contiguous sequences of 368,683-bp and 344,690-bp (Table 1), respectively, which agrees with their predicted sizes determined by pulse-field gel electrophoresis (unpublished results). Since the presumed hairpin termini were not sequenced, the left most nucleotide in the assembled sequences was designated 1.
To orient the NY-2A and AR158 genomes relative to the prototype virus PBCV-1, plots of PBCV-1 proteins and either the NY-2A or the AR158 proteins were compared. These alignments reveal a high degree of gene co-linearity between NY-2A, AR158, and PBCV-1 (Fig. 1). The average G+C content of the NY-2A and AR158 genomes is 40.7%, a concentration similar to the 40.0% G+C content of PBCV-1 (Van Etten et al., 1985).
A putative protein-encoding region, or open-reading frame (ORF), was defined as a continuous stretch of DNA that translates into a polypeptide that is initiated by an ATG translation start codon and extends for 64 or more additional codons. Using this criterion, 886 ORFs were identified in the 369-kb NY-2A genome and 815 ORFs were identified in the 345-kb AR158 genome. The ORF names were based on three criteria. First, the NY-2A ORF names begin with either a “B” for a major ORF (predicted to be a protein-encoding gene) or a “b” for a minor ORF (not considered a true protein-encoding gene). The names for the AR158 ORFs begin with either a “C” for a major ORF or a “c” for a minor ORF. Second, the ORFs were numbered consecutively in the order in which they appeared in the genome after alignment with the PBCV-1 genome. Third, the letter R or L following the ORF number indicates that the transcript runs either left-to-right or right-to-left, respectively. The letters “B” or “b” were chosen to name the NY-2A ORFs, which is the second NC64A virus genome sequenced, and “C” or “c” was chosen to name the AR158 ORFs, which is the third NC64A virus genome sequenced, thus avoiding confusion between the different chlorella viruses. The letters distinguish these virus ORFs from PBCV-1 ORFs (designated with an “A” or “a”).
The 886 NY-2A ORFs and 815 AR158 ORFs were classified into major or minor ORFs based on the following criteria. When an ORF, of either the same or opposite polarity, resided within or significantly overlapped another ORF, the larger ORF was classified as a major ORF and the smaller ORFs were classified as minor. All of the ORFs were analyzed using the non-redundant, Pfam, and COG databases and ORFs predicted to encode a functional protein were classified as major. These conditions led to the prediction that 404 of the 886 NY-2A ORFs and 360 of the 815 AR158 ORFs probably encode proteins.
The Intein Database and Registry (InBase) was used to identify two inteins within the NY-2A ORFs, which are the first inteins identified in the chlorella viruses. The ribonucleotide reductase large subunit (B832R) contains a 337 amino acid intein that resembles an intein named CIV RIR1 (E value = 5E-66) from Chilo iridescent virus (Amitai et al., 2004; Perler, 2002; Pietrokovski, 1998). A second 384 amino acid intein, Lpe Helicase (E value = 1E-31) from Listonella pelagia phage phiHSIC (Paul et al., 2005; Perler, 2002), exists in a putative helicase (B508R).
In addition, several introns exist in the NY-2A and AR158 genomes. The DNA polymerase genes (b249r and c230r) contain an identically located 86-nucleotide splicesome processed intron with 5’-AG/GUGAGU and 3’UGCAG/UU splice site sequences, as well as a predicted branch point UCAC sequence (Grabherr et al., 1992). The DNA polymerase genes from 38 other NC64A viruses also either have an 86 or 101 nucleotide intron at the same position (Zhang et al., 2001).
The pyrimidine dimer-specific glycosylase genes b076l and c064l, have an identical 81-nucleotide splicesomal processed intron that is also present in two other NC64A viruses (Sun et al., 2000). NY-2A contains three additional introns. A 402 nucleotide self-splicing group I intron is found in a putative transcription factor TFIIS gene (b175l); this intron is 99% identical to the intron found in PBCV-1 TFIIS (a125l) (Li et al., 1995). A related intron (75.6% identity to the TFIIS intron) also occurs in NY-2A gene b496l and AR158 gene c437l; the function of this gene product is unknown. This self-splicing intron occurs in one or more genes from 80 other NC64A viruses (Nishida et al., 1998; Yamada et al., 1994). Finally a small intron (13 nucleotides) occurs in the NY-2A tRNATyr; this intron and tRNA are absent in AR158.
GCG software was used to determine several general characteristics and properties for each ORF, including the nucleotide composition of the ORF, the A+T content of the 50 nucleotides upstream of the ORF that is likely to contain the promoter region, the frame in which the putative protein is encoded, the number of amino acids in the encoded protein, the predicted protein molecular weight, and the isoelectric points. These properties are listed in Supplements 1 and 2. Figure 2 reports some general characteristics for both the NY-2A and AR158 major ORFs, including the relative orientation of the ORFs (Fig. 2A, B). The directions in which the ORFs are encoded are slightly skewed in the reverse (~54%) orientation for both viruses. The average size of all the putative NY-2A proteins is 279 amino acids (Fig. 2C), while the average size of all the putative AR158 proteins is 287 amino acids (Fig. 2D); nearly 45% of the proteins are 65 to 200 amino acids long. The predicted pIs of the proteins are depicted in Figure 2E, F. Despite a trend for the proteins to have a pI in the 10–11 pH range, a peak also occurs at pH 4.5. Basic proteins are probably associated with the virion where they presumably help neutralize the positively-charged genomic DNA. However, the functions of the proteins that have pIs in the 4.5 range vary [e.g., an exonuclease (B214R and C200R), a SKP-1 protein (B068L and C056L), a sliding clamp processivity factor protein (PCNA) (B261L and C241L), and an ornithine/arginine decarboxylase (B278R and C256R)]. Figures 2G, H indicate the intergenic space between the major ORFs. Approximately 65% of the ORFs in both viruses are separated by less than 100 nucleotides.
Every ORF was compared with the non-redundant database at NCBI using the criteria described in the Materials and Methods section. The Pfam and the COG databases were used to identify conserved domains and proteins in the NY-2A and AR158 ORFs (Supplements 1 and 2). Gene maps of the NY-2A and AR158 genomes illustrate the location of the putative genes (Fig. 3) and some of the ORFs are listed by their predicted metabolic function (Table 2). Only a few NY-2A and AR158 gene products have been tested for activity (see below). However, we assume that any NY-2A and AR158 encoded proteins that have functional PBCV-1 homologs are also functional.
Eighty-four to ninety-eight percent of the major ORFs are homologous between any two of the three NC64A viruses, i.e. NY-2A, AR158 and PBCV-1. This finding suggests that the majority of major ORFs from the NC64A viruses are essential for virus replication in nature. The average amino acid identity between homologous proteins from PBCV-1 and either NY-2A or AR158 is 73%. There is an average of 87% amino acid identity between NY-2A and AR158 homologs. NY-2A and AR158 contain the 9 genes that are shared by all the NCLDV viruses (Iyer et al., 2001; Iyer et al., 2006).
NY-2A and AR158 have 13 ORFs that are involved in either DNA replication, recombination, or repair, such as δ-DNA polymerase (B249R and C230R), type-II DNA topoisomerase (B781L and C707L), superfamily III helicase (B623L and C562L), DNA primase (B633R and C573R), RNase H (B547R and C491R), exonuclease (B214R and C200R), ATP-dependent DNA ligase (B734R and C658R), two ORFs that resemble sliding clamp processivity factor (PCNA) proteins (B261L, B767L, C241L, and C694L), replication factor C protein (B571L and C519L), and pyrimidine dimer-DNA glycosylase (B076L and C064L) (Table 2).
The NY-2A and AR158 type-II DNA topoisomerases have approximately 40%-amino acid identity with type-II topoisomerases from several eukaryotic organisms. The enzyme is ATP-dependent and functions by threading a double-stranded DNA segment through a transient double-strand break in the DNA (Roca, 1995). PBCV-1 encodes one of the smallest known type II DNA topoisomerases (1061 amino acids) (Lavrukhin et al., 2000). The PBCV-1 enzyme cleaves dsDNAs approximately 30 times faster than the human type II DNA topoisomerase (Fortune et al., 2001). The NY-2A type-II DNA topoisomerase has 90% amino acid identity and is the same size as its PBCV-1 homolog (1061 amino acids). The AR158 type-II DNA topoisomerase has 1062 amino acids and has 89% identity to its PBCV-1 homolog; however, it has 98% amino acid identity to its NY-2A homolog.
Like PBCV-1, NY-2A and AR158 encode two proteins that resemble PCNA proteins. The NY-2A (B261L and B767L) and the AR158 (C241L and C694L) proteins are more similar to their homologs from other organisms than they are to each other. This finding suggests that the viral PCNA genes did not arise recently by gene duplication. PCNA interacts with proteins not only involved in DNA replication but also DNA repair and post-replicative processing, such as DNA methyltransferases and DNA transposases (Warbrick, 2000). Because the chlorella viruses encode proteins involved in both DNA repair and DNA methylation, the two PCNAs may serve different functions in their respective viral life cycles.
No recognizable RNA polymerase or RNA polymerase components have been detected in any of the chlorella viruses that have been sequenced, including NY-2A and AR158. This observation supports the idea that infectious viral DNAs are targeted to the nucleus and that host RNA polymerase(s) initiates viral transcription, possibly in conjunction with virion-packaged transcription factors. NY-2A and AR158 encode at least four putative transcription factor-like elements: TFIIB (B154L and C142L), TFIID (B743R and C669R), TFIIS (B175L and C167L), and VLTF2-type transcription factor (B647R and C586R). However, none of these proteins are packaged in the PBCV-1 virion (Dunigan et al., manuscript in preparation) and are unlikely to be packaged in the NY-2A or AR158 virions. NY-2A and AR158 encode two proteins that are involved in creating a mRNA cap structure, a mRNA capping enzyme (B148R and C137R) and a RNA triphosphatase (B612R and C556R). NY-2A and AR158 also encode a RNase III enzyme (B628R and C568R) that is presumably involved in processing viral mRNAs and/or tRNAs.
In the immediate-early phase of infection, the host is reprogrammed to transcribe viral RNAs, which in PBCV-1 begins 5–10 min p.i. It is not known how this process occurs, but histone methylation may be involved in inhibiting host transcription. PBCV-1 encodes a 119-amino acid protein that contains a SET domain (named vSET) that di-methylates Lys27 in histone 3 (Manzur et al., 2003). vSET is packaged in the PBCV-1 virion and accumulating evidence indicates that vSET could be involved in repressing host transcription after PBCV-1 infection (Manzur et al., manuscript in preparation). NY-2A and AR158 each contain a vSET homolog (B813L and C731L). Furthermore, unlike PBCV-1, NY-2A and AR158 encode a second protein that contains a SET domain (B268L and C245L); B268L and C245L are 190 amino acids long and have 25% amino acid identity to the smaller vSET proteins. In addition to these histone methyltransferases, NY-2A and AR158 encode a putative SWI/SNF family helicase (B738L and C663L) and a SWI/SNF chromatin remodeling complex protein (B258R and C239R). Both proteins are also implicated in chromatin remodeling (Kim and Clark, 2002).
Finally, NY-2A and AR158, as well as all the chlorella viruses, encode a putative cytosine deaminase (B271R and C246R). This observation suggests that either some of the viral transcripts or host transcripts may undergo post-transcriptional editing (Gerber and Keller, 2001).
PBCV-1 was the first virus to encode a translation elongation factor (EF) (Yamada et al., 1993). The PBCV-1 protein has about 45%-amino acid identity to an EF-3 protein from fungi (Belfield and Tuite, 1993; Chakraburtty, 2001). The fungal protein stimulates EF-1 GTP-dependent binding of an amino acyl-tRNA to the ribosome A site. Like fungal EF-3 proteins, the virus-encoded proteins have an ABC transporter family signature and two ATP/GTP-binding site motifs. AR158 encodes a 918 amino acid protein (C788L) that has 93% amino acid identity to PBCV-1 EF-3. However, NY-2A lacks an EF-3 encoding gene.
NY-2A and AR158 have genes that encode proteins involved in post-translational modifications, including prolyl-4-hydroxylase (B126R and C118R), protein kinases (see below), and glycosyltransferases (see below). NY-2A and AR158 also encode a protein disulfide isomerase (B611L and C554L), a SKP-1 protein (B068L and C056L) and a thiol oxidoreductase (B630R and C570R). Additionally, the two viruses encode proteins involved in protein degradation including an ubiquitin C-terminal hydrolase (B150L and C140L), a ring finger ubiquitin ligase (B645L and C584L), and two Zn metallopeptidases (B685L, B803L, C621L, and C726L). NY-2A is the first chlorella virus to encode a 78 amino acid ubiquitin protein (B699L).
The NY-2A and AR158 genomes were analyzed for tRNAs using the tRNAscan-SE program (Lowe and Eddy, 1997). NY-2A is predicted to encode 7 tRNAs: 2 for Leu and 1 each for Arg, Asn, Lys, Tyr, and Val (Table 3). These 7 tRNAs are clustered in a region of the NY-2A genome, nucleotide sequence 194,698 to 195,554. AR158 is predicted to encode 6 tRNAs: 2 for Leu and 1 each for Arg, Asn, Ile, and Val (Table 3). These 6 tRNAs are also clustered in the AR158 genome, nucleotide sequence 172,099 to 172,780. Presumably, the tRNAs are transcribed as a large precursor RNA and processed via intermediates to mature RNAs as they are in chlorella virus CVK2 (Nishida et al., 1999). Only five of the eleven tRNAs encoded by PBCV-1 are found in both NY-2A and AR158; neither NY-2A nor AR158 encode a unique tRNA. Although the orientation of the tRNA genes is the same in all three genomes, their order varies between the viruses (Table 3). None of the tRNAs have a CCA sequence at the 3' end of the acceptor stem. Typically, these three nucleotides are added post-transcriptionally.
One tRNA from NY-2A, tRNATyr, is predicted to contain a 13-nucleotide intron. The insertion of a small intron in the tyrosine tRNA (anti-codon GTA) also occurs in PBCV-1, however, the intron and tRNA are absent in AR158. Codon usage analyses of viral-encoded proteins indicate a strong correlation between the abundance of the viral-encoded tRNAs and their usage in viral proteins.
NY-2A and AR158 encode eight enzymes involved in nucleotide metabolism. These enzymes are important because the DNA concentration in viral-infected cells increases at least four-fold following infection (Van Etten et al., 1984). Therefore, large quantities of dNTPs must be synthesized to support viral DNA replication. NY-2A and AR158 each encode the small (B641R and C579R) and large (B832R and C748R) subunits of ribonucleotide reductase, aspartate transcarbamylase (B222R and C204R), dUTP pryophosphatase (B741L and C667L), deoxycytidylate (dCMP) deaminase (B795R and C718R), glutaredoxin (B592L and C538L), thioredoxin (B581L and C528L) and thymidylate synthase X (B865R and C793R).
PBCV-1 was the first virus found to encode a functional aspartate transcarbamylase, the key regulatory enzyme in the de novo biosynthesis of pyrimidines (Landstein et al., 1996). NY-2A and AR158 each encode an aspartate transcarbamylase that has 84–85% amino acid identity to the PBCV-1 homolog. The NY-2A and AR158 aspartate transcarbamylases have 98% amino acid identity to each other.
Two NY-2A and AR158-encoded enzymes, dUTP pyrophosphatase and dCMP deaminase, produce dUMP, the substrate for thymidylate synthetase. The chlorella viruses, including NY-2A and AR158 lack a traditional thymidylate synthetase A. Instead, they encode a protein that is a member of a new family of flavin-dependent thymidylate synthetases called ThyX (Graziani et al., 2004; Myllykallio et al., 2002).
Both NY-2A and AR158 encode several Ser/Thr protein kinases (Table 2) and a protein that resembles a dual-specificity phosphatase (B430L and C378L). The large number of viral-encoded proteins involved in phosphorylation/dephosphorylation suggests that they are involved in one or more signal transduction pathways that are important for virus replication. AR158, but not NY-2A or PBCV-1, also encodes an 870 amino acid Ca2+ transporting ATPase (C785L).
The chlorella viruses are the first viruses to encode K+ channel proteins (called Kcv) and Kcv genes from 40 NC64A viruses, including NY-2A (B336R), have been expressed successfully in Xenopus oocytes (Kang et al., 2004; Plugge et al., 2000). AR158 is unusual in that it only has a 33 amino acid Kcv (C305aR) homolog. The AR158 Kcv homolog lacks the N-terminal portion of the protein including the predicted transmembrane 1 and pore helix domains (Fig. 4). The truncated protein has 100% amino acid identity to the C-terminal portion of NY-2A Kcv.
PBCV-1, NY-2A, and AR158 encode several proteins with high identities to enzymes involved in either manipulating sugars, synthesizing polysaccharides, or transferring sugars to proteins. Two of the viral encoded enzymes, GDP-D-mannose dehydratase (GMD) (B163R and C155R) and fucose synthase (B395L and C344L), comprise a three-step pathway that converts GDP-D-mannose to GDP-L-fucose. Unexpectedly, the PBCV-1 GMD differs from other GMDs because in addition to the dehydratase activity, the protein also has a strong stereospecific NADPH-dependent reductase activity that produces GDP-D-rhamnose (Tonetti et al., 2003). Both fucose and rhamnose are present in the glycans attached to the PBCV-1 major capsid protein (Nandhagopal et al., 2002; Wang et al., 1993).
NY-2A and AR158 also encode a glucosamine synthetase (B143R and C132R). NY-2A encodes one UDP-glucose dehydrogenase (B465R) while the AR158 has two genes (C413R and C729L) that encode the enzyme; thus AR158 is similar to the chlorella virus CVK2 (Ali et al., 2005). Unlike PBCV-1, neither NY-2A nor AR158 have a hyaluronan synthase homolog. Instead, NY-2A and AR158 encode two chitin synthases (B139R, B472R, C128R, and C418R) like some other NC64A viruses (Ali et al., 2005) Chitin, rather than hyaluronan, is formed on the external surface of cells infected with viruses that encode chitin synthases (Ali et al., 2005; Kawasaki et al., 2002)
NY-2A encodes three (B159R, B618R, and B736L) and AR158 encodes four (C150R, C265R, C559R, and C661L) glycosyltransferases, which are probably involved in glycosylation of the virus major capsid proteins (Graves et al., 2001). NY-2A and AR158 also encode a putative polysaccharide deacetylase (B469L and C415L) which is absent in PBCV-1.
NY-2A and AR158 encode several proteins involved in lipid metabolism including lipoprotein lipase (B550R and C494R), lysophospholipase (B354L and C315L), N-acetyl-transferase (B853L and C767L), glycerophosphoryl diesterase (B075L and C063L), and patatin phospholipase (B226L and C208L).
NY-2A and AR158 encode five proteins that may be involved in degrading Chlorella NC64A cell walls either during virus infection or virus release. These proteins include a chitinase (B239R and C220R), a chitosanase (B393L and C342L), a β-1, 3-glucanase (B137L and C126L), and two polysaccharide lyases that cleave chains of either β- or α-1, 4 linked glucuronic acids (B288L, B468R, C263L, and C414R) (Sugimoto et al., 2004). The larger polyglucuronic acid lyases (B288L and C263L) are 306 amino acids in length and have 72% amino acid identity to a PBCV-1 A215L homolog. In contrast, the smaller (251 amino acids) polyglucuronic acid lyases (B468R and C414R) are 100% identical to each other but only have 36% amino acid identity to the PBCV-1 A215L homolog.
Chlorella viruses contain different concentrations of 5mC and 6mA in their genomes (Van Etten et al., 1991). Therefore, it is not surprising that these viruses encode multiple 5mC and 6mA DNA methyltransferases (MTase). The NY-2A genome is heavily methylated (45% 5mC and 37% 6mA). The level of methylation in the AR158 genome is unknown. NY-2A encodes 18 DNA MTases (Table 4): eleven are 6mA MTases and 7 are 5mC MTases. AR158 encodes 16 MTases (Table 4): nine are 6mA MTases and 7 are 5mC MTases.
The DNA sequences methylated by some of the NY-2A MTases are known (Table 4) (Chan et al., 2006; Zhang et al., 1998). NY-2A also encodes two companion site-specific endonucleases, CviQI (B542R) and Nt.CviQII (B361R). CviQI creates double-stranded breaks 5' to the T in the palindromic sequence G/TAC (Xia et al., 1987), whereas Nt.CviQII makes single-stranded breaks 5' to the A in R/AG sequences (Chan et al., 2006; Zhang et al., 1998). Both endonucleases are inhibited when 6mA is present in their cleavage sites.
Nothing is known about the specificity of the AR158 DNA MTases and possible site-specific endonucleases. However, all 16 of the AR158 MTases are homologous to one of the NY-2A MTases and so one can predict some of the sequences they methylate. For example, ORF C486R has 100% amino acid identity with the NY-2A CviQI restriction endonuclease and so it cleaves G/TAC sequences. In contrast, the NY-2A nicking enzyme Nt.CviQII only has 37% amino acid identity with AR158 ORF C618L. However, C618L has 94% amino acid identity with another chlorella virus nicking enzyme, Nt.CviPII, which cleaves at /CCD sequences (Chan et al., 2004; Xia et al., 1988). Therefore, C618L is predicted to encode a nicking endonuclease, probably cleaving at CCD sequences.
NY-2A has 6 ORFs that resemble bacterial transposases and AR158 has 4 such ORFs (Table 2). Three of the NY-2A transposases have internal resolvases whereas only one AR158 transposase has an internal resolvase. NY-2A and AR158 also encode an ORF that resembles a Tlr 6Fp DNA mobile protein (B168R and C160R). This protein is a member of a family of genetic elements limited to Tetrahymena thermophila (Wuitschick et al., 2002).
In addition to the transposases and resolvases, NY-2A contains 33 ORFs and AR158 contains 21 ORFs with motifs found in homing endonucleases (Table 2). Homing endonucleases are rare DNA-cleaving enzymes typically encoded by introns and inteins. These endonucleases are classified into four families (Belfort and Roberts, 1997). Fifteen of the NY-2A ORFs are members of the GIY-YIG family, and eighteen are members of the HNH family. The AR158 genome codes for 12 ORFs that are members of the GIY-YIG family, and 9 are members of the HNH family. Thus, collectively, the NY-2A and AR158 viruses have many protein encoding genes that could facilitate DNA rearrangements either within or between viruses, and possibly with its hosts. These genes are distributed throughout both virus genomes. However, it is not known if any of these virus-encoded transposases, resolvases, and homing endonucleases, including ones encoded by PBCV-1, are functional.
PBCV-1 was the first virus reported to encode polyamine biosynthetic enzymes, including two pathways to synthesize putrescine. All four PBCV-1 genes, which encode functional enzymes (Kaiser et al., 1999; Morehead et al., 2002; Baumann et al., manuscript in preparation), are also present in NY-2A and AR158. These enzymes are ornithine/arginine decarboxylase (ODC) (B278R and C256R), agmatine iminohydrolase (B844R and C759R), N-carbamoylputrescine amidohydrolase (B116R and C104R), and homospermidine synthase (B305R and C286R). ODC catalyzes the first and rate-limiting step in polyamine biosynthesis, the decarboxylation of ornithine to putrescine (Davis et al., 1992). The PBCV-1 ODC is the smallest ODC characterized to date (372 amino acids) (Morehead et al., 2002). Unexpectedly, the PBCV-1 enzyme decarboxylates arginine more efficiently than ornithine (Shah et al., 2004). NY-2A and AR158 each encode a 372 codon ODC, with 83–86% amino acid identity to its PBCV-1 homolog. The product of arginine decarboxylation is agmatine; agmatine iminohydrolase and N-carbamoylputrescine amidohydrolase convert agmatine to putrescine.
Homospermidine synthase synthesizes homospermidine, a rare polyamine, from two molecules of putrescine (Kaiser et al., 1999). The NY-2A and AR158 homospermidine synthase homologs have 93% amino acid identity with the PBCV-1 enzyme.
Polyamines putrescine, spermidine, and spermine are common in cells and also are structural components of many viruses, where they help to neutralize viral nucleic acids (Tyms et al., 1990). PBCV-1 virions, as well as uninfected and virus-infected Chlorella NC64A cells, contain putrescine, cadaverine, spermidine, and homospermidine. However, it is unlikely that these polyamines are important in neutralizing the chlorella virus DNA because the number of polyamine molecules per virion could only neutralize ~0.2% of the DNA phosphate residues (Kaiser et al., 1999).
NY-2A and AR158 encode two additional enzymes involved in amine metabolism, monoamine oxidase (B289L and C264L) and histidine decarboxylase (B796L and C719L). The finding that PBCV-1, NY-2A and AR158 each encode these six proteins suggests that they must serve some important role(s) in the replication of the viruses.
NY-2A and AR158 also encode several other putative proteins, including an ABC transporter protein (B606L and C551L), an amidase (B371L and C327L), and an AAA+ class ATPase (B073L and C061L). Finally, NY-2A and AR158 each encode a FkbM methyltransferase (B183L and C173L) that has some similarity to an enzyme from several Mycobacterium species that methylates rhamnose (Jeevarajah et al., 2002).
NY-2A ORF B585L and AR158 ORF C532L have the highest amino acid identity (96%) and are the same size as the PBCV-1 major capsid protein, A430L. Therefore, we assume that B585L and C532L are the NY-2A and AR158 major capsid proteins, respectively. However, NY-2A and AR158 have additional ORFs that have significant amino acid sequence identity to the PBCV-1 major capsid protein. For example, B617L and C558L have 94% amino acid identity and are the same size as A430L. Other NY-2A and AR158 ORFs, including B059R, B529R, B748L, B825L, C048R, C470R, C675L, and C741L, have 33 to 41% amino acid identity to A430L.
A total of 131 of the NY-2A ORFs resemble 1 or more other NY-2A ORFs based on a blastp search with an E-value of less than 10−10, suggesting that they might be either gene families or gene duplications. However, this number is somewhat misleading since some of these ORFs are grouped as families because they contain a common conserved domain, e.g. ankyrin repeats or a PAPK repeat, even though the amino acid sequence similarity of the rest of the protein is small. A total of 13 families have two members, 4 families have three members, 3 families have four members, 4 families have five members, 2 families have six members, 2 families have seven members, 1 family has eight members, 1 family has 10 members, and 1 family has seventeen members.
A similar analysis indicates that 125 of the AR158 ORFs resemble 1 or more other AR158 ORFs. A total of 17 families have two members, 1 family has four members, 3 families have five members, 1 family has six members, 3 families have seven members, 1 family has nine members, 1 family has twelve members, and 1 family has twenty-four members.
The 368,683-bp NY-2A genome and the 344,690-bp AR158 genome, the largest chlorella viral genomes sequenced to date, are predicted to encode 404 and 360 proteins as well as 7 and 6 tRNAs, respectively. The putative protein-encoding genes are relatively evenly distributed on both strands and intergenic space is minimal. Approximately 40% of the gene products have been identified; some have prokaryotic characteristics whereas others resemble eukaryotic proteins. Approximately 85% of the NY-2A, AR158 and PBCV-1 protein-encoding genes are homologous, suggesting that these proteins are important in NC64A virus replication. However, there are some interesting exceptions in which a gene is only present in one virus, e.g. ubiquitin is only encoded by NY-2A, Ca++ transporting ATPase is only encoded by AR158, and Cu/Zn superoxide dismutase is only encoded by PBCV-1. Consequently, the total number of genes in the chlorella virus gene pool exceeds that of a single isolate.
Plaque-forming viruses NY-2A and AR158 were isolated from fresh-water samples collected in New York state (August, 1984) and Buenos Aires, Argentina (August, 1997), respectively. The NY-2A and AR158 host, Chlorella NC64A, was grown on MBBM medium (Van Etten et al., 1983). The NY-2A and AR158 viruses were produced, purified, and the viral DNAs were isolated using methods and protocols developed for PBCV-1 (Van Etten et al., 1981; Van Etten et al., 1983). Both DNAs were sequenced to 8 fold coverage and assembled at The Institute for Genomic Research (TIGR).
A potential protein-encoding region, or ORF, was defined as a continuous stretch of DNA that translated into a polypeptide initiated by an ATG translation start codon and extended for 64 or more codons using the standard genetic code. The ORF Finder program (http://bioinformatics.org/sms/orf_find.html) was used to identify all potential ORFs that met this criterion. The ORFs were numbered consecutively starting at the beginning of the genome (as determined by alignment with the PBCV-1 genome). The letter R or L following the number indicates that the orientation of the putative ORF is either left-to-right or right-to-left, respectively.
Dot plots of the virus major ORFs were created to determine the orientation of the NY-2A and AR158 genomes relative to the PBCV-1 genome. Every major ORF was individually plotted against the PBCV-1 major ORFs using blastp (protein vs. protein). Similarities between the two ORFs with E-values <10−3 are presented. Putative tRNA genes were identified using the tRNAscan-SE program developed by Lowe and Eddy at Washington University School of Medicine in 1997 (Lowe and Eddy, 1997). Gene families were identified when a major ORF had an E-value of less than 10−10 to another ORF within the same genome.
Each ORF identified was used in a search for homologs using the protein-protein BLAST (blastp) program (Altschul et al., 1990) against the non-redundant (NR) protein databases at NCBI. The criterion used to search the NR database was as follows: Scoring matrix = blosum62. Each putative identified ORF was scanned for potential functional attributes using Pfam version 18.0 (Finn et al., 2006). Every identified ORF was additionally scanned to determine if it belonged to a particular COG. In each of the analyses the top 10 results were recorded regardless of the E-values.
The NY-2A and AR158 DNA sequences have been deposited in the GenBank database (accession number DQ491002 and DQ491003, respectively) and the sequences can also be found at http://greengene.uml.edu.
We thank James Gurnon for preparing the virus DNAs and Shmuel Pietrokovski for identifying the inteins in NY-2A. This investigation was supported in part by National Science Foundation grant EF-0333197 (MG and JVE), by National institutes of Health grant GM32441 (JVE) and by the Center of Biomedical Research Excellence program of the National Center for Research Resources Grant P20-RR15635 (JVE).
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.