|Home | About | Journals | Submit | Contact Us | Français|
Heterosigma akashiwo virus (HaV) is a large double-stranded DNA virus infecting the single-cell bloom-forming raphidophyte (golden brown alga) H. akashiwo. A molecular phylogenetic sequence analysis of HaV DNA polymerase showed that it forms a sister group with Phycodnaviridae algal viruses. All 10 examined HaV strains, which had distinct intraspecies host specificities, included an intein (protein intron) in their DNA polymerase genes. The 232-amino-acid inteins differed from each other by no more than a single nucleotide change. All inteins were present at the same conserved position, coding for an active-site motif, which also includes inteins in mimivirus (a very large double-stranded DNA virus of amoebae) and in several archaeal DNA polymerase genes. The HaV intein is closely related to the mimivirus intein, and both are apparently monophyletic to the archaeal inteins. These observations suggest the occurrence of horizontal transfers of inteins between viruses of different families and between archaea and viruses and reveal that viruses might be reservoirs and intermediates in horizontal transmissions of inteins. The homing endonuclease domain of the HaV intein alleles is mostly deleted. The mechanism keeping their sequences basically identical in HaV strains specific for different hosts is yet unknown. One possibility is that rapid and local changes in the HaV genome change its host specificity. This is the first report of inteins found in viruses infecting eukaryotic algae.
Heterosigma akashiwo virus (HaV) is a large double-stranded DNA (dsDNA) virus that infects the single-cell raphidophyte (golden brown alga) H. akashiwo (28). This host organism is a typical, harmful, bloom-forming phytoplankton (20). HaV specifically infects this microalga and has no other known hosts. Intensive field surveys have shown that (i) HaV infection influences both the total abundance and the clonal composition of host blooms, with infection being one of the significant factors terminating H. akashiwo blooms (26, 47, 50), and (ii) distinct types of virus and host coexist in natural waters, with HaV strains infecting specific host types and with different host types being sensitive to infections by certain HaV strains (26, 47, 50). In contrast, only scant HaV genomic data are currently available (27); a sequence fragment from a putative ATPase gene was determined and shown to have some similarity to that of the typical algal virus Paramecium bursaria Chlorella virus 1 (PBCV-1) (54). Indeed, HaV has some properties in common with the virus family Phycodnaviridae (52), i.e., a large dsDNA genome, a large capsid size (>100 nm in diameter), a polyhedral morphology, and no obvious external membrane or tails. Genomic data would permit the study of HaV's phylogeny, its molecular biology, and possibly the factors governing its interactions with its raphidophyte host.
Eukaryotes, prokaryotes, and some viruses include similar B-family DNA polymerase genes (55). B-family DNA polymerase is one of the few conserved genes consistently present in dsDNA viruses. These genes are thus suitable for studying the viruses' phylogenetic relationships to one another (e.g., see references 6, 8, 9, 43, and 55). Several Phycodnaviridae genera (Chlorovirus, Prasinovirus, Prymnesiovirus, and Phaeovirus) were established on the basis of full or fragmentary DNA polymerase sequences, morphological features, and propagation characteristics (52).
During genomic sequencing of HaV, we identified its DNA polymerase gene (as well as several conserved viral genes) (data not shown). The DNA polymerase gene was found to include an intein element. Inteins are inserted in the coding regions of other proteins, are translated with them, and then are autocatalytically spliced out to form the mature product (35). Inteins are selfish genetic elements, as they are not known to confer any advantage to their host proteins and species. They survive by several strategies (16, 40). They have a negligible impact on the fitness of their host genes due to the efficient autoprocessive removal of their encoded protein domains from their host precursor proteins. They are integrated in conserved positions of essential genes, where they are difficult to remove without compromising their host genes. Finally, most inteins include an endonuclease (EN) domain that enables them to mediate the insertion of their gene into unoccupied integration points (homing sites) (4).
In the present study, we (i) characterize the host specificities of 10 HaV strains, including measurements of the HaV genome size and an examination of the HaV DNA polymerase gene sequence and its transcription; (ii) analyze the phylogeny of HaV by comparing its DNA polymerase with those of other viruses; (iii) describe the features and phylogeny of the HaV DNA polymerase intein; (iv) assess the implications of the diversity of this intein among HaV strains with different intraspecies host ranges; and (v) discuss possible roles of viruses in intein dispersion.
Ten HaV strains and six Heterosigma akashiwo strains were used for the present experiments (Table (Table1),1), all of which were free from bacterial contamination. Among them, HaV01 was the only strain isolated from Unoshima Port, Fukuoka Prefecture, Japan. The other virus strains and all host strains were isolated from Hiroshima Bay, Japan. Algal cultures were grown in modified SWM3 medium enriched with 2 nM Na2SeO3 (21, 22), with a 12-h-12-h light-dark cycle of 130 to 150 μmol photons m−2 s−1 generated with cool white fluorescent illumination at 20°C. Because HaV strains have strain-specific host ranges (29), each virus strain was reproduced by using a suitable host strain (Table (Table1).1). In order to verify the differences in host specificities of HaV strains, we conducted a cross-reactivity test; briefly, an aliquot (800 μl) of vigorously growing host culture was challenged with a fresh suspension (50 μl) of each virus strain and monitored for lytic activity by optical microscopy during incubation under the conditions described above. Cultures that showed no symptoms of lysis for 2 weeks were considered unsuitable hosts for the virus strain.
Twenty liters of a vigorously growing culture of H. akashiwo H93616 was inoculated with HaV01 at a multiplicity of infection of >1 and lysed under the conditions described above. The resulting lysate was concentrated 50-fold by use of a 100,000-Da-cutoff membrane (Amicon DC10L; Millipore). It was centrifuged at 4,500 × g at 4°C for 10 min, and the supernatant was sequentially passed through 8.0-μm- and 0.8-μm-pore-size filters (Nuclepore) to remove cellular debris. The filtered supernatants were ultracentrifuged at 20,000 × g at 4°C for 30 min. The resultant viral pellet was washed with 10 ml of sterilized water, centrifuged again at 4,500 × g at 4°C for 10 min, and then passed through 0.45-μm-pore-size filters to remove cellular debris.
The genome size of HaV01 was determined by pulsed-field gel electrophoresis. An aliquot of the virus suspension was ultracentrifuged at 20,000 × g at 4°C for 30 min, and the resultant viral pellet was resuspended in 500 μl TE buffer (10 mM Tris-HCl base, 1 mM EDTA, pH 8.0). Equal volumes of virus concentrate and molten 1% agarose L (Nippon Gene Co., Ltd.) were mixed, dispensed into plug molds, and left to solidify. The plugs were then punched out of the molds into a small volume of digestion buffer containing 250 mM EDTA, 1% sodium dodecyl sulfate, and 1 mg ml−1 of proteinase K (Wako Pure Chemical Industries, Ltd.) and incubated in the dark at 50°C overnight. The digestion buffer was decanted and the plugs were washed four times for 1 h each time in TE buffer. Agarose plugs were stored at 4°C in storage buffer (20 mM Tris-HCl, 50 mM EDTA, pH 8.0). They were next placed into the wells of a 1.2% SeaKem GTG agarose (FMC-Bioproducts) gel in 0.5× TBE gel buffer (90 mM Tris-borate and 1 mM EDTA, pH 8.0) with an overlay of molten 0.7% agarose L. Molecular weight markers (agarose plugs containing lambda phage concatemers; Bio-Rad) were also run in the gel. The samples were electrophoresed by use of a Gene Navigator system (Amersham Biosciences) operating at 180 V, with a pulse interval of 30 s at 13°C for 12 h in 0.5× TBE tank buffer (45 mM Tris-borate and 1 mM EDTA, pH 8.0). Following electrophoresis, nucleic acids were visualized by staining of the gel for 30 min with ethidium bromide (Nakalai Tesque, Inc.).
For viral DNA preparation before sequencing, each viral suspension was treated with proteinase K (final concentration, 1 mg ml−1; Wako Pure Chemical Industries, Ltd.) and sarcosyl (final concentration, 1%; Internal Technologies, Inc.) in sterilized water at 55°C for 1.5 h, and then the genomic DNA was extracted by the use of MagExtractor (Toyobo Co. Ltd.) according to the manufacturer's instructions. A shotgun library was generated to facilitate DNA sequencing of the viral genome; briefly, the purified genomic DNA was physically destructed by the use of Hydroshear (GeneMachines) and then separated in a 1× Tris-acetate-EDTA-agarose gel. A fraction corresponding to fragments of 1.5 to 2 kbp in length was excised from the gel and purified through GenElute agarose spin columns (Supelco, Inc.) according to the manufacturer's recommendations. The purified DNA fragments were blunt ended by T4 DNA polymerase (Takara Bio Inc.) and ligated to EcoRI-SmaI linker adapters (Takara Bio Inc.). The resultant fragments were treated with T4 polynucleotide kinase, and the inserts were ligated into the Lambda ZAPII/EcoRI insertion vector (Stratagene) to construct a shotgun library. Sequencing was conducted by means of the dideoxy method using an ABI PRISM 3100 genetic analyzer (Applied Biosystems). Fragmental sequences were assembled with Genetyx-Win software (Software Development Co., Ltd.) and the Genome Gambler system (Xanagen Inc.). Annotation was also conducted by using BLAST (basic local alignment research tool). Although whole-genome sequencing is still under way, a putative DNA polymerase gene with an inserted intein element (see below) was found, as well as several conserved viral genes, through the processes mentioned above.
Based on the nucleotide sequence of the HaV01 DNA polymerase gene, a primer set comprised of DPC-F (5′-TAC AGG GTA AGA AAG ACC GTA ATG-3′) and DPC-R (5′-GGT TTA TCG ACA CGT GTC CAT ATC-3′) was designed to amplify a core fragment of the DNA polymerase gene approximately corresponding to the region between primers AVS1 and AVS2 established by Chen and Suttle (9). The use of this primer set on reverse-transcribed HaV01 cDNA would test if the intein sequence was removed by RNA splicing, resulting in a fragment of 574 bp, or not, resulting in a fragment of 1,433 bp.
Another primer set, IT-F (5′-CAT CGA GCG TAA CGA CAA AAG G-3′) and IT-R (5′-CTA TCG GTT GTG GAA AAG TCT TGC-3′), was designed to amplify and directly sequence intein elements inserted at the conservative amino acid motif YDG/TDS, with the slash representing the intein insertion point (see below).
To verify whether the intervening sequence was spliced at the mRNA stage and when and if the DNA polymerase gene was transcribed, we conducted a reverse transcription-PCR (RT-PCR) experiment. Two liters of a vigorously growing culture of H. akashiwo H93616 was inoculated with HaV01 at a multiplicity of infection of >1, and an aliquot (~250 ml) was withdrawn at 0, 1, 3, 5, 7, 9, 12, and 24 h postinoculation. The cells in each sample were collected by centrifugation at 2,000 × g at 4°C for 10 min. The total RNA was then isolated by use of an RNeasy plant mini kit (QIAGEN Inc.), treated with DNase I (final concentration, 0.2 mg ml−1; Promega Co., Ltd.) for 1 h at 37°C, and collected by phenol-chloroform extraction. For RT-PCRs, High Fidelity RNA PCR kits (Takara Bio Inc.) were used; briefly, 0.5 μg of the total RNA was reverse transcribed by use of the oligo(dT) adaptor primer FB and Bca PLUS reverse transcriptase. PCR amplification was performed with 20-μl mixtures containing ~500 ng of template cDNA prepared as described above (the sample from time zero served as a negative control) or HaV01 genomic DNA (positive control), 1× EX Taq buffer (Takara Bio Inc.), each deoxynucleoside triphosphate at a concentration of 200 μM, 20 pmol (each) of the primers DPC-F and DPC-R, and 1 U of EX Taq DNA polymerase (Takara Bio Inc.) by use of the GeneAmp PCR system 9700 (Applied Biosystems) according to the following cycle parameters: denaturation at 95°C (40 s), annealing at 53°C (30 s), and extension at 72°C (1 min). Following 30 rounds of amplification, the PCR products were electrophoresed in 2% (wt/vol) agarose S (Nippon Gene Co., Ltd.) gels, and the nucleic acids were visualized by ethidium bromide staining.
The viral genomic DNAs of the 10 HaV strains (Table (Table1)1) were prepared according to the method described above. PCR amplification was performed with 20-μl mixtures containing ~500 ng of template viral DNA (with HaV01 DNA serving as a positive control) or the H. akashiwo H93616 DNA genome uninfected by viruses (as a negative control), 1× EX Taq buffer (Takara Bio Inc.), each deoxynucleoside triphosphate at a concentration of 200 μM, 20 pmol (each) of the primers IT-F and IT-R, and 1 U of EX Taq DNA polymerase (Takara Bio Inc.) by use of the GeneAmp PCR system 9700 (Applied Biosystems) according to the following cycle parameters: denaturation at 95°C (40 s), annealing at 53°C (30 s), and extension at 72°C (1 min). Following 30 rounds of amplification, the PCR products were electrophoresed in 1% (wt/vol) agarose S gels to verify if a single band was obtained from each sample. Each amplicon (2.5 μl) was added to 1 μl ExoSAP-IT (Amersham Biosciences), incubated at 37°C for 15 min and 80°C for 15 min, and directly sequenced with the primers IT-F and IT-R by the dideoxy method using an ABI PRISM 3100 genetic analyzer.
Phylogenetic analyses were performed with the amino acid sequences of the DNA polymerases and inteins. Phylogenetic trees were calculated from confidently aligned regions of homologous proteins by use of the PHYLIP v3.61 (12) and MOLPHY v2.3b3 (1) packages. DNA polymerase sequences from various representative viruses and several environmental samples that were very likely to be related viruses were multiply aligned with the MEME program (3). This identified 10 ungapped conserved motif regions, totaling 198 amino acids, in each sequence. These regions were also found in the human (and other) DNA polymerase delta sequence, which was added as an outgroup to the viral DNA polymerase sequences. The same analysis of full-length viral DNA polymerases identified 10 regions, totaling 216 amino acids, in each sequence. These regions partially overlapped the 10 regions mentioned above but were distributed across entire sequences and could be identified in more virus families. A preliminary analysis showed that the HaV and other viral DNA polymerases are a definite subcluster of delta-type DNA polymerases (data not shown). A multiple sequence alignment of the DNA polymerase amino acid sequence of HaV01 and those of four algal viruses (PBCV-1, NY-2A, FsV, and EsV-1) whose DNA polymerase genes were fully sequenced was conducted by using the CLUSTAL W program (48) to verify the positions of specific motifs in the HaV DNA polymerase.
Intein sequence motifs of the protein splicing domain were identified according to the method of Pietrokovski (39). These six motifs contained a total of 75 conserved positions. All were present in all inteins, except for motif N4, which was missing from or divergent in some inteins. The Drosophila melanogaster hedgehog protein HINT domain (19) was used as an outgroup for the inteins.
The DNA polymerase sequences used for phylogenetic analysis were as follows (scientific name, with abbreviation in parentheses, followed by the database accession number [referring to the National Center for Biotechnology Information database unless otherwise stated]): African swine fever virus (ASFV), P43139; Autographa californica nuclear polyhedrosis baculovirus (AcNPV), P18131; Chilo iridescent virus (CIV), 5725646; Chlorella virus NY-2A (NY-2A), AAA88827; Chrysochromulina brevifilum virus PW1 (CbV-PW1), AAB49739; Ectocarpus silliculosus virus (EsV-1), NP_077578; Emiliania huxleyi virus 86 (EhV86), sequence obtained from the Sanger Institute (http://www.sanger.ac.uk/Projects/EhV); Epstein-Barr virus (EBV), CAA24805; Feldmania species virus (FsV), AAB67116; fowlpox virus (FowPV), NP_039057; guinea pig cytomegalovirus (GpcMV), AAA43832; herpes simplex virus type 1 (HSV-1), CAA28464; human DNA polymerase delta catalytic subunit (HsDPOD), P28340; human herpesvirus 3 (VzV), P09252; human herpesvirus 5 (HcMV), P08546; Lymantria dispar multicapsid nuclear polyhedrosis baculovirus (LdMNPV), BAA02036; Micromonas pusilla virus SP1 (MpV-SP1), AAB66713; mimivirus DNA polymerase (MimiV), YP_142676 (originally kindly provided by Ogata and Claverie); mouse cytomegalovirus 1 (McMV), P27172; Orgyia pseudotsugata multicapsid polyhedrosis baculovirus (OpMNPV), Q83948; Paramecium bursaria Chlorella virus (PBCV-1), AAC00532; Phaeocystis globosa virus 03T (PgV-03T), AAR05090; red sea-bream iridovirus (RSIV), 6015024; regular mosquito iridovirus (RMIV), 21668345; suid herpesvirus 1 (PrV), AAA74383; unidentified phycodnavirus isolate (PSB99-1), 15593089; unidentified phycodnavirus isolate (PSC99-2), 15593104; unidentified phycodnavirus isolate (SIA99-1), 15593080; unidentified phycodnavirus isolate (SO98-4), 15593036; unidentified phycodnavirus isolate (SO98-5), 15593039; unidentified phycodnavirus isolate (BSB99-2), 15593074; unidentified phycodnavirus isolate (BSA99-5), 15593053; unidentified phycodnavirus isolate (MIB99-1), 15593083; unidentified phycodnavirus isolate (MIB99-2), 15593086; unidentified probable virus isolate (2098319), 44566937; variola virus (VARV), P33793; and the HaV01 DNA polymerase sequence presented in this paper (HaV01), AB194136 (DDBJ accession number).
The intein sequences used for phylogenetic analysis were all inserted into type B DNA polymerases. The organisms' scientific names, with intein names in parentheses, and database accession numbers (referring to the National Center for Biotechnology Information database unless otherwise stated) were as follows: D. melanogaster hedgehog HINT domain (Dme HH), Q02936; Haloarcula marismortui ATCC 43049 (Hma Pol), YP_136425.1; Haloferax volcanii (Hvo Pol), CAG38139.1; mimivirus (MimiV Pol), YP_142676 (originally kindly provided by Ogata and Claverie); Nanoarchaeum equitans Kin4-M (Neq Pol), NP_963362.1; Pyrococcus horikoshii (Pho Pol), O59610; Pyrococcus sp. strain GB-D (Psp Pol-1), Q51334; Methanococcus jannaschii (Mja Pol-1), Q58295; M. jannaschii (Mja Pol-2), Q58295; Thermococcus aggregans (Tag Pol-1), CAA73475.1; T. aggregans (Tag Pol-2), CAA73475.1; T. aggregans (Tag Pol-3), CAA73475.1; Thermococcus fumicolans (Tfu Pol-1), P74918; T. fumicolans (Tfu Pol-2), P74918; Thermococcus hydrothermalis (Thy Pol-1), CAC18555.1; T. hydrothermalis (Thy Pol-2), CAC18555.1; Thermococcus kodakaraensis KOD1 (Tko Pol-1), S71551; T. kodakaraensis KOD1 (Tko Pol-2), S71551; Thermococcus litoralis (Tli Pol-1), S42459; T. litoralis (Tli Pol-2), S42459; Thermococcus sp. strain GE8 (Tsp-GE8 Pol-1), CAC12850.1; Thermococcus sp. strain GE8 (Tsp-GE8 Pol-2), Q9HH84; and the HaV01 DNA polymerase intein sequence presented in this paper (HaV01), AB194136 (DDBJ accession number).
The sequences obtained for the HaV01 DNA polymerase and the HaV01 DNA polymerase intein are available under DDBJ accession number AB194136.
The host ranges of 10 HaV strains for Heterosigma akashiwo strains were examined (Table (Table1).1). The HaV strains can be roughly divided into four groups according to their host ranges: HaV01 is lytic to strains H00-46, H00-58, and H93616; HaV117, HaV128, and HaV161 are lytic to strains H93616, H94608, and H00-39; HaV132, HaV133, and HaV168 are lytic to all six host strains, with different intensities; and HaV127, HaV140, and HaV180 are lytic to strains H00-46, H00-58, H93616, and H00-39 (and partially also to H94608). Tomaru et al. (50) divided HaV strains isolated from Hiroshima Bay into three groups based on their intraspecies host ranges. In the present study, the HaV01 strain, which originated from Unoshima Port, ~150 km from Hiroshima Bay, showed an intraspecies host range distinct from those of the HaV strains from Hiroshima Bay. Although the mechanism determining the intraspecies host ranges of HaV strains has not yet been clarified, the high degree of diversity among HaV strains in terms of intraspecies host specificity is assumed to reflect the results of coevolution of the host-virus systems. Nevertheless, the HaV strains maintained in our laboratory did not change their host specificities.
Pulsed-field gel electrophoresis showed that HaV01 has a genome size of ~294 kbp (Fig. (Fig.1).1). Among the dsDNA viruses infecting microalgae reported so far, this is considerably larger than Micromonas pusilla virus (200 kbp) (C. Suttle, personal communication) and smaller than the Chlorella viruses (330 to 380 kbp) (54), Heterocapsa circularisquama virus (356 kbp) (Fig. (Fig.1),1), Emiliania huxleyi virus (410 to 415 kbp) (8, 43), Chrysochromulina ericina virus (510 kbp) (42), and Pyramimonas orientalis virus (560 kbp) (42).
In the process of sequencing of the genome of HaV01, a putative DNA polymerase gene was identified by its sequence similarity to other DNA polymerase genes. We identified the gene initiator methionine codon among several possible ATG codons by the adenine (A) and thymidine (T) frequency in its 5′ region and the inclusion of an appropriately positioned TATA box. The identified translation start point has 84% and 86% A+T contents in the 50 and 100 bases preceding it, respectively, whereas the 50 bases preceding the other three downstream ATG codons contain <76% A+T content. In addition, this ATG codon was the only one to be preceded by a TATA box (Fig. (Fig.22).
A multiple sequence alignment of the HaV01 DNA polymerase amino acid sequence and those of four other viruses (PBCV-1, NY-2A, EsV-1, and FsV) showed that the HaV01 protein includes all of the conserved domains of B-family DNA polymerases (Fig. (Fig.3).3). These are the amino-terminal 3′-5′ exonuclease domain, which contains the exoI to exoIII motifs (5), and the carboxyl-terminal nucleotide polymerase domain, which contains conserved motifs I to V (56).
A phylogenetic analysis of highly conserved regions common to viral DNA polymerases (Fig. (Fig.4A)4A) showed that all analyzed algal viruses are more closely related to each other than to other groups of analyzed DNA viruses. Within this group, HaV01 significantly clusters with the Phaeocystis globosa and Chrysochromulina brevifilum viruses infecting microalgae (6, 45) (79% bootstrap value for 1,000 trials). The polymerases of these two microalgal viruses are more similar to each other than to that of HaV01, with a 100% bootstrap value. This microalgal virus cluster is distinct from viruses infecting macroalgae (EsV-1 and FsV) and from chlorella viruses (PBCV-1 and NY-2A) (Fig. (Fig.4A).4A). A similar analysis with major capsid proteins (49) also found HaV to significantly cluster with phycodnaviruses (http://bioinformatics.weizmann.ac.il/~pietro/HAV_inteins/). A sequence analysis of DNA polymerase gene core fragments from 10 H. akashiwo strains with different intraspecies host ranges found that they are almost identical to that of HaV01 at the nucleotide sequence level (data not shown). Integrating these results with previous observations on the basic characteristics of this virus (28), we concluded that HaV most likely belongs to the family Phycodnaviridae. Following the nomenclature system used for other algal viruses (52), we propose here a new genus, “Raphidovirus,” in the family Phycodnaviridae, in which HaV, which is infectious to the raphidophyte H. akashiwo, should be the first member.
Unexpectedly, the highly conserved motif I of the polymerase domain, YGDTDS, which includes a catalytic-site residue, is interrupted at its center (YGD/TDS) by a 232-amino-acid insert sequence (Fig. (Fig.33 and and5).5). The intervening sequence includes all of the motifs of intein protein-splicing domains (Fig. (Fig.5;5; see below).
A nested PCR technique for amplifying a conserved region of the DNA polymerases of Phycodnaviridae designed by Chen and Suttle (9) was inapplicable to HaV (data not shown) as well as to several other algal viruses (7, 9, 42). In the case of HaV, this was presumably due to the intein insertion (this paper) and a mismatch of the degenerate primers, AVS1 and AVS2 (9; data not shown).
An insert coding for 232 amino acids was found next to the active-site residue in the highly conserved motif I of the HaV polymerase domain (YGD/TDS, with the slash indicating the insertion point) (62) (Fig. (Fig.33 and and5).5). This exact point, termed the Pol-c intein integration point, includes inteins in the type B polymerases of several archaeal DNA polymerase genes (YAD/TDG, with the slash indicating the insertion point in archaeal genes) (e.g., see references 30 and 58) and in the recently described type B polymerase of mimivirus, a huge dsDNA virus infecting amoebae (YGD/TDS) (41). Inteins integrated at the same point in homologous proteins are termed intein alleles and are often more similar to each other than to other inteins due to vertical and horizontal transfer (37, 40). A phylogenetic analysis of intein allele groups showed the HaV and mimivirus inteins clustered together, with a bootstrap value of 97%, and with other Pol-c alleles, with a bootstrap value of 72% (Fig. (Fig.6).6). The HaV Pol intein is more similar in its sequence to Pol-c alleles than to other inteins, in particular the mimivirus intein (Fig. (Fig.77).
The codon usage, base and dinucleotide compositions, and GC content of the HaV Pol intein are not significantly different from those of its polymerase host gene and of other available HaV protein coding regions (data not shown). This suggests that the intein and the DNA polymerase have been joined for a long time, and the intein seems well adapted to its host.
The HaV intein includes all of the motifs characterizing intein protein-splicing domains, except for motif N4, which is missing from or very divergent in Pol-c alleles and a few other inteins (Fig. (Fig.5)5) (39). All of the identified protein-splicing motifs are at typical positions within the intein and include active-site residues present in other inteins. The central HaV intein region is only about 90 amino acids long (Fig. (Fig.5).5). A comparison to the mimivirus intein revealed a deletion of 131 amino acids in this central region. It is thus most likely a remnant of a typical intein LAGLI-DADG homing EN domain (10) that underwent a deletion of its central part (Fig. (Fig.7).7). About one-fifth of all inteins do not have an EN domain, and several allele groups include some members with and some without EN domains. This includes the Pol-c alleles, to which the HaV intein belongs (30).
To test the transcription of the HaV DNA polymerase gene, we performed RT-PCR amplification with primers specific for this gene. The gene began to be transcribed at 1 h postinfection in HaV-infected cells and was constitutively transcribed throughout the virus replication process. As expected, no sign of RNA splicing of the intein or any other DNA polymerase gene region was detected (Fig. (Fig.88).
Integrating these observations, we concluded that the intervening sequence is an intein that can most likely splice proteins. The latter conclusion is indirect and is based on (i) the presence of the intein in a gene that is transcribed and thus probably crucial for the virus (Fig. (Fig.8),8), (ii) the position of the intein in a highly conserved active-site motif (Fig. (Fig.33 and and5),5), and (iii) the presence and conservation of intein motifs (Fig. (Fig.5,5, N1, N2, N3, C1, and C2) known to be required for protein splicing (39).
Inteins are thought to be dispensable to their hosts and are gained and lost by horizontal transfer and genomic deletions (16, 40). We thus tested for the presence of an intein at the Pol-c integration point in all 10 HaV strains examined for this study (Table (Table1).1). PCRs using the specific primers IT-F and IT-R amplified intein products from all 10 HaV strains. All amplified inteins and their flanks had identical nucleotide sequences, except for one single cytosine-to-thymidine nucleotide polymorphism in two strains (HaV133 and HaV180) that changed an intein Met34 into a Thr (Fig. (Fig.5).5). The almost absolute identity of the HaV strains' DNA polymerase inteins (Fig. (Fig.5)5) and their flanks (data not shown) is surprising given the different host specificities of the viruses (Table (Table1).1). Furthermore, the absence of silent mutations indicates that the strains, or at least their DNA polymerase loci, are very closely related to each other. Since the 10 tested HaV strains have different host specificities (Table (Table1),1), the genetic determinants of this trait might be significantly more diverse than the DNA polymerase locus. It is also possible that horizontal transfer events between different viral strains unlinked the determinants from the DNA polymerase loci. Even in the case of the intensively studied Chlorella viruses, the molecular mechanism of algal virus host specificity has not been entirely elucidated (53). Although the recent discovery of a chloroviral surface protein that interacts with the host Chlorella cell wall is a promising clue on which to focus (34), a full understanding of the mechanism that determines the intraspecies host specificity awaits further characterization.
The first intein of viruses infecting eukaryotes was identified in the ribonucleotide reductase gene of Chilo iridescent virus (CIV), an icosahedral cytoplasmic insect iridovirus with a large DNA genome (38). The protein-splicing pathway of the CIV intein has recently been studied by the use of specific amino acid substitutions (2). Other related iridoviruses were found to include related intein alleles (Amitai et al., unpublished data). Very recently, a DNA polymerase intein was identified in the giant amoeba-infecting mimivirus (41). Several bacteriophages and bacterial prophages were also found to contain inteins. These intein host proteins include the ribonucleotide reductase catalytic subunit gene of Bacillus subtilis prophages (13, 23, 24), DNA polymerase type I of the APSE1 phage infecting the secondary bacterial endosymbiont of Acyrthosiphon pisum (51), the large terminase subunits of mycobacteriophages (36), and a cofactor in the head assembly of a Mu-like Methylococcus prophage (57). In the present study, we have identified the first intein of viruses infecting eukaryotic algae, which is the second type of intein described for viruses infecting eukaryotes.
Several types of introns occur in the large dsDNA Chlorella algal viruses (53). These include spliceosomally processed (17, 44, 61), tRNA (31), and self-splicing group I (25, 32, 60) introns. This last type of introns are related to inteins: both remove themselves from the precursor product (group I introns do so at the RNA level and inteins do so at the protein level), both are inserted in conserved motifs of essential genes, both frequently include an EN domain that mediates their homing, and both are considered mobile selfish genetic elements (11). Chlorella virus group I introns were suggested to be horizontally transferred from other species (32, 59).
The HaV intein insertion point (Pol-c) includes archaeal inteins from two Euryarchaeota classes, i.e., Halobacteria and Thermococci, and from Nanoarchaeum equitans, which belongs to the Nanoarchaeota phylum. Mimivirus polymerase also includes a Pol-c intein allele that is particularly closely related to the HaV intein, and both are significantly clustered with the Thermococci Pol-c inteins. HaVs and mimiviruses are from distinct virus families (41; the present work), and their polymerase genes are not specifically related beyond their intein insertions (Fig. (Fig.4B).4B). This suggests the occurrence of at least two horizontal transfer events—one between archaea and viruses and one between different viruses. The timing of these events is unknown, but the HaV intein currently has the same codon usage and nucleotide composition as its host gene. Besides this parsimonious scenario, other transfers might have occurred through unknown intermediates. In this context, we note that mimivirus can infect amoebae and also survive in mammalian cells (D. Raoult, personal communication) and that amoebae can ingest and transport various microbes (18) and are thus potential vectors for horizontal transfer events. This is supported by our observation of amoeba-like cells ingesting and digesting H. akashiwo phytoplankton in natural coastal waters (http://bioinformatics.weizmann.ac.il/~pietro/HAV_inteins/).
The presence of inteins in viral DNA polymerase and ribonucleotide reductase genes (38, 41; Amitai et al., unpublished data; the present work) indicates that viral inteins might not be very rare. Sequence data for more diverse genes and virus families might show viruses to be reservoirs of intein elements and intermediates in their transmission between distant species.
Inteins and self-splicing introns apparently undergo cycles of gaining and losing EN activity (14, 15, 33). Both elements can survive without EN activity, but having it is a major advantage since it mediates the horizontal transfer of the elements to unoccupied insertion points (homing sites) (4). However, once all available insertion points in the population are occupied, there is no use for EN activity and it probably even harms the cell by nonspecific cleavage of its genome. Thus, the EN domain will quickly inactivate. Eventually some individuals in the population will lose the inserted elements, reestablishing the advantage of EN activity for the elements.
The central region of the HaV intein is a remnant of a typical EN homing domain (Fig. (Fig.7).7). Thus, the HaV intein is currently unable to promote its own homing. The presence of almost identical copies of the intein and its flanks in 10 different HaV strains (Fig. (Fig.55 and data not shown) suggests that these regions only recently diverged and did not even accumulate any silent mutations. This raises the question of how these HaV strains that are so similar across the Pol gene region have different host specificities. The genetic determinants of host specificity might be in variable regions of the HaV genome, while other genes are very conserved (27). The complete genome sequence of HaV and a comparative analysis between different strains of the host-range-determining regions will help to clarify our present findings.
We are grateful to T. Okuno, K. Mise, Y. Sako, and N. Nomura (Kyoto University, Japan) for their helpful advice about the present research. Thanks are also due to T. Shiba, T. Maeda, and M. Furushita (National Fisheries University, Japan) and to K. Doga (Mitsui Knowledge Industry Co., Ltd.) for their technical advice on shotgun sequencing and assembly and annotation procedures. J. M. Claverie and H. Ogata are thanked for providing the sequence of the mimivirus DNA polymerase gene.
This work was partially supported by funding from The Industrial Technology Research Grant Program in 2004 from the New Energy and Industrial Technology Development Organization (NEDO) of Japan and from The Israel Science Foundation, founded by the Israel Academy of Sciences and Humanities. S. Pietrokovski holds the Ronson and Harris Career Development Chair.