|Home | About | Journals | Submit | Contact Us | Français|
Viruses with genomes greater than 300 kb and up to 1200 kb are being discovered with increasing frequency. These large viruses (often called giruses) can encode up to 900 proteins and also many tRNAs. Consequently, these viruses have more protein-encoding genes than many bacteria, and the concept of small particle/small genome that once defined viruses is no longer valid. Giruses infect bacteria and animals although most of the recently discovered ones infect protists. Thus, genome gigantism is not restricted to a specific host or phylogenetic clade. To date, most of the giruses are associated with aqueous environments. Many of these large viruses (phycodnaviruses and Mimiviruses) probably have a common evolutionary ancestor with the poxviruses, iridoviruses, asfarviruses, ascoviruses, and a recently discovered Marseillevirus. One issue that is perhaps not appreciated by the microbiology community is that large viruses, even ones classified in the same family, can differ significantly in morphology, lifestyle, and genome structure. This review focuses on some of these differences rather than provides extensive details about individual viruses.
Typically, one views viruses as small particles that readily pass through 0.2-µm filters and contain small genomes with a few protein-encoding genes. However, huge viruses with large dsDNA genomes that encode hundreds of proteins, often called giruses, are now being discovered with increasing frequency. This review concentrates on viruses with genomes in excess of 300 kb and focuses on partially characterized viruses with annotated genomes (Table 1; annotated genomes in the public domain have an accession number). Most of these viruses inhabit aquatic environments and infect protists. Examples include Mimivirus, which infects amoebae and has the largest genome (~1.2 Mb); viruses that infect algae (phycodnaviruses) and have genomes up to ~560 kb; viruses that infect bacteria and have genomes up to ~670 kb; and White spot shrimp virus (WSSV), which has a genome of ~305 kb. At least one member of the poxvirus family has a genome larger than 300 kb (canarypox virus – 360 kb); however, most poxviruses have genomes ranging from 180 to 290 kb and therefore we have not discussed canarypox virus in this review. The polydnaviruses are enigmatic with respect to genes and particle structure, means of replication, and transmission (see sidebar, Polydnaviruses); these are not considered further. Other large, dsDNA-containing viruses have genomes ranging from 100 to 280 kb, including herpesviruses, asfarviruses, baculoviruses, iridoviruses, and ascoviruses, and also are not discussed in this review.
To put the size of these large viral genomes into perspective, the smallest free-living bacterium, Mycoplasma genitalium, encodes 470 proteins (18). Although estimates of the minimum genome size required to support life are ~250 protein-encoding genes (45), a symbiotic bacterium (Candidatus Hodgkinia cicadicola) in the cicada Diciceroprocta semicincta has a 145-kb genome (37). Thus, giruses have more protein-encoding genes (CDSs) than some single-celled organisms.
Given the coding capacity of these large viruses, it is not surprising that they encode many proteins that are atypical of or novel for a virus. However, the majority of their CDSs do not match anything in the databases. Some of these viruses also encode introns and inteins, which are uncommon in viruses. A type IB intron exists in several phycodnaviruses (66). Mimivirus has six self-excising introns (9). Because introns are often detected when they interrupt coding sequences of known proteins, additional introns located within anonymous virus CDSs will probably be discovered. Inteins are protein-splicing domains encoded by mobile intervening sequences, and they catalyze their own excision from the host protein. Although found in all domains of life, their distribution is sporadic. Mimivirus and phycodnaviruses NY-2A and CeV01 are among the few intein-containing viruses.
The morphogenesis of large viruses is also interesting because presumably they are too large to self-assemble. Furthermore, structures of giruses vary significantly (Figure 1). Therefore, these viruses must encode proteins that measure size and other proteins that serve as assembly catalysts or scaffolds.
One issue that is perhaps not appreciated by the microbiology community is that large viruses, even ones classified in the same family, can differ significantly in morphology, lifestyle, and genome structure. This review focuses on some of these differences and only provides brief details about individual viruses. See noted comprehensive reviews for more information about specific viruses.
Most large viruses have been discovered and characterized in the last few years. Two exceptions are bacteriophage G, initially described about 40 years ago (13) and largely ignored until recently, and the chlorella virus Paramecium bursaria chlorella virus 1 (PBCV-1), which was first described in 1982 (62) and has since been studied continuously.
Large viruses went undetected for many years for several reasons. First, there were technical problems. For example, classical virus isolation procedures include filtration through 0.2-µm-pore filters to remove bacteria and protists. However, these filters also often exclude large viruses. Standard methods for plaquing bacteriophage missed large phages because the high soft agar concentrations prevented formation of visible plaques (54). In addition, large viruses may grow more slowly than smaller viruses and have lower burst sizes. Large viruses have larger surface areas and are thus more likely to aggregate and/or adsorb to extraneous material. None of these issues is a problem as long as one is aware of them. Second, the hosts for some of these large viruses were not examined for virus infections until recently. Finally, the discovery of some of these large viruses was serendipitous; e.g., Mimivirus was initially believed to be a parasitic bacterium (9).
It is now obvious that many large viruses await discovery. For example, Monier et al. (41) recently conducted a metagenomic study using samples collected on the Sorcerer II Global Ocean Sampling Expedition to determine the relative abundance of DNA polymerase fragments that could be assigned to virus groups. In 86% of sample sites, Mimivirus relatives were the most abundant, after bacteriophages. This high abundance suggests that Mimivirus-like viruses may infect other marine protists (11). In another metagenomics study using three other proteins as the queries, phycodnaviruses were commonly found (31).
Many of the viruses listed in Table 1 probably have a common evolutionary ancestor, perhaps arising before the divergence of the major eukaryotic kingdoms (26, 51, 64, 71). These viruses, which include the phycodnaviruses, poxviruses, asfarviruses, iridoviruses, ascoviruses, and the Mimiviruses, are referred to as nucleocytoplasmic large dsDNA viruses (NCLDVs) (25, 26). Recently, another large NCLDV, named Marseillevirus, that is distantly related to the iridoviruses and ascoviruses was isolated from an amoeba (6). NCLDVs contain 9 common genes, and 177 additional genes are present in at least two of the virus families (71).
Although the hypothesis of a common ancestor for the NCLDVs is generally accepted, there is disagreement on the size and morphology of its ancestral virus and how it diverged into the different virus families. A recent maximum-likelihood reconstruction of NCLDV evolution produced a set of 47 conserved genes, which are considered the minimum genome for the common ancestor; NCLDVs then evolved by losing some of these common genes and acquiring new genes from their hosts and bacterial endosymbionts as well as by gene duplications (71). Another scenario suggests the ancestral NCLDV was a huge virus or even a cellular organism that evolved primarily via genome contraction (51). Finally, Filee et al. (15) proposed that NCLDVs evolved from a small DNA virus by gene acquisition from cells.
The origin of NCLDVs is controversial. For example, some researchers have suggested that NCLDVs should be considered the fourth kingdom of life (12, 51), others have suggested that NCLDV genes arose from the original gene pool that led to prokaryotes and eukaryotes (26), and still others have suggested that horizontal gene transfer has driven the evolution of their genomes (39). These suggestions have stimulated controversy over whether the tree of life should include these viruses.1 Another interesting hypothesis is that primitive NCLDVs gave rise to the eukaryotic nucleus or vice versa (5). Regardless, some viruses, including the NCLDVs, have a long evolutionary history, and viruses probably contributed to the emergence and subsequent structure of modern cellular life forms (29).
It is becoming difficult to classify some of these large viruses into distinct families. Recent phylogenetic analysis of the DNA polymerase protein from four putative phycodnaviruses illustrates this problem. (a) The DNA polymerase from three phycodnaviruses, CeV01, PpV01, and PoV01 (Table 1), is more similar to Mimivirus than to the other phycodnaviruses (41). (b) The HcDNAV polymerase indicates its closest relative is African swine fever virus (50). Therefore, it is clear that giruses, like the DNA phages (23), have exchanged genes for eons.
Contributing to the uncertainty about NCLDV evolution is the discovery that the structure of the PBCV-1 major capsid protein (MCP) resembles MCPs from other smaller dsDNA viruses with hosts in all three domains of life, including human adenoviruses, bacteriophage PRD1, and a virus infecting an archaeon, Sulfolobus solfataricus. This similarity suggests that these three viruses might have a common evolutionary ancestor with the NCLDVs, despite the lack of amino acid sequence similarity among their MCPs (32).
All NCLDVs are assembled in virus factories located in the cytoplasm. The role of the nucleus in the replication of NCLDVs varies. For example, poxviruses (43) and Mimivirus (46) carry out their entire life cycle in the cytoplasm. In contrast, the nucleus is probably essential for replication of the phycodnaviruses and other NCLDVs. However, the nuclear role in virus replication probably differs, even among the phycodnaviruses.
The four viruses in Table 1 that are not NCLDVs are a polydnavirus, WSSV, and two bacteriophages, PhageG and 2012-1. WSSV, which causes huge economic losses to the shrimp industry, is an enigma because it is not obviously related to known viruses. Large bacteriophages, referred to as jumbo phages, resemble smaller phages that may have acquired increased genome functions over evolutionary time.
Mimivirus, Mamavirus, and Marseillevirus all infect amoebae; the first two are the largest viruses ever reported (9). Mamavirus has an 18.3-kb DNA satellite virus (called Sputnik) that can only replicate in the presence of Mamavirus (34).
Mimivirus virions have an icosahedral core capsid with a diameter of ~500 nm. The capsid is uniformly covered with a 140-nm-thick layer of closely packed fibers, forming an ~750-nm spherical object (Figure 1a) (67). This peripheral fiber layer, which is absent in other NCLDVs, might be linked to the heterotrophic nature of the host amoeba; i.e., Mimivirus particles might mimic the bacteria on which amoebae prey. For amoebae to initiate phagocytosis, individual particles have to be larger than 600 nm in diameter or aggregate before being engulfed (30). The external fiber layer might also serve another role. After engulfing bacteria, bacterial surface lipopolysaccharides (LPSs) stimulate endocytic vesicle formation in amoebae. Therefore, the outer layer of Mimivirus particles, which stains gram positive (51), might resemble the gram-positive bacterial surface LPSs. Consistent with this hypothesis, Mimivirus encodes several sugar-manipulating enzymes and some of them are directly related to surface LPS-specific sugars, such as perosamine (9).
Electron tomography and volume-reconstruction analyses of viral particles inside infected amoeba cells at final infection stages establish that the Mimivirus capsid is composed of two superimposed shells with different densities (72). The inside of the virus particle has a membranous sac enveloping the viral genome. In addition to the two shells, a prominent fivefold star-shaped structure is located at one icosahedral vertex and extends along the entire length of the five icosahedral edges that center around this unique vertex (Figure 1b) (46, 56, 67, 72). Mimivirus initiates infection by phagocytosis followed by lysosome fusion with the phagosomes. This lysosomal activity is predicted to open the viral capsid at the stargate portal. The fusion of the particle’s internal membrane with the endocytic vacuole membrane forms a large membrane conduit through which the genome-containing Mimivirus core enters the cytoplasm. Although initial studies suggested the Mimivirus genome moved into the nucleus and shuttled back to the cytoplasm following a few rounds of replication (56), a more recent study indicates its genome remains in the cytoplasm and that, like the poxviruses, the entire replication cycle takes place in the cytoplasm (9, 46). Like poxviruses, Mimivirus possesses its own transcription machinery and it packages 12 transcription proteins in Mimivirus particles. Transcription of early Mimivirus genes, in conjunction with a conserved promoter element AAAATTGA, is believed to occur in the core particles. The cores release virus DNA, forming cytoplasmic replication factories where virus DNA replication begins, followed by transcription of late genes. The replication factories form around the viral cores and expand until they occupy a large fraction of the amoeba cell volume at 6 h post infection (p.i.).
Later stages of the Mimivirus replication cycle occur from 6 to 9 h p.i., when empty fiberless procapsids, which are only partially assembled, as well as icosahedral procapsids undergoing DNA packaging, appear at the periphery of the large replication factories. A statistical survey of particles undergoing DNA packaging indicated that 60% of the particles package DNA through a face-centered rather than a vertex-centered aperture. Thus, Mimivirus DNA exit and packaging proceeds through different portals, which is a unique feature among viruses (72).
The Mimivirus linear ~1.2-Mb genome encodes ~910 predicted CDSs and six tRNA genes. Despite its much larger genome, Mimivirus has a high coding density (90.5%) similar to that of other NCLDVs. Adjacent open reading frames (ORFs) are separated by an average of 157 bp. In contrast to some other NCLDV members, Mimivirus genomic termini lack large inverted repeats. Instead the Mimivirus genome has a 617-bp inverted repeat beginning at nucleotide position 22,515; its unique complementary counterpart begins at nucleotide position 1,180,529 (9). The extreme conservation of these intergenic regions suggests they serve an important role in Mimivirus replication. Pairing these regions produces a putative Q-like form in the genome, with a long (22,514 bp) and a short (259 bp) tail. The short tail region has no CDSs. The long tail region has a lower coding density than the rest of the genome (75% versus 90.5%) and larger intergenic spacers (435 bp versus 157 bp on average). This region encodes 12 proteins, 7 of which are involved in DNA replication.
Thirty-three percent of the Mimivirus genes are related to at least one other Mimivirus gene because of gene duplications (55). Thirty-six of the Mimivirus 910 CDSs are associated with functions not previously found in a virus (9). For instance, Mimivirus possesses a complete set of DNA repair enzymes capable of correcting nucleotide mismatches as well as errors induced by oxidation, UV irradiation, and alkylating agents. Mimivirus is also the only virus to encode three topoisomerases. In addition, Mimivirus encodes a variety of polysaccharide-, amino acid–, and lipid-manipulating enzymes. Such metabolic capabilities, although covering a broader biochemical spectrum in Mimivirus, also exist in other NCLDVs, especially the phycodnaviruses (66), where they often vary among isolates. Probably the most unexpected discovery in the Mimivirus genome was finding homologs to 10 translation-related proteins (9). Finally, Mimivirus, like the chlorella viruses, encodes several putative glycosyltransferases that might help glycosylate its MCP. Proteomic analysis of the Mimivirus virion identified 114 virus-encoded proteins, including the transcription proteins mentioned above.
Three viruses, PBCV-1, EhV, and EsV, are selected to represent the Phycodnaviridae family. These three viruses group into a single family (14, 66) and at first glance appear to be more similar than they actually are. As noted below, the apparent long evolutionary history of these three viruses has led to major differences in propagation strategies (lytic versus lysogenic), virus release (lytic versus budding), and virus structure (unique vertex with a spike versus probably no spike). The fact that these three viruses only have 14 common genes provides additional evidence of their long evolutionary history. Thus, over 1000 different genes exist just among these three phycodnaviruses!
The chlorella viruses (genus Chlorovirus) infect symbiotic chlorella, often called zoochlorellae, which are associated with the protozoan Paramecium bursaria, the coelenterate Hydra viridis, and the heliozoon Acanthocystis turfacea (58, 68). Paramecium bursaria chlorella virus (PBCV-1) is the type member of the genus (61). The zoochlorellae are resistant to virus infection in the symbiotic state. Fortunately, some zoochlorellae can be grown independently of their hosts, permitting plaque assay of the viruses and synchronous infection of their hosts. Therefore, one can study the virus replication cycle in detail. The 46.2-Mb genome of the PBCV-1 host Chlorella NC64A was sequenced recently by the Department of Energy Joint Genome Institute and its genome annotation is publicly available. Availability of both host and virus sequences makes chlorella viruses a favorable model system.
Freshwater throughout the world contains chlorella viruses with titers as high as 100,000 plaque-forming units (PFUs) per milliliter of native water, although typically, virus titers are 1–100 PFU ml−1. The titers fluctuate during the year, with the highest titers occurring in the spring. Although chlorella viruses are ubiquitous in freshwater, little is known about their natural history. For example, do they have another host?
Cryo-electron microscopy and 3D image reconstruction of PBCV-1 indicate the outer capsid is icosahedral and covers a single lipid bilayered membrane, which is required for infection. The capsid shell consists of 1680 donut-shaped trimeric capsomers plus 12 pentameric capsomers, one at each icosahedral vertex. The trimeric capsomers are arranged into 20 triangular facets (trisymmetrons, each containing 66 trimers) and 12 pentagonal facets (pentasymmetrons, each containing 30 trimers and one pentamer at the icosahedral vertices) (Figure 1c). PBCV-1 has a triangulation number of 169d quasi-equivalent lattice (66).
Recent fivefold symmetry averaging 3D reconstruction experiments revealed that one of the PBCV-1 vertices has a cylindrical spike, 250 °A long and 50 °A wide (Figure 1c) (7). A pocket exists between the inside of the unique vertex and the enveloped nucleocapsid; i.e., the internal virus membrane departs from icosahedral symmetry adjacent to the unique vertex (Figure 1d). Consequently, the virus DNA located inside the envelope is packaged nonuniformly in the particle. The PBCV-1 MCP is a glycoprotein and comprises ~40% of the total virus protein. The MCP consists of two eight-stranded, antiparallel β-barrel jelly-roll domains related by a pseudo-sixfold rotation (48).
External fibers extend from some of the trisymmetron capsomers (probably one per trisymmetron) and may facilitate attachment to the host (Figure 1c, e). The spike at the unique vertex is too thin to deliver DNA and so it probably aids in penetration of the wall. PBCV-1 initiates infection by attaching rapidly and specifically to the Chlorella NC64A cell wall (58), probably by the fibers mentioned above (7). Following host cell wall degradation by virus-packaged enzyme(s), the viral internal membrane presumably fuses with the host membrane, facilitating entry of the viral DNA and virion-associated proteins into the cell, leaving an empty capsid attached to the surface (57). This fusion process initiates rapid depolarization of the host membrane and the rapid release of K+ from the cell. The rapid loss of K+ and associated water fluxes from the host reduce its turgor pressure, which may aid ejection of viral DNA and virion-associated proteins into the host. Depolarization may also prevent infection by a second virus (22).
PBCV-1 lacks a recognizable RNA polymerase gene, and so circumstantial evidence suggests PBCV-1 DNA and DNA-associated proteins quickly move to the nucleus, where early transcription begins 5 to 10 min p.i. (66). In this immediate-early phase of infection (5– 10 min p.i.), host transcription machinery is reprogrammed to transcribe viral DNA. Details of reprogramming are unknown, but host chromatin remodeling is probably involved. PBCV- 1 encodes a SET domain–containing protein (referred to as vSET) that methylates Lys-27 in histone 3. vSET is packaged in the PBCV- 1 virion, and circumstantial evidence indicates vSET helps to repress host transcription following PBCV-1 infection (44). In addition, host chromosomal DNA degradation begins within minutes after infection, presumably by PBCV- 1-encoded and packaged DNA restriction endonuclease(s) (1). This degradation also inhibits host transcription and facilitates recycling of nucleotides for viral DNA replication.
Viral DNA replication begins 60 to 90 min p.i. and is followed by transcription of late genes (58). Approximately 2 to 3 h p.i., assembly of virus capsids begins in localized regions in the cytoplasm, which become prominent 3 to 4 h p.i. Five to 6 h p.i. the cytoplasm fills with infectious progeny virus particles, and localized lysis of the host cell releases progeny at 6–8 h p.i. Each cell releases ~1000 particles, of which ~30% are infectious.
Global transcription of PBCV-1 genes during virus replication (70) indicate that (a) 98% of the 365 PBCV-1 protein-encoding genes are expressed in laboratory conditions, (b) 63% of the genes are expressed before 60 min p.i. (classified as early genes), (c)37% of the genes are expressed after 60 min p.i. (classified as late genes), and (d) 43% of the early gene transcripts are also detected at late times following infection (classified as early/late genes).
The PBCV-1 genome is a linear, ~334-kb, nonpermuted dsDNA molecule with covalently closed hairpin termini. Identical ~2.2-kb inverted repeats flank each hairpin end. The remainder of the PBCV-1 genome contains primarily single-copy DNA. Of the 365 predicted PBCV-1 CDSs, ~35% resemble proteins of known function, including many that are novel for a virus (e.g., hyaluronan synthase, K+ channel protein, and four polyamine biosynthetic enzymes). PBCV-1 CDSs are evenly distributed on both DNA strands with minimal intergenic spaces. Exceptions to this rule include a 1788-nucleotide sequence in the middle of the PBCV-1 genome that encodes 11 tRNAs [cotranscribed as a large precursor and then processed to mature tRNAs (68)].
Most chlorella virus genomes contain methylated bases. For example, genomes from 37 sampled chlorella viruses have 5- methylcytosine (5 mC) in amounts ranging from 0.12 to 47.5% of the total cytosine. In addition, 24 of the 37 viral DNAs contain N6- methyladenine (6 mA) in amounts ranging from 1.5 to 37% of the total adenine (61). The methylated bases occur in specific DNA sequences, which led to the discovery that the chlorella viruses encode multiple 5 mC and 6 mA DNA methyltransferases. About 25% of the virus-encoded DNA methyltransferases have companion DNA site-specific (restriction) endonucleases, including some with unique cleavage specificities (61).2
Five additional chlorella viruses have been sequenced, including two more viruses (NY-2A and AR158) that infect the same host as PBCV- 1, Chlorella NC64A; two (MT325 and FR483) that infect Chlorella Pbi; and one (ATCV-1) that infects Chlorella SAG 3.83 (66). Approximately 80% of the genes are common to all six sequenced chlorella viruses, suggesting they are essential for virus replication. However, the number of chlorella-virus-encoded genes is much larger than those present in any one virus. Not surprisingly, orthologs from viruses infecting the same host are the most similar; the average amino acid identity between orthologs from PBCV-1 and NY-2A or AR158 is ~73%. PBCV-1 and MT325 or FR483 orthologs have ~50% amino acid identity, and PBCV-1 and ATCV-1 orthologs have ~49% amino acid identity. Using PBCV-1 as a model, there is high synteny between the three viruses that infect Chlorella NC64A. In contrast, PBCV-1 has only slight synteny with the two Pbi viruses and the SAG virus (16).
Many PBCV-1-encoded enzymes are either the smallest or among the smallest proteins in their family. Phylogenetic analyses indicate some of these minimalist proteins are potential evolutionary precursors of more complex cellular proteins. Despite their small size, the virus enzymes typically have all the catalytic properties of larger enzymes. Their small size and the fact that they are often laboratory friendly have made them excellent models for mechanistic and structural studies (66).
The chloroviruses are also unusual because they encode enzymes involved in sugar metabolism. For example, two PBCV-1- encoded enzymes synthesize GDP-l-fucose from GDP-d-mannose (19), and three enzymes contribute to the synthesis of hyaluronan, a linear polysaccharide typically found in vertebrates (68). All three genes are transcribed early during PBCV-1 infection and hyaluronan accumulates on the external surface of the infected chlorella cells. In addition, PBCV-1 encodes at least five putative glycosyltransferases that likely participate in glycosylating the virus MCP (60).3
The coccolithophore Emiliania huxleyi is a globally important unicellular marine phytoplankton. The alga forms huge blooms that extend over 100,000 km2 and it is important in ocean carbon and sulfur cycles, as well as influencing the climate (20). It is now generally accepted that the Emiliania huxleyi virus (EhV) contributes to the collapse of these blooms (66).
E. huxleyi has two phenotypes in its haplodiploid lifestyle. The diploid calcified phase forms algal blooms; this form is infected by EhV (genus Coccolithovirus). In contrast, the ecological status of the noncalcified haploid phase is largely unknown. However, haploid cells are resistant to EhV (17).
Currently, no detailed structural studies exist for the icosahedral EhV virion (Figure 1f, g). However, the initial assumption that it is structurally similar to PBCV-1 is probably incorrect because the EhV capsid is surrounded by an external lipid membrane and it infects its host by fusion with the host plasma membrane and enters by endocytosis (36). In contrast, PBCV-1 uncoats at the surface of the cell wall.
EhV has a different propagation strategy than either the lytic chlorella viruses or the latent EsV-1 viruses. The host alga E. huxleyi is covered with a calcium carbonate shell that would appear to create a physical barrier to virus adsorption. However, despite this barrier, virus adsorption to the host membrane is rapid and intrinsically linked to the host cell cycle (36). Real-time fluorescence microscopy revealed that EhV-86 rapidly enters its host intact via either an endocytotic or an envelope fusion mechanism where it rapidly disassembles.
Whereas both the chlorella viruses and EsV- 1 depend on host transcription machinery, EhV is unique among the phycodnaviruses because it has six RNA polymerase-encoding genes (65). These genes suggest some virus independence from the host nucleus. Viral transcription begins immediately after infection, but it is limited to a specific 100-kb region, containing ~150 CDSs, of the virus genome. This 100- kb region contains a unique promoter element, and only the genes transcribed during the first hour p.i. contain this element. Thus, these CDSs undoubtedly play a crucial and integral role early during virus infection. However, none of these CDSs matches anything in the databases.
Intriguingly, proteomic analysis did not detect any transcriptional proteins in mature EhV-86 virions; therefore, host nuclear RNA polymerase(s) is presumably responsible for early transcription (66). Between 1 and 2 h p.i., a second transcription phase begins with gene expression occurring from the remainder of the genome. Because viral RNA polymerase components are expressed in this second phase, viral replication may no longer be nuclear dependent and transcription may occur in the cytoplasm. At ~4.5 h p.i., virus progeny begin to be released via budding, during which EhV-86 virions become enveloped with host plasma membrane. Therefore, unlike chlorella viruses, for which nascent infectious virions accumulate in the cytoplasm prior to release by cell lysis, EhV virions are released gradually (36).
The EhV-86 407-kb genome, encoding 472 CDSs, was originally thought to be linear. PCR amplification over the termini revealed a random A/T single nucleotide overhang (50% A, 50% T), suggesting the virus genome has both linear and circular phases. EhV-86 has three repeat families (none of which is located at the ends of the genome); one repeat family is postulated to act as a replication origin (suggesting a circular form of DNA replication), another family is postulated to contain immediate-early promoter elements, and the last family has a large repetitive proline-rich domain that may bind calcium (66).
EhV-86 also has some unusual CDSs, including an entire metabolic pathway of seven genes encoding sphingolipid metabolic enzymes (65). The host also contains genes encoding this entire pathway and it is clear that horizontal gene transfer occurred between EhV and E. huxleyi (42). However, the direction of the transfer is not obvious. This biosynthetic pathway appears to function during lytic infection and the glycosphingolipids (GSLs) produced induce programmed cell death (PCD) with corresponding activation of an algal metacaspase, an essential activity for EhV-86 replication. Susceptible hosts accumulate both algal and viral derived GSLs that may coordinate virus maturation, whereas resistant cells accumulate only algal derived GSLs. The viral GSLs accumulate in the viral envelope, and it is hypothesized that this is a mechanism to activate virus release and subsequently induce PCD in surrounding algal cells during natural blooms as a type of quorum-sensing device, terminating the bloom (63). This example of cell signaling by the E. huxleyi/EhV interaction suggests that aquatic viruses are very much in control of their environment in ways virologists and ecologists are only just beginning to fathom.
Ectocarpus siliculosus virus 1 (EsV-1) is the type species for the genus Phaeovirus and its infection strategy is regarded as typical for the genus (59, 66). Collectively, Phaeovirus members infect freeswimming, wall-less gametes or spores of filamentous marine brown macroalgae (order Ectocapales, class Phaeophyceae) by fusing with the host plasma membrane. Their hosts are members of benthic communities in near-shore coastal environments in all the world’s oceans. Phaeovirus DNAs are integrated into the host genome and are passed to daughter cells during cell division. The EsV-1 genome persists as a latent infection in vegetative cells, and infected algae show no obvious growth or developmental defects, except for partial or total inhibition of their reproductive organs. The viral genome is only expressed in sporangia and gametangia cells, where the cellular organelles disintegrate and are replaced with densely packed viral particles. Environmental stimuli, such as temperature and light, cause lysis of reproductive organs, synchronously releasing spores or gametes as well as viruses. Phaeoviruses are the only known phycodnaviruses to infect members of more than one algal family.
EsV-1 has a linear dsDNA genome with almost perfect inverted repeats at each end that allows circularization. Indeed, before sequencing, EsV-1 was thought to have a circular genome. The inverted repeats are proposed to anneal with each other to form a cruciform structure that effectively circularizes the genome. In addition to the terminal repeats, tandem repeats are located throughout the EsV-1 genome and comprise ~12% of the total genome size. The genome also contains several single-stranded regions randomly distributed over its length whose functions are unknown. Another characteristic of the EsV-1 genome is its low gene density, compared with the other phycodnaviruses. The 231 CDSs only occupy 70% of the EsV-1 genome; they are located in islands of densely packed genes that are separated by large regions of DNA repeats and noncoding sequences.
EsV-1 also encodes some unusual CDSs, including six putative hybrid histidine kinases (two-component systems that form part of a stimulus-responsive transduction pathway) that are widespread in archaea and bacteria. The relevance of these genes to EsV-1 infection is unknown.
The first reported appearance of WSSV occurred in 1992–1993 in shrimp farms in southern provinces of mainland China and also in northern Taiwan. The virus quickly spread to shrimp farming regions all over the world, including North and South America, Europe, and the Middle East. WSSV is lethal to most commercially cultivated penaeid shrimp species, causing serious economic damage. For example, an acute outbreak of white spot disease in cultured shrimp can result in 100% fatality in 3 to 10 days (35).
Unlike many viruses, WSSV infects a wide range of marine, brackish water, and freshwater crustaceans in addition to penaeid shrimp, including crayfishes, crabs, spiny lobsters, and hermit crabs. However, WSSV infection is usually not lethal to these other crustaceans; consequently, these other crustaceans may serve as virus reservoirs.
Structurally WSSV virions resemble baculoviruses and WSSV was initially classified as a baculovirus. However, WSSV has now been assigned to its own family called Nimaviridae (genus Whispovirus). WSSV virions are enveloped, cylindrical to elliptical in shape (Figure 1h). They measure 80–120 nm wide and 250–380 nm long. Some purified virions contain a 279- to 310-nm filamentous tail-like appendage at one end. The nucleocapsid has a segmented appearance, with ring-like segments running perpendicular to the longitudinal axis of the nucleocapsid (Figure 1). Each segment (or ring) is composed of two parallel rows of 12–14 globular subunits, each of which is approximately 8–10 nm in diameter. At least 40 structural proteins have been identified in the virus particles (35).
WSSV infection studies have been hampered by the lack of a cell culture system, although this may be changing (27). Currently, researchers agree that WSSV replicates and assembles in the nucleus, and in an acute infection, its life cycle is completed within 24 h. However, there are conflicting reports on the events associated with morphogenesis. One report suggests that nuclear protein is packaged into a partially enveloped empty capsid, whereas another report suggests that the electron-dense nucleocapsid is assembled first and then enveloped by a viral membrane (35).
The ~305-kb WSSV genome is circular. Most of the WSSV genome sequences are unique and only 3% of the genome consists of repetitive sequences. These repetitive sequences are organized into nine homologous regions containing 47 repeated mini-segments, which are distributed throughout the genome, mainly in intergenic regions.
Annotation of the WSSV genome identified 531 ORFs that consist of at least 60 codons (35). One hundred and eighty-one ORFs are non-overlapping and are classified as CDSs. About 80% of these CDSs have a potential 3′- polyadenylation site (AATAAA). The sizes of the proteins encoded by these CDSs range from 60 to 6077 amino acids. The 6077-amino-acid CDS encodes the extraordinarily large MCP. Only 45 of the CDSs resemble known proteins (>20% amino acid identity) or contain recognizable motifs. Twenty-seven CDSs are classified into 10 WSSV gene families; these families probably arose from gene duplications.
The few WSSV identifiable CDSs primarily encode gene products involved in nucleotide metabolism (35). Surprisingly, only one WSSV CDS, a DNA polymerase, is related to DNA replication. WSSV also encodes a collagen-like protein, which is the first collagen gene to be identified in a virus genome.
Large dsDNA bacteriophages are being discovered with increasing frequency (33). However, when this review was written only two phages, 670-kb Phage G and 317-kb 2012- 1, had genomes larger than 300 kb (Table 1). Both viruses are members of the tailed family Myoviridae. PhageG infects Bacillus megaterium and phage 2012-1 infects Pseudomonas chlororaphis. Although both genomes have been sequenced, annotation of only 2012-1 is in the public domain.
The majority of the proteins predicted from the genome sequences of these phages have no database matches, and the genomes themselves are diverse enough to preclude the detailed comparative analysis that has occurred with smaller phages, for which hundreds of genome sequences are available. However, one can extrapolate the better-known genome organizations and mechanisms of evolution seen in the smaller phages to the jumbo phages. Typically, larger phages contain the same core genes (structural and DNA replication genes), plus many additional, generally smaller genes that do not match anything in the databases and can usually be deleted without affecting replication. It is possible that the jumbo phages evolved from smaller-tailed phages, possibly in a process mediated by constraints on genome size by capsid size (24).4
The phage G genome sequence is 498 kb, but the chromosome is ~670 kb. This means terminal redundancy is about 35%. Phage G is predicted to have 682 CDSs and about 10% of these are families of paralogs. Phage G, like some other large viruses, encodes several translation system components, e.g., 17 tRNAs covering 14 codon specificities and a homolog of a serine aminoacyl-tRNA synthetase. The phage 2012-1 genome is also circularly permuted and terminally redundant and is predicted to encode 468 CDSs.
Although giruses are probably ancient, they are relatively new to virologists. Even with our limited knowledge, research efforts on large viruses are contributing scientific and economic benefits. For example, chlorella viruses, which encode as many as 400 CDSs, are sources of new and unexpected genes. The genes not only encode commercially important enzymes such as DNA restriction endonucleases, but many viral proteins are the smallest in their class. Consequently, these proteins serve as biochemical models for mechanistic and structural studies (21). The viruses are also a source of genetic elements for genetically engineering other organisms. Examples include (a) promoter elements from chlorella viruses that function well in both monocots and dicots of higher plants, as well as bacteria (38); and (b) a translational enhancer element from a chlorella virus that functions well in Arabidopsis (49).
The hosts for some of these viruses either have been sequenced recently or are in the process of being sequenced. Annotation of these sequences will certainly contribute to studies on giruses. However, one obstacle to studying these viruses is that currently none of the eukaryotic viruses described in this review can be genetically modified by molecular techniques. The development of successful and reproducible host transformation procedures should lead to the genetic analysis of these viruses, which would be a major achievement.
It is obvious that the discovery and characterization of giruses are in their infancy and that many more interesting and unusual members await discovery. For example, metagenomic studies on environmental microbial DNA sequences collected in the Sargasso Sea revealed many homologs of Mimivirus genes. Thus, many Mimivirus relatives certainly exist in nature, some of which probably infect novel protists. Classifying these newly discovered large viruses will be complicated because of horizontal gene swapping.
The origin of giruses is controversial. One interesting suggestion is that amoebae, which harbor many diverse microorganisms, such as viruses, are melting pots for gene mixing, leading to new viruses, including large viruses with complex gene repertoires of various origins (6).
Research in the Van Etten laboratory was supported in part by Public Health Service grant GM32441 and NIH grant P21RR15635 from the COBRE program of the National Center for Research Resources. We thank Michele Malchow for help with the figures.
2The chlorella viruses were the first nonbacterial source of DNA restriction endonucleases.
3PBCV-1 was the first virus reported to encode most, if not all, of the machinery to glycosylate its MCP.
4In the construction of the virion of all the tailed phages, an empty protein capsid is assembled first and then DNA is pumped into the capsid, presumably by a head-full packaging mechanism. This puts an upper size limit on the genome, and in fact the DNA is usually packed as tightly as physically possible. This agrees with the circularly permuted and terminally redundant structure of the phage DNAs.