|Home | About | Journals | Submit | Contact Us | Français|
This paper is about the taxonomy and genomics of herpesviruses. Each theme is presented as a digest of current information flanked by commentaries on past activities and future directions.
The International Committee on Taxonomy of Viruses recently instituted a major update of herpesvirus classification. The former family Herpesviridae was elevated to a new order, the Herpesvirales, which now accommodates 3 families, 3 subfamilies, 17 genera and 90 species. Future developments will include revisiting the herpesvirus species definition and the criteria used for taxonomic assignment, particularly in regard to the possibilities of classifying the large number of herpesviruses detected only as DNA sequences by polymerase chain reaction.
Nucleotide sequence accessions in primary databases, such as GenBank, consist of the sequences plus annotations of the genetic features. The quality of these accessions is important because they provide a knowledge base that is used widely by the research community. However, updating the accessions to take account of improved knowledge is essentially reserved to the original depositors, and this activity is rarely undertaken. Thus, the primary databases are likely to become antiquated. In contrast, secondary databases are open to curation by experts other than the original depositors, thus increasing the likelihood that they will remain up to date. One of the most promising secondary databases is RefSeq, which aims to furnish the best available annotations for complete genome sequences. Progress in regard to improving the RefSeq herpesvirus accessions is discussed, and insights into particular aspects of herpesvirus genomics arising from this work are reported.
Systematics is usually taken to be synonymous with the classification of organisms, but for the purposes of this paper I have employed a broader definition that includes both taxonomy and the systematic aspects of genomics. I address aspects of how the current situation in each area has been reached and how it might develop in future, and provide detailed summaries of current information. I also discuss a selection of new findings that have emerged from genomic studies. Readers should note that herpesvirus names are referred to in this paper not in full but by the acronyms listed in Table 1.
Taxonomy – the systematic classification of living organisms – is an exercise in categorization that helps human beings cope with the world. Of the taxa into which organisms may be classified, only those of species, genus, subfamily, family and order are presently applicable to viruses. The grouping of viruses into these taxa proceeds on the basis of identifying shared properties, and has applications in understanding virus evolution, identifying the origins of emerging diseases, and developing rational treatment strategies.
The criteria used to classify viruses have necessarily depended on the knowledge and technology available at the time. In early days, aspects of biology such as host range and broad pathogenic and epidemiological features of the disease constituted the majority of information available. With the advent of electron microscopy and physicochemical methods, data on the morphological properties and constituents of virus particles became accessible, and these are still important today in assigning viruses to higher taxa. Thus, for example, all herpesviruses share a capsid structure that consists of a DNA core surrounded by an icosahedral (20-faceted) capsid consisting of 12 pentavalent and 150 hexavalent capsomeres. The capsid is embedded in a proteinaceous matrix called the tegument, which in turn is invested in a glycoprotein-containing lipid envelope. The advent of specific antibodies facilitated the study of antigenic relationships, which are generally detectable only between closely related viruses; in the case of herpesviruses, those in the same genus. The era of nucleic acid sequencing has led to the wide application of sequence-based phylogeny to classification. Indeed, the quantitative basis and broad utility of this approach have resulted in this becoming a key taxonomical discriminator in all parts of the tree of life, to the point where it dominates all other criteria.
Advances in virus taxonomy have been recorded in a series of eight reports published at intervals since 1971 by the International Committee on Taxonomy of Viruses (ICTV). The latest report in book form was published in 2005 (Fauquet et al., 2005). In deference to the electronic age, annual updates of the list of virus species were published in 2007, 2008 and 2009 at the ICTV website (http://www.ictvonline.org).
Virus classification is advanced through a voting procedure involving the members of the ICTV. The proposals on which voting takes place are prepared by the ICTV Executive Committee from submissions made by individuals in the virological community, in particular those associated with Study Groups devoted to particular virus groups (usually families). The development of taxonomy as summarized in the ICTV reports is regularly promoted, supplemented and discussed by expert publications from Study Groups, including that focused on the herpesviruses.
In the first ICTV report (Wildy, 1971), the genus Herpesvirus was established, consisting of 23 viruses and 4 groups of viruses named according to the usages of the day (e.g. herpesvirus of saimiri). In the second ICTV report (Fenner, 1976), this genus was elevated to the family Herpetoviridae, which, presumably because of the misleading association of this name with reptiles and amphibians, was renamed Herpesviridae in the third ICTV report (Matthews, 1979). Also, a formal system for naming herpesviruses was founded (Roizman et al., 1973), implemented (Fenner, 1976; Matthews, 1979), elaborated (Roizman et al., 1981) and consolidated (Francki et al., 1991; Roizman et al., 1992).
This naming system specified that each herpesvirus should be named after the taxon (family or subfamily) to which its primary natural host belongs. The subfamily name was used for viruses from members of the family Bovidae or from primates (the virus name ending in –ine, e.g. bovine), and the host family name for other viruses (ending in –id, e.g. equid). Human herpesviruses were treated as an exception (i.e. human rather than hominid). Following the host-derived term, the word herpesvirus was added, succeeded by an arabic number, which bore no implied meaning about the taxonomic or biological properties of the virus. Thus, the formal name of pseudorabies virus (also known as Aujeszky's disease virus) was established as suid herpesvirus 1. Since herpesviruses had previously been named on an ad hoc basis, sometimes with the effect that a virus might have several names, the formal system promised a degree of clarity and simplicity to students and scientists in the research field. However, a number of practical disadvantages of the formal naming system emerged. Most importantly, many virus names (e.g. Epstein–Barr virus) were so widely accepted that they could not be dislodged (e.g. in this case by human herpesvirus 4). This led to the use of a dual nomenclature in the literature for some herpesviruses.
Nonetheless, classification of herpesviruses continued and expanded. At the time of the third ICTV report (Matthews, 1979), the family Herpesviridae was divided into 3 subfamilies (Alphaherpesvirinae, Betaherpesvirinae and Gammaherpesvirinae) and 5 unnamed genera, and 21 viruses were listed. A subsequent list compiled by the Study Group contained 89 viruses (Roizman et al., 1981).
At the time of the seventh report (Minson et al., 2000), the ICTV adopted the species concept, which recognizes that a virus and the species to which it belongs fall into different categories, the real and the conceptual (Van Regenmortel, 1990). This sea change put the activities of the ICTV on a more logical footing, and also limited its authority to determining formal taxonomical nomenclature and classification, and not virus names, abbreviations and vernacular usages of formal names. It also brought about a simplification by effectively removing any implied taxonomical standing for viruses denoted previously as tentative species or unassigned viruses. Moreover, it reduced tensions concerning the pervasive use of a dual system for herpesvirus names, with the effect that the ICTV approach has been adopted for some (e.g. equine abortion virus is now well known as equid herpesvirus 1) and the ad hoc name for others (e.g. Kaposi's sarcoma-associated herpesvirus, rather than human herpesvirus 8). The rules for virus taxonomical names are that they are written in italics with the first letter capitalized, and never abbreviated. Thus, for example, pseudorabies virus belongs to the species Suid herpesvirus 1, genus Varicellovirus, subfamily Alphaherpesvirinae, family Herpesviridae. If an abbreviation based on the formal name is used (in this case, SuHV1 or SuHV-1), it represents the virus and not the species.
As the classification developed, it became clear from genome studies that IcHV1 and OsHV1 are very distant relatives of each other and other herpesviruses. For this reason, a new order, Herpesvirales, was created (Davison et al., 2009). This accommodates three families, namely the revised family Herpesviridae, which contains mammal, bird and reptile viruses, the new family Alloherpesviridae, which incorporates bony fish and frog viruses, and the new family Malacoherpesviridae, which contains OsHV1. Also, species representing viruses of non-human primates were renamed after the host genus rather than the subfamily, in order to cope with their rising number.
The most recent update (2009) of herpesvirus taxonomy largely concerned the introduction of additional taxa to the family Alloherpesviridae. The current, complete list of herpesvirus taxa is given in the first column of Table 1. This table also provides the common names and acronyms (abbreviations) of the viruses. The right-hand part of the table conveys genomic information, which is discussed below. The order Herpesvirales now consists of 3 families, 3 subfamilies, 17 genera and 90 species. A total of 48 tentative species and unassigned viruses are also listed in Table 1, but, as discussed above, these viruses are not part of formal taxonomy and are included out of convenience rather than necessity. Many are longstanding entities that may no longer exist in laboratories and for which sequence data are not (and never will be) available. Moreover, some are now known or suspected to be strains of other viruses (see the footnotes to Table 1).
The ICTV Herpesvirales Study Group is continuing to progress the classification of herpesviruses as they are identified and characterized. It is also taking forward discussions that will shape herpesvirus taxonomy in future. These include revisiting the herpesvirus species definition and the criteria used for taxonomic assignment, particularly in regard to the possibilities of classifying the large number of herpesviruses detected only as DNA sequences by polymerase chain reaction (PCR).
The ICTV follows the principle that “a virus species is a polythetic class of viruses that constitutes a replicating lineage and occupies a particular ecological niche” (Van Regenmortel, 1992). The polythetic nature of virus species (that is, having some but not all properties in common) implies that a species cannot be delineated on the basis of a single property. In line with this principle, herpesviruses are classified as distinct species if “(a) their nucleotide sequences differ in a readily assayable and distinctive manner across the entire genome and (b) they occupy different ecological niches by virtue of their distinct epidemiology and pathogenesis or their distinct natural hosts” (Roizman et al., 1992; Minson et al., 2000; Davison et al., 2005a). However, the dominant criterion in virus classification, as in the classification of all organisms, is now sequence-based phylogeny. The impact of this development upon the continuing viability of herpesvirus classification under the polythetic rule will need careful consideration. Discussions are likely to be driven to some extent by the ongoing PCR-based discovery of large numbers of herpesviruses that potentially belong to new taxa (e.g. Ehlers et al., 2008). Finally, the nomenclature of herpesvirus species is inherently unstable, because of its dependence on host nomenclature and the motivation to rename species that become very numerous in certain host families. It will be a challenge to maintain a balance between utility and stability.
Genomics – the study of the structure, function and evolution of genomes – underlies most discoveries in modern biology. Since the inception of nucleic acid sequencing, and increasingly since, interpretation of the data has lagged behind its generation. Hence, understanding the meaning of genomic data is at a premium. Interpretation of sequence data is served greatly by computer analyses in areas such as comparative genomics, pattern-based bioinformatics and proteomics. However, it is not best advanced when treated as a robotic exercise. Genomics, even of herpesviruses, has an actively speculative edge where new discoveries are being made and old models are being refined.
The primary nucleotide sequence databases, of which perhaps the most prominent is GenBank at the National Center for Biotechnology Information (NCBI; http://www.ncbi.nlm.nih.gov), are a vital resource to experimenters. NCBI is also putting effort into the Reference Sequence Project (RefSeq; http://www.ncbi.nlm.nih.gov/RefSeq), which aims to provide a comprehensive, integrated, non-redundant, well annotated set of sequences for major research organisms, including viruses (http://www.ncbi.nlm.nih.gov/genomes/GenomesHome.cgi?taxid=10239) (Pruitt et al., 2003). The herpesvirus RefSeqs are under development (http://www.ncbi.nlm.nih.gov/genomes/GenomesGroup.cgi?taxid=548681).
The utility of the primary databases is limited by three major factors: the accuracy of the sequence, the quality of the annotation (i.e. the description of features in the genome—genes, RNAs, coding regions and so forth), and the improvements made to both over time. However, for understandable reasons, updating an accession is effectively restricted to the original depositor. The tendency is for this to be done rarely, if at all, and this is causing the primary databases to become antiquated. Against this background, an enterprise in updating herpesvirus accessions in primary sequence databases seems forlorn. RefSeq provides a way to counter the disadvantages of the primary databases by enabling updating and curation by experts other than the original depositor.
The right-hand part of Table 1 lists information on the 51 herpesvirus species or potential species from which at least one virus has been sequenced to date. These species are represented by 122 complete or almost complete genome sequences. Partial sequence information is available on nearly all the other herpesviruses ranked in formal taxa; indeed, this is now a requirement for classification. At an experimental level, many herpesviruses are manipulated in the form of bacterial artificial chromosomes (BACs), the sequences of several of which are available; for example, those representing GaHV2, HHV1 and HHV5. The latest sequencing techniques are starting to be applied to herpesviruses, including the Roche 454 instrument (e.g. GaHV2 and MuHV1) or Illumina Genome Analyzer (e.g. HHV1 and HHV5).
I have been active in updating the herpesvirus RefSeqs, and thus far have processed all members of the subfamily Alphaherpesvirinae, plus some other viruses outside this group (e.g. HHV5). In addition to identifying potential sequence errors and upgrading the standard of annotation, one of the main tasks has been to start applying a systematic nomenclature to genome features. Members of the best studied family (Herpesviridae) in the order Herpesvirales share 44 genes apparently inherited from a common ancestor (core genes; Table 2), and yet the names of these orthologous genes and their encoded proteins vary from virus to virus. This has led to confusion in the research field. My tack has been to retain original gene names so that workers can find their way round an accession, and to apply standard names to the encoded proteins. This is aimed at improving the utility of the database accessions; for example, database searches would then return the same name for orthologous proteins from different viruses. Table 2 shows the current scheme used to annotate the RefSeqs for members of the subfamily Alphaherpesvirinae. A more extensive form of this database is available from me in spreadsheet form. As well as providing additional information, this database permits the genes and their encoded proteins to be sorted according to their order along a particular genome.
The long term aim of updating herpesvirus RefSeqs is to ensure that are correct (free from errors), complete (adequately annotated), clear (using standard nomenclature) and current (up to date). From experience, I foresee that fulfilling this aim to an acceptable standard will be demanding and that many considerations will have to be weighed. Most importantly, the list of standard protein names must be considered as needing development in future, since the process of establishing names is fraught with pitfalls that are not developed further here. Thus, as with the names of herpesvirus species in Table 1, the protein names in Table 2 are provisional. It is intended that they will improve and harmonize as knowledge increases and data are imported from the other herpesvirus subfamilies. The substantial efforts of Mocaski (2007) to apply a standard nomenclature to the proteins encoded by core genes also deserve recognition in this regard.
The process of systematizing information as described above tends to yield new insights, particularly those arising from comparative genomics. This section highlights three examples that were unearthed while reannotating genomes of members of the subfamily Alphaherpesvirinae and members of the genus Cytomegalovirus in the subfamily Betaherpesvirinae. These examples illustrate the fact that new discoveries remain to be made even with well characterized herpesviruses.
Previously, I reported that orthologues of gene UL56 are not confined to members of the genus Simplexvirus in the subfamily Alphaherpesvirinae, where they were first identified, but are also found among members of the genera Varicellovirus and Iltovirus (Davison, 2007). Subsequent analysis (Fig. 1 and Table 2) indicated that orthologues are also present in members of the genus Mardivirus. These studies indicate that all members of the subfamily Alphaherpesvirinae except BoHV1 and BoHV5 encode a UL56 orthologue. The appropriate amendments were applied to the RefSeqs by the middle of 2007.
All the versions of the UL56 product (membrane protein UL56) have a C-terminal hydrophobic domain, and sequence similarity is limited to a central region consisting in most instances of two PPXY motifs (Fig. 1). Most of the proteins have additional PPXY elements elsewhere in their sequences. Moreover, two viruses (CeHV9 and PsHV1) appear to have additional genes related to UL56, whose products are termed membrane protein UL56A. The existence of these paralogues gives rise to the novel UL56 gene family. This systematic evaluation provides a framework within which to view the possible roles of the encoded proteins.
In cellular proteins, PPXY motifs interact with the WW domain, which is 35–40 residues in length and structured as a 3-stranded, antiparallel β-sheet with two ligand-binding grooves (Macias et al., 1996). The WW domain is invariably joined to one of a wide variety of other protein modules, and is thus implicated in assembly of multiprotein complexes (Ingham et al., 2005). These processes encompass transcription, RNA processing, protein trafficking, receptor signaling, control of the cytoskeleton and vacuolar protein sorting (Hettema et al., 2004). Vacuolar protein sorting is exploited for budding by some enveloped viruses, utilizing PPXY motifs in virus proteins (Wills et al., 1994; Martin-Serrano et al., 2005). These observations from cellular proteins are in general accord with what is known about membrane protein UL56.
In HHV1, UL56 is not required for growth of virus in cell culture. However, mutants, including one lacking the C-terminal hydrophobic domain, are compromised in pathogenicity, including neuroinvasiveness (Peles et al., 1990; Rösen-Wolff et al., 1991; Berkowitz et al., 1994; Kehm et al., 1996), although this appears to depend on the system used (Nash and Spivack, 1994). Moreover, mutants with lesions in UL56 and other genes have been examined for utility in oncolytic viral therapy (Takakuwa et al., 2003; Sugiura et al., 2004; Ushijima et al., 2007). The protein has been detected in virions (Kehm et al., 1994, 1998).
Additional information is available for HHV2, in which UL56 has been shown to encode a type 2 membrane protein that localizes to the Golgi apparatus and cytoplasmic vesicles, and is tail-anchored by the C-terminal hydrophobic domain so that the N terminus (containing the PPXY motifs) is located in the cytoplasm (Koshizuka et al., 2002). An association of the protein has been reported with a neuron-specific kinesin (KIF1A) involved in transport of synaptic vesicle precursors in the axon, leading to the suggestion that it may function in vesicular transport (Koshizuka et al., 2005). These features prompted a comparison (Koshizuka et al., 2005) with membrane protein US9, which is a tail-anchored, type 2 membrane protein involved in transport of virus proteins towards the axon terminus, probably in vesicles (Brideau et al., 1998; Lyman et al., 2007). An interaction detected between membrane protein UL56 and myristylated tegument protein (encoded by gene UL49A in HHV2; alternatively named UL49.5), and colocalization of the complex to the Golgi apparatus and aggresome-like structures, suggests that it may have a role in virus maturation and egress (Koshizuka et al., 2006). Membrane protein UL56 also interacts with the ubiquitin ligase Nedd4 via its PPXY motifs and promotes its ubiquitination (Ushijima et al., 2008).
Some functional investigations have been carried out on other orthologues. Deletion of the EHV1 or SuHV1 gene has no effect on growth in cell culture (Sun and Brown, 1994; Baumeister et al., 1995), whereas the VZV protein is required for efficient growth of virus in cell culture and in an animal model system (Zhang et al., 2007).
HHV1 gene US8A (alternatively named US8.5) appeared late on the scene. It was overlooked in the genome sequence analyses (McGeoch et al., 1985, 1988), and was discovered later by Georgopoulou et al. (1993). In a comparative description of the HHV2 gene content, Dolan et al. (1998) evaluated US8A as probably specific to the genus Simplexvirus, and noted a positional counterpart in EHV1, a member of the genus Varicellovirus. Georgopoulou et al. (1993) characterized an internally tagged form of the HHV1 protein as locating to nucleoli. However, as registered by Dolan et al. (1998), the sequence used in these experiments was frameshifted, so the 16 residues at the C terminus of the protein would have been replaced. This raises a question as to whether the mutated protein might have localized inappropriately. A comparative approach (Fig. 2) indicates that the US8A protein has the sequence properties of a tail-anchored, type 2 membrane protein (N terminus in the cytoplasm). It is notable that the adjacent gene (US9) also encodes a tail-anchored, type 2 membrane protein, though it is unrelated in sequence to membrane protein US8A. A gene that is positionally equivalent to gene US8A is present in members of other genera in the subfamily Alphaherpesvirinae, and the same protein name (membrane protein US8A) is currently assigned (Table 2).
HHV5 and its closest relative (PnHV2) are members of the genus Cytomegalovirus in the subfamily Betaherpesvirinae. The UL28 coding region in these viruses was predicted several years ago to be spliced to an unidentified upstream exon (Davison et al., 2003). The layout of UL28 in relation to genes upstream is shown in Fig. 3a. A subsequent comparative analysis involving members or potential members of the genus Cytomegalovirus from Old and New World primates (HHV5, PnHV2, CeHV5 and CeHV8 for the former; AoHV1 and SaHV3 for the latter) strongly suggested that the upstream exon is UL29 (Fig. 3b). Reverse-transcription PCR (RT-PCR) confirmed the expression of the predicted spliced HHV5 RNA (as represented by the 450 bp RT-PCR product in Fig. 3a), as well as the unspliced RNA (as represented by the 600 bp product). The appropriate amendment to the HHV5 RefSeq was made early in 2009. The intron has also been detected independently (Mitchell et al., 2009). Thus, two previous coding regions (UL28 and UL29) present in primate members of the genus Cytomegalovirus have been conflated to a single gene (UL29).
During transcript mapping experiments, we mapped the major 5′-end upstream from HHV5 UL29 to a nucleotide position approximately 800 bp upstream from the translational initiation codon, and 300 bp upstream from the UL30 translational initiation codon (as represented by the 700 bp 5′-RACE product in Fig. 3a). This transcriptional initiation site is located an appropriate distance downstream from a candidate TATA box (Fig. 3c). Comparative analysis then led to the discovery of a potential coding region (UL30A) located in the 300 bp region between the transcriptional initiation site and the UL30 translational initiation codon. This region potentially encodes a protein that is conserved in other Old World primate members of the genus Cytomegalovirus and is related to protein UL30 (Fig. 3d). However, none of the UL30A coding regions in these viruses possesses an appropriately positioned translational initiation codon (ATG). It is possible that translational initiation occurs from a non-ATG codon, in this case ACG, which has been shown to function as an initiation codon in eukaryotic systems (Anderson and Buzash-Pollert, 1985; Peabody, 1989) and viruses such as adeno-associated virus (Becerra et al., 1988) and Sendai virus (Curran and Kolakofsky, 1988). Indeed, a few respectable herpesvirus coding regions lack an ATG initiation codon and have been proposed to utilise a non-ATG codon; for example, that encoding HHV2 tegument protein UL16 (Dolan et al., 1998). The New World primate viruses that potentially belong to the genus Cytomegalovirus appear to have UL30A orthologues with ATG initiation codons, but lack UL30 orthologues (Fig. 3d). These findings indicate the presence of paralogues (UL30 and UL30A) that together constitute the novel UL30 gene family.
The author has no conflict of interest.
I am grateful to Charles Cunningham (MRC Virology Unit, Glasgow, UK) for providing RNA mapping data.
Systematicization proceeds best when interested people participate. I encourage those who wish to contribute to herpesvirus taxonomy to contact the chair of the Herpesvirales Study Group (currently Philip Pellett; email@example.com). Also, any with thoughts on annotating the herpesvirus RefSeqs should contact me.