|Home | About | Journals | Submit | Contact Us | Français|
Human herpesvirus 6 variants A and B (HHV-6A and HHV-6B) are closely related viruses that can be readily distinguished by comparison of restriction endonuclease profiles and nucleotide sequences. The viruses are similar with respect to genomic and genetic organization, and their genomes cross-hybridize extensively, but they differ in biological and epidemiologic features. Differences include infectivity of T-cell lines, patterns of reactivity with monoclonal antibodies, and disease associations. Here we report the complete genome sequence of HHV-6B strain Z29 [HHV-6B(Z29)], describe its genetic content, and present an analysis of the relationships between HHV-6A and HHV-6B. As sequenced, the HHV-6B(Z29) genome is 162,114 bp long and is composed of a 144,528-bp unique segment (U) bracketed by 8,793-bp direct repeats (DR). The genomic sequence allows prediction of a total of 119 unique open reading frames (ORFs), 9 of which are present only in HHV-6B. Splicing is predicted in 11 genes, resulting in the 119 ORFs composing 97 unique genes. The overall nucleotide sequence identity between HHV-6A and HHV-6B is 90%. The most divergent regions are DR and the right end of U, spanning ORFs U86 to U100. These regions have 85 and 72% nucleotide sequence identity, respectively. The amino acid sequences of 13 of the 17 ORFs at the right end of U differ by more than 10%, with the notable exception of U94, the adeno-associated virus type 2 rep homolog, which differs by only 2.4%. This region also includes putative cis-acting sequences that are likely to be involved in transcriptional regulation of the major immediate-early locus. The catalog of variant-specific genetic differences resulting from our comparison of the genome sequences adds support to previous data indicating that HHV-6A and HHV-6B are distinct herpesvirus species.
Sequence-based information is an essential precursor to many molecular, biological, and epidemiologic studies. In addition, sequences are important for confirming or clarifying biological and taxonomic classifications, as illustrated for herpesviruses by experiences with Marek’s disease virus, channel catfish virus, and human herpesvirus 6 (HHV-6) (4, 9, 19). While the wealth of information obtained from smaller DNA segments is useful, genetic descriptions of viruses revealed through the determination and analysis of complete genome sequences are uniquely valuable. Thus, the analysis of 17 complete herpesvirus genome sequences has provided detailed information about their coding capacity and genetic architecture, revealing various permutations of conserved gene blocks and clusters of unique genes. This information yielded numerous insights into evolutionary paths within the herpesvirus family.
HHV-6 variants A and B (HHV-6A and HHV-6B) are classified as members of the Betaherpesvirinae subfamily, in the Roseolovirus genus along with human herpesvirus 7 (HHV-7) (47). These viruses share extensive domains of similar genetic organization with other betaherpesviruses, such as human cytomegalovirus (HCMV) (14, 19, 30, 40, 41, 48). The complete genome sequence for HHV-6A strain U1102 [HHV-6A(U1102)] was described by Gompels et al. (19).
HHV-6 was first described by Salahuddin and coworkers as a novel human herpesvirus isolated from the blood of patients with AIDS and other lymphoproliferative diseases (49). Reports describing the isolation of similar viruses soon followed (reviewed in reference 3), including from patients during the acute phase of the common childhood illness roseola or exanthem subitum (60). Characteristics shared by all of these viruses include infection of activated primary CD4+ T cells (33, 34, 54, 55), cross-reactive antigens (6, 43), and similar genomic organizations (30, 32, 36). As the cellular and molecular biologic properties of these viruses were investigated, it became evident that they segregate into two groups that differ with respect to several genetic and biological properties. To recognize the differences between these viruses, a system was established that classified them as either variant A or B. Classification was based on differences in nucleotide sequences, reactivity with panels of monoclonal antibodies, and cell tropism (1). The question of whether the HHV-6 variants should be recognized as distinct viral species was deferred pending the accumulation of additional information.
Although the viruses are closely related, there is no genetic gradient between HHV-6A and HHV6B, and recombinant viruses have never been detected. This situation is in contrast to that seen with the Epstein-Barr virus types, for which concentration of variant-specific changes in a small number of loci does not preclude intervariant recombination (25, 50). To understand the differences that may play a role in segregating the HHV-6 variants into discrete viral species, their genome sequences must be compared. In previous comparisons between subsets of HHV-6A and HHV-6B amino acid sequences, differences ranged from 1 to 5% in the set of genes shared by all herpesviruses (herpesvirus core genes), 19% in the gene encoding a strongly immunoreactive virion protein (U11), and 25% in the IE1 (U89) gene (8, 17, 29, 43, 59). In this report we present the genome sequence of HHV-6B strain Z29 [(HHV-6B(Z29)]. We describe its general organization, protein-coding potential, and relationship with the HHV-6A sequence.
A previously described library of clones containing HHV-6B(Z29) restriction endonuclease fragments was used to determine the genome sequence (30). Several regions that were uncloned or cloned as part of a larger insert were amplified by using primers derived from terminal nucleotide sequences of bordering clones or the HHV-6A(U1102) genome sequence (19) (Table (Table1).1). The junctions between adjacent genomic restriction endonuclease fragments were confirmed by sequencing junction-spanning PCR amplicons.
PCR was performed with a proofreading enzyme (Taq Precision Plus; Stratagene, La Jolla, Calif.) on HHV-6B(Z29) nucleocapsid DNA prepared as previously described (32). In most cases, cycles (30) consisted of denaturation (94°C for 45 s), annealing (60 s), and extension (72°C) of 1 min per kb. Primer sequences, genomic coordinates, annealing temperatures, and amplimer sizes are listed in Table Table1.1. PCR amplimers were affinity purified by using standard methods (Qiagen, Santa Clarita, Calif.) prior to direct sequencing. A subset of PCR amplimers was cloned into a pCR-Blunt vector (Invitrogen, San Diego, Calif.) prior to sequencing.
Nucleotide sequences were determined with a fourfold or greater redundancy using commercially available primers or by primer walking using custom primers. Double-stranded coverage was not possible in two regions of the direct repeat (DR) because of the presence of repeats or homopolymeric stretches. These regions include a 454-bp segment within the unique region of DR (DRL coordinates 5734 to 6188) and a 694-bp region extending across the (TAACCC)78 repeat array at the right end of DR to the junction of the right end of DR with the left end of the unique segment (U). These regions were sequenced multiple times from different templates, including other PCR amplimers and a lambda phage clone (λH6Z-851) (30) with various primers. In addition, double-stranded information was not obtained for two small regions (totaling 118 bases) at the boundaries of the internal repeat R1 (Fig. (Fig.1A).1A).
Repeat R3 is contained within HindIII fragment C, cloned as pH6Z-204 (30). Two complementary sets of nested deletion clones with termini spaced at approximately 200-bp intervals were generated from subclones of pH6Z-204 with exonuclease III (Erase-a-Base; Promega, Wis.). These subclones were then used to sequence both strands of the repeat.
Sequences were assembled and analyzed by using the Wisconsin Package, version 9.0 (Genetics Computer Group, Madison, Wis.). Database searches were done with nonredundant versions of GenBank posted on February 9, 10, and 15, 1999.
The sequence reported has been deposited with GenBank under accession no. AF157706.
The HHV-6B genetic content described below is presented in the context of elegant descriptions of HHV-6A and HHV-7 genetic architecture by others (19, 37, 41). Thus, we will not describe coding content in detail but will expand on issues that are unique to HHV-6B. The focus will be on a comparative description between the HHV-6B and HHV-6A genomes. In any such analysis of herpesvirus genomes, it must be remembered that the reported sequences represent a snapshot of a single example from the heterogeneous population of molecules that might be present in an individual or that might have varied on passage in cell culture. Such variation is frequently seen in regions containing repetitive elements (e.g., the het region in DR) (32) but is not necessarily limited to these regions (5).
Over 98% of the sequence was determined on both strands, with an average of fourfold redundancy; the exceptions were three highly repetitive regions described in Materials and Methods. A representative complete genome sequence was compiled by assembling a representative DR element and grafting it to the termini of U. The junctions of U with DRL and DRR were confirmed independently. The DR sequence was assembled by using the following sequences, from left to right: a 1.5-kb cloned PCR amplimer mapping to the 5′ end of DR (TL in Table Table1),1), a 3.8-kb BamHI genomic clone (pH6Z-109, BamHI fragment L) (30), a 5.2-kb PCR amplimer that extends from BamHI L across the DRL-U junction (DRL/U in Table Table1),1), plus clones that span the junctions of DR-DR (from circular or concatemeric genomes) and U-DRR. We previously described three segments that are included in the complete genome sequence (29, 31, 43). One 20-kb segment spans U40 to U57 (GenBank accession no. L16947) and includes the origin of lytic replication (oriLyt). Another 20-kb segment spans U69 to U84 (GenBank accession no. L14772) and includes the homolog of the herpes simplex virus type 1 UL9 origin binding protein. The third is a smaller 3.2-kb segment encoding the antigenic virion protein 101K, which is the product of the U11 gene (GenBank accession no. L13162).
The HHV-6B(Z29) genome sequence as assembled is 162,114 bp long. This is in close agreement with values of 159 to 164 kb determined by summation of restriction endonuclease fragment lengths (30, 32). The genome is composed of a 144,528-bp U flanked by 8,793-bp DR segments, DRL and DRR (Fig. (Fig.11A).
G+C contents are 40.8 and 59.1% in U and DR, respectively, with an overall G+C content of 42.8%. Nearly identical uneven base distributions between U and DR are also observed in HHV-6A and HHV-7 (19, 37, 41). Similar distributions of low and high G+C content have been described between U segments and long repeats that are present at either genomic termini or termini of long invertible segments of other herpesviruses (2, 7). As previously described, a region with unusually low G+C content (32.2%) is the 1,367-bp region between U41 and U42 that contains the HHV-6B oriLyt (12).
A shared characteristic of betaherpesviruses is CpG suppression with a concomitant increase in TpG in the major immediate-early (IE-A) locus; this feature is in contrast to the global CpG suppression in most gammaherpesvirus genomes and the lack of apparent CpG suppression in alphaherpesviruses (19, 21). Like HHV-6A, HHV-6B is CpG deficient, with a corresponding increase in TpG frequency in the IE-A locus (coordinates 127342 to 139167). CpG deficits in the IE regions of betaherpesviruses have been hypothesized to reflect localized methylation by the host cell during latency (19, 21).
HHV-6B, HHV-6A, and HHV-7 have similarly organized DR segments, which are composed of terminal, unique, and junctional regions. The coding content of the unique region of DR is described in the section on gene content. The assembled HHV-6B DR is 8% longer than that described for HHV-6A as a result of small insertions in the unique region and differences in the copy number of repeats found near the termini (Table (Table2).2). The terminal repeats are composed of perfect and imperfect copies of the hexanucleotide TAACCC (telomeric repeat sequence [TRS]). This sequence is also present in repeat arrays at the termini of vertebrate chromosomes (38), near the termini of HHV-6A and HHV-7 (14, 18, 19, 52), and at the junction region between the internal inverted repeats IRS and IRL of Marek’s disease virus (26). Additionally, scattered single copies of TRS are also present in U, distributed with a polarity similar to that of HHV-6A, in which TAACCC is found to the left of oriLyt and the complementary sequence, GGGTTA, to the right of oriLyt (18, 19). This arrangement confers an overall dyad symmetry to the genome, radiating from oriLyt.
Sequences near the termini of the HHV-6B DR were described previously (57). We confirmed and extended these results by sequencing across the TRS arrays into the adjacent unique DR sequences, by sequencing additional clones that span the junction between the termini of circularized or concatemerized genomes, and by sequencing across the junction between the right end of DR and the left end of U. In summary, DR termini are composed of copies of TRS that are flanked on their left by pac1 at the left terminus and on their right by pac2 at the right terminus (Fig. (Fig.2A);2A); pac1 and pac2 are conserved cis-acting herpesvirus packaging signals. At the left terminus of DR, a pac1 cleavage sequence is located 18 nucleotides (nt) from the predicted genomic terminus; adjacent to it are multiple copies of TRS interspersed with the related hexamers TAGGTC and TAGCCC. The right terminus of DR consists of 78 perfectly reiterated copies of TRS, followed by a pac2 signal located 29 nt from the predicted genomic terminus (Fig. (Fig.2B2B and C).
There were copy number differences in the heterogeneous TRS arrays between the sequence described here and the previously reported left terminal sequence of HHV-6B(Z29) obtained independently by Thomson et al. (57). This copy number variability was also present in different clones obtained from the same PCR amplification (15). Additionally, in electrophoretic analyses, the left terminal BamHI and SalI restriction endonuclease fragments were determined to be 2.7 and 2.0 kb long, respectively (30, 32), compared with 1,389 and 656 bp, respectively, predicted from the sequence. This difference suggests that smaller segments were selectively amplified by PCR from the pool of heterogeneous versions of the region. These results are consistent with the observations of Lindquester and Pellett (32), who found that the length of DR changed from 13 to 10 kb on viral passage in cell culture. The length heterogeneity mapped to the left end of the DR elements. On the basis of the restriction mapping and sequence data described above, it is likely that the variable regions correspond to the heterogeneous TRS arrays. Sequence analysis of uncultured virus will be required to more completely understand the structure of this region in wild-type virus.
The precise genomic termini of HHV-6B have not been directly determined but can be inferred from the sequences of fragments from at or near the genomic termini and that span DR-U boundaries, in the context of motifs conserved at the termini of other herpesvirus genomes. An alignment of our sequences with all previously published HHV-6A and HHV-6B sequences from the terminal region is shown in Fig. Fig.2B.2B. As can be seen in the alignment, terminal and DRR-DRL junction sequences are highly conserved between linear and concatemeric or circular genomes. Additionally, HHV-6B and HHV-6A sequences are highly conserved in this region. Interestingly, of the seven DRR-DRL junction clones that we analyzed, all except P4 had one to eight additional nucleotides at the junction. The mechanism for inserting these nucleotides is not obvious, although the variability is unlikely to be an artifact since it was observed in independently derived clones from different viral stocks and HHV-6A. Similar heterogeneity was also observed in clones derived from plasmid concatemers that had been packaged into extracellular virions or intracellular nucleocapsids (11).
In addition to the TRS arrays, five major repeat elements are located in U: R0, R1, R2A, R2B, and R3 (Fig. (Fig.1A).1A). Copy number and coordinates for these arrays in both HHV-6A and HHV-6B genomes are given in Table Table2.2. These repeat elements are located in regions of the HHV-6B genome that have lower nucleotide sequence identity with HHV-6A (Fig. (Fig.1B).1B). R0 is unique to HHV-6B and is located near the junction of DRL and U and is contained within the putative HHV-6B open reading frame (ORF) B4. R1 is located near the 3′ end of the U86 ORF. Translation of R1 results in a series of serine and arginine (SR) repeats at the carboxy terminus of the U86 protein, the HCMV IE2 (UL122) homolog (19, 42). HHV-6B R1 has greater sequence variation than does HHV-6A R1; the HHV-6B repeat array is assembled from 10 different units, while HHV-6A R1 is assembled from 3 different units. The SR repeats are unique to the HHV-6 version of the protein and reflect divergence from other betaherpesviruses (19).
R2A and R2B are located in the region between U86 and U90 (coordinates 131902 to 138003). This region has only 57.5% nucleotide identity between HHV-6A and HHV-6B. R2A is not unique to HHV-6B; two, rather than five, copies are present in HHV-6A, and they are more divergent than in HHV-6B. R2B is related to HHV-6A R2 but is much shorter. R2B is 94 nt long, compared with the 1.2-kb R2. R2A contains several TATA-like sequences, while R2B has multiple potential binding sites for the transcription factor HNF-5 (TRTTTGY) (16), suggesting a possible role for these sequences in transcription regulation. A point of interest is that a plasmid clone (pH6Z-231) (30) was used as the sequencing template for the region encompassing R2A and R2B; the corresponding region in HHV-6A was refractory to cloning and was sequenced from a PCR-derived template. However, no large deletion is present in pH6Z-231 since the sequenced plasmid had a length similar to the predicted length of the corresponding HHV-6B restriction endonuclease fragment. In addition, R2A and R2B are present in HHV-6B(HST) (27).
R3 is located upstream of the IE-A locus that spans U86 to U91 and is hypothesized to contain cis-acting regulatory sequences that might play a role in transcription regulation of this locus. HHV-6B R3 is composed of 26 copies of 103-, 104-, or 105-bp imperfect repeat units. Individual units from HHV-6B vary considerably in sequence. An alignment of R3 consensus sequences for HHV-6B strains Z29 and HST (27) and HHV-6A (19) is shown in Fig. Fig.3.3. Sequence variation is scattered throughout individual units of HHV-6B R3, with conserved pockets at positions 32 to 52 and 94 to 96 of the consensus sequence. One of the conserved pockets in all of the individual units of both variants is a potential binding site for the transcription factor PEA3 (AGGAA[A/G]). The PEA3 motif has been found in other viral genomes, including the polyomavirus enhancer and the adenovirus enhancer core element (20, 35). The presence of multiple PEA3 binding sites in R3 is intriguing, as they may represent primary targets of signal transduction in HHV-6A and HHV-6B. Other potential transcription factor binding sites identified in HHV-6A R3 include NF-κB and AP2 (58). In contrast, HHV-6B(Z29) R3 has no NF-κB sites and AP2 sites (CCC[A/C]NG/C[G/C][G/C]) in 11 of the 26 repeat units.
HHV-6B protein-encoding ORFs were first considered as significant if the translated proteins had sequence counterparts in HHV-6A. Proteins with similarity to HHV-6B-encoded proteins were identified by searching GenBank with the BLAST family of programs. Also, ATG-initiated ORFs as small as 177 nt (59 amino acids) that had no significant overlap with other ORFs and appropriately located polyadenylation signals were also considered as possibly significant. Using these criteria, we identified 127 ORFs; 8 are diploid because they are present in each copy of DR, leaving 119 that are unique (Table (Table3).3). These 119 unique ORFs compose 97 genes, based on the predicted splicing patterns for 11 genes. Of the 119 ORFs analyzed, 110 (92%) had their highest similarity score with the set of HHV-6A ORFs initially described by Gompels et al. (19) and modified by Megaw et al. (37). The remaining nine do not have HHV-6A counterparts and are unique to HHV-6B. With the exception of the acceptor site of U91EX2, exon boundaries were in agreement with previously published acceptor and donor splice sites for HHV-6A (37). The nomenclature used is based on that previously employed for HHV-6A and HHV-7 (19, 37, 41). ORFs with HHV-6A counterparts were given the same name; ORFs unique to HHV-6B are designated B1 through B9. Spliced genes are identified by the 5′-proximal exon as was done for HHV-7 (RK) (37). The deduced protein-coding capacity of the HHV-6B genome is listed in Table Table3,3, and the arrangement of the ORFs is shown in Fig. Fig.1A.1A.
Nine ORFs (DR4, DR5, DR8, U1, U61, U78, U88, U92, and U93) described by Gompels et al. (19) do not have counterparts in the HHV-6B genome, as a result of either the lack of an initiation codon, truncation, or frameshift mutations. HHV-6B encodes positional counterparts of HHV-6A ORFs LT1, LJ1, and RJ1, but these were excluded from consideration as candidates for being expressed as proteins because they are composed almost entirely of TRS arrays, the large variations in TRS copy number described above, and the lack of amino acid sequence conservation between the variants. Interestingly, DR3, U6, U9, U22, U83, and U94 are present in HHV-6A and HHV-6B but not in HHV-7 and thus are unique to the HHV-6 variants.
The nine putative unique HHV-6B ORFs, B1 through B9, are predicted to encode proteins of 265 amino acids or less (Table (Table3).3). B4 and B9 are located near the junctions of U and DR, a region of sequence divergence between the variants (see below). B4 spans R0, while B9 spans a region composed of four copies of an imperfect 62- or 63-bp element. None of the ORFs unique to HHV-6B had significant similarity with any other proteins in GenBank. Experimental data must be obtained to determine if any of the these ORFs encode functional proteins.
Amino acid identity of HHV-6B proteins with their HHV-6A counterparts ranged from 99.5% in the spliced U66 gene to 61.8% for the spliced U91 gene (Table (Table3).3). U66 is a highly conserved herpesvirus protein involved in DNA packaging (46). U91 is located in the genomic region that is most divergent between the variants. The function of U91 is not known, but it is likely to be involved in gene regulation since it is part of the IE-A locus (51).
Several HHV-6B ORFs give rise to proteins longer than their HHV-6A counterparts as a result of differences near either their 5′ or 3′ ends. U8, U21, U23, U55, and U83 have 5′ extensions, while U10 and U44 have 3′ extensions. In addition, HHV-6B U47 has a 3′ extension and a 5′ truncation. These ORFs are dispersed in the genome, and none are homologs of conserved herpesvirus genes. In fact, U47 and U55 are located between conserved herpesvirus gene blocks (Fig. (Fig.1A).1A). Interestingly, HHV-6B(Z29) U12, which encodes one of the G-protein-coupled receptor homologs, is truncated because of an in-frame termination codon at amino acid 195 of U12EX2. The termination codon lies within transmembrane region 5 of the seven predicted membrane-spanning domains and creates a protein that is 146 amino acids shorter than HHV-6A U12 (19). The HHV-6B(Z29) alteration is not likely to be a cloning or sequencing artifact or error because it was present in plasmid clones derived nearly a decade ago from purified viral DNA and in PCR amplimers derived directly from more recent viral DNA preparations. Nonetheless, the termination codon present in HHV-6B(Z29) is not present in two HHV-6B clinical isolates; translation of the ORF from these viruses should lead to a protein corresponding to the HHV-6A protein (15). Additionally, Isegawa et al. (24) reported that expression of HHV-6B(HST) U12 results in a full-length protein that functions as a β-chemokine receptor. These results indicate that expression of full-length U12 is not required for viral replication in cell culture.
The HHV-6A and HHV-6B genomic sequences were aligned in several pieces by using GAP, and the segments were then assembled into complete genomes containing alignment gaps. The resulting alignment was visualized with PLOTSIMILARITY (Fig. (Fig.1B).1B). Most of the narrow valleys of low identity, e.g., in the vicinity of U41/U42 and U47, correspond to regions of multiple insertions in one variant relative to the other. The baseline identity level in these and other regions can be easily discerned. Some of the valleys are augmented by differences in copy number of a repeat element between the two genomes, e.g., TRS and R3.
As can be seen, there is high similarity across the middle of the alignment (positions 30000 through 128000), with regions of extensive dissimilarity toward the genomic ends. Highly variable regions are localized to the DRL-U junction, the left end of U, the region spanning U86 to U100, and all of DR. The most variable region is the region between U86 and U90, where there is only 63.2% nucleotide identity between HHV-6B and HHV-6A. Interestingly, the right end of U, spanning ORFs U86 to U100, differs by more than 10%, with the exception of the region encoding U94 (the adeno-associated virus type 2 rep homolog), which has 96.5% nucleotide sequence identity between the variants. In addition, concatemers of HHV-6A and HHV-6B ORFs U75, U76, U77, U79, U81, U82, U83, U84, U85, U86, U90, U91, U94, U95, and U100 that had been aligned codon by codon to correspond to the amino acid sequence alignments used to determine Ks and Ka values (described below) were assembled, and the resulting alignment was visualized as for Fig. Fig.1B.1B. Similar results were observed; i.e., U86, U90, U91, U95, and U100 had the greatest amount of sequence variation (not shown). Three notable areas of low sequence identity located in the region spanning the herpesvirus core genes are the U41-U42 intergenic region (which includes oriLyt), U47, and U54. As noted by others, genes at the junction of conserved herpesvirus gene blocks, e.g., U47 and U54, are frequently more divergent than their conserved neighbors (19); however as discussed later, the basis for variation at these positions in lineages of colinear genomes is not clear.
Genetic variation between HHV-6A and HHV-6B was further examined by determining the distribution of nucleotide substitutions within coding sequences. Estimates of the number of nucleotide substitutions per synonymous site (Ks) or per nonsynonymous site (Ka), that is, substitutions that are silent or result in amino acid changes, were computed for the 88 genes present in both HHV-6A and HHV-6B (Table (Table3).3). The expected outcome is that Ka will be smaller than Ks, unless positive selective pressure is being exerted on that particular sequence (13, 28). A Ka/Ks ratio of greater than 1 can be an indication that a particular sequence is under a strong selective pressure toward change in the encoded amino acid. Ka/Ks ratios were plotted against the codon length of the aligned pairs, which allows possible stochastic effects of Ka/Ks ratios obtained from short sequences to be visualized (Fig. (Fig.4).4). Of the 88 genes analyzed, U24, U54, U90, U91, and U95 have Ka/Ks ratios of greater than 1. The values for U24 and U91 possibly reflect stochastic effects due to their small size. Of the remaining ORFs with Ka/Ks ratios greater than 1, U90 and U95 appear to be under strong positive selection toward sequence divergence. The products of both genes are hypothesized to have regulatory functions and therefore may have important roles in the establishment of the variant-specific niche in the host. U90 has been shown to be a transcriptional activator of the human immunodeficiency virus type 1 long terminal repeat (61). U95 is a member of the HCMV US22 gene family, and two family members from HCMV have been shown to transactivate gene expression (53). These data add additional support to the hypothesis based on sequence differences that the right-most 24 kb of U of HHV-6A and HHV-6B genomes is not under strong sequence conservation.
The relationship of HHV-6B to other herpesviruses is similar to that described for HHV-6A (19). HHV-7 is the next-closest relative of HHV-6B; homologs to 82 HHV-7 genes that are likely to encode proteins are present in HHV-6B (37, 41). Amino acid identities between HHV-6B genes and their HHV-7 homologs range from 75% for U66EX1 and U77 to 22% for U20 (Table (Table33).
A 30-kb region encompassing part of U74 through the carboxy terminus of U94 has been reported for HHV-6B(HST) (27). In pairwise comparisons between the two variant B strains, the amino acid identity for complete ORFs ranged from 92% (U90EX3) to 100% (U75). For comparison, amino acid identity between HHV-6A and HHV-6B ORFs in this region range from 57.1% (U91EX1) to 98.9% (U77).
R1 and R3 are regions of greater divergence (Fig. (Fig.5);5); nucleotide identities between R1 and R3 from strains Z29 and HST are 92.4 and 94.4% without gaps, respectively. There are strain differences in the copy number of the repeat elements; R1 has 54 copies in strain Z29 and 53 in strain HST, while R3 has 26 copies in strain Z29 and 24 copies in strain HST. Additionally, Z29 U86 is encoded by a single ORF, while there are two ORFs (U86 and U87) in the HST sequence as a result of a one-base deletion at position 16394 of the HST sequence. The amino-terminal ends of Z29 U86 and HST U87 are related, but the carboxy-terminal end of Z29 U86 is related to HST U86. Overall, the region spanning U75 through the end of R3 has high nucleotide identity between the strains (98.4%). For comparison, the corresponding region in HHV-6A has 79.3% nucleotide identity with HHV-6B (Fig. (Fig.5).5). Most of the intravariant sequence differences are localized in repeat elements R1 and R3; greater interstrain divergence (both insertions and substitutions) has previously been described in regions adjacent to repetitive regions of other herpesviruses (7, 56). The observed sequence differences between the two HHV-6B strains may reflect either geographic or etiologic differences; HST was isolated from a Japanese exanthem subitum patient, whereas Z29 was isolated from an AIDS patient from Zaire.
In this work we present the general features of the HHV-6B genome sequence, describe its coding potential, and describe its relationships with other herpesviruses. This information will facilitate examination of the biological significance of the genetic differences between HHV-6B and other herpesviruses, in particular between HHV-6A and HHV-6B. The HHV-6B genome sequence allows prediction of 97 unique genes, 88 of which have HHV-6A counterparts, while 82 have counterparts in HHV-7. Thirty-nine of these genes are conserved among all mammalian herpesviruses (herpesvirus core genes), others have been found only in betaherpesviruses, while still others are found only in members of the genus Roseolovirus and some are found only in HHV-6A and HHV-6B. Below we discuss the general aspects of sequence conservation and divergence across the herpesvirus family, as well as the relationship between the HHV-6 variants.
Herpesviruses encode a set of conserved genes, the herpesvirus core genes, that are grouped into seven gene blocks (7). Within gene blocks, the order and transcriptional polarity of the component genes are maintained. The gene blocks have different genomic locations, order, and orientations in the different herpesvirus subfamilies. Products of the core genes include structural components, such as capsid proteins and glycoproteins. Others include enzymes required for DNA replication, such as the major DNA binding protein and the DNA polymerase. The HHV-6B gene organization shares overall similarity with other betaherpesviruses, such as HCMV and HHV-7.
Betaherpesviruses encode several genes that are common to the viruses of this subfamily and that are absent in alpha- and gammaherpesviruses. Many of these genes belong to the US22 family of genes, examples of which are scattered throughout the HCMV genome (7). The function of these genes during infection is unclear, although HCMV TRS1 and IRS1 are transcriptional activators (53). Within the betaherpesviruses, divergence between viruses of the Cytomegalovirus and Roseolovirus genera is reflected by genes that are specific to members of Roseolovirus, including U20, U21, U23, U24, and U26.
It is particularly interesting that the roseoloviruses encode homologs (U73) of both the origin binding protein of alphaherpesviruses and their binding sites in origins of lytic DNA replication (23). As described here and elsewhere (19, 29, 31, 37, 41), the other proteins that are involved in replication at the replication fork are more highly conserved between the roseoloviruses and their HCMV counterparts than with any alphaherpesvirus. This strengthens the previous suggestion (14) that the mechanisms of DNA replication initiation and elongation for roseoloviruses have greater similarities with alphaherpesviruses and cytomegaloviruses, respectively.
Among the roseoloviruses, HHV-7 encodes no genes without counterparts in HHV-6A and HHV-6B, whereas both HHV-6A and HHV-6B encode DR3, U6, U9, U22, U83, and U94. In addition, HHV-6A and HHV-6B each encode variant-specific genes.
Comparisons of the HHV-6A and HHV-6B genomes confirmed that while the two genomes are colinear, there are regions of significant variation, including DR, a region spanning the junction of DRL-U and the extreme left end of U, and a 24-kb segment located to the right of U86 (except for U94). The region spanning ORFs U2 through U85, which encompasses approximately 75% of U, is more highly conserved. Of the 89 ORFs in this region, 66 have over 92% amino acid identity. All of the genes belonging to the herpesvirus conserved gene blocks have greater than 94% amino acid identity.
U47 and U54 have less than 90% amino acid identity with their HHV-6A counterparts and Ka/Ks ratios of greater than 0.8, possibly reflecting gene products whose functions do not require specific sequences or that the gene products have different functions in each variant. It is interesting that these genes, which are the most divergent in the segment spanned by the herpesvirus core genes, map at the junctions of blocks of conserved genes. Extensive amino acid sequence variation of genes at these locations has been previously found in comparisons of herpesvirus genomes. This variation is easily rationalized for comparisons between herpesvirus genomes that have been rearranged through these sites, e.g., varicella-zoster virus and Epstein-Barr virus (10), but the basis for the susceptibility of these genes to variation is not obvious in the case of viruses with colinear genomes, e.g., HCMV, HHV-6A, HHV-6B, and HHV-7, for which these sites have not been the location of genomic rearrangements during evolution of the lineage.
The segment from U86 to the right end of U is the most divergent between the variants. This region is likely to be important in defining the biological differences between the variants. Interestingly, this region contains genes that have anomalous sequence compositions due to CpG suppression and complex splicing patterns. As an example, the variants differ in temporal regulation and splicing patterns of U91 transcripts in T-cell lines (39). It will be important to determine whether R3 is involved in transcriptional regulation of the IE-A locus and to ascertain the effect of sequence differences and copy number on R3-mediated transcriptional regulation.
Another source of phenotypic difference between the variants could be the gp82-gp105 complex, which is a major envelope glycoprotein that is composed of a number related polypeptides (44, 45). This glycoprotein is encoded by U100, which encompasses 11 exons (45) and is located in one of the most divergent regions of the genome. Homologs of the gene have been found only in roseoloviruses, with the intron-exon structure being conserved between the HHV-6 variants and HHV-7 (19, 37, 41). Differential splicing accounts for the presence of multiple related protein species, at least in the case of HHV-6A and as postulated for HHV-6B (45). The gp82-105 complex is likely to be important to the biology of HHV-6 since there are variant-specific neutralizing epitopes (44). HHV-6A and HHV-6B U100 share 79.9% amino acid identity, which is much lower than for other glycoproteins, such as glycoproteins B, H, L, and M (Table (Table3).3). Because glycoproteins are important determinants of specificity in the initial physical interaction between virus and the host cell, this complex of related proteins may confer different biological properties on the variants.
In addition to the effects of the more dramatically divergent regions, it is likely that the smaller genetic differences elsewhere in the genome reflect subtle adaptations to specific biological niches and cumulatively are likely to have an important effect on the biology of the variants.
The genetic differences found between HHV-6A and HHV-6B are substantially greater than those found between HHV-7 strains RK and JI, even in the conserved region spanning ORFs U2 to U85. Across their lengths, the two HHV-7 genomes differed by a total of only 179 nt, an average of 1 per kb (37). In contrast, the sequence identity between HHV-6A and HHV-6B is 85% in DR, a mean of 95% in the region spanning U2 and U85, and 72% in the region spanning U86 and U100, for an overall identity of 90% (Fig. (Fig.1B).1B). Additionally, sequence variation between HHV-6B strains Z29 and HST in the 30-kb region spanning the IE-A locus is substantially less than that between the variants. The accumulated biological, genetic, and epidemiologic data thus converge and make it clear that while HHV-6A and HHV-6B are closely related viruses, they have independent biological niches and meet the criteria for classification into distinct species.
We thank Jodi B. Black, Gary J. Lindquester, and Robert D. Allen for their contributions to this work. We also thank William C. Reeves for his support.
G.D. was a Visiting Fellow at the Centers for Disease Control and Prevention. S.D. was supported by NIH grants KO4 AI01240 and R21 AI34231.