|Home | About | Journals | Submit | Contact Us | Français|
Primate cytomegalovirus (CMV) genomes contain tandemly repeated gene clusters putatively encoding divergent CXC chemokine ligand-like proteins (vCXCLs) and G protein-coupled receptor-like proteins (vGPCRs). In human, chimpanzee and rhesus CMVs, respectively, the vCXCL cluster contains two, three and six genes, and the vGPCR cluster contains two, two and five genes. We report that (i) green monkey CMV strains fall into two groups, containing either eight and five genes or seven and six genes in the respective clusters, and (ii) owl monkey CMV has two and zero genes. Phylogenetic analysis suggested that the vCXCL cluster evolved from a CXCL chemokine gene (probably GRO-α) that was captured in an incompletely spliced form by an ancestor of Old and New World primate CMVs, and that the vGPCR cluster evolved from a GPCR gene captured by an Old World primate CMV. Both clusters appear to have evolved via complex duplication and deletion events.
Several cytomegaloviruses (CMVs) infect Old World primate hosts, including green monkey CMV (GMCMV; alternatively named simian or vervet CMV; species Cercopithecine herpesvirus 5), rhesus macaque CMV (RhCMV; species Macacine herpesvirus 3), baboon CMV (BaCMV; not yet assigned species status), chimpanzee CMV (CCMV; species Panine herpesvirus 2) and human CMV (HCMV; species Human herpesvirus 5). Additional CMVs infect New World primate hosts, including owl monkey CMV (OMCMV; alternatively named aotine herpesvirus 1; not yet assigned species status). These viruses belong, actually or potentially, to the genus Cytomegalovirus, subfamily Betaherpesvirinae, family Herpesviridae, order Herpesvirales. Various strains have been isolated, and, in the description below, their names are given in parentheses immediately after the virus name abbreviations.
One of the features of Old World primate CMV genomes is the presence of two sets of tandemly repeated genes (in the sense of their identification as potential coding regions), one putatively encoding viral CXC chemokine ligand-like proteins (vCXCLs) and the other putatively encoding viral G protein-coupled receptor-like proteins (vGPCRs). Characterization of these gene clusters has not been straightforward.
In summary, currently available data demonstrate that wild-type, unrearranged primate CMV genomes have the following numbers of genes in the vCXCL and vGPCR clusters, respectively: HCMV, two and two; CCMV, three and two; RhCMV, six and five; and GMCMV, at least five in each. Both clusters are absent from tree shrew herpesvirus (TSCMV, a member of species Tupaiid herpesvirus 1 in subfamily Betaherpesvirinae whose host is believed by some to be a primitive prosimian) and non-primate CMVs.
Although the functions of most primate CMV vCXCLs and vGPCRs are unknown, most of them are dispensible in fibroblast cell cultures. Nonetheless, in our opinion, each paralog is likely to have its own role during infection in vivo, whether as a chemokine or as some other function. The limited information available points to a role in virus spread and dissemination in vivo for HCMV UL146 vCXCL, which binds in a species-specific fashion to human CXCLR2, has potent calcium mobilization and chemotactic properties that are similar to those of cellular GRO-α (also known as CXCL1), and acts in vitro to attract neutrophils (Penfold et al., 1999; Sparer et al., 2004). The CCMV vCXCL1 ortholog has similar properties (Miller-Kittrell et al., 2007). The HCMV US28 vGPCR is similar to cellular CXCR2 and CX3CR1, binding fractalkine and a broad range of other ligands (Kledal et al., 1998; Kuhn et al., 1995). It has been shown to be capable of stimulating chemotactic migration of smooth muscle cells, as well as of sequestering the soluble cellular ligand CCL5 (RANTES) from infected cell culture medium (Bodaghi et al., 1998; Streblow, 1999). All five RhCMV vGPCRs are expressed as surface antigens in DNA-transfected cells, but only the protein encoded by the distal member of the cluster has chemokine ligand-binding properties equivalent in range to those of the HCMV US28 vGPCR, including the ability to bind to cellular CX3CL1 (Penfold et al., 2003). However, cells expressing this RhCMV vGPCR, unlike those expressing the HCMV US28 vGPCR, do not mobilize intracellular calcium ions.
In this paper, we have derived the sequences of the vCXCL and vGPCR clusters in five GMCMV strains and an OMCMV strain. We have compared the organization of the two clusters with those in other known primate CMVs, and attempted to deduce some of the duplication and deletion events that might have occurred during their capture and subsequent divergent evolution into the different viral lineages.
The vCXCL and vGPCR clusters are positioned similarly in each of the Old World primate CMV genomes examined so far. In HCMV, they are located at approximately 0.75 and 0.95 map units, with the vCXCL cluster oriented right to left and the vGPCR cluster oriented left to right. Depicted with transcription proceeding from left to right, the vCXCL cluster is flanked by other genes in the arrangement UL144/UL145/vCXCL cluster/UL147A/UL148, and the vGPCR cluster in the arrangement US24/US26/vGPCR cluster/US29/US30 (Fig 1c). The complete sequences of the clusters in five GMCMV strains, namely GMCMV(Colburn), GMCMV(GR2757), GMCMV(2715), GMCMV(4915) and GMCMV(SA6), were derived by anchoring the data in flanking genes.
In initial experiments (data not shown), PCR and sequencing of GMCMV(Colburn) and GMCMV(GR2757) using primers based on the GMCMV(Stealth) data (Table 1) showed that both strains had an extra gene internally in the vCXCL cluster, and that GMCMV(2715) had one fewer gene internally in the vGPCR cluster. GMCMV(Colburn) failed to yield PCR products for the vGPCR cluster. Since the GMCMV(Stealth) sequence was anchored only in upstream flanking genes for each cluster, it was recognized that the data obtained were likely to be incomplete. The subsequently derived complete genome sequences of GMCMV(Colburn) and GMCMV(2715) confirmed this, revealing a total of eight genes in the vCXCL cluster for each strain, and five genes in the vGPCR cluster of GMCMV(2715). GMCMV(Colburn) had suffered a 6.8 kbp deletion extending from a position upstream from US26, retaining a single intact vGPCR gene at the distal end of the cluster. Like GMCMV(GR2757), GMCMV(2715) had one more gene in the vCXCL cluster than GMCMV(Stealth), and one less gene in the vGPCR cluster.
Complete sequences for both clusters from GMCMV(GR2757), GMCMV(4915) and GMCMV(SA6) were obtained using additional PCR primers based on the GMCMV(2715) data (Table 1), in some instances in combination with GMCMV(Stealth) primers. GMCMV(SA6) had suffered a 3.3 kbp deletion that removed most of the vCXCL cluster as well as UL147A and part of UL148, leaving the remnant of the distal vCXCL gene as an in-frame fusion with UL148.
On the basis of DNA sequence relatedness and similarities in gene organization, the five strains sequenced and the two for which data had been published fell into two groups: group I consisting of GMCMV(Colburn), GMCMV(GR2757), GMCMV(2715) and GMCMV(SA6), and group II consisting of GMCMV(Stealth), GMCMV(4915) and GMCMV(9610). Overlooking GMCMV(Colburn) and GMCMV(SA6), in which deletions had occurred, and GMCMV(Stealth) and GMCMV(9610), for which the published data are incomplete, group I strains are characterized by eight genes in the vCXCL cluster and five genes in the vGPCR cluster, and group II strains by seven genes in the vCXCL cluster and six genes in the vGPCR cluster.
Simplified depictions of the arrangements of genes in the vCXCL and vGPCR clusters of GMCMV and other primate CMVs are compared in Fig. 1a and Fig 1c, respectively. The nomenclature employed effectively follows the approach by which HCMV vCXC-1 and vCXC-2 were first named on the basis of sequence similarity, in the absence of functional information (Penfold et al., 1999). We emphasize that this is an attempt at a unitary scheme for the purposes of the discussion below, and that it carries no implications about functionality. On the basis of positional and sequence relationships (explained below), the numbering scheme for the GMCMV and RhCMV genes indicates proposed orthology, with the vCXCL genes named sequentially from right to left in the direction that they are transcribed. Similarly, perceived orthology is inherent within the CCMV and HCMV nomenclature. However, an orthologous relationship is not implied among or between the GMCMV/RhCMV, CCMV/HCMV or OMCMV groups, where the high level of divergence confounds its establishment in most cases. Fig. 1 emphasizes the fact that GMCMV strains in group I lack the vGPCR3A gene, and those in group II lack the vCXCL1B gene. As described below, BaCMV sequences are incomplete for both clusters, but the available data indicate that the gene number and arrangements are the same as in RhCMV.
In conducting the phylogenetic analyses described below, limitations in the published sequences were registered. Data for GMCMV(Stealth) do not extend beyond a point in vCXCL4, and they also do not extend beyond a point in vGPCR4. Data for GMCMV(9610) do not extend beyond vGPCR4, and are not available for the vCXCL cluster. A frameshift in RhCMV(180.92) vGPCR5 was corrected for the analysis, and two frameshifts in RhCMV(68-1) vGPCR3A (Hansen et al., 2003) were amended by reference to an apparently correct version (Penfold et al., 2003). Complete sequences were not available for vCXCL1, vCXCL5 and vCXCL6 in RhCMV(WT2), and for vCXCL6 in RhCMV(WT3). The sequences available for the BaCMV vCXCL cluster terminate within vCXCL6, and those for the vGPCR cluster extend from a point within vGPCR1 to a point within vGPCR5. The BaCMV vGPCR4 sequence contains an in-frame termination codon, which was corrected (TGA to AGA) for the analysis.
The precursors of cellular vCXCLs consist of 100–110 amino acid residues, with a cleavable signal peptide sequence just prior to the CXC motif. The structure of the mature protein depends upon four conserved C residues. The shortest GMCMV vCXCL, vCXCL3A, contains only 81–82 residues and lacks the third and fourth C residues. Initially, the vCXCL4 coding region appeared to be even smaller (62 residues) and to lack the fourth C residue in all five of the GM-CMV strains, but this appears to be the only example where the function as an active chemokine may have been compromized. However, utilization of conserved potential splice donor and acceptor sites introduced a second exon that is similar to the C-terminus of vCXCL5 and contains the additional C residue, resulting in a precursor of 92 residues. These splice sites are located identically in RhCMV vCXCL4 (Oxford et al., 2008) and BaCMV vCXCL4.
Among the four GMCMV strains for which sequences of the complete vCXCL cluster were derived (i.e. GMCMV(Colburn), GMCMV(2715) and GMCMV(GR2757) in group I, and GMCMV(4915) in group II), pairwise alignments of the orthologs showed a wide spectrum (36–97%) of sequence conservation. That between GMCMV(2715) and GMCMV(GR2757) ranged from 42% (vCXCL4) to 97% (the highest observed in any pairwise comparison; vCXCL1B), and that between GMCMV(2715) and GMCMV(4915) ranged from 36% (the lowest observed in any pairwise comparison; vCXCL4) to 54% (vCXCL1B/vCXCL1). The amino acid sequences of the GMCMV vCXCLs were aligned with those of other primate CMV vCXCLs, including the major variants of the HCMV vCXCLs and incorporating primate versions of GRO-α, which is the cellular CXCL most closely related to the vCXCLs. All versions of Old World primate vCXCL3/vCXCL3B and vCXCL5 proteins retain the ELRCXC motif characteristic of GRO-α (or its minor derivative, ELHCXC), but the others vary considerably. A subset consisting of most of the sequences from the complete alignment is shown in Fig. 2.
A Bayesian phylogenetic tree derived from the complete alignment is shown in Fig. 3. The versions of GMCMV vCXCL1A group into a clade (100% of trees), as do those of vCXCL1B (100%). These two clades group (100%), and are sister to RhCMV vCXCL1 (100%). The versions of RhCMV vCXCL2 group (100%), and those of GMCMV vCXCL2 group less convincingly (95%). The grouping of these two clades is poorly supported (57%), but they are sister to vCXCL1A/vCXCL1B/vCXCL1 (99%). The grouping of vCXCL3A, vCXCL3B, vCXCL3 and vCXCL4 is supported moderately (97%). The three examples of mammalian cellular GRO-α genes fall loosely as a clade in this section of the tree (see Fig. 2 for the degree of conservation). The versions of GMCMV vCXCL3A group (100%), as do those of RhCMV vCXCL3 (100%). However, grouping of GMCMV vCXCL3B versions is not supported. The RhCMV vCXCL4 versions group (100%), but clustering of the GMCMV vCXCL4 versions is less convincing (94%). The grouping of RhCMV vCXCL4 with GMCMV vCXCL4 is also not strongly supported (82%), although these genes share the same splicing pattern, which is not found in any other vCXCL gene. The vCXCL6 versions form individual groups for GMCMV and RhCMV (100% for each), but the evidence that these two clades are sisters is less convincing (74%). vCXCL6 is sister to vCXCL3A/vCXCL3B/vCXCL3/vCXCL4 in the majority of trees (96%). The GMCMV and RhCMV vCXCL5 versions form a clade (100%). The BaCMV vGPCRs generally group with those of GMCMV rather than RhCMV.
The gene complement in the vCXCL cluster is simpler in CCMV and HCMV, consisting of three and two genes, respectively (Fig. 1). The highly divergent versions of HCMV vCXCL1 group with the single known version of CCMV vCXCL1 (100%), with allele G4 showing the closest similarity (98%). Similarly, CCMV vCXCL2 groups with HCMV vCXCL2 (100%), and this clade is sister to GMCMV and RhCMV vCXCL6 (97%). This part of the tree represents the only (moderately) convincing relationship between the GMCMV/RhCMV and CCMV/HCMV groups. The origins of CCMV vCXCL3 are not discernible. The New World primate virus, OMCMV, has two genes in the vCXCL cluster, though their origins in relation to Old World primate viruses are also not discernible.
Cellular GPCRs consist of about 320 amino acid residues encompassing seven hydrophobic transmembrane domains (TMDs), with the cytoplasmic C-terminal tail involved in G-protein signaling and serine/threonine kinase interactions. Key functional features include a cytoplasmic [DE]RY motif. All genes within the Old World primate vGPCR clusters appear functionally intact and the paralogs within each cluster are much more diverged than any of the orthologs.
Among the four GMCMV strains for which sequences of the complete vGPCR cluster were derived (i.e. GMCMV(SA6), GMCMV(2715) and GMCMV (GR2757) in group I, and GMCMV(4915) in group II), pairwise alignments showed that vGPCR5 is least conserved (91% of residues). However, the majority of the variation in vGPCR5 is present in an S/T-rich N-terminal extension of about 140 residues (truncated in Fig. 4), which lacks counterparts in other vGPCRs, with 57–85% of residues conserved. Conservation in the GPCR domains ranges from 90% (vGPCR3B) to 99% (vGPCR2). The amino acid sequences of GMCMV vGPCRs were aligned with those of other primate CMV vGPCRs. OMCMV was not included because it lacks the vGPCR cluster. A subset consisting of some of the sequences from the complete alignment is shown in Fig. 4.
A Bayesian phylogenetic tree derived from the complete alignment is shown in Fig. 5. The tree includes Old World primate CMV data for two vGCPRs encoded by genes UL33 and UL78, which are not part of the cluster and, since orthologs are present in non-primate CMVs, probably represent genes captured earlier in the evolution of subfamily Betaherpesvirinae. All three versions of vGPCR3A are truncated so that, although they retain the seven TMDs, they lack most of the C-terminal intracellular domain. Two of these three also lack the [DE]RY motif, which is marked in Fig. 4 and in all of the vGPCR5 genes this motif is modified to DRCY.
The greater length and conservation of vGPCR sequences provides a more reliable view of relationships than was derived for the vCXCLs. In Fig. 5, each vGPCR groups into a fully supported (100%) clade containing all the orthologous GMCMV and RhCMV members. vGPCR3A is sister to vGPCR3B (100%), and together they are sister to vGPCR3 (100%). The vGPCR3A/vGPCR3B/vGPCR3, vGPCR4 and vGPCR2 clades group (100%). Again, the BaCMV vGPCRs generally group with those of GMCMV rather than RhCMV. CCMV and HCMV vGPCR1 group together (100%), as do CCMV and HCMV vGPCR2 (100%). All of the vGPCRs in the cluster form a clade separate from UL33 and UL78 (97%).
The prevailing view is that in general primate CMVs have evolved along with their hosts throughout primate evolution. Divergence dates inferred from host phylogeny are 42.9 millions of years ago (Mya) for Old World monkeys from New World monkeys, 30.5 Mya for apes from Old World monkeys, 9.9 Mya for macaques and baboons from green monkeys, 6.6 Mya for macaques from baboons and 6.6 Mya for humans from chimpanzees (Steiper and Young, 2006). This speciation order is largely congruent with CMV phylogeny based on a conserved gene (DNA polymerase), except that BaCMV is more closely related to GMCMV than RhCMV (Fig. 7). BaCMV is also generally more closely related to GMCMV than RhCMV in the vCXCL and vGPCR clusters. From a parsimonious standpoint, GMCMV seems to have proliferated extra genes in each cluster since its divergence from BaCMV, which has a gene layout in each cluster like that of RhCMV rather than GMCMV.
The evolutionary outcome of the interactions between virus and host is clearly discernible in the vCXCL and vGPRC clusters, both in terms of sequence diversity, even among orthologs (especially in the vCXCL cluster), and in terms of variation in gene number between virus species. Several factors are important in discerning the evolutionary steps that may have occurred. First, an accurate understanding of the gene arrangement must be derived for each virus. This includes accounting for errors or mutations in published sequences, recognizing the effects of large deletions or rearrangements that may occur during passage of virus in vitro, anchoring derived sequences in flanking genes to ensure that the whole cluster is defined, and identifying protein-coding regions optimally. Second, as reliable a sequence alignment as achievable must be used to derive a statistically supported phylogeny. Relationships can then be assessed with discrimination, and strengthened by other features shared between genes, such as relative position or splicing. Third, gene loss, as well as gene duplication, must be considered as an evolutionary step. These alternatives may sometimes be distinguished on the basis of residual similarities between DNA sequences. Respectively, GMCMV, BaCMV, RhCMV, CCMV, HCMV and OMCMV have seven or eight, six, six, three, two and two genes in the vCXCL cluster and five or six, five, five, two, two and zero genes in the vGPCR cluster. Since only a subset of the events that have occurred are likely to be represented in extant viruses, any view on the evolution of the clusters will be necessarily incomplete.
From a parsimonious standpoint, certain evolutionary steps can be proposed for the vCXCL cluster in GMCMV and RhCMV, on a scale of increasing phylogenetic support. In an ancestor of Old and New World primate CMVs (more than 42.9 Mya), there was a single vCXCL gene that in the lineage leading to GMCMV, BaCMV and RhCMV gave rise to four daughters, one becoming vCXCL1/vCXCL2, one vCXCL3/vCXCL4, one vCXCL5, and one vCXCL6. vCXCL3 was duplicated in the GMCMV lineage (possibly after divergence from the BaCMV lineage) to generate vCXCL3A and vCXCL3B, and then, apparently as the most recent duplication event based on levels of paralogous divergence, vCXCL1A and vCXCL1B were generated from vCXCL1 in GMCMV group I. However, events might not have been parsimonious. For example, the most recent event in the GMCMV vCXCL cluster might not have involved a duplication event to give rise to vCXCL1A and vCXCL1B in group I. Instead, this duplication might have occurred in a parent of both lineages, followed by deletion of vCXCL1B in group II. Examination of DNA sequence alignments (Fig. 6) showed that the insertion/deletion event involved the complete coding region of vCXCL1B plus 150 bp extending upstream to the termination codon of vCXCL1A. It did not affect the 78 bp sequence between vCXCL1B and vCXCL2, which is conserved in groups I and II and is unrelated to the region upstream from vCXCL1B. In group I, the initiation codons of vCXCL1A and vCXCL1B are located in a conserved sequence of 29 bp, and the termination codons in a different conserved sequence of 21 bp. These sequences are also conserved at the initiation and termination codons of vCXCL1A in group II. Thus, loss of vCXCL1B might have involved straightforward homologous recombination between the duplicated motifs at the 3’-ends of vCXCL1A and vCXCL1B.
In further support of the occurrence of gene deletion during evolution of the vCXCL cluster, the six GMCMV strains for which relevant sequence data are available contain an 83 bp sequence at the 5’-end of the vCXCL1 coding region that is essentially repeated in the region close upstream from vCXCL1A, in the non-coding region between this gene and UL145; that is, a vestigial vCXCL gene (Fig. 6). This repeated sequence contains a third copy of the conserved sequence at the initiation codons of vCXCL1A and vCXCL1B, and is not present in the RhCMV or BaCMV vCXCL clusters. Therefore, it is plausible that the lineage leading to GMCMV groups I and II, but not the other two lineages, once possessed, and later lost, an additional vCXCL gene upstream from vCXCL1A.
A final feature of the vCXCL data is the unexpectedly large divergence of GMCMV(2715) vCXCL4 from all other GMCMV vCXCL4 versions. Indeed, Fig. 2 and Fig 3 show that both GMCMV(2715) vCXCL3B and vCXCL4 are more similar to BaCMV vCXCL3 and vCXCL4, respectively, than to the other GMCMV versions of these genes. Furthermore, a linked 66 bp non-coding segment in an unusually large gap immediately upstream of the BaCMV vCXCL4 gene showed 85% nucleotide identity with GMCMV(2715), but minimal identity with the equivalent region from any other GMCMV (data not shown). This pattern could be the result of a homologous recombination exchange event between a progenitor of the GMCMV(2715) virus and a BaCMV-like virus. However, it is also possible that this additional complexity instead reflects a further level of expansion and differential deletion events involving vCXCL3 and vCXCL4 that occurred in an ancestor of one or both lineages.
A parsimonious approach also permits certain evolutionary steps to be proposed for the vGPCR cluster in GMCMV and RhCMV, with full phylogenetic support. In an ancestor of Old World primate CMVs (42.9-32.5 Mya), there was a single vGPCR gene that in the lineage leading to GMCMV and RhCMV gave rise to three daughters, one becoming vGPCR1, one vGPCR2/vGPCR3/vGPCR4, and one vGPCR5. vGPCR3 then duplicated in GMCMV (possibly after divergence from the BaCMV lineage) to give rise to vGPCR3A and vGPCR3B in group II. However, similar to the situation in the vCXCL cluster, evidence is available from DNA sequence alignments for a less parsimonious route, involving earlier generation of vGPCR3A and vGPCR3B in a progenitor of both GMCMV lineages and then loss of vGPCR3A to give rise to group I. In comparison with group II, group I sequences lack a region of 851 bp that extends from 35 bp upstream from the vGPCR3A initiation codon to 63 bp upstream from the termination codon (Fig. 6). In group I, the region between vGPCR2 and vGPCR3B is highly similar to coding and non-coding sections at the 3’-end of vGPCR3A in group II, effectively resulting in a vestigial vGPCR3A gene located between vGPCR2 and vGPCR3B. The RhCMV and BaCMV vGPCR clusters lack this feature, thus confirming that the events involving vGPCR3A are specific to the GMCMV lineage.
GMCMV strains in group I have eight genes in the vCXCL cluster and five in the vGPCR cluster, whereas those in group II have seven and six genes, respectively. Multiple strains in each group were detected, indicating that the differences in gene number originated from genuine evolutionary events, rather than from deletions taking place in cell culture, although the latter may account for the unique deletion mutations found in the GMCMV(SA6) vCXCL and GMCMV(Colburn) vGPCR clusters. The existence of two distinct GMCMV groups was also supported by data from gene UL144, which is located in the flanking sequence upstream from the vCXCL cluster: members of group I differ from members of group II at the same seven residues (data not shown). The precise origins of the groups are not understood. In a broad sense, green monkeys consist of several subspecies of Cercopithecus aethiops (or, in the alternative taxonomy, species of the genus Chlorocebus). The two virus groups may have evolved in the same host subspecies or in different subspecies, perhaps spreading to other subspecies in primate housing facilities.
It is more problematic to extend the findings from Old World monkey CMVs to include the evolution of CCMV, HCMV and OMCMV. Little can be said beyond that vCXCL2 in CCMV and HCMV may be orthologs of vCXCL6 in GMCMV and RhCMV. In the absence of convincing phylogenetic data, and given the potential involvement of both duplications and deletions, it is unrewarding to speculate in detail about additional evolutionary events in more ancient ancestral viruses. Nonetheless, the fact that the closest cellular relative of the vCXCL cluster is GRO-α suggests that this gene was acquired by capture (lateral transfer), and that its descendents in the GMCMV and RhCMV vCXCL3A/vCXCL3B/vCXCL3/vCXCL4 lineage have been selected to retain greatest similarity to the parental gene. Alternative explanations, that GRO-α was inserted into the vCXCL cluster more recently, and has founded only the vCXCL3A/vCXCL3B/vCXC3/vCXC4 lineage, or that the vCXCL cluster was derived from a captured gene other than GRO-α (for example, the interleukin-8 (IL-8) gene), with the vCXCL3A/vCXCL3B/vCXC3/vCXC4 lineage converging to resemble GRO-α, seem less plausible. Functional aspects of CMV vCXCLs remain largely unexplored, and it is not known how many are CXC chemokines. There is a considerable degree of variation among human CXC chemokines, and many lack the ELR element of the ELRCXC motif. Both types exhibit chemokine functions, those possessing the ELR element tending to promote the attraction of neutrophils, and those lacking it tending to promote the attraction of lymphocytes and monocytes (Laing and Secombes, 2004). Therefore, the observation that most vCXCLs lack the ELR element (Fig. 2) does not rule out chemokine or chemokine-related functions.
Gene capture has evidently been a prominent feature of herpesvirus evolution, and there are multiple examples in all herpesviruses of genes that have clear cellular counterparts. Most captured genes lack introns, but some are expressed by splicing patterns that, at least partially, resemble those of the cellular genes. In principle, gene capture may be achieved by recombination with cDNA copies (presumably retrovirus-mediated) of cellular transcripts in unspliced, partially spliced or fully spliced forms, or by recombination with cellular genomic DNA to acquire genes in unspliced form. In addition, special mechanisms have been proposed in certain instances, such as that of a cluster of nine adjacent, unspliced genes in Kaposi’s sarcoma-associated herpesvirus (a member of the Gammaherpesvirinae), which may have been captured by sequential, accidental initiation by cellular mRNAs of DNA strand growth at a bidirectional origin of DNA replication (Nicholas et al., 1997; Nicholas et al., 1998).
The observation that GMCMV and RhCMV vCXCL4 contain an equivalent of the second intron located within the I codon (V codon in BaCMV) of a largely conserved motif (EVIATLKNG) encoded by GRO-α (Fig. 2) and IL-8 (data not shown) is strong evidence that the cellular gene was captured in an unspliced or partially spliced form (with subsequent loss of the first intron), and that the second intron was present in the vCXCL that underwent the initial duplication event. This intron has been retained in lineages leading to GMCMV, BaCMV and RhCMV vCXCL4 (which, incidentally, along with vCXCL3A, vCXCL3B and vCXCL3 retain strong sequence similarity to GRO-α; Fig. 2), but has been lost from other lineages, perhaps on more than one occasion. The vCXCL family thus joins other examples of captured CMV genes that have retained cellular splicing patterns. The cyclooxygenase-2 (vCOX-2) gene is present as a partially spliced gene in RhCMV (Hansen et al., 2003; Rue et al., 2004), as well as in GMCMV with one fewer intron (A. Dolan, unpublished data), but is absent from CCMV and HCMV. The interleukin-10 (vIL-10) gene is present as a partially spliced gene in RhCMV and GMCMV and with one fewer intron in HCMV, but has evidently been lost from CCMV (Davison et al., 2003; Lockridge et al., 2000). These examples lend further credence to the idea that more captured genes than are apparent in extant CMVs may have been acquired in unspliced or partially spliced forms, via RNAs or from the cellular genome, with subsequent loss of introns, rather than via fully spliced RNAs.
Finally, it is of interest to note that nine human CXCL genes (including GRO-α and IL-8) form a tandemly repeated cluster of largely similarly spliced genes on chromosome 4. The processes of duplication and deletion that appear to have been involved in evolution of the primate CMV vCXCL cluster may also have occurred in the generation of the host CXCL cluster. Furthermore, similar processes may have operated throughout CMV genome evolution, as evidenced by the many sets of multicopy gene paralogs, clustered or otherwise, that occur in primate and non-primate CMV genomes.
Low passage strains GMCMV(2715) and GMCMV(4915), and the original vervet strain GMCMV(SA6) isolated in 1957, were obtained in 1982 from Richard Heberling at the Southwest Foundation for Research and Education in San Antonio, Texas. High passage strain GMCMV(Colburn), which was isolated ostensibly from a human, was obtained in 1976 from Milan Fiala at UCLA (Huang et al., 1978; LaFemina and Hayward, 1980). GMCMV(GR2757), for which the passage history is unknown, was obtained in 1974 from Eng-Sheng Huang, University of North Carolina at Chapel Hill (Huang and Pagano, 1974; Jeang and Hayward, 1983). OMCMV(S34E) was isolated from an owl monkey (Daniel et al., 1971), and was obtained in 2004 from the American Type Culture Collection (ATCC) as VR-606.
The GMCMV viruses were passaged at low MOI in diploid human fibroblast cells under the standard conditions used to avoid accumulation of defective virus genomes (LaFemina et al., 1989). OMCMV was passaged similarly on the owl monkey kidney cell line OMK(637-69), which was obtained from the ATCC as CRL-1556. DNA was isolated from virus particles obtained by standard procedures from infected cell cultures exhibiting maximal CPE. Briefly, GMCMV virions were purified from culture medium by sucrose density gradient ultracentrifugation, pelleted, incubated with DNase I, and lysed in SDS/Tris-HCl/EDTA. OMCMV virions were isolated in the cytoplasmic fraction of infected cells, incubated with DNase I, pelleted and lysed in SDS/Tris-HCl/EDTA. DNA was purified by phenol extraction, ethanol precipitation and dialysis.
The primers listed in Table 1 were used in appropriate pairwise combinations to amplify and sequence overlapping fragments encompassing the vCXCL and vGPCR clusters. Strategically, the design of PCR primers for anchoring the upstream ends of the vCXCL and vGPCR clusters of the five GMCMV strains was based on published GMCMV(Stealth) sequence data. These primers were then supplemented or replaced by additional primers designed for each strain on the basis of primer-walking. The design of PCR primers for anchoring the downstream ends of both clusters in GMCMV(GR2757), GMCMV(4915) and GMCMV(SA6) was based on the complete genome sequences of GMCMV(2715) and GMCMV(Colburn). PCR products were sequenced on both strands using an ABI310 instrument.
The sequences of the vCXCL and vGPCR clusters in GMCMV(2715), GMCMV(Colburn) and OMCMV were extracted from complete genome sequences, which were derived in our laboratories by standard plasmid shotgun methods. There were no differences between the GMCMV(2715) or GMCMV(Colburn) genome sequences and comparable, incomplete sequences obtained for these strains by PCR.
The accession numbers for new DNA sequences containing the vCXCL cluster were: GMCMV(GR2757), FJ883003; GMCMV(4915), FJ883005; and GMCMV(SA6), FJ883007. Those for the vGPCR cluster were GMCMV(GR2757), FJ883004; GMCMV(4915), FJ883006; and GMCMV(SA6), FJ883008.The accession numbers of previously published DNA sequences containing all or part of the vCXCL gene cluster were: GMCMV(Stealth), AF145588; RhCMV(22659), EU130540; RhCMV(CNPRC), EF990255; RhCMV(WT2), EF990256; and RhCMV(WT3), EU003822. Those for the vGPCR gene cluster were: GMCMV(Stealth), U27802; and GMCMV(9610), AY340790, AY340791, AY340792, AY340793 and AY340794. The accession numbers of genome sequences from which the vCXCL and vGPCR clusters and DNA polymerase genes were extracted were: GMCMV(Colburn), FJ483969; GMCMV(2715), FJ483968; BaCMV (OCOM4-37), AC090446 (working draft version 27); RhCMV(68-1), AY186194; RhCMV(180.92), DQ120516; CCMV(Heberling), AF480884; HCMV(Merlin), AY446894; and OMCMV(S34E), FJ483970. Additional sequences for vCXCL genes in other HCMV strains were: additional alleles of HCMV UL146 (accession numbers in Dolan et al., 2004); and additional alleles of HCMV UL147 identified by Blast searches of GenBank. The accession numbers of the sequences of rhesus macaque, chimpanzee and human CXCL1 were NP_001028050, XP_001156038 and NP_001502, respectively. The locations of introns in the human GRO-α and IL-8 genes were derived from the annotation for human chromosome 4 (NT_006216).
ClustalW (Thompson et al., 1994) was used to produce alignments of nucleic acid or predicted amino acid sequences, and the latter were adjusted manually. Bayesian inference of phylogeny was carried out on aligned sequences using MrBayes 3 (Ronquist and Huelsenbeck, 2003). For each analysis, the program was run for 2 million generations, during which 20001 trees were sampled, with the initial 15001 discarded as burn-in. The Jones-Taylor-Thornton amino acid substitution matrix was used with a gamma-plus-invariant site categories model and variable rates molecular clock model. Phylogenetic analysis of primate CMV DNA polymerases was performed using Mega4 (ClustalW alignment followed by neighbour-joining; Tamura et al., 2007). Phylogenetic trees were drawn using Mega4. TMDs were predicted from predicted primary amino acid sequences using the Phobius web server (Käll et al., 2007).
We are grateful to Derrick Dargan for providing OMCMV DNA, Duncan McGeoch for commenting on the manuscript, and Sarah Heaggans for assistance with PCR sequence editing and assembly. The studies at Johns Hopkins University were supported by National Institutes of Health Research Grant R01 A124576 to GSH from the National Institute of Allergy and Infectious Disease. The studies at the MRC Virology Unit were supported by the UK Medical Research Council. We thank Susan Neubauer and Earl Blewett (Oklahoma State University) for making unpublished DNA sequence data for BaCMV available.
Publisher's Disclaimer: This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final citable form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.