This study, containing the largest number of dUTPase amino acid sequences to date, allows us to examine the evolution of the dut
gene in detail. In addition to the organisms listed in Table , dut
has also been detected in rhesus monkeys, dogs, cows, rabbits, and chickens (46
). dUTPase is a beneficial component of the viral replication machinery but is nonessential in host dividing tissues for herpesviruses (35
), poxviruses (19
), and nonprimate lentiviruses (36
). All currently known functional dUTPases have the motifs described here (Fig. and ), with the exception of unrelated dCTPase-dUTPase bifunctional enzymes in phage (76
) and an unrelated dUTP-specific protein in Leishmania
). Given the widespread distribution of obligatory dut
genes in host organisms, the beneficial nature of the gene for viruses in some host tissues, and the similarity of many viral dUTPases to host dUTPases (Fig. ), it is likely that several viruses have acquired dut
genes from their hosts.
Our phylogeny (Fig. ) clearly reveals this pattern. The poxviruses included in this study (Table ) encode a dUTPase, whereas molluscum contagiosum virus (MCV) does not. Based on other proteins, recent studies observed similar relationships among the poxviruses studied here and that the MCV sequences are basal to the group (58
). All six poxvirus dUTPases studied here cluster with those of eukaryotes by using either the phylogenetic method and OSM, MIR, or the combined data set. Five poxvirus dUTPases (VACW, VARI, CPOX, VACL, and SPOX) group together with a bootstrap value of 67% (100% excluding SPOX). This implies a common eukaryotic origin for these five poxvirus dUTPases. Thus, it appears that a host dut
gene was acquired subsequent to the divergence of MCV but prior to that of swinepox, vaccinia, variola, and cowpox viruses. In addition, ORF virus dUTPase (ORFV) clusters with mammalian dUTPases (100%) and is more similar to mammalian dUTPases than to dUTPases from other poxviruses (Table ). This pattern is due mostly to similarity in the OSM rather than the MIR (data not shown), which may imply extreme divergence from dUTPases of other poxviruses combined with convergence to the host OSM. Alternatively, a separate transfer may have resulted in the observed ORF virus dut
gene. The dUTPase sequence from a baculovirus infecting gypsy moths and its similarity to dUTPases of poxviruses has been recently published (30
). Our preliminary analysis of this sequence shows that it has close similarity with eukaryotic dUTPases in general, indicating that it is another possible example of horizontal transfer (data not shown).
It has been suggested that avian adenovirus dut
(AVAD) is related to an analogous unidentified reading frame in human adenoviruses (75
). The avian adenovirus sequence groups with animal dUTPases with a bootstrap of 60%, with multicellular eukaryotic dUTPases at 66%, and with eukaryotic dUTPases in general at 99%. This result strongly implies a eukaryotic dut
transfer as the origin of this gene. If this gene is in fact related to the unidentified reading frame in human adenoviruses, it must have diverged considerably in that lineage after acquisition.
chlorella virus dUTPase (PBCV) also clearly clusters with eukaryotic dUTPases with a 99% bootstrap, supporting monophyly. All other single-copy dUTPases from eukaryotic DNA viruses (IHER, SHER, ASFV, and ONPH) are scattered about the topology, very distant from the eukaryotic dUTPases (Fig. ). If they also originally acquired dut
genes from eukaryotes, sufficient divergence has occurred to obscure this fact. Bacteriophage SPβ encodes a dUTPase (BPSP) that clusters with host B. subtilis
dUTPase (BSUB) with a bootstrap of 100%. Archaeal virus dUTPase (SIRV) clusters with archaeal dUTPases (MJAN and DAMB) with a high bootstrap (100%). Although this phylogeny is consistent with that reported by Prangisvili et al. (54
), no assertions of horizontal transfer have been made for this viral gene until now.
Additional potential cases for horizontal transfer.
One siphovirus (bacteriophage) dUTPase sequence clusters with those of eubacteria no matter which of the two phylogenetic methods or data sets (OSM, MIR, or combined) is used. Bacteriophage T5 (BPT5) was isolated from E. coli
, a member of the gamma subclass of the class proteobacterium
), and its sequence clusters basal to those of gamma and beta proteobacteria (ECOL, HINF, PAER, CBUR, and NGON) with a bootstrap value of 69% (Fig. ). Bacteriophage r1t was isolated from firmicute host Lactococcus lactis
), and its dUTPase sequence (BPRT) clusters with that of firmicute B. subtilis
(BSUB). The BPRT sequence is 97% identical to a recently deposited L. lactis
dUTPase sequence. Preliminary analysis indicates a bootstrap of more than 96% for monophyly of the L. lactis
and BPRT dUTPase sequences (data not shown). The dUTPase of bacteriophage phi PVL isolated from S. aureus
, was also recently deposited in GenBank. Preliminary analysis of this sequence shows similarity to dUTPases of eubacteria (data not shown).
The chlamydial (CTRA) and spirochete (TPAL) dUTPase sequences cluster together with a bootstrap of 96%. Chlamydiales
comprise eubacterial groups that are entirely separate from firmicutes and proteobacteria. Although not supported with a 50% bootstrap, epsilon and alpha proteobacterial dUTPases (HPYL and BJAP) cluster with those of Chlamydiales
, within dUTPases of firmicutes. This is inconsistent with the organismal phylogeny, as epsilon and alpha proteobacteria are more closely related to other (beta and gamma) proteobacteria (13
). It is possible that dUTPase sequences are insufficient to resolve these relationships, given the few representative sequences from these taxa currently available. Alternatively, members of these groups may have exchanged dut
genes. Several other genes of Helicobacter pylori
) and Chlamydia trachomatis
) are suspected to have been acquired horizontally from other eubacteria, as well as eukaryotes. More dUTPase sequences from members of underrepresented taxa will be necessary to resolve this issue.
Two retrovirus lineages contain a complete dut
gene (MMTV-like and nonprimate lentiviruses). The gene is located in a different genomic region in each lineage (Fig. ) (41
), yet the protein sequence is sufficiently similar between these lineages to imply a possible common origin relative to other dUTPase protein sequences available. The shortest distance tree supports monophyly of the dUTPase sequences in retroviruses, although this topology is not significantly shorter than the search tree in which the nonprimate lentivirus dUTPases are polyphyletic with respect to those of MMTV relatives. There are close relatives to each lineage that do not encode a dUTPase (Fig. ). While our best tree is consistent with horizontal transfer between retroviral lineages, the lack of statistical significance between the two trees prevents a definitive claim for horizontal transfer in this case.
Alpha- and gammaherpesviruses encode a recognizable dUTPase, while the related betaherpesviruses encode an analogous reading frame containing no recognizable motifs. dUTPase activity has not yet been assayed in betaherpesviruses. If the gene is not present in betaherpesviruses, then perhaps alpha- and gammaherpesviruses are more closely related to each other and inherited a duplicated dut
gene from a common ancestor. This evolutionary scenario would be consistent with a reported herpesvirus polymerase phylogeny (24
). Channel catfish virus and salmonid herpesvirus also encode a dUTPase, but it is a single copy and quite different from those reported in vertebrates. Fish herpesviruses are known to be the most distant members of the herpesviruses (10
). These viruses may have acquired the gene independently or early in the evolution of herpesviruses, before the duplication observed in alpha- and gammaherpesvirus lineages. Either scenario is consistent with reported herpesvirus phylogenies (24
The fact that the location of dut
is variable in the closely related genomes of viruses and eubacteria (Fig. ) demonstrates that it has moved. Despite its variable position, dut
was observed adjacent to other genes involved in nucleotide metabolism such as ribonucleotide reductase, transcription initiation factors, primase, and DNA synthesis flavoprotein (Fig. ). In eubacteria the proximity of dut
to other genes needed for similar functions might be beneficial. There is evidence that genes coding for related biochemical functions in eubacteria frequently occur adjacent to one another (62
). In retroviruses the location of dut
affects its level of expression. The gag
portion of the gag/pol
polycistron is translated approximately 20 times more than the entire polycistron (7
). Thus, in MMTV relatives one would expect a 20-fold-greater level of dut
expression than in nonprimate lentiviruses.
Mechanisms of horizontal transfer.
Retroviruses may acquire sequences relatively easily from their hosts or from each other because of viral recombination or incorporation of host mRNA into the retroviral genome (23
). A mature dut
RNA message could theoretically be copackaged in a retrovirus and then incorporated into its genome. This might explain why none of the retroviral dut
sequences have introns even though their vertebrate counterparts do; a pattern also observed in c- and v-oncogenes, for example c-myc
). dUTPase is encoded in a different region in two different retrovirus lineages (MMTV relatives and nonprimate lentiviruses), despite its absence in close relatives of these lineages. It is most reasonable to assume a horizontal transfer between the two lineages or independent acquisition of the gene, whereas a loss in all other relatives is highly unlikely. If convergence or parallelism were operating because both dUTPases function in a retroviral background, one might expect the generally conserved OSM to reflect this. It is the MIRs that support monophyly of the MMTV and nonprimate lentivirus dUTPases rather than the OSM (data not shown). In addition, a spuma-related retrovirus encodes a dUTPase in a third unique location (8
). We have another study in progress to determine the relationship of this third type of retroviral dUTPase to the others. Due to the rapid rate of evolution in RNA genomes and consequent high sequence divergence, the source of these genes (host or retroviral) may be undeterminable. It has recently been proposed that the outer domain of gp120 in primate lentivirus human immunodeficiency virus (HIV) also originated as a host dUTPase sequence (1
). While it is possible that gp120 evolved from a dUTPase-like sequence, the extreme lack of conserved dUTPase residues between and within the dUTPase OSM in gp120 makes the identity of the original protein impossible to confirm.
Lack of introns in DNA virus dUTPases implies a retroviral intermediate.
Some hosts have introns in their dut
genes, although viral dut
genes do not. For example, the human dut
gene spans about 14 kb and contains approximately five introns (30a
). Mouse (30a
) and rhesus monkey (46
genes are also reported to have introns. In our study, dut
in C. elegans
was found to have two introns in each of its three copies (data not shown). One of the introns in C. elegans dut
corresponds exactly to the position of an intron in H. sapiens dut
in motif IV. To explain the presence of intronless dut
genes in DNA viruses, we hypothesize that these viruses acquired a cDNA of an RNA message analogous to the mechanism that resulted in v-oncogenes. The agent responsible for mediating this event must encode a reverse transcriptase. Transfer may have occurred in different cellular compartments, since poxviruses replicate in the cytoplasm, whereas adenoviruses replicate in the nucleus. In contrast, the process by which a eukaryotic nucleus might acquire a retroviral or DNA virus dut
gene would not necessarily require an intermediary retroid agent. Introns would be inserted subsequent to the transfer. Since the viral, not the eukaryotic, topology is discordant, and given the presence of dut
introns in eukaryotes, it is most likely that the viruses acquired dut
from their eukaryotic hosts rather than vice versa. Convergence or parallelism of viral copies with host genes is unlikely because MIRs alone support the monophyly of viral dUTPases with their hosts (data not shown) nearly as well as the entire sequence (Fig. ). If convergence or parallelism were operating, one would expect the OSM to support monophyly and the MIRs to contribute noise or a different topology. In addition, functional but unrelated dUTPases in T4 phage and Leishmania
spp. are a clear example of functional convergence in the absence of sequence homology (4
Implications for folding and subunit assembly.
While the location of structural and functional residues has been determined by X-ray crystallography in single-copy dUTPases (9
), structures have not been reported for any dUTPases with other motif arrangements. On the basis of the conserved structural and functional residues in the alignment (Fig. ), it is possible to hypothesize the type of folding and assembly that might occur in duplicated and triplicated dUTPases (Fig. ). Alpha- and gammaherpesviruses have a duplication such that the first copy conserves motif III and the second copy conserves I, II, IV, and V (Fig. and ). The available evidence indicates that herpesvirus dUTPases function as monomers (5
). Some residues involved in secondary structure (50
) are conserved as well, implying that much of the structure surrounding motifs I, II, IV, and V is similar (data not shown). The structure of the amino portion of the alpha- and gammaherpesvirus dUTPase remains ambiguous, although motif III is likely present in an active site as is found in homotrimers comprised of single copies. The large 21- to 38-residue insertion between motifs IV and V in carboxy copies of herpesvirus dUTPases may aid this by allowing the flexible tail (containing motif V) as found in E. coli
) to double back to complete the active site in the monomer (Fig. ).
FIG. 5 Assembly of dUTPase subunits. The four known dUTPase crystals (FIV, EIAV, E. coli, and H. sapiens) indicate a homotrimer folding to form three separate active sites in which each of the five conserved motifs is present. (A) H. sapiens (single). It was (more ...)
Duplicated genes typically undergo one of several well-documented fates. Generally, one gene will retain its function while the other diverges into a pseudogene or evolves a new function. Alternatively, both copies may retain similar functions, as in the globin family. Fused tandem copies functioning in concert, each with a complete OSM, have been observed in aspartic acid proteases. The fate of alpha- and gammaherpesvirus dut genes is unusual in that it is the only example to our knowledge of fused tandem copies each potentially contributing a subset of motifs to the catalytic site.
The dUTPase of C. elegans has a novel arrangement in that it is tandemly triplicated such that a single peptide could theoretically fold into a shape similar to that of a single-copy homotrimer (Fig. ). The way in which the peptide may fold between motif V of one copy and motif I of the next is ambiguous. There are two introns of the same size and in the same position within each copy of the C. elegans dut gene, one of which interrupts the coding regions. The DNA sequences of the three copies are fairly similar (60% identical in a three-way comparison of all three coding regions), although the introns are more divergent (data not shown). This is consistent with the 100% bootstrap for the cluster containing the amino acid sequences of each copy (Fig. ). The conservation of gene organization and sequence in these copies may indicate that the triplication is fairly recent.
The patterns of dut
horizontal transfer observed between archaea, eubacteria, and eukaryotes and their viruses have important implications for clinical and genomic research. In terms of clinical implications, other DNA and RNA viruses may acquire the dut
gene and thereby expand their pathology to include nondividing tissues. The dUTPase protein has been proposed as a target for antiviral drugs and cancer chemotherapy (47
). It is thought that primate lentiviruses such as HIV, which do not encode dUTPase, may activate and use dut
genes in human endogenous retroviruses (HERVs) (48
). Uncontrolled pools of dUTP are considered toxic to rapidly dividing tissue because they lead to incorporation of dUTP in DNA, excessive DNA repair, and cell death. Already a common target for chemotherapy, thymidylate synthase is an enzyme downstream of dUTPase in the biochemical pathway for conversion of dUTP to TTP. Targeting the dUTPase protein may likewise have potential for controlling the rapid growth of tissues (47
). Understanding the nature and relationships of native and HERV dUTPases in humans will be important for identifying specific areas of the protein to target, and such studies are in progress.
In this study we have established a compelling case for at least five horizontal acquisitions of viral dut genes from eukaryotic, eubacterial, and archaeal hosts. These include dUTPases from: poxviruses, avian adenovirus, Paramecium bursaria chlorella virus, bacteriophage SPβ, and archaeal virus SIRV. A sixth potential case is that ORF virus acquired its dUTPase separately from those of other poxviruses. We have identified potential transfer events among several eubacteria, two between eubacteria and their viruses, and one among retroviruses. We have demonstrated that dut is present in variable genomic locations in eubacteria and RNA and DNA viruses. Finally, potential structures are hypothesized by mapping primary structures of various uncrystallized motif arrangements onto known tertiary structures of known dUTPase crystals. dut provides an example of a gene undergoing ubiquitous horizontal transfer relative to viral core genes, with different arrangements of the same conserved motifs functioning in different genetic background. This study illustrates that both sequence similarity and genomic location need to be considered when reconstructing the evolutionary history of individual genes and the genomes in which they are found. Future comparative genomic studies will reveal how many more genes move as often and how many other dUTPase motif arrangements may exist.