What directs the location of transposable elements’ integration? A simple random insertion strategy is fraught with peril, as it may lead to both insertional mutations for the host and “dead” copies for the element. Natural selection will thus seek out members of a family of transposable elements that have hit upon strategies to maximize their own transmission frequencies (9
). One particular strategy may be for the transposable element to possess an insertion site preference in proximity to genes whose high levels of expression are guaranteed in the host genome but which do not disrupt expression of the genes themselves.
LTR (or long terminal repeat-bearing) retrotransposons can be divided into four major classes, the retroviruses, the Ty3/Gypsy group, the Ty1/Copia group, and the caulimoviruses. Each class contains a distinctive structural organization of open reading frames (ORFs) and enzymatic modules (29
). Members of the Ty3/Gypsy group are of interest, because in sharp contrast to their neighbors, the retroviruses, they sometimes possess a remarkable site specificity of insertion. The Saccharomyces cerevisiae
Ty3 element, for example, inserts within 5 bp of the start site of RNA polymerase III (Pol III)-transcribed genes (5
). This turns out to be an excellent choice for insertion preference. Pol III-transcribed genes possess internal promoters; thus, the insertions are expected to have a minimal effect on gene expression levels (13
). This remarkable preference is proposed to be brought about by the tethering of the Ty3 integration machinery to the Pol III transcription apparatus through protein-protein contacts with the transcription factors TFIII-B and -C (14
). While many other members of the Ty3/Gypsy group have been sequenced, much less is known about their specificities (but see reference 7
While the site specificity of Ty3 (and related retrotransposons) has attracted considerable biochemical and genetic effort, the evolutionary scheme by which this specificity was acquired or lost is still unknown (6
). Previous attempts to determine the phylogeny of the Ty3/Gypsy group have been restricted by the lack of resolution afforded by the analyses of individual enzymatic domains (25
). While increased resolution might be obtained by combining the information from all well-conserved domains, this could be done only if the elements have not swapped individual enzymatic domains. We investigated the possibility of domain swapping by comparing the phylogenies of members of the Ty3/Gypsy group based on either the reverse transcriptase domain, the RNase H domain, or the core region of the integrase domain. Where sufficient resolution was afforded in the individual analyses (at least 50% bootstrap support in a neighbor-joining analysis [22
]), no cases of disagreements among the three phylogenies were observed, suggesting the absence of any domain swapping (data not shown). We therefore combined all three data sets to obtain maximum resolution. Neighbor-joining analysis was carried out with PAUP* (version 4d64). This phylogeny is shown in Fig. and has been rooted with the divergent Ty1/Copia retrotransposons as an outgroup.
FIG. 1 Phylogenetic tree of the Ty3/Gypsy group and related classes of LTR retrotransposons. The tree is based on an alignment of the sum of the amino acids in the reverse transcriptase, RNase H, and integrase domains (approximately 700 amino acid positions) (more ...)
Some surprising groupings that were not evident in previous analyses (25
) emerged from this analysis of the Ty3/Gypsy group of elements. For example, the three lineages of Ty3/Gypsy retrotransposons that have been proposed to bear envelope domains, the well-characterized Gypsy group from Drosophila melanogaster
), the plant errantiviruses Athila and Cyclops (28
), and Osvaldo from Drosophila buzzatii
), appear to be well separated. In addition, there is a clear grouping of Ty3 with Skipper, Tf2, and a number of additional plant and fungal retrotransposons.
The integrase domain of the Ty3/Gypsy group and retroviruses has been typically classified into three distinct subdomains (12
). The N-terminal subdomain contains an HH-CC motif implicated in binding to LTR sequences but not in binding to target site DNA (12
). The central core subdomain contains the catalytic D,D35
-E motif. Both these domains were used in the above-described phylogenetic analysis. The C-terminal subdomain of the integrase has been implicated in nonspecific binding to the target site in some cases (10
); however, a recent study has also demonstrated the role of this subdomain in binding to the LTR of Ty3 elements (18
). The C-terminal subdomain is the least well conserved of the three subdomains, showing great variation in sequence and length, and is consequently not useful for a universal phylogenetic analysis.
We were intrigued by the possibility that this domain contains the factors necessary for the integration specificities of these elements. Thus, we turned our attention to the C-terminal subdomain. We were not surprised to find a high degree of conservation within this subdomain in the Ty3 lineage (Fig. ), such subdomains being related by ancestry. More surprising was the finding that this subdomain is also well conserved among several additional members of the Ty3/Gypsy class of retrotransposons as well as certain vertebrate retroviruses (Fig. ). The most conserved region of this module can be loosely identified as G-(D/E)-X10–20
(mostly hydrophobic residues)-K-L-X2
-(R/K)-(F/Y/W)-X-G-P-(F/Y)-X-(I/V), where the letters are in the single-letter amino acid codes and X refers to any amino acid. This module is hitherto referred to as GPY/F to highlight the best-conserved residues and for brevity. The GPY/F module is not universal in the Ty3/Gypsy group. Members of the Mag and Gypsy lineages and some in the Osvaldo lineage (Fig. ) do not contain the GPY/F module. This situation is mirrored in the retroviral clade, where only a subset of retroviruses (indicated in Fig. ) contains the GPY/F module. It is worth mentioning that all the retroviruses indicated in Fig. that contain the GPY/F module group together phylogenetically (data not shown, but see reference 29
). This differential retention of the GPY/F module is illustrated in the schematic shown in Fig. . It can be noted that two of the three clades that have lost the GPY/F module have acquired another ORF, which has been shown to be env
FIG. 2 Alignment of the GPY/F module. Coding regions from different elements starting from 40 residues downstream of the core integrase domain are aligned to highlight the conserved residues in a number of Ty3/Gypsy retrotransposons (upper group) and certain (more ...)
FIG. 3 Schematic evolution of the integrase domain in the Ty3/Gypsy group. The integrase domains of representatives from the Ty3/Gypsy groups identified in Fig. are presented to the scale shown. The HH-CC and D,D35-E motifs of the N-terminal and (more ...)
It is striking that the integrase domain does not end with the GPY/F module in members of the Ty3 lineage (Fig. ). In some of these elements, this portion of the ORF has been noted to have homology to a chromodomain (1
). The chromatin organization modifier domain (chromodomain) is a domain comprising about 50 amino acids that was originally identified as a protein sequence motif common to the Drosophila
chromatin proteins Polycomb (Pc) and heterochromatin protein 1 (HP1) (20
). Subsequently, these domains were identified in a variety of proteins that play a role in chromatin modification (1
), including factors that activate and repress transcription. Functional studies have shown that chromodomains are responsible for targeting the chromatin sites of action (17
). Recent nuclear magnetic resonance studies have also suggested that the chromodomain may mediate interactions between different proteins as a stand-alone protein module (3
). We identified a chromodomain in all members of the Ty3 lineage (Fig. ), except for the Ty3 element itself, using a combination of BLASTP and TBLASTN searches (2
). In addition to the members of the Ty3 lineage represented in the phylogenetic diagram, we could identify chromodomains in four additional members that could not be included in Fig. as their sequences were incomplete. An alignment of these chromodomain motifs with CLUSTAL W (27
) is presented in Fig. . Much like other authentic-chromodomain-containing proteins, some retrotransposons match the universal chromodomain consensus better than others. However, all members of this lineage, except the Ty3 element itself, bear a large proportion of the critical (conserved) amino acid residues. When present in the Ty3 lineage, chromodomains are found at the C-terminal end of the integrase domain, never as a stand-alone ORF, as previously noted (15
). It is interesting that while Ty3 does not bear a chromodomain, it does bear a module of approximately the same size. This suggests a model for Ty3 in which an ancestral chromodomain module may have been replaced or become specialized. This specialization in Ty3 has been proposed to reflect the selection imposed by the haploid-diploid life cycle of S. cerevisiae
, where any gene disruption is expected to have the most severe consequence (4
FIG. 4 Alignment of putative chromodomains from various retrotransposons. The first block includes representatives of authentic chromodomains from a variety of chromatin-modifying proteins: D. melanogaster Polycomb and Su(Var) 3-9, Homo sapiens HP1, Mus musculus (more ...)
We can thus propose a scheme for the segmental evolution of the integrase domain in the Ty3/Gypsy group, summarized in Fig. . The common ancestor of the Ty3/Gypsy group contained only a GPY/F domain. This was supplemented with a chromodomain (or analog) in the Ty3 group, by an envelope domain in the plant errantiviruses, and by a still unknown domain in Cer1. This domain has been eliminated (replaced) on potentially three independent occasions with the evolution of the Gypsy, Mag, and Osvaldo lineages. Considerable biochemical attention has been paid to the integration specificities in the Ty3/Gypsy group (7
). In a recent study, a chimeric Ty3/Moloney murine leukemia virus integrase was created, with the C-terminal end of the virus being replaced with that of Ty3 (8
). While this chimeric integrase was functional, it lacked the specificity of Ty3 for tRNA genes in human cell lines. Unfortunately, a caveat in interpreting this result is that if Ty3 has indeed specialized its interaction module, this specificity may not be observed outside the yeast genome. We are confident that future efforts based on our analysis of these domains will help streamline efforts to elucidate the evolution of different integration strategies in the Ty3/Gypsy group and perhaps the vertebrate retroviruses as well. The presence of chromodomains in other members of the Ty3 lineage may implicate this module in directing targeting to sites in the host genome via protein-protein interactions (with authentic-chromodomain-containing proteins serving as an analogy). However, this specificity may be loosely defined relative to that of the Ty3 element itself. Thus, it is likely that other members of the Ty3 lineage are specific for locations in or about expressed genes, for example, and only examination of numerous copies will elucidate the region-specific targeting that is brought about by the chromodomain. It is possible that the newly defined GPY/F domain may also be involved in some (as yet unknown) specificity that allows directed integration into safe havens and, thus, long-term maintenance of the elements in the host genome. The presence of env
domains in retroviruses of both vertebrates and insects may obviate the need for such a domain; although some strong site preference has been found in members of the Gypsy lineage (see reference 7
for instance), the biochemical basis for which remains unknown.
Acquisition of target specificity based on a chromodomain is a novel means by which a transposable element can direct integration efforts in a host genome by relying on protein-protein interactions, making it independent of the actual DNA sequence that is the target. The alternative would be to encode a site-specific endonuclease, as is the case with group II introns (23
) and some non-LTR retrotransposable elements (30
). It is perhaps not surprising that this domain inclusion usually occurs in the Ty3/Gypsy group at the end of the integrase ORF, where it probably has a minimal impact on the core set of enzymatic activities of the element. Similar strategies exist not only in vertebrate retroviruses but also in the Ty1/Copia group of retrotransposons. In the latter group, targeting of Ty5 to silenced regions of the S. cerevisiae
chromosome was shown to be abolished by a single amino acid change in a domain downstream of the core integrase domain (11
Nucleotide sequence accession numbers.
The alignments used in Fig. have been deposited in the EMBL online database (9a
) under accession no. DS36733 (reverse transcriptase), DS36732 (RNase H), and DS36734 (integrase).