|Home | About | Journals | Submit | Contact Us | Français|
Typical metazoan core promoter elements, such as TATA boxes and Inr motifs, have yet to be identified in early-evolving eukaryotes, underscoring the extensive divergence of these organisms. Towards the identification of core promoters in protists, we have studied transcription of protein-encoding genes in one of the earliest-diverging lineages of Eukaryota, that represented by the parasitic protist Trichomonas vaginalis. A highly conserved element, comprised of a motif similar to a metazoan initiator (Inr) element, surrounds the start site of transcription in all examined T. vaginalis genes. In contrast, a metazoan-like TATA element appears to be absent in trichomonad promoters. We demonstrate that the conserved motif found in T. vaginalis protein-encoding genes is an Inr promoter element. This trichomonad Inr is essential for transcription, responsible for accurate start site selection, and interchangeable between genes, demonstrating its role as a core promoter element. The sequence requirements of the trichomonad Inr are similar to metazoan Inrs and can be replaced by a mammalian Inr. These studies show that the Inr is a ubiquitous, core promoter element for protein-encoding genes in an early-evolving eukaryote. Functional and structural similarities between this protist Inr and the metazoan Inr strongly indicate that the Inr promoter element evolved early in eukaryotic evolution.
Typical metazoan promoters use a TATA box, at −25 to −30 from the transcription start site, for accurate start site selection by RNA polymerase II (47). Some metazoan promoters, however, lack this TATA box and instead use a functionally analogous initiator element (Inr), with the consensus PyPyA+1NT/APyPy, to direct transcription initiation (3, 15, 27, 39, 40). The Inr is the only element in metazoan protein-encoding genes known to be a functional analog of the TATA box, in that it is sufficient for directing accurate transcription initiation in genes that lack TATA boxes (39). Genes of the Archaea, the ancestor of the eukaryotic cell (46), use TATA boxes to direct initiation but appear to lack metazoan-like Inr elements (43). Recent studies on the transcription of Archaea genes have shown that the minimal proteins required for accurate transcription initiation (TBP, TFIIB, and RNA polymerase) of eukaryotic and archaeal promoters are similar, indicating that they have a common ancestral transcription machinery (34).
Despite considerable effort, the quest to identify sequence-specific core promoter elements in early-diverging eukaryotic lineages has been largely unsuccessful. The lack of typical eukaryotic promoters is likely due to the immense divergence that has occurred since these organisms branched from the main line of eukaryotic evolution about a billion years ago. The most highly studied early-evolving eukaryotes are the parasitic protists, which are known for the evolution of unusual molecular mechanisms, such as RNA editing (37) and transcription of protein-encoding genes by RNA polymerase I (35), as found in kinetoplastids. Regulation of gene transcription in kinetoplastids has evolved in concert with ubiquitous trans-splicing, resulting in the apparent lack of sequence-specific promoters. Instead, genes are transcribed in large polycistronic units and formation of discrete mRNAs is regulated via trans-splicing (for reviews, see references 10 and 21). Although other protists have not evolved this elaborate mechanism for regulation of gene expression, the properties governing gene expression in this diverse group of eukaryotes are largely unknown.
An interest in the evolution and regulation of gene transcription in early-diverging eukaryotes led us to examine the promoter structure of protein-encoding genes of the parasitic protist Trichomonas vaginalis. This organism is a common human pathogen belonging to the phylum Parabasalia, one of the earliest-diverging eukaryotic lineages (8, 14). Trichomonads are characterized by a number of primitive cellular features, such as the absence of two classic eukaryotic organelles, peroxisomes and mitochondria (42). We have found that, without exception, T. vaginalis genes contain a highly conserved element that surrounds the start site of transcription (32). This element bears a remarkable resemblance to the metazoan initiator (Inr) (3, 15, 27, 39, 40). Using transfection assays to assess the effect of mutations in this conserved element on transcriptional activity, we demonstrate that it is essential for transcription, directs the selection of transcription start sites, and is exchangeable between trichomonad genes. Structural and functional analyses of the element indicate that it is a homologue of the metazoan Inr element. These studies identify the first cognate core promoter element found in both early-diverging and metazoan eukaryotes and indicate an early evolution of the Inr in Eukaryota.
T. vaginalis C1 (ATCC 3001) was grown in Diamond’s medium, supplemented with 10% (vol/vol) horse serum and iron as described previously (32).
The αSCS-CAT (α-succinyl coenzyme A [CoA] synthetase-chloramphenicol acetyltransferase) construct has been described previously (11). αSCS-CAT Inr mutants were constructed by the unique restriction site elimination method of site-directed mutagenesis (Chameleon kit; Stratagene). Mutagenesis reactions to generate the Inr1 to Inr3, Inr5 to Inr13, Inr17, Inr20, Inr21, Inr33, Inr34, cHSP70 Inr, Fd Inr, and TdT Inr constructs contained the appropriate mutagenesis primer (Table (Table1),1), the pBS XhoI selection primer (Table (Table1),1), which changes the pBluescript XhoI site to a HpaI site, and the αSCS-CAT plasmid as a template. Mutagenesis reactions to generate the Inr22 and Inr24 to Inr32 constructs contained the appropriate mutagenesis primer (Table (Table1),1), the pBS HpaI selection primer (Table (Table1),1), which changes the HpaI site back to a XhoI site, and the Fd Inr plasmid as a template.
To generate the αTUB-Neo (α-tubulin–neomycin phosphotransferase) construct, primers αTub 1 and αTub 2 (Table (Table1),1), which correspond to the 5′ and 3′ ends of the α-tubulin (αTUB) gene and contain KpnI and BamHI restriction sites, respectively, were used in an inverse PCR with a genomic clone of the αTUB gene to generate the 1-kb 5′ and 1.3-kb 3′ αTUB untranslated regions (UTRs) and the pBluescript vector. The coding region of the neomycin (Neo) gene was generated by PCR with primers Neo 1 and Neo 2 (Table (Table1),1), which also contain KpnI (5′ end) and BamHI (3′ end) sites, and the pKm2 plasmid (4). The PCR products were then gel purified, digested with KpnI and BamHI, and ligated to each other to form αTUB-Neo. Constructs for the selectable-transfection experiments were generated by digesting the αTUB-Neo and αSCS-CAT Inr constructs with SacI and EcoRV and ligating them together to form the αSCS-CAT/αTUB-Neo constructs with the chloramphenicol acetyltransferase (CAT) and Neo genes in opposite transcriptional orientations.
Plasmid DNAs were purified by column chromatography (Qiagen). Constructs and Inr mutations were verified by sequencing with the Sequenase kit (Amersham).
Transient and selectable transfections of T. vaginalis C1 (ATCC 30001) were performed as described previously (11), except that cells were electroporated in Cytomix (120 mM KCl, 0.15 mM CaCl2, 10 mM K2HPO4-KH2PO4 [pH 7.6], 25 mM HEPES [pH 7.6], 2 mM EGTA, 5 mM MgCl2, 0.375% glycerol) (44). DNA constructs are maintained as episomes in both transient and selectable transfectants (11).
Cell extracts were prepared from transfected cells as described previously (11) and assayed for CAT activity by the phase extraction method of Seed and Sheen (36). Each 100-μl CAT assay mixture contained 50 μl of cell extract, 250 mM Tris-HCl (pH 8.0), 0.2 μCi of [14C]chloramphenicol (Amersham; 50 to 60 mCi/mmol) and 300 μM butyryl-CoA (Sigma). Reaction mixtures were incubated at 37°C for 2 to 4 h and then extracted with 200 μl of a 2:1 tetramethyl-p-phenylenediamine–xylene mixture. One hundred fifty microliters of the organic phase was removed and added to 5 ml of scintillation fluid, and the activity was determined in a liquid scintillation counter. Protein concentrations of the cell extracts were determined by the Bradford assay (Bio-Rad). CAT activities were normalized based on the protein concentration of the extracts and presented as the percentage of that of the wild-type control.
Total RNA from selectable transformants was obtained by a LiCl-urea method (32), and poly(A)+ RNA was isolated with an oligo(dT) column (Pharmacia). Each primer extension reaction mixture contained 2.5 μg of poly(A)+ RNA, except for Inr16, which had 15 μg of poly(A)+ RNA. Primer extension reactions using poly(A)+ RNA and the CAT-PE primer (Table (Table1)1) were performed as described before (32), except that hybridization was carried out at 42°C for 4 h and the reverse transcriptase reactions were performed at 42°C.
Preparation of T. vaginalis nuclear extracts was based on the method of Dignam et al. (12). T. vaginalis cells were harvested by centrifugation at 1,500 × g for 10 min at 4°C. Cell pellets were washed twice with phosphate-buffered saline and resuspended in 5 packed cell volumes of buffer A (20 mM HEPES-KOH [pH 7.6], 10 mM KCl, 1.5 mM MgCl2, 0.1 mM EDTA, 1 mM dithiothreitol (DTT), 10 μg of leupeptin per ml, 50 μg of TLCK (Nα-p-tosyl-l-lysine chloromethyl ketone) per ml, 1 mM phenylmethylsulfonyl fluoride, 0.25% Nonidet P-40). The cell suspension was transferred to a glass Dounce homogenizer (Wheaton), and the cells were lysed with 10 to 20 strokes of a type A pestle. The cell lysate was then spun at 3,000 × g for 10 min at 4°C. The nuclear pellet was resuspended in 4 volumes of buffer C (20 mM HEPES-KOH [pH 7.6], 420 mM KCl, 1.5 mM MgCl2, 0.1 mM EDTA, 1 mM DTT, 10 μg of leupeptin per ml, 50 μg of TLCK per ml, 1 mM phenylmethylsulfonyl fluoride, 20% glycerol) and incubated at 4°C while being stirred for 30 min. The extract was spun at 100,000 × g for 1 h at 4°C, and the supernatant was dialyzed against 1 liter of buffer D (20 mM HEPES-KOH [pH 7.6], 100 mM KCl, 1.5 mM MgCl2, 0.1 mM EDTA, 1 mM DTT, 20% glycerol) overnight at 4°C.
A probe corresponding to −15 to +15 of the wild-type αSCS Inr was prepared by annealing two 37-mer oligonucleotides (GS wt A and GS wt B) (Table (Table1)1) and end labeling with [γ-32P]ATP and T4 polynucleotide kinase. Cold, mutant primers (GS Inr3 A and B, GS Inr12 A and B, Inr13 A and B, and Inr20 A and B) (Table (Table1)1) also corresponding to nucleotides −15 to +15 were annealed and used in competition assays. The labeled probe was purified on an 8% polyacrylamide gel. Each binding reaction mixture contained 5,000 cpm of labeled probe, 20 μg of nuclear extract, 500 ng of poly(dI-dC), 10 mM HEPES-KOH (pH 7.6), 40 mM KCl, 1 mM MgCl2, 1 mM DTT, 0.015% Nonidet P-40, and 10% glycerol. Reaction mixtures were incubated for 30 min on ice and run on a prerun 8% 0.5× Tris-borate-EDTA polyacrylamide gel for 100 min at 15 mA.
Very little is known about transcription control in early-evolving eukaryotes. Analyses of eukaryotic promoters and their interacting proteins have been confined, in large part, to the crown group of eukaryotes composed of animals, plants, and fungi. As illustrated previously (16), these organisms are quite similar in evolutionary terms and, in fact, represent only a fraction of the diversity exhibited by eukaryotic cells. By comparison, single-celled eukaryotes, such as those represented by trichomonads, kinetoplastids, and entamoebids, comprise a larger, more diverse group, and yet little is known regarding the mechanisms used to regulate gene transcription in these organisms. To address this problem, we have analyzed the 5′ UTRs of protein-encoding genes from one of the earliest-evolving eukaryotes studied to date, T. vaginalis, to identify promoter elements and to gain insight into the evolution of regulated gene transcription in eukaryotes.
An alignment of the 5′ UTRs of all available T. vaginalis protein-encoding genes reveals the presence of a conserved motif positioned 6 to 20 nucleotides upstream from the translation initiation ATG codon (Fig. (Fig.1)1) (32). This motif, with the consensus sequence TCA+1T/CT/A, surrounds the start site of transcription of all genes for which the 5′ end of the mRNA has been determined (Fig. (Fig.1A).1A). It also resembles the metazoan initiator (Inr) element that has been shown to be responsible for directing the start of transcription of metazoan genes that lack TATA boxes (39). The ubiquitous conservation of this element at the transcription start site and its similarity to metazoan Inr elements led us to question whether this apparent Inr element acts as a core promoter element. To address this, we have designed constructs that contain the α-succinyl CoA synthetase (αSCS) gene promoter with either wild-type or mutant versions of the element cloned directly upstream of the CAT gene. These constructs were then used to transiently transfect T. vaginalis cells, and the resulting CAT activities were measured. Using this approach, we show that deletion of the apparent Inr element completely abolishes detectable CAT activity (Fig. (Fig.2A,2A, Inr5). Given the pyrimidine-rich nature of the sequences surrounding the start site of transcription of T. vaginalis genes (Fig. (Fig.1),1), we tested whether this element could be replaced with a random, pyrimidine-rich sequence and found that this severely reduces CAT activity, to only 6% ± 5% (mean ± standard deviation) (Fig. (Fig.2A,2A, Inr13). These data demonstrate that this element is required for T. vaginalis promoter activity and that it cannot be replaced by random pyrimidine-rich sequences. Although we cannot rule out the presence of other, as-yet-unidentified T. vaginalis core promoter elements, these data also show that they cannot function without the conserved TCA+1CT/A element.
To determine whether this trichomonad Inr-like element is a core promoter element, we have tested whether these conserved motifs are interchangeable between T. vaginalis genes. The conserved sequences surrounding the start sites of the genes encoding cytosolic heat shock protein 70 (cHSP70; TCATTTTTTAATA) and ferredoxin (Fd; TCACTTCTCTTTA) (see Fig. Fig.1)1) were substituted for the TCA+1CTTCA+1CATTA motif in the αSCS promoter. Swapping these sequences had virtually no effect on CAT activity, since essentially wild-type levels of CAT activity resulted (Fig. (Fig.2A).2A). Moreover, replacement of the nucleotide sequence of the apparent trichomonad αSCS Inr element with that of a metazoan Inr, in particular the murine terminal deoxynucleotidyltransferase (TdT) gene Inr (CCCTCATTCTGGAG) (40), only slightly reduced activity, to 89% ± 17% of that of the wild type. These data show that the T. vaginalis element is not a gene-specific element and establish its identity as a core promoter element. Furthermore, they show that this element in T. vaginalis can be replaced in vivo by a metazoan Inr, strongly suggesting that the two elements perform homologous functions. Henceforth, we will refer to the trichomonad element as an Inr. Additional analyses, described below, confirm that this element is both structurally and functionally an Inr.
With rare exception, the start site of transcription of T. vaginalis protein-encoding genes has been mapped to adenosine residues within the Inr element (Fig. (Fig.1).1). Given the importance of this residue in the selection of transcription start sites in metazoan genes that require an Inr for activity, we tested whether conserved adenosines in T. vaginalis Inrs play a critical role in transcription. Our previous analyses have shown that the αSCS Inr has two TCA+1CT/A motifs in tandem, resulting in strong transcription start sites at both adenosines (32). To investigate the requirement for an adenosine residue at the +1 position, we have mutated the A’s at +1 in this element to either a G, C, or T (Fig. (Fig.2B).2B). When only one of the two start sites is mutated individually to a G, C, or T, there is a modest reduction in CAT activity, to 30 to 84% of that of the wild type (Fig. (Fig.2B,2B, Inr1, Inr2, and Inr6 to Inr9). This is presumably due to the nonmutated start site being used (see below for further discussion). However, when both the 5′ and 3′ start sites are mutated to a G or C, CAT activity is greatly reduced, to 5% ± 2% and 7% ± 2%, respectively (Fig. (Fig.2B,2B, Inr3 and Inr10). Unexpectedly, when both start sites are mutated to T, CAT activity is only reduced to 19% ± 9% (Fig. (Fig.2B,2B, Inr11). This is due to the creation of a new TCA+1NT/A motif (discussed in detail below). The effect of mutating the +1 position was confirmed by using another Inr element, the Fd Inr, which has only one transcription start site (Fig. (Fig.1)1) (32). Mutation of the +1 A residue to either a G, C, or T in the Fd Inr nearly eliminates CAT activity, to 1 to 7% of that of the wild type (Fig. (Fig.2C,2C, Inr25 to Inr27). These results demonstrate the requirement of an adenosine residue at the +1 position of the T. vaginalis Inr element, since replacement of this nucleotide with any other nucleotide in two different Inrs severely reduces transcription.
Functional analysis of the mammalian Inr has revealed a loose consensus of YYA+1NT/AYY for Inr activity, with the most critical residues for determining the strength of an Inr being the A at +1, the T or A at +3, and the pyrimidine at −1 (19, 22). The consensus sequence of the T. vaginalis Inr (TCA+1YT/A) looks very similar to the metazoan Inr consensus sequence, differing primarily in that the T. vaginalis Inr consensus sequence is more highly conserved (Fig. (Fig.1).1). To determine whether these conserved sequences play an essential functional role and whether the critical residues in the trichomonad and metazoan Inrs are similar, we carried out a detailed mutational analysis of the conserved residues of both the αSCS and Fd Inrs. First, we mutated the C at −1 to a G in the αSCS Inr and to a T in the Fd Inr. These mutations reduced CAT activity to 14% ± 3% and 38% ± 18%, respectively, of that of the wild type (Fig. (Fig.3A,3A, Inr12; Fig. Fig.3B,3B, Inr24), indicating that a C is strongly preferred at this position while another pyrimidine is somewhat tolerated, similar to that observed for the mammalian Inr (19, 22). Next, we mutated the T at −2 to a G or A in the αSCS Inr construct and a C in the Fd Inr construct. These results show that a pyrimidine at this position allows for full CAT activity, that an A is tolerated (CAT activity = 46% ± 20%), but that a G is very detrimental, reducing CAT activity to 11% ± 7% (Fig. (Fig.3A,3A, Inr20 and Inr21; Fig. Fig.3B,3B, Inr22). Mutation of the C at the +2 position in the Fd Inr to a G or an A had only a modest effect on CAT activity (activity reduced to 60% ± 19% or 73% ± 20%, respectively) (Fig. (Fig.3B,3B, Inr28 and Inr29). Finally, when we mutated the +3 T to either an A or a C, the CAT activity was reduced to 75% ± 18% or 55% ± 10%, respectively, whereas a G mutation at this position severely reduced activity to only 15% ± 3% (Fig. (Fig.3A,3A, Inr17; Fig. Fig.3B,3B, Inr30 and Inr31). These results show a preference for either a T or A at the +3 position. Taken together, our mutational analyses indicate that the sequence requirements for Inr activity in T. vaginalis are an A at +1, a C at −1, a Y at −2, and a T or an A at +3. These requirements are qualitatively the same as those for mammalian Inr activity (19, 22) but differ quantitatively since mutations at these four nucleotides are more deleterious for trichomonad Inr activity.
Alignment of the sequences surrounding the mapped start sites of T. vaginalis genes (Fig. (Fig.1)1) suggests that the Inr may extend 3′ of the core Inr motif, TCA+1YT/A, whereas the metazoan Inr has been shown to be comprised of only a 5-nucleotide core sequence. To determine the core sequences required for trichomonad Inrs, we have tested whether the nucleotides immediately 3′ of the TCA+1YT/A consensus motif are important for T. vaginalis Inr activity. To do this, +6 to +11 of the Fd Inr was mutated by replacing this pyrimidine-rich sequence with a GC-rich sequence (Fig. (Fig.3C,3C, Inr32). This mutation had no effect on CAT activity. This result and the fact that the TdT Inr, which has virtually no sequence similarity in this 3′ region to trichomonad Inrs, can replace the T. vaginalis Inr indicate that the core trichomonad Inr motif is composed of 5 nucleotides.
Examination of an alignment of the upstream regions of all available T. vaginalis genes fails to reveal a conserved TATA-like element at −30, suggesting that T. vaginalis promoters may be analogous to TATA-less metazoan promoters that rely entirely on an Inr element to select the start site of transcription. Although TATA-like elements are not typically found at −30 in T. vaginalis genes, the αSCS promoter does, in fact, have an AT-rich sequence, TAAAAT, at this position (Fig. (Fig.1).1). To examine the function of this sequence, we replaced it with a standard TATAAA sequence, to determine whether this change would augment transcription. To the contrary, a standard TATAAA box at this position results in a slight reduction in CAT activity, to 73% ± 23% (Fig. (Fig.3C,3C, Inr33). To test whether the AT-rich motif at −30 can be replaced by a GC-rich element, we inserted a GC-rich sequence, TGCGCT, at this site. Although this reduces CAT activity to 43% ± 9% (Fig. (Fig.3C,3C, Inr34), indicating that an AT-rich sequence at −30 is favored for full promoter activity, these data demonstrate that such an element is not required. These data indicate that a classic TATA box is not needed at −30 for efficient transcription of T. vaginalis genes.
Our structural analyses of the trichomonad Inr show that mutation of specific nucleotides within this motif may severely inhibit or abolish transcriptional activity. To determine whether these mutations have a direct effect on transcription start site selection, as would be expected if this element performs the role of a metazoan Inr, we have analyzed selectable transfectants expressing different Inr mutations. T. vaginalis cells were transfected with αSCS-CAT/αTUB-Neo constructs that contain various Inr mutations as well as the neomycin phosphotransferase selectable marker under the control of a promoter with a wild-type Inr. Transfectants were selected, poly(A)+ RNA was prepared, and transcription start sites were determined by primer extension analysis. The CAT construct containing the wild-type αSCS Inr had two strong primer extension products at each TCA+1YT/A motif (Fig. (Fig.4),4), initiating at the adenosine start sites previously mapped for the endogeneous αSCS mRNA. When the 5′ A at +1 in this construct was mutated to a T (Inr8), an extension product was seen only at the 3′ start site. Conversely, when the 3′ start site was mutated to a T (Inr9), the principal start site used was the 5′ A; however, a minor extension product mapped to a T residue of a new TCATT sequence created by this mutation. Mutating both start site A’s in the αSCS Inr to T’s abolished the two wild-type TCATT/A motifs but created a new Inr with the sequence TCATT (discussed above) (Fig. (Fig.2B).2B). Analysis of this mutant reveals an extension product only at the A of the newly created TCATT sequence (Fig. (Fig.4,4, Inr11). These results show that a TCA+1YT/A motif selects the transcription start site at A+1 and indicate that the position of the Inr relative to other sequence elements may be important in selecting the start site nucleotide.
We also have examined the effect of mutating both T’s at −2 in the αSCS Inr (Fig. (Fig.4,4, Inr21). In this case, accurate transcription initiation is lost, since three extra start sites are found while only one of the wild-type sites is used. Next, we mutated the 5′ T at +3 to a G in the αSCS Inr (Fig. (Fig.4,4, Inr16). This results in an extension product only at the 3′ TCA+1YT/A motif, indicating that the T/A at +3 is required for start site selection. Mutation of the C at −1 to a T in the Fd Inr (Fig. (Fig.4,4, Inr24) did not change the transcription start site, confirming that a pyrimidine can function at this position to accurately select the start site. It is noteworthy that nearly all alternative start sites observed upon mutation of the Inr map to adenosine residues, demonstrating a strong preference to initiate transcription at an A.
Analyses of the metazoan Inr indicate the presence of nuclear factors which may be involved in recognition of the Inr (31, 39). As a step toward determining whether T. vaginalis nuclear proteins directly interact with the Inr, we have identified an Inr-specific binding activity in T. vaginalis nuclear extracts. By using an electrophoretic mobility shift assay with a 37-bp double-stranded DNA probe containing the αSCS Inr, a binding activity that is specific for the αSCS Inr is detected (Fig. (Fig.5).5). Probes containing either a pyrimidine-rich, non-Inr sequence (Inr13) or Inr mutations at positions +1, −1, or −2 (Inr3, Inr12, and Inr20) do not allow binding (data not shown). In addition, the binding activity obtained with the wild-type αSCS Inr can be competed away with the wild-type αSCS Inr but not with a non-Inr sequence (Inr13) or with Inrs that have mutations at +1 (Inr3), −1 (Inr12), or −2 (Inr20) (Fig. (Fig.5,5, lanes 3 to 9). These data illustrate the presence of a protein(s) that can specifically recognize Inr sequences and that is able to distinguish between wild-type and mutant Inrs. Since Inr mutations that affect activity disrupt this binding activity, the protein or proteins responsible are likely to be involved in recognition of the T. vaginalis Inr by the transcriptional machinery.
Examination of the 5′ UTRs of all available protein-encoding genes from one of the earliest-diverging eukaryote studied to date, T. vaginalis (8, 14), has revealed a highly conserved TCA+1YT/A motif surrounding the start site of transcription (32). The functional analysis of this motif, presented here, shows that this element is essential for transcription, is interchangeable between trichomonad genes, and can be replaced by a mammalian initiator (Inr) element. We demonstrate that this motif is a promoter element which is both structurally and functionally similar to metazoan Inrs (3, 15, 27, 39, 40). Specific, conserved nucleotides comprising the core of this promoter element are shown to be necessary for accurate selection of the start of transcription of trichomonad genes, demonstrating its function as a bona fide Inr. These studies show that the Inr acts as a ubiquitous core promoter element in trichomonads.
It is remarkable that this early-diverging eukaryote appears not to use TATA elements to direct transcription initiation but instead invariably uses an Inr with strong similarity to metazoan Inrs. The structural and functional similarities between trichomonad and metazoan Inrs are particularly striking, since this is the first cognate promoter shown to be used by both protist and metazoan genes. There are, however, interesting differences between this protist Inr and its metazoan homologue. As revealed by our transcription analyses of mutant trichomonad Inrs, this Inr appears to have stricter sequence requirements than those observed for metazoan Inrs (19, 22). This is also reflected in a stronger consensus sequence for trichomonad Inrs. It is noteworthy that the Inr is found in all T. vaginalis genes, indicating that it is essential for transcription of all protein-encoding genes. This differs from the situation with metazoan Inrs, where only a small subset of genes have been shown to rely on the Inr for transcriptional activity (39). The trichomonad Inr also differs from its metazoan counterpart in its close proximity to the ATG translation initiation codon (see Fig. Fig.1).1). The Inr is invariably located within 20 nucleotides of the ATG and may be as close as 6 nucleotides, resulting in unusually short 5′ UTRs for trichomonad mRNAs. The selection for a strict spatial conservation between the Inr and the ATG initiation codon is likely due to requirements for efficient translation of the mRNAs; however, since nothing is known about the translational machinery of trichomonads, this remains speculative.
Studies showing that transcription of protein-encoding genes in T. vaginalis is insensitive to the fungal toxin α-amanitin, an inhibitor of RNA polymerase II, have raised the question whether these genes are transcribed by this polymerase (33). The fact that the homologous Inr in metazoa is an RNA polymerase II promoter (39) indicates that RNA polymerase II transcribes these genes in T. vaginalis as well. RNA polymerase II promoters in other protists appear not to use either metazoan-like Inrs or TATA boxes, with the possible exception of the apicomplexan genus Toxoplasma (5, 9, 21, 30, 38, 41). The recent observation that a sequence similar to that of the Inr described here is required for the transcription of a Toxoplasma gene (30) raises the possibility that the Inr plays a more general role in the transcription of protist genes. However, it should be noted that this study did not test whether this sequence element actually functions as an Inr. Since so little is currently known about core promoter elements in protists, future studies on genes from this diverse group of eukaryotes will be necessary to determine whether the Inr, or elements which are functionally homologous to the Inr, is a common, essential feature of protists promoters.
The presence of a highly conserved ubiquitous Inr element that is indispensable for transcription initiation of T. vaginalis protein-encoding genes strongly indicates that the Inr evolved early during eukaryotic evolution. Although this promoter element is not generally conserved in all eukaryotes, little divergence has occurred between trichomonad and metazoan Inrs, supporting a common origin. Nevertheless, we cannot rule out the possibility that the metazoan and trichomonad Inrs evolved independently of one other. However, it seems more likely that the lack of conservation of the Inr in previously examined, early-evolving eukaryotes reflects either the divergence of these organisms or the limited number of sequence-specific protist promoters that have been identified. Indeed, conserved elements do surround and contain the transcription start site of genes in the entamoebid Entamoeba histolytica (38) and the diplomonad Giardia lamblia (13); however, these elements have no sequence similarity to each other or to trichomonad or metazoan Inrs. Finally, evidence of proteins that interact with metazoan Inrs exists (31, 39), but sequence-specific, Inr-binding proteins have been difficult to identify. Purification of the trichomonad nuclear protein that specifically recognizes a functional, but not a nonfunctional, Inr should advance our understanding of transcription in eukaryotes, since this protein will likely be a homologue of metazoan Inr-binding proteins.
We thank Arnie Berk and Steve Smale for helpful comments on the manuscript, Doris Quon for assistance with primer extension analyses, Maria Delgadillo for assistance with site-directed mutagenesis, Kayvan Niazi for preparing the αTUB-Neo construct, and members of our laboratory for helpful advice and discussion.
This work was supported by an NIH grant (AI30537) to P.J.J. and a USPHS predoctoral training award (GM07185) to D.R.L. P.J.J. is the recipient of a Burroughs-Wellcome Fund Scholar Award.