|Home | About | Journals | Submit | Contact Us | Français|
The identification of novel DNA sequence motifs potentially participating in the regulation of gene transcription is a difficult task due to the small size and relative simplicity of the sequences involved. One possible way of overcoming this difficulty is to examine the promoter region of genes with similar expression profiles. Parameters of interest include similar tissue and cell-type specificity and quantitatively similar levels of mRNA in wild-type backgrounds. Tcp10b and Tctex1 are genes exhibiting these properties in that both are expressed at similar levels in pachytene spermatocytes of male mouse germ cells with little to no expression elsewhere. An analysis of the promoter region of these genes has uncovered a novel 20-nucleotide motif perfectly conserved in both. We have characterized the binding properties of this motif and show that it is specifically recognized by a 43 kD nuclear protein. The complex is highly stable and exhibits strong specificity. Furthermore, results from analyzing the sequence of several vertebrate genomes for the presence of the motif are consistent with the existence of a novel motif in the vicinity of several hundred genes.
Tctex1 and Tcp10 are two gene complexes located within the mouse t complex of chromosome 17. Tctex1 consists of multiple (possibly four), contiguous genes in the inbred strain C3H that cluster into two types, A and B, based on differences in their promoter regions [1, 2]. This arrangement appears to be conserved in other strains, including CAST/Ei, 129/SvJ, Balb/cJ and C57BL/6J but not SPRET/Ei, where a single copy appears to be present (A. Planchart, unpublished). The Tcp10 complex consists of three contiguous genes referred to as Tcp10a, Tcp10b and Tcp10c [3, 4]. Tctex1 expression is ubiquitous yet it is most abundantly expressed in the testis ; however, expression from the Tcp10 complex is testis-restricted . The expression of both gene complexes is first observed during the pachytene stage of spermatogenesis [1, 3].
Tctex1 encodes a dynein light chain  found in flagellar axonemal inner  and outer  dynein arms and in cytoplasmic dynein . It is involved in the transport of rhodopsin in rod photoreceptors  and interacts with the poliovirus receptor, CD155 . In the t haplotype, a variant form of the t complex found in approximately 25% of feral mice, the Tctex1 gene family harbors multiple mutations, at least one of which eliminates the start codon in the B subset. Mutations in the A subset are thought to affect the protein’s function . Tctex1 maps to a region of the t complex known to be involved in transmission ratio distortion (TRD; reviewed in ) in t haplotype males, thus Tctex1 is a candidate for one of the proximal distorters, the other one being the recently cloned Tagap1, a GTPase-activating protein . The genes encoded by the Tcp10 complex have no known function (although computationally-derived annotations suggest that the protein encoded by Tcp10c has patterns found in proteins that function in G-protein coupled receptor pathways; MGI Accession ID 98543). Transcription from either complex is not under the control of a TATA-box promoter, a phenomenon frequently seen in testis-expressed genes [13–15].
Functional and sequence characterizations of the upstream controlling regions of the genes within the Tctex1 complex have been performed . Thus, a Germ-cell Inhibitory Motif (GIM) has been identified in the ‘A’ subset of the C3H Tctex1 complex that consists of an octanucleotide, ACCCTGAG, a sequence that bears some similarity to the mammalian AP-2 binding site ; in 129/SvJ, the last two nucleotides of the GIM are switched (ACCCTGGA, A. Planchart, unpublished). Interestingly, in the t haplotype alleles of Tctex1 genes, the GIM is absent having undergone a loss of nucleotides within the motif and surrounding sequence. Tctex1 expression in the testis of t haplotype males is highly upregulated compared to wild-type males and this phenomenon was attributed to the loss of the GIM.
An extensive analysis of the promoter region of Tcp10bt, the t haplotype allele of Tcp10b, has been conducted [16–18]. Promoter “bashing” approaches revealed that the sequence from−973 to−1 (where +1 indicates the start-site of transcription) is sufficient for the proper temporal and tissue-specific expression of a LacZ reporter gene in transgenic mice . Electrophoretic mobility shift assays (EMSA) uncovered three regions within the Tcp10bt promoter that are specifically bound by testis-derived nuclear proteins [16, 18]. One site in particular, the so-called TBP3 site, contains an AP-2 half-site which the authors’ hypothesize is part of a complex transcription factor binding site in which the AP-2 transcription factor oligomerizes with a testis-specific factor, thus converting the ubiquitously recognized AP-2 site into a testis-specific transcription factor binding site . Two other sites (BP1 and 2) are also bound specifically by a testis-only nuclear factor yet these sites posses no recognizable transcription factor binding sites. Whereas BP2 is within the 973-nucleotide region that governs the proper expression of the reporter gene, BP1 lies outside of this region .
In this report, we describe a novel binding site that is found within the promoter regions of the Tctex1 and Tcp10b and c genes, but not Tcp10a. This site, which we call motif A1, is a 20-mer with perfect identity in the two gene families. It is located in the interval between BP2 and BP3 of Tcp10b, a region that does not appear to have been characterized by Ewulonu et al., (1996). We provide evidence for specific binding by a nuclear factor, the approximate half-life of the protein: DNA complex and an approximate binding constant and the relative molecular weight of the protein. In addition, we report on a genome-wide survey of the motif’s prevalence and its proximity to known or novel genes.
Brain, liver and testes from adult C3H/HeJ males and the NIH/3T3 cell line were used for isolating a crude nuclear extract using polyethylenimine . Crude nuclear protein pellets were resuspended in storage buffer (50 mM Tris pH 7.9, 12.5% glycerol, 1.85 mg mL 1 KCl, 0.1 mM EDTA, 10 mM 2-mercaptoethanol and protease inhibitor cocktail), quantified by Bradford assay, adjusted to 2 μg μL 1, aliquoted, flash-frozen in liquid N2 and stored at −80°C until ready to use.
Lyophilized, complimentary oligonucleotides (IDT, Coralville, IA), corresponding to the 20-mer motif (motif A1) common to Tctex1 and Tcp10, or to mutated versions of the 20-mer motif (motifs A2 and A3), were resuspended to a final concentration of 100 p mole μL 1 in water. Labeling reactions were performed as follows: 400 pmole of each oligonucleotide were mixed and heated to 95°C in an MJ Research PTC100 thermalcycler for 3 minutes, followed by slow cooling to room temperature and incubation on ice for 1 h to allow oligonucleotides to anneal to each other. Afterwards, end-labeling of the double-stranded probe was performed with 10 U of T4 polynucleotide kinase (PNK; New England Biolabs) supplemented with PNK buffer and 10 μCi of -32P ATP in a final reaction volume of 20 μL at 37°C for 1 h. Unincorporated nucleotides were removed with the QIAquick nucleotide removal kit (Qiagen, Valencia, CA) and probe was eluted from the column with 50 μL of water and stored at −20°C until needed.
20 μL binding reactions consisting of 5 μL of protein extract (10 μg crude extract), 4 μL of 5X binding buffer (60 mM HEPES pH 7.9, 300 mM KCl, 5 mM DTT, 1.5 mM EDTA, 50% v/v glycerol), 100 μg BSA, 1 μL labeled probe (20 pmol μL 1; 40,000 counts-per-minute, CPM) and 1 μg poly (dI:dC) non-competitive DNA were incubated with or without unlabeled competitor (motifs A1, A2 or A3; Table 1) at varying excess concentrations (0 to 50-fold) at 30°C for 30 min. Complexes were resolved on 4.75% native polyacrylamide gels (pre-run at 125V for 30 minutes) for 2.5 h at 125V. Gels were transferred onto filter paper, dried under vacuum and placed on X-ray film with intensifying screen at −80°C.
The complex half-life was measured by a second EMSA assay in which a binding reaction was setup as described above with motif A1, including the addition of 15-fold unlabeled motif A1 as a competitor. Aliquots were removed at varying time points and resolved on a 4.75% native polyacrylamide gel as described above. After drying the gel and exposing it to X-ray film, the location of the complexes were determined by superimposing the autoradiograph onto the dried gel and cutting out the corresponding regions, adding them to scintillation cocktail and counting in a liquid scintillation counter. The data were log transformed, plotted and fitted to a straight line by least squares regression analysis using SigmaPlot 8. The complex half-life was determined from the graph.
An approximate binding constant for the protein:DNA complex was determined by a third EMSA assay in which varying concentrations of cold competitive DNA were added. The complexes were resolved as described above and the resulting Autoradiograph was subject to scanning densitometry. FUJI’s MultiGauge software was used to determine the spot densities. Data was log-transformed and plotted as described above. The binding constant was extrapolated from the graph.
The sequence specificity of the binding site was determined by the use of double stranded oligomers that differed from motif A1 by the introduction of mutations. EMSA analysis with these mutant motifs was carried out as described above.
A 20 μL binding reaction was incubated for 30 minutes at 30°C. Afterwards, the droplet was transferred to Parafilm, placed on ice and crosslinked by irradiating at 254 nm for 10 minutes from an 18.4 W light source (corresponding to a total energy of 11 kJ). SDS-PAGE loading buffer with 2-mercaptoethanol was added and the crosslinked complex was boiled for 5 minutes and loaded onto a 9% SDS-PAGE Laemmli gel  after which the gel was stained in Coomassie, dried and exposed to X-ray film.
The occurrence of motif A1 in the mouse genome was determined by BLAST  analysis of the mouse genome assembly, build 34 (parameter: e = 10). Motifs were subsequently aligned and a sequence logo was generated to illustrate the consensus sequence . The Tctex1 and Tcp10b promoter sequences are available from NCBI (Accession IDs AC092482 and M84175, respectively).
Genes with similar expression profiles lead naturally to the hypothesis that their transcription is regulated by common mechanisms. Although Tctex1 expression is ubiquitously detected at low levels by RT-PCR and Northern analysis, like Tcp10b, it is abundantly expressed in mouse pachytene spermatocytes. The promoters of both gene complexes have been extensively analyzed, yet to our knowledge a previous inter-promoter comparison for the purpose of uncovering common motifs has not been performed. Searching the 5′ upstream region of the Tctex1 and Tcp10b genes for common motifs, revealed a conserved 20-nucleotide motif of sequence 5′-AAGAATGAGAAGCAATTCAA-3′ in Tcp10b but inverted in Tctex1. We call this sequence element motif A1. A sequence of this length is expected to occur randomly once in 1012 nucleotides, barring any sequence bias or extreme lack of complexity. This led us to hypothesize that it may be a binding site for a nuclear factor that is a component of a gene regulatory system common to both gene complexes, so we investigated its prevalence in the mouse genome by blasting the motif against available genomic sequence at NCBI. The results are shown in Table 1. A total of 355 instances of the motif were found in the vicinity of known genes or hypothetical loci, spread across all autosomes and the X chromosome, but not the Y or the mitochondrial genome. The distance from putative transcription start sites is highly variable, ranging from 0.6 kbp (Tcp10b) to 1.7 Mbp (Cdh6). In many instances, more than one identical copy of the motif was found in the vicinity of a gene (Speer3, Vlrg6, Ephb1), whereas in others the flanking residues had diverged between duplications (1110001A23Rik, 4921506J03Rik, Lrfn5, Klhl1 and Cdh6 (3 occurrences)). The motif was found in either orientation in relation to a gene, something that is characteristic of enhancers and repressors [23–25]. A smaller number of hits were found in regions of the genome that have not been fully characterized (data not shown).
The 355 motif sequences were aligned and the alignment was used to calculate the best motif pattern across all 20 nucleotide sites using WebLogo . As shown in Fig. 1, the greatest sequence conservation resides in nucleotides 4 to 14 of the motif (corresponding to AATGAGAAGCA), whereas the residues flanking this core are not as strongly conserved (sites 2–3 and 15–17) or not conserved at all (sites 1 and 18–20). The motif is found in other genomes, including human (627 instances), rat (267 instances), zebrafish (147 instances), Fugu (500 instances) and Drosophila (165 instances) although a gene-by-gene comparison with mouse was not performed. The additional sequences derived from these genomes indicate that the most important sites within the core are positions 7 to 14, GAGAAGCA. When TRANSFAC  was searched using TESS (http://www.cbil.upenn.edu/tess/), no matches to the motif were found, nor was it recognized as a repetitive or simple sequence element by RepeatMasker (http://repeatmasker.org).
To determine if the motif was specifically recognized by a nuclear factor, we performed an electrophoretic mobility shift assay (EMSA) using radiolabeled motif A1 and testis nuclear protein extract. The results, shown in Fig. 2, are consistent with a specific interaction between the motif and a nuclear factor: a single complex was observed and, more importantly, the protein:DNA complex disappeared after addition of a 50-fold excess cold motif A1, but an excess of cold non-competitive DNA had no effect. Similar results were obtained when liver and brain extracts were substituted for the testis extract; however, extracts derived from ovaries or NIH/3T3 cells failed to form a complex, suggesting that the nuclear factor is not expressed in these tissues (data not shown).
The half-life of the complex was determined in a second EMSA in which a constant amount of cold competitor (15-fold) was added to the binding reaction and the loss of the complex signal was monitored by measuring the complex intensity by scintillation at different time points and plotting this value as a function of time. A drop in complex intensity to half-maximum was observed after 42 minutes in the presence of 15-fold cold competitor, thus indicating that the interaction between the protein and the probe is stable. A third EMSA in which different amounts of cold motif A1 (0, 1, 3, 5, 7, 10, 15, 20 and 50-fold) were added, was performed in order to determine how much of the motif was bound per μg of crude protein, which would give a rough indication of the strength and specificity of the protein:DNA complex. Again, the data were fitted to a straight line and an approximate binding constant was calculated as the concentration of probe per μg of crude protein at the point where the complex intensity was half-maximum. The gel and resulting graph are shown in Fig. 3. The binding constant was calculated to be 62 pmol of binding site bound per μg of crude protein (62 pmol μg1).
In order to test the computational results suggesting that the specificity of binding resides in residues 4–14 of the motif, we designed two mutant versions of it. Motif A2 had the flanking residues mutated (5′-AGATTTGAGAAGCAAATTAA-3′) whereas motif A3 had mutations in residues 6, 9 and 14 (5′-AAGAAGGAAAAGCGATTCAA-3′). When motif A2 was used to create the complex and subsequently analyzed by EMSA, the intensity of the complex was not significantly different from the complex formed with A1 (Fig. 4). This result is consistent with our earlier finding that these residues are not highly conserved. However, when motif A3 was used a significant drop in complex intensity to approximately one fourth of that observed with motif A1 was noted (Fig. 5), indicating that the specificity of binding resides within the core identified computationally.
Lastly, in order to determine the approximate molecular weight of the nuclear protein that binds to motif A1, we analyzed UV-crosslinked complexes by SDS-PAGE (Fig. 5). The results of this experiment show that a DNA: protein complex of approximately 55 kD is formed by UV-crosslinking. Subtracting the molecular weight of the motif (approximately 12 kD) yields an estimated molecular weight of 43 kD for the protein. Once again, the specificity of the interaction was underscored by the absence of a crosslinked complex when the binding reaction was performed in the presence of an 50-fold excess of cold motif A1 (Fig. 5). When an excess of bovine serum albumin was used in place of the nuclear extract, no complex was observed (data not shown).
The discovery of novel motifs involved in the regulation of gene transcription is critical to our complete understanding of the mechanisms that govern proper spatial and temporal gene expression. However, this task is made difficult by the size and relative simplicity of these motifs, since they are expected to occur frequently and in regions of the genome bereft of transcriptional activity. One strategy for overcoming this pitfall is to cluster orthologous genes from divergent taxa and search regions upstream of the transcription start site for conserved sequence blocks . Another strategy, employed here, is to examine genes with similar expression profiles and cell-type specificity for shared elements that may be involved in regulating their overlapping expression profiles. The promoter regions of Tctex1 and Tcp10b have been studied individually [2, 16]. Their high levels of expression in pachytene spermatocytes as well as their low (Tctex1) or absent (Tcp10b) expression in other tissues suggested to us that they may be regulated by the same mechanism and the discovery of motif A1 bolstered this hypothesis.
However, our results show that the motif is specifically recognized by a nuclear factor present in several tissues, consistent with the observation that motif A1 is found in genes expressed in a variety of tissues and cell types. It is interesting that NIH/3T3 cells and ovary do not express the protein, indicating that a higher level of complexity in the organization of the tissue (NIH/3T3) or absent signal required for expression of the nuclear factor is not present in NIH/3T3 cells or in ovary. Although we have yet to uncover a link common to all the genes in Table 1, it remains a possibility that they act in concert in an uncharacterized gene network. We anticipate that the kinetics and affinity of the protein for motif A1 will support our findings that the complex is highly stable and probably has a low binding constant, but this awaits purification of the nuclear factor that binds motif A1.
The prevalence of motif A1 in the mouse genome and the variability in its position and orientation relative to the purported transcription start site of nearby genes are suggestive of a role in cis-acting gene regulation, possibly as an enhancer or repressor of expression of genes under the transcriptional control of RNA polymerase II. Its conserved presence in other vertebrate organisms is suggestive of strong evolutionary conservation, particular given the observation that the central region of motif A1, which we show is the core site of recognition (Fig. 4), is highly conserved across taxa (data not shown). The high occurrence of the motif in Fugu is interesting and seems to indicate that a larger number of genes in this organism are under the influence of the motif’s hypothesized effect on gene regulation, than in mice, rats, zebrafish, or Drosophila. If this is the case, it is consistent with the hypothesis that speciation and species differences are largely due to differences in gene expression and not to differences in the genes themselves [28–30].
Other questions remain unresolved, such as the identity of the nuclear factor that binds to motif A1 and how it might interact with the transcription machinery and the effect of motif A1 on the regulation of gene transcription. One possibility, given the proximity of the motif to a large cadre of genes with seemingly unrelated expression profiles and functions, is that the motif is part of a general mechanism used by the cell to either enhance or repress expression based on a number of different external queues; its role in regulation in such a situation could be due to tissue and/or cell-type specific expression of other factors that interact with the protein bound to motif A1. A second possibility, as stated previously, is that the genes where motif A1 is found interact in an uncharacterized network.
We thank David Barnes, Mary Ann Handel and Charles Wray for critical comments on the manuscript. A.P. thanks Peter Schlax and Paula Schlax for helpful suggestions on experimental protocols. This work was supported by NIH Grant P20 RR-016463 from the INBRE Program of the National Center for Research Resources to A.P.